Kaggle Data Hub – Telegram
Kaggle Data Hub
29K subscribers
866 photos
14 videos
309 files
1.13K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥4👍3
Kaggle Data Hub pinned Deleted message
Handwritten Mathematical Expression Convert LaTeX

The Azu/Handwritten-Mathematical-Expression-Convert-LaTeX dataset on Hugging Face is designed for converting handwritten mathematical expressions into LaTeX code. It includes image-text pairs, where each image contains a handwritten mathematical expression, and the corresponding text is its LaTeX representation. The dataset is suitable for training models in Handwritten Mathematical Expression Recognition (HMER) tasks, combining computer vision and natural language processing techniques.

With a size ranging between 10K to 100K samples, it supports tasks like image-to-sequence modeling and LaTeX code generation. The dataset is formatted as an imagefolder, making it easy to integrate into machine learning pipelines.
#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥21
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥5👍1
Volleyball Ball Object Detection Dataset

Volleyball Court Images + Ball Object Detection Annotations

This dataset comprises volleyball court images and their ball object detection annotations.
The dataset has been annotated precisely to train a yolov8x model to detect the ball in volleyball matches.
This dataset is part of three datasets (the other two for volleyball players and referee object detection and volleyball court key points regression) used to train yolov8x models for my project.

#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5🔥3
archive.zip
32.6 MB
Volleyball Ball Object Detection Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2🔥2
LOOKING FOR A NEW SOURCE OF INCOME?
Average earnings from 100$ a day

Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.

WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn

❗️ Requires 20 people ❗️
Access is available at the link below
👇

https://news.1rj.ru/str/+EWM2hR1d_As0ZDA5
👎3👍2
Gender Recognition by Voice (processed)

help identifying male and female voice

Features:

The dataset includes the following extracted audio features:

mean_spectral_centroid: The average spectral centroid, representing the "center of mass" of the spectrum, indicating brightness.
std_spectral_centroid: The standard deviation of the spectral centroid, measuring variability in brightness.
mean_spectral_bandwidth: The average width of the spectrum, reflecting how spread out the frequencies are.
std_spectral_bandwidth: The standard deviation of spectral bandwidth, indicating variability in frequency spread.
mean_spectral_contrast: The average difference between peaks and valleys in the spectrum, indicating tonal contrast.
mean_spectral_flatness: The average flatness of the spectrum, measuring the noisiness of the signal.
mean_spectral_rolloff: The average frequency below which a specified percentage of the spectral energy resides, indicating sharpness.
zero_crossing_rate: The rate at which the signal crosses the zero amplitude axis, representing noisiness or percussiveness.
rms_energy: The root mean square energy of the signal, reflecting its loudness.
mean_pitch: The average pitch frequency of the audio.
min_pitch: The minimum pitch frequency.
max_pitch: The maximum pitch frequency.
std_pitch: The standard deviation of pitch frequency, measuring variability in pitch.
spectral_skew: The skewness of the spectral distribution, indicating asymmetry.
spectral_kurtosis: The kurtosis of the spectral distribution, indicating the peakiness of the spectrum.
energy_entropy: The entropy of the signal energy, representing its randomness.
log_energy: The logarithmic energy of the signal, a compressed representation of energy.
mfcc_1_mean to mfcc_13_mean: The mean of the first 13 Mel Frequency Cepstral Coefficients (MFCCs), representing the timbral characteristics of the audio.
mfcc_1_std to mfcc_13_std: The standard deviation of the first 13 MFCCs, indicating variability in timbral features.
label: The target variable indicating the gender male(1) or female(0).

https://news.1rj.ru/str/datasets1
👍2🔥1
Lightening Strikes Dataset NOAA

2018 lightning strike data by National Oceanic and Atmospheric Administration

Dataset Denoscription: NOAA Lightning Strikes Dataset
The NOAA (National Oceanic and Atmospheric Administration) Lightning Strikes dataset provides insights into lightning activity over a given region or time period. This dataset is a product of NOAA's weather monitoring and storm tracking systems, offering valuable information for meteorologists, researchers, and disaster management authorities.

https://news.1rj.ru/str/datasets1
1👍1
Amazon reviews - Full
amazon_review_full_csv.tgz
613.9 MB
Amazon reviews - Full

Abstract:
34,686,770 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP). This full dataset contains 600,000 training samples and 130,000 testing samples in each class.

#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1
👍52
Liver Cancer Predictions

A Global Look at Liver Cancer Cases, Risk Factors, and Healthcare Access

About Dataset
This dataset contains information about liver cancer predictions in the 30 most populated countries. It includes factors such as age, gender, alcohol use, healthcare access, and screening availability. The data helps understand how liver cancer spreads, which groups are at higher risk, and how healthcare differences affect survival chances.
👍1
Oral Cancer Prediction Dataset – Top 30 Countries

📊 160,292 Records | 25 Factors | High-Risk Groups Analyzed

This dataset provides information on oral cancer cases from the 30 most populated countries. It considers key risk factors such as age, gender, tobacco/alcohol use, socioeconomic background, and diagnosis stage. The dataset helps researchers and healthcare experts understand trends in oral cancer and predict outcomes based on patient details.
👍7
Thyroid Cancer Risk Dataset

Assessing Thyroid Cancer Risk Through Key Health Indicators

This dataset incorporates 212,691 statistics related to* thyroid cancer risk factors*. It includes demographic facts, clinical history, lifestyle factors, and key thyroid hormone degrees to assess the probability of thyroid most cancers. The dataset may be beneficial for system learning fashions aiming to predict thyroid most cancers risk based on numerous indicators.

Column Denoscriptions:
Patient_ID (int): Unique identifier for each patient.
Age (int): Age of the patient.
Gender (object): Patient’s gender (Male/Female).
Country (object): Country of residence.
Ethnicity (object): Patient’s ethnic background.
Family_History (object): Whether the patient has a family history of thyroid cancer (Yes/No).
Radiation_Exposure (object): History of radiation exposure (Yes/No).
Iodine_Deficiency (object): Presence of iodine deficiency (Yes/No).
Smoking (object): Whether the patient smokes (Yes/No).
Obesity (object): Whether the patient is obese (Yes/No).
Diabetes (object): Whether the patient has diabetes (Yes/No).
TSH_Level (float): Thyroid-Stimulating Hormone level (µIU/mL).
T3_Level (float): Triiodothyronine level (ng/dL).
T4_Level (float): Thyroxine level (µg/dL).
Nodule_Size (float): Size of thyroid nodules (cm).
Thyroid_Cancer_Risk (object): Estimated risk of thyroid cancer (Low/Medium/High).
Diagnosis (object): Final diagnosis (Benign/Malignant).
👍5🔥3