Kaggle Data Hub – Telegram
Kaggle Data Hub
29K subscribers
866 photos
14 videos
309 files
1.13K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Handwritten Mathematical Expression Convert LaTeX

The Azu/Handwritten-Mathematical-Expression-Convert-LaTeX dataset on Hugging Face is designed for converting handwritten mathematical expressions into LaTeX code. It includes image-text pairs, where each image contains a handwritten mathematical expression, and the corresponding text is its LaTeX representation. The dataset is suitable for training models in Handwritten Mathematical Expression Recognition (HMER) tasks, combining computer vision and natural language processing techniques.

With a size ranging between 10K to 100K samples, it supports tasks like image-to-sequence modeling and LaTeX code generation. The dataset is formatted as an imagefolder, making it easy to integrate into machine learning pipelines.
#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥21
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥5👍1
Volleyball Ball Object Detection Dataset

Volleyball Court Images + Ball Object Detection Annotations

This dataset comprises volleyball court images and their ball object detection annotations.
The dataset has been annotated precisely to train a yolov8x model to detect the ball in volleyball matches.
This dataset is part of three datasets (the other two for volleyball players and referee object detection and volleyball court key points regression) used to train yolov8x models for my project.

#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5🔥3
archive.zip
32.6 MB
Volleyball Ball Object Detection Dataset

https://news.1rj.ru/str/datasets1 🎁
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2🔥2
LOOKING FOR A NEW SOURCE OF INCOME?
Average earnings from 100$ a day

Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.

WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn

❗️ Requires 20 people ❗️
Access is available at the link below
👇

https://news.1rj.ru/str/+EWM2hR1d_As0ZDA5
👎3👍2
Gender Recognition by Voice (processed)

help identifying male and female voice

Features:

The dataset includes the following extracted audio features:

mean_spectral_centroid: The average spectral centroid, representing the "center of mass" of the spectrum, indicating brightness.
std_spectral_centroid: The standard deviation of the spectral centroid, measuring variability in brightness.
mean_spectral_bandwidth: The average width of the spectrum, reflecting how spread out the frequencies are.
std_spectral_bandwidth: The standard deviation of spectral bandwidth, indicating variability in frequency spread.
mean_spectral_contrast: The average difference between peaks and valleys in the spectrum, indicating tonal contrast.
mean_spectral_flatness: The average flatness of the spectrum, measuring the noisiness of the signal.
mean_spectral_rolloff: The average frequency below which a specified percentage of the spectral energy resides, indicating sharpness.
zero_crossing_rate: The rate at which the signal crosses the zero amplitude axis, representing noisiness or percussiveness.
rms_energy: The root mean square energy of the signal, reflecting its loudness.
mean_pitch: The average pitch frequency of the audio.
min_pitch: The minimum pitch frequency.
max_pitch: The maximum pitch frequency.
std_pitch: The standard deviation of pitch frequency, measuring variability in pitch.
spectral_skew: The skewness of the spectral distribution, indicating asymmetry.
spectral_kurtosis: The kurtosis of the spectral distribution, indicating the peakiness of the spectrum.
energy_entropy: The entropy of the signal energy, representing its randomness.
log_energy: The logarithmic energy of the signal, a compressed representation of energy.
mfcc_1_mean to mfcc_13_mean: The mean of the first 13 Mel Frequency Cepstral Coefficients (MFCCs), representing the timbral characteristics of the audio.
mfcc_1_std to mfcc_13_std: The standard deviation of the first 13 MFCCs, indicating variability in timbral features.
label: The target variable indicating the gender male(1) or female(0).

https://news.1rj.ru/str/datasets1
👍2🔥1
Lightening Strikes Dataset NOAA

2018 lightning strike data by National Oceanic and Atmospheric Administration

Dataset Denoscription: NOAA Lightning Strikes Dataset
The NOAA (National Oceanic and Atmospheric Administration) Lightning Strikes dataset provides insights into lightning activity over a given region or time period. This dataset is a product of NOAA's weather monitoring and storm tracking systems, offering valuable information for meteorologists, researchers, and disaster management authorities.

https://news.1rj.ru/str/datasets1
1👍1
Amazon reviews - Full
amazon_review_full_csv.tgz
613.9 MB
Amazon reviews - Full

Abstract:
34,686,770 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP). This full dataset contains 600,000 training samples and 130,000 testing samples in each class.

#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1
👍52
Liver Cancer Predictions

A Global Look at Liver Cancer Cases, Risk Factors, and Healthcare Access

About Dataset
This dataset contains information about liver cancer predictions in the 30 most populated countries. It includes factors such as age, gender, alcohol use, healthcare access, and screening availability. The data helps understand how liver cancer spreads, which groups are at higher risk, and how healthcare differences affect survival chances.
👍1
Oral Cancer Prediction Dataset – Top 30 Countries

📊 160,292 Records | 25 Factors | High-Risk Groups Analyzed

This dataset provides information on oral cancer cases from the 30 most populated countries. It considers key risk factors such as age, gender, tobacco/alcohol use, socioeconomic background, and diagnosis stage. The dataset helps researchers and healthcare experts understand trends in oral cancer and predict outcomes based on patient details.
👍7
Thyroid Cancer Risk Dataset

Assessing Thyroid Cancer Risk Through Key Health Indicators

This dataset incorporates 212,691 statistics related to* thyroid cancer risk factors*. It includes demographic facts, clinical history, lifestyle factors, and key thyroid hormone degrees to assess the probability of thyroid most cancers. The dataset may be beneficial for system learning fashions aiming to predict thyroid most cancers risk based on numerous indicators.

Column Denoscriptions:
Patient_ID (int): Unique identifier for each patient.
Age (int): Age of the patient.
Gender (object): Patient’s gender (Male/Female).
Country (object): Country of residence.
Ethnicity (object): Patient’s ethnic background.
Family_History (object): Whether the patient has a family history of thyroid cancer (Yes/No).
Radiation_Exposure (object): History of radiation exposure (Yes/No).
Iodine_Deficiency (object): Presence of iodine deficiency (Yes/No).
Smoking (object): Whether the patient smokes (Yes/No).
Obesity (object): Whether the patient is obese (Yes/No).
Diabetes (object): Whether the patient has diabetes (Yes/No).
TSH_Level (float): Thyroid-Stimulating Hormone level (µIU/mL).
T3_Level (float): Triiodothyronine level (ng/dL).
T4_Level (float): Thyroxine level (µg/dL).
Nodule_Size (float): Size of thyroid nodules (cm).
Thyroid_Cancer_Risk (object): Estimated risk of thyroid cancer (Low/Medium/High).
Diagnosis (object): Final diagnosis (Benign/Malignant).
👍5🔥3
Hand Gesture Detection System

HandMimic - An Advanced Hand Gesture Recognition System

Problem Statement
As a data scientist at a leading home electronics company, my goal is to create an innovative gesture control feature for smart televisions. By utilizing a webcam mounted on the TV, the system will recognize five specific gestures, enabling users to interact with the TV hands-free, without needing a remote control.

The five gestures and their corresponding actions are:

Thumbs Up: Increases the volume.
Thumbs Down: Decreases the volume.
Left Swipe: Rewinds the content by 10 seconds.
Right Swipe: Jumps forward by 10 seconds.
Stop: Pauses the content.

Machine learning algorithms will train the system to recognize these gestures in real-time using the webcam, providing seamless interaction and enhancing the overall user experience.

Objectives
The primary objective is to develop a gesture-based control feature for smart TVs, enabling users to adjust volume, skip, rewind, and pause content using five distinct gestures detected by a webcam. Machine learning will be employed to train the model to recognize these gestures instantly, offering a hands-free and intuitive TV experience.

Understanding the Dataset
The training dataset consists of several hundred videos, each categorized into one of five gesture classes. Each video lasts 2-3 seconds, divided into 30 frames (images). Captured by various individuals performing the gestures in front of a webcam, these videos simulate real-world smart TV use. The gestures include thumbs up, thumbs down, left swipe, right swipe, and stop, and serve as individual training samples for the gesture recognition model.

Generator
The generator will preprocess the video data by cropping, resizing, and normalizing it to ensure proper formatting before passing it to the model. The generator should efficiently process batches of video data, ensuring smooth training without errors.

Model
The objective is to create a model that trains efficiently with minimal inference time. The model’s architecture should be optimized to balance performance and speed, with fewer parameters leading to faster predictions. The model will be evaluated based on accuracy in recognizing gestures, starting with a small dataset to assess initial performance before scaling up.
👍7
Car Detection and Tracking Dataset

499 Images of Car Dataset with Text Annotation

About Dataset

This dataset contains 499 images, each with bounding box annotations for cars.
The annotations are provided in the YOLO text format, which includes class labels and bounding box coordinates.
This dataset is useful for object detection tasks such as vehicle recognition and traffic analysis.
👍6