Kaggle Data Hub – Telegram
Kaggle Data Hub
29K subscribers
867 photos
14 videos
309 files
1.13K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Lightening Strikes Dataset NOAA

2018 lightning strike data by National Oceanic and Atmospheric Administration

Dataset Denoscription: NOAA Lightning Strikes Dataset
The NOAA (National Oceanic and Atmospheric Administration) Lightning Strikes dataset provides insights into lightning activity over a given region or time period. This dataset is a product of NOAA's weather monitoring and storm tracking systems, offering valuable information for meteorologists, researchers, and disaster management authorities.

https://news.1rj.ru/str/datasets1
1👍1
Amazon reviews - Full
amazon_review_full_csv.tgz
613.9 MB
Amazon reviews - Full

Abstract:
34,686,770 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP). This full dataset contains 600,000 training samples and 130,000 testing samples in each class.

#KaggleDatasets #DataScience #MachineLearning #DataAnalysis #DataVisualization #OpenData #DataCleaning #TextClassification #NLP #SentimentAnalysis #BigData #APIAutomation #DataLicensing #SocialMediaData #PythonIntegration #DataModeling #kaggle #ComputerVision #python #LLM #DeepLearning #Pytorch #HuggingFace #Dataset

https://news.1rj.ru/str/datasets1
👍52
Liver Cancer Predictions

A Global Look at Liver Cancer Cases, Risk Factors, and Healthcare Access

About Dataset
This dataset contains information about liver cancer predictions in the 30 most populated countries. It includes factors such as age, gender, alcohol use, healthcare access, and screening availability. The data helps understand how liver cancer spreads, which groups are at higher risk, and how healthcare differences affect survival chances.
👍1
Oral Cancer Prediction Dataset – Top 30 Countries

📊 160,292 Records | 25 Factors | High-Risk Groups Analyzed

This dataset provides information on oral cancer cases from the 30 most populated countries. It considers key risk factors such as age, gender, tobacco/alcohol use, socioeconomic background, and diagnosis stage. The dataset helps researchers and healthcare experts understand trends in oral cancer and predict outcomes based on patient details.
👍7
Thyroid Cancer Risk Dataset

Assessing Thyroid Cancer Risk Through Key Health Indicators

This dataset incorporates 212,691 statistics related to* thyroid cancer risk factors*. It includes demographic facts, clinical history, lifestyle factors, and key thyroid hormone degrees to assess the probability of thyroid most cancers. The dataset may be beneficial for system learning fashions aiming to predict thyroid most cancers risk based on numerous indicators.

Column Denoscriptions:
Patient_ID (int): Unique identifier for each patient.
Age (int): Age of the patient.
Gender (object): Patient’s gender (Male/Female).
Country (object): Country of residence.
Ethnicity (object): Patient’s ethnic background.
Family_History (object): Whether the patient has a family history of thyroid cancer (Yes/No).
Radiation_Exposure (object): History of radiation exposure (Yes/No).
Iodine_Deficiency (object): Presence of iodine deficiency (Yes/No).
Smoking (object): Whether the patient smokes (Yes/No).
Obesity (object): Whether the patient is obese (Yes/No).
Diabetes (object): Whether the patient has diabetes (Yes/No).
TSH_Level (float): Thyroid-Stimulating Hormone level (µIU/mL).
T3_Level (float): Triiodothyronine level (ng/dL).
T4_Level (float): Thyroxine level (µg/dL).
Nodule_Size (float): Size of thyroid nodules (cm).
Thyroid_Cancer_Risk (object): Estimated risk of thyroid cancer (Low/Medium/High).
Diagnosis (object): Final diagnosis (Benign/Malignant).
👍5🔥3
Hand Gesture Detection System

HandMimic - An Advanced Hand Gesture Recognition System

Problem Statement
As a data scientist at a leading home electronics company, my goal is to create an innovative gesture control feature for smart televisions. By utilizing a webcam mounted on the TV, the system will recognize five specific gestures, enabling users to interact with the TV hands-free, without needing a remote control.

The five gestures and their corresponding actions are:

Thumbs Up: Increases the volume.
Thumbs Down: Decreases the volume.
Left Swipe: Rewinds the content by 10 seconds.
Right Swipe: Jumps forward by 10 seconds.
Stop: Pauses the content.

Machine learning algorithms will train the system to recognize these gestures in real-time using the webcam, providing seamless interaction and enhancing the overall user experience.

Objectives
The primary objective is to develop a gesture-based control feature for smart TVs, enabling users to adjust volume, skip, rewind, and pause content using five distinct gestures detected by a webcam. Machine learning will be employed to train the model to recognize these gestures instantly, offering a hands-free and intuitive TV experience.

Understanding the Dataset
The training dataset consists of several hundred videos, each categorized into one of five gesture classes. Each video lasts 2-3 seconds, divided into 30 frames (images). Captured by various individuals performing the gestures in front of a webcam, these videos simulate real-world smart TV use. The gestures include thumbs up, thumbs down, left swipe, right swipe, and stop, and serve as individual training samples for the gesture recognition model.

Generator
The generator will preprocess the video data by cropping, resizing, and normalizing it to ensure proper formatting before passing it to the model. The generator should efficiently process batches of video data, ensuring smooth training without errors.

Model
The objective is to create a model that trains efficiently with minimal inference time. The model’s architecture should be optimized to balance performance and speed, with fewer parameters leading to faster predictions. The model will be evaluated based on accuracy in recognizing gestures, starting with a small dataset to assess initial performance before scaling up.
👍7
Car Detection and Tracking Dataset

499 Images of Car Dataset with Text Annotation

About Dataset

This dataset contains 499 images, each with bounding box annotations for cars.
The annotations are provided in the YOLO text format, which includes class labels and bounding box coordinates.
This dataset is useful for object detection tasks such as vehicle recognition and traffic analysis.
👍6
DATAFLOW2025 - PRODUCT RECOMMENDATION

DATAFLOW2025: "MASTERING THE DATA WAVES"
👍2
OpenR1-Math-220k

OpenR1-Math-220k is a large-scale dataset for mathematical reasoning.

Dataset denoscription

OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer.

The dataset consists of two splits:

default with 94k problems and that achieves the best performance after SFT.
extended with 131k samples where we add data sources like cn_k12. This provides more reasoning traces, but we found that the performance after SFT to be lower than the default subset, likely because the questions from cn_k12 are less difficult than other sources.
👍2
1 Year Perplexity Pro on Your Mail
💰 Price:$20 or ₹1500

1 Year You.com Pro on Your Mail
💰 Price:$25 or ₹1800

💰Combo Offer:40$

Original Price:200$

How I activate ?
I activate account through voucher codes on your mail for 1 year.

💡 Features Included
Advanced AI Models:
• DeepResearch
•GPT-4o, o1, o3 mini(High)
• Deepseek r1[USA Hosted Uncensored]
• Llama 3.1
•Claude 3.5 Sonnet, Claude 3.5 Haiku
•Grok-2(Grok 3 coming too confirmed by its CEO)
•FILE ANALYSIS
•PRO SEARCH

Image Generation 🎥
•Flux, DALL-E 3
•Playground v3, Stable Diffusion XL

✔️ What You Get
•1 year of full access.
•A 12-month warranty is included.

💨 This post will be deleted/removed after 24 hours so save my username or contact immediately.

💰 Payment Method: Crypto[LTC or USDT] or UPI
For Inquiry/Purchase DM: @AiChatBoss
👍54🔥1