Kaggle Data Hub – Telegram
Kaggle Data Hub
28.9K subscribers
808 photos
13 videos
309 files
1.07K links
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Kaggle Data Hub
archive.zip
Please 👍 or ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5
Teeth Segmentation on dental X-ray images

The dataset consists of 598 images with a total of 15,318 polygons

About Dataset

Humans in the Loop is excited to publish a new open access dataset for Teeth segmentation on dental radiology scans. The segmentation is done manually by 12 Humans in the Loop trainees in the Democratic Republic of Congo as part of their trainings, using the Panoramic radiography database published by Lopez et al. The dataset consists of 598 images with a total of 15,318 polygons, where each tooth is segmented with a different class.

https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍52
archive.zip.002
241.1 MB
Teeth Segmentation on dental X-ray images

https://news.1rj.ru/str/datasets1 🐍
Please open Telegram to view this post
VIEW IN TELEGRAM
2
DeepGlobe Road Extraction Dataset

Road Extraction Dataset from DeepGlobe Challenge

Data

The training data for Road Challenge contains 6226 satellite imagery in RGB, size 1024x1024.
The imagery has 50cm pixel resolution, collected by DigitalGlobe's satellite.
The dataset contains 1243 validation and 1101 test images (but no masks).

Label

Each satellite image is paired with a mask image for road labels. The mask is a grayscale image, with white standing for road pixel, and black standing for background.
File names for satellite images and the corresponding mask image are id _sat.jpg and id _mask.png. id is a randomized integer.
Please note:
The values of the mask image may not be pure 0 and 255. When converting to labels, please binarize them at threshold 128.
The labels are not perfect due to the cost for annotating segmentation mask, specially in rural regions. In addition, we intentionally didn't annotate small roads within farmlands.

https://news.1rj.ru/str/datasets1 ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍42
archive.zip
3.8 GB
DeepGlobe Road Extraction Dataset

https://news.1rj.ru/str/datasets1 🧡
Please open Telegram to view this post
VIEW IN TELEGRAM
2
CT KIDNEY DATASET: Normal-Cyst-Tumor and Stone

Dataset to detect auto Kidney Disease Analysis

Content

The dataset was collected from PACS (Picture archiving and communication system) from different hospitals in Dhaka, Bangladesh where patients were already diagnosed with having a kidney tumor, cyst, normal or stone findings. Both the Coronal and Axial cuts were selected from both contrast and non-contrast studies with protocol for the whole abdomen and urogram. The Dicom study was then carefully selected, one diagnosis at a time, and from those we created a batch of Dicom images of the region of interest for each radiological finding. Following that, we excluded each patient's information and meta data from the Dicom images and converted the Dicom images to a lossless jpg image format. After the conversion, each image finding was again verified by a radiologist and a medical technologist to reconfirm the correctness of the data.

Our created dataset contains 12,446 unique data within it in which the cyst contains 3,709, normal 5,077, stone 1,377, and tumor 2,283
5👍5
archive.zip
1.5 GB
CT KIDNEY DATASET: Normal-Cyst-Tumor and Stone

https://news.1rj.ru/str/datasets1 🧠
Please open Telegram to view this post
VIEW IN TELEGRAM
2
ISIC 2019 Skin Lesion images for classification

25,331 images belonging to 8 classes for training models on classification

The dataset for ISIC 2019 contains 25,331 images available for the classification of dermoscopic images among nine different diagnostic categories:

Melanoma
Melanocytic nevus
Basal cell carcinoma
Actinic keratosis
Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
Dermatofibroma
Vascular lesion
Squamous cell carcinoma
None of the above

https://news.1rj.ru/str/datasets1 🧠
Please open Telegram to view this post
VIEW IN TELEGRAM
👍82🔥1
archive.zip.003
1.3 GB
ISIC 2019 Skin Lesion images for classification

https://news.1rj.ru/str/datasets1 👑
Please open Telegram to view this post
VIEW IN TELEGRAM
3👍3🔥1
Tuberculosis (TB) Prediction(Top 75 Countries)

About Dataset

This dataset includes 400,000 records with 22 variables that capture demographic, health, and socioeconomic factors influencing tuberculosis incidence across 70 countries. The data is designed to resemble real-world patterns observed in tuberculosis prevalence and healthcare indicators. It can be used for tasks such as denoscriptive analysis, machine learning, and public health research.

https://news.1rj.ru/str/datasets1 🏐
Please open Telegram to view this post
VIEW IN TELEGRAM
👍62🔥1
archive.zip
62.2 MB
Tuberculosis (TB) Prediction(Top 75 Countries)

https://news.1rj.ru/str/datasets1 🏐
Please open Telegram to view this post
VIEW IN TELEGRAM
2🔥1
STL-10 Image Recognition Dataset

Train models to recognize different animals and vehicles

Context

STL-10 is an image recognition dataset inspired by CIFAR-10 dataset with some improvements. With a corpus of 100,000 unlabeled images and 500 training images, this dataset is best for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Unlike CIFAR-10, the dataset has a higher resolution which makes it a challenging benchmark for developing more scalable unsupervised learning methods.
Content

Data overview:

There are three files: train_image.zips, test_images.zip and unlabeled_images.zip
10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck
Images are 96x96 pixels, color
500 training images (10 pre-defined folds), 800 test images per class
100,000 unlabeled images for unsupervised learning. These examples are extracted from a similar but broader distribution of images. For instance, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.) in addition to the ones in the labeled set
Images were acquired from labeled examples on ImageNet

https://news.1rj.ru/str/datasets1 🆘
Please open Telegram to view this post
VIEW IN TELEGRAM
👍13🔥2
archive.zip
1.9 GB
STL-10 Image Recognition Dataset

https://news.1rj.ru/str/datasets1 💦
Please open Telegram to view this post
VIEW IN TELEGRAM
👍8🔥2
Skin Cancer MNIST: HAM10000

a large collection of multi-source dermatoscopic images of pigmented lesions

Overview
Another more interesting than digit classification dataset to use to get biology and medicine students more excited about machine learning and image processing.

https://news.1rj.ru/str/datasets1 🩵
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4🔥1
The California Wildfire Data 🔥🔥🔥🔥

Structures Impacted by Wildland Fires in California!

Column Denoscriptions:
OBJECTID: A unique identifier for each record in the dataset.
DAMAGE: Indicates the level of fire damage to the structure (e.g., "No Damage", "Affected (1-9%)").
STREETNUMBER: The street number of the impacted structure.
STREETNAME: The name of the street where the impacted structure is located.
STREETTYPE: The type of street (e.g., "Road", "Lane").
STREETSUFFIX: Additional address information, such as apartment or building numbers (if applicable).
CITY: The city where the impacted structure is located.
STATE: The state abbreviation (e.g., "CA" for California).
ZIPCODE: The postal code of the impacted structure.
CALFIREUNIT: The CAL FIRE unit responsible for the area.
COUNTY: The county where the impacted structure is located.
COMMUNITY: The community or neighborhood of the structure.
INCIDENTNAME: The name of the fire incident that impacted the structure.
APN: The Assessor’s Parcel Number (APN) of the property.
ASSESSEDIMPROVEDVALUE: The assessed value of the improved property (e.g., structures, not just land).
YEARBUILT: The year the structure was built.
SITEADDRESS: The full address of the property, including city, state, and ZIP code.
GLOBALID: A globally unique identifier for each record.
Latitude: The latitude coordinate of the structure’s location.
Longitude: The longitude coordinate of the structure’s location.
UTILITYMISCSTRUCTUREDISTANCE: The distance between the main structure and any utility or miscellaneous structures (if recorded).
FIRENAME: An alternative or secondary name for the fire incident.
geometry: A geospatial representation of the location in a point format (e.g., "POINT (-13585927.697 4646740.750)").

https://news.1rj.ru/str/datasets1 🎙
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4🔥31
Please open Telegram to view this post
VIEW IN TELEGRAM
4👍3🔥3