Kaggle Data Hub
archive.zip
Please 👍 or ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5
Teeth Segmentation on dental X-ray images
The dataset consists of 598 images with a total of 15,318 polygons
About Dataset
Humans in the Loop is excited to publish a new open access dataset for Teeth segmentation on dental radiology scans. The segmentation is done manually by 12 Humans in the Loop trainees in the Democratic Republic of Congo as part of their trainings, using the Panoramic radiography database published by Lopez et al. The dataset consists of 598 images with a total of 15,318 polygons, where each tooth is segmented with a different class.
https://news.1rj.ru/str/datasets1✅
The dataset consists of 598 images with a total of 15,318 polygons
About Dataset
Humans in the Loop is excited to publish a new open access dataset for Teeth segmentation on dental radiology scans. The segmentation is done manually by 12 Humans in the Loop trainees in the Democratic Republic of Congo as part of their trainings, using the Panoramic radiography database published by Lopez et al. The dataset consists of 598 images with a total of 15,318 polygons, where each tooth is segmented with a different class.
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5❤2
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2
DeepGlobe Road Extraction Dataset
Road Extraction Dataset from DeepGlobe Challenge
Data
The training data for Road Challenge contains 6226 satellite imagery in RGB, size 1024x1024.
The imagery has 50cm pixel resolution, collected by DigitalGlobe's satellite.
The dataset contains 1243 validation and 1101 test images (but no masks).
Label
Each satellite image is paired with a mask image for road labels. The mask is a grayscale image, with white standing for road pixel, and black standing for background.
File names for satellite images and the corresponding mask image are id _sat.jpg and id _mask.png. id is a randomized integer.
Please note:
The values of the mask image may not be pure 0 and 255. When converting to labels, please binarize them at threshold 128.
The labels are not perfect due to the cost for annotating segmentation mask, specially in rural regions. In addition, we intentionally didn't annotate small roads within farmlands.
https://news.1rj.ru/str/datasets1⭐️
Road Extraction Dataset from DeepGlobe Challenge
Data
The training data for Road Challenge contains 6226 satellite imagery in RGB, size 1024x1024.
The imagery has 50cm pixel resolution, collected by DigitalGlobe's satellite.
The dataset contains 1243 validation and 1101 test images (but no masks).
Label
Each satellite image is paired with a mask image for road labels. The mask is a grayscale image, with white standing for road pixel, and black standing for background.
File names for satellite images and the corresponding mask image are id _sat.jpg and id _mask.png. id is a randomized integer.
Please note:
The values of the mask image may not be pure 0 and 255. When converting to labels, please binarize them at threshold 128.
The labels are not perfect due to the cost for annotating segmentation mask, specially in rural regions. In addition, we intentionally didn't annotate small roads within farmlands.
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4❤2
CT KIDNEY DATASET: Normal-Cyst-Tumor and Stone
Dataset to detect auto Kidney Disease Analysis
Content
The dataset was collected from PACS (Picture archiving and communication system) from different hospitals in Dhaka, Bangladesh where patients were already diagnosed with having a kidney tumor, cyst, normal or stone findings. Both the Coronal and Axial cuts were selected from both contrast and non-contrast studies with protocol for the whole abdomen and urogram. The Dicom study was then carefully selected, one diagnosis at a time, and from those we created a batch of Dicom images of the region of interest for each radiological finding. Following that, we excluded each patient's information and meta data from the Dicom images and converted the Dicom images to a lossless jpg image format. After the conversion, each image finding was again verified by a radiologist and a medical technologist to reconfirm the correctness of the data.
Our created dataset contains 12,446 unique data within it in which the cyst contains 3,709, normal 5,077, stone 1,377, and tumor 2,283
Dataset to detect auto Kidney Disease Analysis
Content
The dataset was collected from PACS (Picture archiving and communication system) from different hospitals in Dhaka, Bangladesh where patients were already diagnosed with having a kidney tumor, cyst, normal or stone findings. Both the Coronal and Axial cuts were selected from both contrast and non-contrast studies with protocol for the whole abdomen and urogram. The Dicom study was then carefully selected, one diagnosis at a time, and from those we created a batch of Dicom images of the region of interest for each radiological finding. Following that, we excluded each patient's information and meta data from the Dicom images and converted the Dicom images to a lossless jpg image format. After the conversion, each image finding was again verified by a radiologist and a medical technologist to reconfirm the correctness of the data.
Our created dataset contains 12,446 unique data within it in which the cyst contains 3,709, normal 5,077, stone 1,377, and tumor 2,283
❤5👍5
ISIC 2019 Skin Lesion images for classification
25,331 images belonging to 8 classes for training models on classification
The dataset for ISIC 2019 contains 25,331 images available for the classification of dermoscopic images among nine different diagnostic categories:
Melanoma
Melanocytic nevus
Basal cell carcinoma
Actinic keratosis
Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
Dermatofibroma
Vascular lesion
Squamous cell carcinoma
None of the above
https://news.1rj.ru/str/datasets1🧠
25,331 images belonging to 8 classes for training models on classification
The dataset for ISIC 2019 contains 25,331 images available for the classification of dermoscopic images among nine different diagnostic categories:
Melanoma
Melanocytic nevus
Basal cell carcinoma
Actinic keratosis
Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
Dermatofibroma
Vascular lesion
Squamous cell carcinoma
None of the above
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍8❤2🔥1
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3👍3🔥1
Tuberculosis (TB) Prediction(Top 75 Countries)
About Dataset
This dataset includes 400,000 records with 22 variables that capture demographic, health, and socioeconomic factors influencing tuberculosis incidence across 70 countries. The data is designed to resemble real-world patterns observed in tuberculosis prevalence and healthcare indicators. It can be used for tasks such as denoscriptive analysis, machine learning, and public health research.
https://news.1rj.ru/str/datasets1🏐
About Dataset
This dataset includes 400,000 records with 22 variables that capture demographic, health, and socioeconomic factors influencing tuberculosis incidence across 70 countries. The data is designed to resemble real-world patterns observed in tuberculosis prevalence and healthcare indicators. It can be used for tasks such as denoscriptive analysis, machine learning, and public health research.
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍6❤2🔥1
STL-10 Image Recognition Dataset
Train models to recognize different animals and vehicles
Context
STL-10 is an image recognition dataset inspired by CIFAR-10 dataset with some improvements. With a corpus of 100,000 unlabeled images and 500 training images, this dataset is best for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Unlike CIFAR-10, the dataset has a higher resolution which makes it a challenging benchmark for developing more scalable unsupervised learning methods.
Content
Data overview:
There are three files: train_image.zips, test_images.zip and unlabeled_images.zip
10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck
Images are 96x96 pixels, color
500 training images (10 pre-defined folds), 800 test images per class
100,000 unlabeled images for unsupervised learning. These examples are extracted from a similar but broader distribution of images. For instance, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.) in addition to the ones in the labeled set
Images were acquired from labeled examples on ImageNet
https://news.1rj.ru/str/datasets1🆘
Train models to recognize different animals and vehicles
Context
STL-10 is an image recognition dataset inspired by CIFAR-10 dataset with some improvements. With a corpus of 100,000 unlabeled images and 500 training images, this dataset is best for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Unlike CIFAR-10, the dataset has a higher resolution which makes it a challenging benchmark for developing more scalable unsupervised learning methods.
Content
Data overview:
There are three files: train_image.zips, test_images.zip and unlabeled_images.zip
10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck
Images are 96x96 pixels, color
500 training images (10 pre-defined folds), 800 test images per class
100,000 unlabeled images for unsupervised learning. These examples are extracted from a similar but broader distribution of images. For instance, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.) in addition to the ones in the labeled set
Images were acquired from labeled examples on ImageNet
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍13🔥2
Skin Cancer MNIST: HAM10000
a large collection of multi-source dermatoscopic images of pigmented lesions
Overview
Another more interesting than digit classification dataset to use to get biology and medicine students more excited about machine learning and image processing.
https://news.1rj.ru/str/datasets1🩵
a large collection of multi-source dermatoscopic images of pigmented lesions
Overview
Another more interesting than digit classification dataset to use to get biology and medicine students more excited about machine learning and image processing.
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4🔥1
archive.zip.002
1.3 GB
Skin Cancer MNIST: HAM10000
#Datasets #Kaggle #MachineLearning #Python #ML #LLM #NLP #ComputerVision #GPT4
https://news.1rj.ru/str/datasets1🩵
#Datasets #Kaggle #MachineLearning #Python #ML #LLM #NLP #ComputerVision #GPT4
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
❤4🔥3
The California Wildfire Data 🔥 🔥 🔥 🔥
Structures Impacted by Wildland Fires in California!
Column Denoscriptions:
OBJECTID: A unique identifier for each record in the dataset.
DAMAGE: Indicates the level of fire damage to the structure (e.g., "No Damage", "Affected (1-9%)").
STREETNUMBER: The street number of the impacted structure.
STREETNAME: The name of the street where the impacted structure is located.
STREETTYPE: The type of street (e.g., "Road", "Lane").
STREETSUFFIX: Additional address information, such as apartment or building numbers (if applicable).
CITY: The city where the impacted structure is located.
STATE: The state abbreviation (e.g., "CA" for California).
ZIPCODE: The postal code of the impacted structure.
CALFIREUNIT: The CAL FIRE unit responsible for the area.
COUNTY: The county where the impacted structure is located.
COMMUNITY: The community or neighborhood of the structure.
INCIDENTNAME: The name of the fire incident that impacted the structure.
APN: The Assessor’s Parcel Number (APN) of the property.
ASSESSEDIMPROVEDVALUE: The assessed value of the improved property (e.g., structures, not just land).
YEARBUILT: The year the structure was built.
SITEADDRESS: The full address of the property, including city, state, and ZIP code.
GLOBALID: A globally unique identifier for each record.
Latitude: The latitude coordinate of the structure’s location.
Longitude: The longitude coordinate of the structure’s location.
UTILITYMISCSTRUCTUREDISTANCE: The distance between the main structure and any utility or miscellaneous structures (if recorded).
FIRENAME: An alternative or secondary name for the fire incident.
geometry: A geospatial representation of the location in a point format (e.g., "POINT (-13585927.697 4646740.750)").
https://news.1rj.ru/str/datasets1🎙
Structures Impacted by Wildland Fires in California!
Column Denoscriptions:
OBJECTID: A unique identifier for each record in the dataset.
DAMAGE: Indicates the level of fire damage to the structure (e.g., "No Damage", "Affected (1-9%)").
STREETNUMBER: The street number of the impacted structure.
STREETNAME: The name of the street where the impacted structure is located.
STREETTYPE: The type of street (e.g., "Road", "Lane").
STREETSUFFIX: Additional address information, such as apartment or building numbers (if applicable).
CITY: The city where the impacted structure is located.
STATE: The state abbreviation (e.g., "CA" for California).
ZIPCODE: The postal code of the impacted structure.
CALFIREUNIT: The CAL FIRE unit responsible for the area.
COUNTY: The county where the impacted structure is located.
COMMUNITY: The community or neighborhood of the structure.
INCIDENTNAME: The name of the fire incident that impacted the structure.
APN: The Assessor’s Parcel Number (APN) of the property.
ASSESSEDIMPROVEDVALUE: The assessed value of the improved property (e.g., structures, not just land).
YEARBUILT: The year the structure was built.
SITEADDRESS: The full address of the property, including city, state, and ZIP code.
GLOBALID: A globally unique identifier for each record.
Latitude: The latitude coordinate of the structure’s location.
Longitude: The longitude coordinate of the structure’s location.
UTILITYMISCSTRUCTUREDISTANCE: The distance between the main structure and any utility or miscellaneous structures (if recorded).
FIRENAME: An alternative or secondary name for the fire incident.
geometry: A geospatial representation of the location in a point format (e.g., "POINT (-13585927.697 4646740.750)").
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4🔥3❤1
archive.zip
18.6 MB
The California Wildfire Data 🔥 🔥 🔥 🔥
#Datasets #Kaggle #MachineLearning #Python #ML #LLM #NLP #ComputerVision #GPT4
https://news.1rj.ru/str/datasets1⚠️
#Datasets #Kaggle #MachineLearning #Python #ML #LLM #NLP #ComputerVision #GPT4
https://news.1rj.ru/str/datasets1
Please open Telegram to view this post
VIEW IN TELEGRAM
❤4👍3🔥3