what i meant by Imbalanced Dataset is for example what if a dataset has an unequal distribution of a target class (let's say 90% class A, 10% class B), it prevents one set from having too many samples of one class. so Normal train_test_split() might lead to under-representation of rare categories in the test set, making model evaluation unreliable.
StratifiedShuffleSplit in Scikit-Learn
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
from sklearn.model_selection import StratifiedShuffleSplit
import pandas as pd
# Creating income categories
housing["income_category"] = pd.cut(housing["median_income"],
bins=[0., 1.5, 3.0, 4.5, 6., float("inf")],
labels=[1, 2, 3, 4, 5])
# Stratified split based on income category
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_idx, test_idx in split.split(housing, housing["income_category"]):
strat_train_set = housing.loc[train_idx]
strat_test_set = housing.loc[test_idx]
👍2
Python Data Cleaning Cookbook.pdf
3.4 MB
Python Data Cleaning Cookbook
Michael Walker, 2023
Michael Walker, 2023
Forwarded from Dagmawi Babi
Media is too big
VIEW IN TELEGRAM
This's so impressive wtfruits!!! 🔥
Lucy — Multilingual AI Voice Assistant & Chatbot for Ethiopians
• https://www.linkedin.com/posts/zemenu_lucy-next-generation-ai-ugcPost-7298736880336920576-HD8B
I was impressed at the Amharic TTS and also when bro wrote amharic in english "endezih" and it understood. Not to mention it scrapes telegram channels and understands major Ethiopian languages,
#CommunityShowcase #Zemenu #Lucy
@Dagmawi_Babi
Lucy — Multilingual AI Voice Assistant & Chatbot for Ethiopians
• https://www.linkedin.com/posts/zemenu_lucy-next-generation-ai-ugcPost-7298736880336920576-HD8B
I was impressed at the Amharic TTS and also when bro wrote amharic in english "endezih" and it understood. Not to mention it scrapes telegram channels and understands major Ethiopian languages,
#CommunityShowcase #Zemenu #Lucy
@Dagmawi_Babi
🔥2
Forwarded from Mike's ML Forge (Mike)
I can't just see this and keep my mouth shut, so the thing is
How to Pick the Right Machine Learning Algorithm
One of the hardest parts of machine learning is choosing the right algorithm for the job. Different algorithms are suited for different types of problems. Here’s a simple way to break it down:
Step 1: What kind of problem are you solving?
Everything starts with understanding what you want to predict or classify. Your problem will fall into one of these categories:
1. Classification – When you need to categorize things (e.g., "Is this email spam or not?").
2. Regression – When you need to predict a number (e.g., "How much will a house cost?").
3. Clustering – When you want the computer to group things automatically without labels (e.g., "Group customers by similar behavior").
4. Dimensionality Reduction – When you have too much data and need to simplify it while keeping the important parts
How to Pick the Right Machine Learning Algorithm
One of the hardest parts of machine learning is choosing the right algorithm for the job. Different algorithms are suited for different types of problems. Here’s a simple way to break it down:
Step 1: What kind of problem are you solving?
Everything starts with understanding what you want to predict or classify. Your problem will fall into one of these categories:
1. Classification – When you need to categorize things (e.g., "Is this email spam or not?").
2. Regression – When you need to predict a number (e.g., "How much will a house cost?").
3. Clustering – When you want the computer to group things automatically without labels (e.g., "Group customers by similar behavior").
4. Dimensionality Reduction – When you have too much data and need to simplify it while keeping the important parts
Forwarded from Oops, My Brain Did That
God
make me invisible from the eyes of others and attract me to you and help me repel from this worldly planet and come to you , help me overcome my sins and be christ-like you
In your name I pray amen🫀☦️
make me invisible from the eyes of others and attract me to you and help me repel from this worldly planet and come to you , help me overcome my sins and be christ-like you
In your name I pray amen🫀☦️
❤4
What's your interest in Machine Learning, AI, and Data Science?
Anonymous Poll
19%
I'm actively learning and practicing
69%
I'm interested but haven't started yet
6%
I'm curious but not sure if I want to learn
6%
Not interested at all
Mike's ML Forge
What's your interest in Machine Learning, AI, and Data Science?
Hey everyone! I know many of you are interested in ML, AI, and Data Science but haven’t started yet. I’m also learning, so let’s grow together and build a great community! 🚀
🤝2✍1
Beginner Roadmap to ML, AI & Data Science
💡 A step-by-step guide to help you start your journey
1️⃣ Learn Python Basics 🐍
Python is the most widely used language in AI & Data Science. Start by mastering:
✅ Variables, Data Types, Loops, Functions
✅ Object-Oriented Programming (OOP) basics
✅ File handling (reading CSV, JSON)
✅ Virtual environments & package management (pip, conda)
Recommended Resources:
freeCodeCamp’s Python Course
Python Crash Course by Eric Matthes (Book)
W3Schools Python Tutorials
🔗 Practice: Write small noscripts & automation projects (e.g., a weather app, a to-do list manager).
2️⃣ Master Data Handling 📊
Data Science is all about working with data. Learn how to:
✅ Use pandas for data manipulation (DataFrames, Series, handling missing values)
✅ Use NumPy for numerical computations (arrays, linear algebra)
✅ Use Matplotlib & Seaborn for data visualization
Recommended Resources:
Kaggle’s Python & pandas courses
"Python for Data Analysis" by Wes McKinney (Book)
Real-world datasets on Kaggle & UCI Machine Learning Repository
🔗 Practice:
Analyze a dataset (e.g., FIFA player stats, Netflix movies)
Clean & visualize real-world messy data
3️⃣ Understand Math & Statistics 📏
Math is essential for ML, but you don’t need a PhD. Focus on:
✅ Linear Algebra (vectors, matrices, dot products)
✅ Probability & Statistics (mean, variance, distributions)
✅ Calculus Basics (derivatives & optimization)
Recommended Resources:
Khan Academy (Linear Algebra & Stats)
"The Elements of Statistical Learning" (Book)
3Blue1Brown (YouTube for math visualization)
🔗 Practice:
Use Python (NumPy, SciPy) to apply math concepts
Simulate probability distributions with Python
4️⃣ Learn Machine Learning Basics 🤖
Start with Supervised & Unsupervised Learning concepts:
✅ Supervised Learning: Linear Regression, Decision Trees, Random Forests, SVM
✅ Unsupervised Learning: Clustering (K-Means, DBSCAN), PCA
✅ Evaluation Metrics: Accuracy, Precision-Recall, RMSE
Recommended Resources:
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" (Aurélien Géron)
Scikit-Learn Documentation & Tutorials
Kaggle micro-courses
🔗 Practice:
Predict house prices using Linear Regression
Classify iris flowers using Decision Trees
5️⃣ Work on Real-World Projects 🚀
Apply what you’ve learned with real-world data.
✅ Kaggle datasets & competitions
✅ End-to-end projects: Data collection → Cleaning → Modeling → Deployment
Project Ideas:
🔹 Sentiment analysis of tweets (Text data)
🔹 Predict sales revenue (Regression)
🔹 Identify handwritten digits (Image classification)
Final Tips 🎯
🔹 Start small, build consistently
🔹 Work on real datasets (Kaggle, Google Dataset Search)
🔹 Engage with the community (Kaggle, Stack Overflow, LinkedIn)
💡 A step-by-step guide to help you start your journey
1️⃣ Learn Python Basics 🐍
Python is the most widely used language in AI & Data Science. Start by mastering:
✅ Variables, Data Types, Loops, Functions
✅ Object-Oriented Programming (OOP) basics
✅ File handling (reading CSV, JSON)
✅ Virtual environments & package management (pip, conda)
Recommended Resources:
freeCodeCamp’s Python Course
Python Crash Course by Eric Matthes (Book)
W3Schools Python Tutorials
🔗 Practice: Write small noscripts & automation projects (e.g., a weather app, a to-do list manager).
2️⃣ Master Data Handling 📊
Data Science is all about working with data. Learn how to:
✅ Use pandas for data manipulation (DataFrames, Series, handling missing values)
✅ Use NumPy for numerical computations (arrays, linear algebra)
✅ Use Matplotlib & Seaborn for data visualization
Recommended Resources:
Kaggle’s Python & pandas courses
"Python for Data Analysis" by Wes McKinney (Book)
Real-world datasets on Kaggle & UCI Machine Learning Repository
🔗 Practice:
Analyze a dataset (e.g., FIFA player stats, Netflix movies)
Clean & visualize real-world messy data
3️⃣ Understand Math & Statistics 📏
Math is essential for ML, but you don’t need a PhD. Focus on:
✅ Linear Algebra (vectors, matrices, dot products)
✅ Probability & Statistics (mean, variance, distributions)
✅ Calculus Basics (derivatives & optimization)
Recommended Resources:
Khan Academy (Linear Algebra & Stats)
"The Elements of Statistical Learning" (Book)
3Blue1Brown (YouTube for math visualization)
🔗 Practice:
Use Python (NumPy, SciPy) to apply math concepts
Simulate probability distributions with Python
4️⃣ Learn Machine Learning Basics 🤖
Start with Supervised & Unsupervised Learning concepts:
✅ Supervised Learning: Linear Regression, Decision Trees, Random Forests, SVM
✅ Unsupervised Learning: Clustering (K-Means, DBSCAN), PCA
✅ Evaluation Metrics: Accuracy, Precision-Recall, RMSE
Recommended Resources:
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" (Aurélien Géron)
Scikit-Learn Documentation & Tutorials
Kaggle micro-courses
🔗 Practice:
Predict house prices using Linear Regression
Classify iris flowers using Decision Trees
5️⃣ Work on Real-World Projects 🚀
Apply what you’ve learned with real-world data.
✅ Kaggle datasets & competitions
✅ End-to-end projects: Data collection → Cleaning → Modeling → Deployment
Project Ideas:
🔹 Sentiment analysis of tweets (Text data)
🔹 Predict sales revenue (Regression)
🔹 Identify handwritten digits (Image classification)
Final Tips 🎯
🔹 Start small, build consistently
🔹 Work on real datasets (Kaggle, Google Dataset Search)
🔹 Engage with the community (Kaggle, Stack Overflow, LinkedIn)
👍3
Hope you guys know the basics of python so you can start with data handling and https://news.1rj.ru/str/MikeDevThoughts/134 you can simply start from this
Telegram
Mike's DevThoughts
https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/section-2-data-science-and-ml-tools/numpy-exercises.ipynb
👍2
Forwarded from Tech Nerd (Tech Nerd)
KiNFiSH Farms
how in the world i would answer this ... this things are getting out of control
Imagine the captcha is too complicated you need ai to help you prove you're not an ai 😁
@selfmadecoder
@selfmadecoder
🤣5
I want to take a moment to thank someone who inspired my journey into machine learning and AI. He introduced me to the field, constantly encouraged me, and pushed me to think bigger. Shoutout to this guy ,He’s the most humble and hardworking person I know, and he continues to be a huge inspiration. I truly appreciate his guidance—thank you for everything
@kalkin_21
@kalkin_21
👏7👍1