Mike's ML Forge
Ofc im in class room😁
I already filled and cleaned the missing values also seeing some visualisation mannn this so fun😅
👍2
Media is too big
VIEW IN TELEGRAM
Feature Encoding 101: Prepare Data For Machine Learning
various feature encoding methods. These are important in order to turn all sorts of features into meaningful numerical representations.
If you ever see me staring at a flower for too long… don’t interrupt. It’s a moment of deep appreciation stg😭
❤6
what i meant by Imbalanced Dataset is for example what if a dataset has an unequal distribution of a target class (let's say 90% class A, 10% class B), it prevents one set from having too many samples of one class. so Normal train_test_split() might lead to under-representation of rare categories in the test set, making model evaluation unreliable.
StratifiedShuffleSplit in Scikit-Learn
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
from sklearn.model_selection import StratifiedShuffleSplit
import pandas as pd
# Creating income categories
housing["income_category"] = pd.cut(housing["median_income"],
bins=[0., 1.5, 3.0, 4.5, 6., float("inf")],
labels=[1, 2, 3, 4, 5])
# Stratified split based on income category
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_idx, test_idx in split.split(housing, housing["income_category"]):
strat_train_set = housing.loc[train_idx]
strat_test_set = housing.loc[test_idx]
👍2
Python Data Cleaning Cookbook.pdf
3.4 MB
Python Data Cleaning Cookbook
Michael Walker, 2023
Michael Walker, 2023
Forwarded from Dagmawi Babi
Media is too big
VIEW IN TELEGRAM
This's so impressive wtfruits!!! 🔥
Lucy — Multilingual AI Voice Assistant & Chatbot for Ethiopians
• https://www.linkedin.com/posts/zemenu_lucy-next-generation-ai-ugcPost-7298736880336920576-HD8B
I was impressed at the Amharic TTS and also when bro wrote amharic in english "endezih" and it understood. Not to mention it scrapes telegram channels and understands major Ethiopian languages,
#CommunityShowcase #Zemenu #Lucy
@Dagmawi_Babi
Lucy — Multilingual AI Voice Assistant & Chatbot for Ethiopians
• https://www.linkedin.com/posts/zemenu_lucy-next-generation-ai-ugcPost-7298736880336920576-HD8B
I was impressed at the Amharic TTS and also when bro wrote amharic in english "endezih" and it understood. Not to mention it scrapes telegram channels and understands major Ethiopian languages,
#CommunityShowcase #Zemenu #Lucy
@Dagmawi_Babi
🔥2
Forwarded from Mike's ML Forge (Mike)
I can't just see this and keep my mouth shut, so the thing is
How to Pick the Right Machine Learning Algorithm
One of the hardest parts of machine learning is choosing the right algorithm for the job. Different algorithms are suited for different types of problems. Here’s a simple way to break it down:
Step 1: What kind of problem are you solving?
Everything starts with understanding what you want to predict or classify. Your problem will fall into one of these categories:
1. Classification – When you need to categorize things (e.g., "Is this email spam or not?").
2. Regression – When you need to predict a number (e.g., "How much will a house cost?").
3. Clustering – When you want the computer to group things automatically without labels (e.g., "Group customers by similar behavior").
4. Dimensionality Reduction – When you have too much data and need to simplify it while keeping the important parts
How to Pick the Right Machine Learning Algorithm
One of the hardest parts of machine learning is choosing the right algorithm for the job. Different algorithms are suited for different types of problems. Here’s a simple way to break it down:
Step 1: What kind of problem are you solving?
Everything starts with understanding what you want to predict or classify. Your problem will fall into one of these categories:
1. Classification – When you need to categorize things (e.g., "Is this email spam or not?").
2. Regression – When you need to predict a number (e.g., "How much will a house cost?").
3. Clustering – When you want the computer to group things automatically without labels (e.g., "Group customers by similar behavior").
4. Dimensionality Reduction – When you have too much data and need to simplify it while keeping the important parts
Forwarded from Oops, My Brain Did That
God
make me invisible from the eyes of others and attract me to you and help me repel from this worldly planet and come to you , help me overcome my sins and be christ-like you
In your name I pray amen🫀☦️
make me invisible from the eyes of others and attract me to you and help me repel from this worldly planet and come to you , help me overcome my sins and be christ-like you
In your name I pray amen🫀☦️
❤4