Improving hyperparameters by hand means manually adjusting hyperparameter values through trial and error instead of using automated tuning methods like GridSearchCV or RandomizedSearchCV.
How It Works:
*Start with default hyperparameters.
*Train the model and evaluate performance.
*Adjust one hyperparameter at a time (e.g. max features nestimators, max depth).
*Retrain and compare results.
*Repeat until you find the best settings.
How It Works:
*Start with default hyperparameters.
*Train the model and evaluate performance.
*Adjust one hyperparameter at a time (e.g. max features nestimators, max depth).
*Retrain and compare results.
*Repeat until you find the best settings.
👍3
RandomizedSearchCV (Faster Alternative)
🎯 How it works:
Randomly selects a subset of hyperparameter combinations instead of trying all.
Still uses cross-validation to evaluate performance.
Saves time by focusing on random but diverse samples.
✅ Pros:
✔️ Much faster than GridSearchCV.
✔️ Works well when there are many hyperparameters.
❌ Cons:
❌ Might not find the absolute best combination (since it’s random).
❌ Less exhaustive compared to GridSearchCV.
🎯 How it works:
Randomly selects a subset of hyperparameter combinations instead of trying all.
Still uses cross-validation to evaluate performance.
Saves time by focusing on random but diverse samples.
✅ Pros:
✔️ Much faster than GridSearchCV.
✔️ Works well when there are many hyperparameters.
❌ Cons:
❌ Might not find the absolute best combination (since it’s random).
❌ Less exhaustive compared to GridSearchCV.
GridSearchCV (Exhaustive Search)
🔍 How it works:
Tries every possible combination of hyperparameters from a predefined set.
Uses cross-validation to evaluate each combination.
Selects the best performing set.
✅ Pros:
✔️ Finds the best hyperparameters since it checks all options.
✔️ Ensures optimal tuning when the search space is small.
❌ Cons:
❌ Very slow if there are many parameters and values.
❌ Computationally expensive.
🔍 How it works:
Tries every possible combination of hyperparameters from a predefined set.
Uses cross-validation to evaluate each combination.
Selects the best performing set.
✅ Pros:
✔️ Finds the best hyperparameters since it checks all options.
✔️ Ensures optimal tuning when the search space is small.
❌ Cons:
❌ Very slow if there are many parameters and values.
❌ Computationally expensive.
Mike's ML Forge
Ofc im in class room😁
I already filled and cleaned the missing values also seeing some visualisation mannn this so fun😅
👍2
Media is too big
VIEW IN TELEGRAM
Feature Encoding 101: Prepare Data For Machine Learning
various feature encoding methods. These are important in order to turn all sorts of features into meaningful numerical representations.
If you ever see me staring at a flower for too long… don’t interrupt. It’s a moment of deep appreciation stg😭
❤6
what i meant by Imbalanced Dataset is for example what if a dataset has an unequal distribution of a target class (let's say 90% class A, 10% class B), it prevents one set from having too many samples of one class. so Normal train_test_split() might lead to under-representation of rare categories in the test set, making model evaluation unreliable.
StratifiedShuffleSplit in Scikit-Learn
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
The StratifiedShuffleSplit class in Scikit-Learn is used for splitting datasets into training and test sets while maintaining the same proportion of a specific category (strata) in both sets. It is particularly useful when working with imbalanced datasets to ensure that the train and test sets have a similar distribution of the target variable.
it Prevents the test set from being skewed toward high or low-income areas, which could happen with a simple random split.
from sklearn.model_selection import StratifiedShuffleSplit
import pandas as pd
# Creating income categories
housing["income_category"] = pd.cut(housing["median_income"],
bins=[0., 1.5, 3.0, 4.5, 6., float("inf")],
labels=[1, 2, 3, 4, 5])
# Stratified split based on income category
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_idx, test_idx in split.split(housing, housing["income_category"]):
strat_train_set = housing.loc[train_idx]
strat_test_set = housing.loc[test_idx]
👍2
Python Data Cleaning Cookbook.pdf
3.4 MB
Python Data Cleaning Cookbook
Michael Walker, 2023
Michael Walker, 2023