So today let's talk about the most important real world data science questions
"How do I know which features matter and which ones don't"
Let's break this down step by step, nice and clear
"How do I know which features matter and which ones don't"
Let's break this down step by step, nice and clear
🔥5
🔗 1. How to Find Correlation Between Features
Correlation shows how strongly two variables move together.
In pandas, the easiest way:
📊 Visualize it:
Values close to 1 = strong positive correlation
Values close to -1 = strong negative correlation
Close to 0 = no linear relationship
Correlation shows how strongly two variables move together.
In pandas, the easiest way:
python
import pandas as pd
# Load your dataset
df = pd.read_csv("your_data.csv")
# Get correlation matrix
correlation_matrix = df.corr(numeric_only=True)
# View it
print(correlation_matrix)
📊 Visualize it:
python
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.noscript("Correlation Matrix")
plt.show()
Values close to 1 = strong positive correlation
Values close to -1 = strong negative correlation
Close to 0 = no linear relationship
2. How to Know Which Columns Are Useful?
Here are every possible way to decide which features matter:
A. Correlation with Target Variable (for regression)
Check how each feature correlates with the target (label).
→ The ones with strong correlation (positive or negative) are often useful.
Here are every possible way to decide which features matter:
A. Correlation with Target Variable (for regression)
Check how each feature correlates with the target (label).
df.corr()['target_column'].sort_values(ascending=False)
→ The ones with strong correlation (positive or negative) are often useful.
👍1
I just packed my bag, hyped myself up to study my academic courses, and went to the library feeling like a scholar. mann Guess what?
I forgot everything notebooks even my pen... just vibes
me here Sitting like a monk with no scrolls then Started learning ML out of spite 😂
Sometimes forgetfulness leads to unexpected learning paths. Let’s roll with it😁
#MLChoseMe
I forgot everything notebooks even my pen... just vibes
me here Sitting like a monk with no scrolls then Started learning ML out of spite 😂
Sometimes forgetfulness leads to unexpected learning paths. Let’s roll with it😁
#MLChoseMe
😁3
which should be done first
Anonymous Quiz
10%
Scaling the data
70%
Handle missing values
20%
check correlation
It’s one of those nights where everything’s just… still. No chaos. No overthinking. Just peace. I needed this so bad and I don’t want it to end.
❤3
They say I’m strange for loving the night,
for walking alone with no one around,
for enjoying silence more than noise.
But I’m not strange. I just found peace where most people never look within.
for walking alone with no one around,
for enjoying silence more than noise.
But I’m not strange. I just found peace where most people never look within.
❤5
Alrighttt the break is over lemme get back to tuning them hyperparameters
⚡5
Hoping you find strength as you look to what was done on the cross
❤3🥰2
Mike's ML Forge
2. How to Know Which Columns Are Useful? Here are every possible way to decide which features matter: A. Correlation with Target Variable (for regression) Check how each feature correlates with the target (label). df.corr()['target_column'].sort_value…
B. Feature Importance (for tree models)
Use models like RandomForest or XGBoost to get importance scores.
Use models like RandomForest or XGBoost to get importance scores.
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X, y)
importances = model.feature_importances_
# Show them in a DataFrame
feature_scores = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
feature_scores = feature_scores.sort_values(by='Importance', ascending=False)
print(feature_scores)