Dealing with Missing Values
Real-world data is messy, missing values are common.
Here's how to handle them cleanly:
Next up 👉 Using GroupBy
Real-world data is messy, missing values are common.
Here's how to handle them cleanly:
# Check for nulls
df.isnull().sum()
# Drop rows with any missing values
df_clean = df.dropna()
# Fill missing values
df['age'].fillna(df['age'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
# Forward or backward fill (great for time series)
df['value'].ffill()
Next up 👉 Using GroupBy
❤7
Using GroupBy
GroupBy is where Pandas shines brightest. It summarizes data by categories in one line.
Next up 👉 Sorting and Ranking
GroupBy is where Pandas shines brightest. It summarizes data by categories in one line.
# Total sales by region
df.groupby('region')['sales'].sum()
# Multiple aggregations
df.groupby('region').agg({
'sales': 'sum',
'customer_id': 'nunique',
'order_date': 'max'
})
# Group by multiple columns
df.groupby(['region', 'product'])['sales'].mean()
Next up 👉 Sorting and Ranking
❤3
📚 Data Science Riddle - Evaluation
You're measuring performance on a dataset with heavy class imbalance. What metric is most reliable?
You're measuring performance on a dataset with heavy class imbalance. What metric is most reliable?
Anonymous Quiz
20%
Accuracy
44%
F1 Score
17%
Precision
20%
AUC
Sorting and Ranking
Order matters! Sort your data to find top performers or trends:
Next up 👉 Merging and Joining Data
Order matters! Sort your data to find top performers or trends:
# Sort by one column
df.sort_values('sales', ascending=False)
# Sort by multiple columns
df.sort_values(['region', 'sales'], ascending=[True, False])
# Reset index after sorting
df = df.sort_values('sales', ascending=False).reset_index(drop=True)
# Add rank
df['sales_rank'] = df['sales'].rank(ascending=False)
Next up 👉 Merging and Joining Data
❤3
Media is too big
VIEW IN TELEGRAM
OnSpace Mobile App builder: Build AI Apps in minutes
With OnSpace, you can build website or AI Mobile Apps by chatting with AI, and publish to PlayStore or AppStore.
🔥 What will you get:
• 🤖 Create app or website by chatting with AI;
• 🧠 Integrate with Any top AI power just by giving order (like Sora2, Nanobanan Pro & Gemini 3 Pro);
• 📦 Download APK,AAB file, publish to AppStore.
• 💳 Add payments and monetize like in-app-purchase and Stripe.
• 🔐 Functional login & signup.
• 🗄 Database + dashboard in minutes.
• 🎥 Full tutorial on YouTube and within 1 day customer service
🌐 Visit website:
👉 https://www.onspace.ai/?via=tg_bigdata
📲 Or Download app:
👉 https://onspace.onelink.me/za8S/h1jb6sb9?c=bigdata
With OnSpace, you can build website or AI Mobile Apps by chatting with AI, and publish to PlayStore or AppStore.
🔥 What will you get:
• 🤖 Create app or website by chatting with AI;
• 🧠 Integrate with Any top AI power just by giving order (like Sora2, Nanobanan Pro & Gemini 3 Pro);
• 📦 Download APK,AAB file, publish to AppStore.
• 💳 Add payments and monetize like in-app-purchase and Stripe.
• 🔐 Functional login & signup.
• 🗄 Database + dashboard in minutes.
• 🎥 Full tutorial on YouTube and within 1 day customer service
🌐 Visit website:
👉 https://www.onspace.ai/?via=tg_bigdata
📲 Or Download app:
👉 https://onspace.onelink.me/za8S/h1jb6sb9?c=bigdata
❤4
Merging and Joining Data
Working with multiple datasets? Combine them just like SQL:
This wraps up our Data Manipulation Using Pandas Series.
Hit ❤️ if you liked this series. It will help us tailor more content based on what you like.
👉Join @datascience_bds for more
Part of the @bigdataspecialist family
Working with multiple datasets? Combine them just like SQL:
# Inner join (default)
merged = pd.merge(df_sales, df_customers, on='customer_id')
# Left join
pd.merge(df_sales, df_customers, on='customer_id', how='left')
# Concatenate vertically
all_data = pd.concat([df_2023, df_2024], ignore_index=True)
# Join on index
df1.join(df2, on='date')
This wraps up our Data Manipulation Using Pandas Series.
Hit ❤️ if you liked this series. It will help us tailor more content based on what you like.
👉Join @datascience_bds for more
Part of the @bigdataspecialist family
❤7
📚 Data Science Riddle - Regularization
A linear model starts performing worse on unseen data right after its training loss keeps decreasing. Which fix is moat appropriate ?
A linear model starts performing worse on unseen data right after its training loss keeps decreasing. Which fix is moat appropriate ?
Anonymous Quiz
10%
Increase epochs
62%
Add L2 penalty
16%
Shuffle data again
13%
Raise Learning rate
Vector Databases: Searching by Meaning, Not Keywords
Traditional databases retrieve exact matches.
Vector databases retrieve conceptual similarity.
They store high-dimensional embeddings(mathematical representations of meaning) and search by finding the closest vectors in that space. This is how modern systems power semantic search, personalized recommendations, and AI memory retrieval.
Instead of asking “Does this word appear?”, you ask:
👉 “Is this idea close to what I’m looking for?”
It’s a shift from storing text to storing understanding.
And it’s becoming the backbone of LLM-powered applications.
Traditional databases retrieve exact matches.
Vector databases retrieve conceptual similarity.
They store high-dimensional embeddings(mathematical representations of meaning) and search by finding the closest vectors in that space. This is how modern systems power semantic search, personalized recommendations, and AI memory retrieval.
Instead of asking “Does this word appear?”, you ask:
👉 “Is this idea close to what I’m looking for?”
It’s a shift from storing text to storing understanding.
And it’s becoming the backbone of LLM-powered applications.
❤6
📚 Data Science Riddle - Data Quality
Your dataset's numeric features contain silently corrupted values. What detection method helps?
Your dataset's numeric features contain silently corrupted values. What detection method helps?
Anonymous Quiz
28%
Min-max scaling
33%
Range validation
12%
Learning rate warmup
27%
Dropout masks
✅ Robotic Process Automation (RPA) Basics You Should Know 🤖⚙️
Robotic Process Automation (RPA) is a technology that uses software robots to automate repetitive, rule based digital tasks normally performed by humans.
🔹 1. What is RPA?
RPA is a form of automation where software bots mimic human actions to perform structured and repetitive tasks across applications.
🔹 2. How RPA Works:
→ Bot logs into applications
→ Reads and processes data
→ Applies predefined rules
→ Performs actions like clicking, typing, copying
→ Completes tasks without human intervention
🔹 3. Common Use Cases:
• Invoice processing
• Data entry and migration
• Payroll and HR operations
• Customer support automation
• Report generation
🔹 4. Key Benefits of RPA:
• Reduces manual work
• Improves accuracy
• Increases productivity
• Works 24x7
• Faster business processes
🔹 5. Popular RPA Tools:
• UiPath
• Automation Anywhere
• Blue Prism
• Microsoft Power Automate
🔹 6. RPA vs Traditional Automation:
• RPA works at UI level
• No need to change existing systems
• Faster deployment
• Lower development cost
🔹 7. Industries Using RPA:
• Banking and finance
• Healthcare
• Insurance
• E commerce
• Telecom
🔹 8. Limitations of RPA:
• Not suitable for unstructured data
• Depends on application stability
• Limited decision making ability
• Breaks if UI changes
🔹 9. RPA + AI (Intelligent Automation):
• AI handles decision making
• RPA handles execution
• Enables automation of complex processes
🔹 10. Future of RPA:
• More intelligent bots
• Integration with AI and ML
• End to end process automation
• Higher enterprise adoption
💡 Learning RPA helps you understand how automation is transforming modern businesses.
💬 Tap ❤️ for more!
Robotic Process Automation (RPA) is a technology that uses software robots to automate repetitive, rule based digital tasks normally performed by humans.
🔹 1. What is RPA?
RPA is a form of automation where software bots mimic human actions to perform structured and repetitive tasks across applications.
🔹 2. How RPA Works:
→ Bot logs into applications
→ Reads and processes data
→ Applies predefined rules
→ Performs actions like clicking, typing, copying
→ Completes tasks without human intervention
🔹 3. Common Use Cases:
• Invoice processing
• Data entry and migration
• Payroll and HR operations
• Customer support automation
• Report generation
🔹 4. Key Benefits of RPA:
• Reduces manual work
• Improves accuracy
• Increases productivity
• Works 24x7
• Faster business processes
🔹 5. Popular RPA Tools:
• UiPath
• Automation Anywhere
• Blue Prism
• Microsoft Power Automate
🔹 6. RPA vs Traditional Automation:
• RPA works at UI level
• No need to change existing systems
• Faster deployment
• Lower development cost
🔹 7. Industries Using RPA:
• Banking and finance
• Healthcare
• Insurance
• E commerce
• Telecom
🔹 8. Limitations of RPA:
• Not suitable for unstructured data
• Depends on application stability
• Limited decision making ability
• Breaks if UI changes
🔹 9. RPA + AI (Intelligent Automation):
• AI handles decision making
• RPA handles execution
• Enables automation of complex processes
🔹 10. Future of RPA:
• More intelligent bots
• Integration with AI and ML
• End to end process automation
• Higher enterprise adoption
💡 Learning RPA helps you understand how automation is transforming modern businesses.
💬 Tap ❤️ for more!