Data science/ML/AI – Telegram
Data science/ML/AI
13.1K subscribers
517 photos
2 videos
98 files
314 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
👉 https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
🧵 Thread Series on:

Mastering Pandas for Data Manipulation!


Pandas is the go-to library for handling tabular data in Python. Whether you're analyzing sales, surveys, or logs, start every project the same way:

import pandas as pd

# Load CSV
df = pd.read_csv('sales_data.csv')

# Quick look
df.head()     # First 5 rows
df.info()     # Structure & data types
df.describe() # Basic stats


Next up 👉 Selecting Columns & Rows
4🔥2
Selecting Columns & Rows

Need specific columns or rows? Pandas makes selection intuitive and fast:

# Single column (Series)
df['name']

# Multiple columns (DataFrame)
df[['name', 'age', 'sales']]

# Row selection with .loc (label-based)
df.loc[0:5]                    # Rows 0 to 5
df.loc[df['sales'] > 1000]     # Conditional

# .iloc (position-based)
df.iloc[0:5, 1:4]              # Rows 0-4, columns 1-3


Next up 👉 Filtering and Querying
6
Filtering and Querying

Want to zoom in on specific data?

Filtering in Pandas is incredibly powerful. Check the code below:

# Multiple conditions
high_sales = df[(df['sales'] > 1000) & (df['region'] == 'West')]

# Using .query() – cleaner syntax!
high_performers = df.query("sales > 1000 and region == 'West'")

# Find missing values
df[df['email'].isna()]

# Contains substring
df[df['product'].str.contains('Pro', case=False)]


Next up 👉 Adding and Removing Columns
4
Adding and Removing Columns

DataFrames are flexible! Easily create new columns or remove unnecessary ones:

# Add new column
df['revenue'] = df['sales'] * df['price']

# From existing columns
df['full_name'] = df['first_name'] + ' ' + df['last_name']

# Drop columns
df.drop(columns=['temp_col'], inplace=True)

# Or create a new DF without modifying original
clean_df = df.drop(columns=['old_col1', 'old_col2'])


Next up 👉 Dealing with Missing Values
6
Dealing with Missing Values

Real-world data is messy, missing values are common.

Here's how to handle them cleanly:

# Check for nulls
df.isnull().sum()

# Drop rows with any missing values
df_clean = df.dropna()

# Fill missing values
df['age'].fillna(df['age'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)

# Forward or backward fill (great for time series)
df['value'].ffill()


Next up 👉 Using GroupBy
6
Using GroupBy

GroupBy is where Pandas shines brightest. It summarizes data by categories in one line.

# Total sales by region
df.groupby('region')['sales'].sum()

# Multiple aggregations
df.groupby('region').agg({
    'sales': 'sum',
    'customer_id': 'nunique',
    'order_date': 'max'
})

# Group by multiple columns
df.groupby(['region', 'product'])['sales'].mean()


Next up 👉 Sorting and Ranking
3
📚 Data Science Riddle - Evaluation

You're measuring performance on a dataset with heavy class imbalance. What metric is most reliable?
Anonymous Quiz
19%
Accuracy
46%
F1 Score
15%
Precision
20%
AUC
Sorting and Ranking

Order matters! Sort your data to find top performers or trends:

# Sort by one column
df.sort_values('sales', ascending=False)

# Sort by multiple columns
df.sort_values(['region', 'sales'], ascending=[True, False])

# Reset index after sorting
df = df.sort_values('sales', ascending=False).reset_index(drop=True)

# Add rank
df['sales_rank'] = df['sales'].rank(ascending=False)


Next up 👉 Merging and Joining Data
3
Media is too big
VIEW IN TELEGRAM
OnSpace Mobile App builder: Build AI Apps in minutes

With OnSpace, you can build website or AI Mobile Apps by chatting with AI, and publish to PlayStore or AppStore.

🔥 What will you get:
🤖 Create app or website by chatting with AI;
🧠 Integrate with Any top AI power just by giving order (like Sora2, Nanobanan Pro & Gemini 3 Pro);
📦 Download APK,AAB file, publish to AppStore.
💳 Add payments and monetize like in-app-purchase and Stripe.
🔐 Functional login & signup.
🗄 Database + dashboard in minutes.
🎥 Full tutorial on YouTube and within 1 day customer service

🌐 Visit website:
👉 https://www.onspace.ai/?via=tg_bigdata

📲 Or Download app:
👉 https://onspace.onelink.me/za8S/h1jb6sb9?c=bigdata
4
Merging and Joining Data

Working with multiple datasets? Combine them just like SQL:

# Inner join (default)
merged = pd.merge(df_sales, df_customers, on='customer_id')

# Left join
pd.merge(df_sales, df_customers, on='customer_id', how='left')

# Concatenate vertically
all_data = pd.concat([df_2023, df_2024], ignore_index=True)

# Join on index
df1.join(df2, on='date')


This wraps up our Data Manipulation Using Pandas Series.

Hit ❤️ if you liked this series. It will help us tailor more content based on what you like.

👉Join @datascience_bds for more
Part of the @bigdataspecialist family
7
SQL Joins Explained Visually
4
📚 Data Science Riddle - Regularization

A linear model starts performing worse on unseen data right after its training loss keeps decreasing. Which fix is moat appropriate ?
Anonymous Quiz
12%
Increase epochs
60%
Add L2 penalty
13%
Shuffle data again
15%
Raise Learning rate
Vector Databases: Searching by Meaning, Not Keywords

Traditional databases retrieve exact matches.
Vector databases retrieve conceptual similarity.

They store high-dimensional embeddings(mathematical representations of meaning) and search by finding the closest vectors in that space. This is how modern systems power semantic search, personalized recommendations, and AI memory retrieval.

Instead of asking “Does this word appear?”, you ask:
👉 “Is this idea close to what I’m looking for?”

It’s a shift from storing text to storing understanding.
And it’s becoming the backbone of LLM-powered applications.
5