Data science/ML/AI – Telegram
Data science/ML/AI
13K subscribers
508 photos
1 video
98 files
314 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
👉 https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
The Simplest Machine Learning Cheatsheet
6👍1
📚 Data Science Riddle

A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
56%
Add indexes
17%
Use aliases
16%
Add DISTINCT
11%
Increase RAM
Everything You need To Know About Databricks
3
📚 Data Science Riddle

You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
54%
Box plot
29%
Heatmap
9%
Line chart
7%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
3
If you want to become a Data Scientist, this is the path to follow.
👍5
📚 Data Science Riddle

You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
30%
Checkpoints
18%
Contracts
40%
Indexes
13%
Sharding
🛠️ Running Code in Jupyter Notebooks

Jupyter Notebooks let you write & run code interactively.
Here’s a quick guide to make your workflow smoother:

▶️ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.

⌨️ Useful Shortcuts
- Shift + Enter → run current cell, move to next
- Alt + Enter → run current cell, insert new one below
- Ctrl + Enter → run current cell, stay in place

🔄 Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.

🖥️ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as they’re generated.
- Large outputs can be scrolled or collapsed for clarity.

💡 Pro Tip:
Always “Restart & Run All” before sharing or saving a notebook.
This ensures reproducibility and clean results.

👉   Explore
2
📚 Data Science Riddle

You need fast reads of small files. What storage options fits best?
Anonymous Quiz
21%
Distributed FS
10%
Cold storage
21%
Object Storage
48%
Local SSD
4
6 Must-Know Data Engineering Tools For Beginners
2👏2
📚 Data Science Riddle

A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
25%
Encode it differently
21%
Scale it
11%
Drop the feature
43%
Check interaction effects
Advanced Data Science on Spark.pdf
1.8 MB
Covers Spark for ML, graph processing (GraphFrames), and integration with Hadoop from Stanford University.
4
📚 Data Science Riddle

Your estimate has high variance. Best fix?
Anonymous Quiz
57%
Increase sample size
26%
Change confidence level
8%
Reduce bin count
8%
Switch to bootstrap
The Difference Between Model Accuracy and Business Accuracy

A model can be 95% accurate…
yet deliver 0% business value.

Why
Because data science metrics ≠ business metrics.

📌 Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue

Always align ML metrics with business KPIs.
Otherwise, your “great model” is just a great illusion.
5
📚 Data Science Riddle

Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Anonymous Quiz
27%
Gradient exploding
41%
Weak regularization
19%
Small batch size
13%
Slow optimizer
Complete AI (Artificial Intelligence) Roadmap 🤖🚀 

1️⃣ Basics of AI 
🔹 What is AI? 
🔹 Types: Narrow AI vs General AI 
🔹 AI vs ML vs DL 
🔹 Real-world applications 

2️⃣ Python for AI
🔹 Python syntax & libraries 
🔹 NumPy, Pandas for data handling 
🔹 Matplotlib, Seaborn for visualization 

3️⃣ Math Foundation
🔹 Linear Algebra: Vectors, Matrices 
🔹 Probability & Statistics 
🔹 Calculus basics 
🔹 Optimization techniques 

4️⃣ Machine Learning (ML)
🔹 Supervised vs Unsupervised 
🔹 Regression, Classification, Clustering 
🔹 Scikit-learn for ML 
🔹 Model evaluation metrics 

5️⃣ Deep Learning (DL)
🔹 Neural Networks basics 
🔹 Activation functions, backpropagation 
🔹 TensorFlow / PyTorch 
🔹 CNNs, RNNs, LSTMs 

6️⃣ NLP (Natural Language Processing)
🔹 Text cleaning & tokenization 
🔹 Word embeddings (Word2Vec, GloVe) 
🔹 Transformers & BERT 
🔹 Chatbots & summarization 

7️⃣ Computer Vision
🔹 Image processing basics 
🔹 OpenCV for CV tasks 
🔹 Object detection, image classification 
🔹 CNN architectures (ResNet, YOLO) 

8️⃣ Model Deployment
🔹 Streamlit / Flask APIs 
🔹 Docker for containerization 
🔹 Deploy on cloud: Render, Hugging Face, AWS 

9️⃣ Tools & Ecosystem
🔹 Git & GitHub 
🔹 Jupyter Notebooks
🔹 DVC, MLflow (for tracking models) 

🔟 Build AI Projects
🔹 Chatbot, Face recognition 
🔹 Spam classifier, Stock prediction 
🔹 Language translator, Object detector 
3👏1
📚 Data Science Riddle - CNN Kernels

Which convolution increases channel depth but not spatial size?
Anonymous Quiz
9%
1x1 convolution
29%
3x3 convolution
46%
Depthwise convolution
16%
Transposed convolution
1
Normalization vs Standardization: Why They’re Not the Same

People treat these two as interchangeable. they’re not.

👉 Normalization (Min-Max scaling):
Compresses values to 0–1.
Useful when magnitude matters (pixel values, distances).

👉 Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).

🔑 Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.

Pick the wrong one, and your model’s geometry becomes distorted.
4👏3
Hey everyone 👋

Tomorrow we are kicking off a new short & free series called:

📊 Data Importing Series 📊

We’ll go through all the real ways to pull data into Python:
→ CSV, Excel, JSON and more
→ Databases & SQL databases 
→ APIs, Google Sheets, even PDFs

Short lessons, ready-to-copy code, zero boring theory.

First part drops tomorrow.
Turn on notifications so you don’t miss it 🔔

Who’s excited? React with a 🔥 if you are.
🔥172
Data science/ML/AI
Hey everyone 👋 Tomorrow we are kicking off a new short & free series called: 📊 Data Importing Series 📊 We’ll go through all the real ways to pull data into Python: → CSV, Excel, JSON and more → Databases & SQL databases  → APIs, Google Sheets, even PDFs…
Loading a CSV file in Python

CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.

# Import the pandas library
import pandas as pd

# Specify the path to your CSV file
filename = "data.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(filename)

#Checking the first five rows
df.head()


Next up ➡️ Loading an Excel file in Python

👉Join @datascience_bds for more
Part of the @bigdataspecialist family
10
Data science/ML/AI
Hey everyone 👋 Tomorrow we are kicking off a new short & free series called: 📊 Data Importing Series 📊 We’ll go through all the real ways to pull data into Python: → CSV, Excel, JSON and more → Databases & SQL databases  → APIs, Google Sheets, even PDFs…
Loading an Excel file in Python

Excel files are packed with headers, logos, merged cells, and multiple sheets but pandas handles it all.
With just a few extra parameters, you can skip junk rows, pick exact columns,e.t.c

# Import the pandas library 
import pandas as pd

# Specify the path to your Excel file (.xlsx or .xls)
filename = "data.xlsx"

# Read the Excel file into a DataFrame
# Common options you'll use all the time:
df = pd.read_excel(
    filename,
    sheet_name=0,              # 0 = first sheet
    header=0,                  # Row (0-indexed) to use as column names
    skiprows=4,                # Skip first 4 rows
    nrows=1000,                # Load only first 1000 rows
)
# Check the first five rows
df.head()


Next up ➡️ Loading a text file in Python

👉Join @datascience_bds for more
Part of the @bigdataspecialist family
7