Data science/ML/AI – Telegram
Data science/ML/AI
13K subscribers
510 photos
1 video
98 files
314 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
👉 https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
Data Structures in R
5👏2
The RAG Developer Stack 2025 - Build Intelligent Al That Thinks, Remembers & Acts
5😭2
📚 Data Science Riddle

Which algorithm is most sensitive to feature scaling?
Anonymous Quiz
25%
Decision Tree
24%
Random Forest
36%
KNN
15%
Naive Bayes
Great Packages for R
2
Big Data 5V
👍21
📚 Data Science Riddle

Why does bagging reduce variance?
Anonymous Quiz
13%
Uses deeper trees
50%
Averages multiple models
29%
Penalizes weights
9%
Learns Sequentially
📊 Infographic Elements That Every Data Person Should Master 🚀

After years of working with data, I can tell you one thing:
👉 The chart ou choose is as important as the data itself.

Here’s your quick visual toolkit 👇

🔹 Timelines

* Sequential great for processes
* Scaled best for real dates/events

🔹 Circular Charts

* Donut 🍩 & Pie 🥧 for proportions
* Radial 🌌 for progress or cycles
* Venn 🎯 when you want to show overlaps

🔹 Creative Comparisons

* Bubble 🫧 & Area 🔵 for impact by size
* Dot Matrix 🔴 for colorful distributions
* Pictogram 👥 when storytelling matters most

🔹 Classic Must-Haves

* Bar 📊 & Histogram 📏 (clear, reliable)
* Line 📈 for trends
* Area 🌊 & Stacked Area for the “big picture”

🔹 Advanced Tricks

* Stacked Bar 🏗 when categories add up
* Span 📐 for ranges
* Arc 🌈 for relationships

💡 Pro tip from experience:
If your audience doesn’t “get it” in 3 seconds, change the chart. The best visualizations speak louder than numbers
8🔥3
Most Common Data Science Skills in Job Posting
5
Machine Learning Cheatsheet
4
📚 Data Science Riddle

Which Metric is best for imbalanced classification?
Anonymous Quiz
20%
Accuracy
18%
Precision
18%
Recall
44%
F1-Score
SQL JOINS
3
Introduction To Linear Regression
8
📚 Data Science Riddle

A dataset has 20% missing values in a critical column. What's the most practical choice?
Anonymous Quiz
6%
Drop all rows
49%
Fill with mean/median
41%
Use model-based imputation
5%
Ignore missing data
3
ML models don’t all think alike 🤖

❇️ Naive Bayes = probability
❇️ KNN = proximity
❇️ Discriminant Analysis = decision boundaries

Different paths, same goal: accurate classification.

Which one do you reach for first?
4
📚 Data Science Riddle

In a medical diagnosis project, what's more important?
Anonymous Quiz
34%
High precision
15%
High recall
37%
High accuracy
14%
High F1-score
Important LLM Terms

🔹 Transformer Architecture
🔹 Attention Mechanism
🔹 Pre-training
🔹 Fine-tuning
🔹 Parameters
🔹 Self-Attention
🔹 Embeddings
🔹 Context Window
🔹 Masked Language Modeling (MLM)
🔹 Causal Language Modeling (CLM)
🔹 Multi-Head Attention
🔹 Tokenization
🔹 Zero-Shot Learning
🔹 Few-Shot Learning
🔹 Transfer Learning
🔹 Overfitting
🔹 Inference

🔹 Language Model Decoding
🔹 Hallucination
🔹 Latency
11
Cheatsheet: Bayes Theroem And Classifier
9
Why is Kafka Called Kafka

Here’s a fun fact that surprises a lot of people.

The “Kafka” you use for real-time data pipelines is… named after the novelist Franz Kafka.

Why? Jay Kreps (the creator) once explained it simply:

- He liked the name.
- It sounded mysterious.
- And Kafka (the author) wrote a lot.

That last part is key.
Because Apache Kafka is all about writing: streams of events, logs, and data in motion.
So the name stuck.

Today, Millions of engineers across the globe talk about “Kafka” every single day… and most don’t realize they’re also invoking a 20th-century novelist.

It's funny how small choices like naming your project can shape how the world remembers it.
5👍1😁1