Data science/ML/AI – Telegram
Data science/ML/AI
13K subscribers
508 photos
1 video
98 files
314 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
👉 https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
Regression Analysis Cheatsheet
5
Linear Regression.pdf
834.6 KB
Covers basics of Linear Regression for modeling numerical data, including assumptions and applications in genetics, from University of Washington.
5
📚 Data Science Riddle

In a real-world NLP project, your model performs poorly on new slang abbreviations. What's the fix?
Anonymous Quiz
7%
Add more layers
72%
Use contextual embeddings like BERT
13%
Tune dropout
8%
Increase token length
1
Top 6 Data Concepts
5
📚 Data Science Riddle

A data engineer complains that your model training job is failing in production due to schema mismatch. What's the root fix?
Anonymous Quiz
12%
Cast data types in code
16%
Skip invalid rows
21%
Retrain with old schema
52%
Use a schema registry
K-Means Clustering
4
Covariance vs. Correlation: Same Family, Different Story

People use them interchangeably but they measure different things.

Covariance tells you the direction of relationship (positive or negative).
Correlation goes further; it tells you the strength, normalized between -1 and 1.

So while covariance can be 2345.67, correlation says 0.92. clear, interpretable, scale-free.
Covariance shows movement, correlation shows consistency.
5👍1
📚 Data Science Riddle

You're Processing a dataset with frequent schema evolution. Which format handles it most gracefully?
Anonymous Quiz
10%
ORC
13%
Avro
57%
CSV
19%
Parquet
4
Eigenvalues & Eigenvectors — Why PCA Actually Works

You’ve heard of PCA. But what’s really happening underneath?

PCA finds the directions (vectors) where your data varies the most.

Those directions are eigenvectors of the covariance matrix and the eigenvalues tell you how much variance each captures.

You’re basically rotating your data to find its “natural axes.”

PCA isn’t compression — it’s discovering how your data wants to be seen.
7👏2
📚 Data Science Riddle

Your spark job fails due to executor memory pressure. Most effective optimization?
Anonymous Quiz
14%
Broadcast variables
29%
Larger cluster
41%
More shuffle partitions
16%
Persist fewer objects
BigDataAnalytics-Lecture.pdf
10.2 MB
Notes on HDFS, MapReduce, YARN, Hadoop vs. traditional systems and much more... from Columbia University.
7
📚 Data Science Riddle

You fit a forecasting model and residuals show increasing variance. What is needed?
Anonymous Quiz
20%
Differnecing
46%
Smoothing
27%
Decomposition
7%
Box-Cox
👍31
4 Pillars of Data Science
🔥4
AI vs Machine Learning vs Deep Learning Vs Generative AI
5
📚 Data Science Riddle

A numeric feature has many repeated exact values with occasional jumps. What type of variable is this?
Anonymous Quiz
28%
Discrete
22%
Ordinal
17%
Continuous
33%
Interval
4
Machine Learning Notes.pdf
226.8 KB
A Stanford CS' Lecture note diving into supervised/unsupervised algorithms, neural networks, SVMs with math proofs and Python pseudocode.
7
Kafka 101
5
📚 Data Science Riddle

Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
6%
Loss Curves
12%
Batch shapes
61%
Random seeds
22%
Metric choice
The Simplest Machine Learning Cheatsheet
6👍1
📚 Data Science Riddle

A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
56%
Add indexes
17%
Use aliases
16%
Add DISTINCT
11%
Increase RAM
Everything You need To Know About Databricks
3