Conceptual Modeling For ETL Processes.pdf
460.5 KB
Discusses Modeling ETL workflows for data warehousing, including data sources and transformations, from Drexel University.
❤5
📚 Data Science Riddle
During EDA(Explanatory Data Analysis), what's the main reason we use box plots?
During EDA(Explanatory Data Analysis), what's the main reason we use box plots?
Anonymous Quiz
22%
To visualize distributions
64%
To detect outliers
10%
To see correlations
4%
To test normality
❤5
Hey everyone 👋
Some time ago, I asked if I should start a Data Science educational series and since 96% of you said yes, I began creating it.
But many of you also asked for real, hands-on experience with projects, not just lessons. So I decided to shift gears. It’s now becoming a full practical coding course! 💻
My goal is to help you build skills that get you job-ready, not just teach theory. It’s taking a bit longer, but I promise it’ll be worth it.
Thank you all for your support and patience ❤️
I’ll let you know as soon as we’re ready to start!
Some time ago, I asked if I should start a Data Science educational series and since 96% of you said yes, I began creating it.
But many of you also asked for real, hands-on experience with projects, not just lessons. So I decided to shift gears. It’s now becoming a full practical coding course! 💻
My goal is to help you build skills that get you job-ready, not just teach theory. It’s taking a bit longer, but I promise it’ll be worth it.
Thank you all for your support and patience ❤️
I’ll let you know as soon as we’re ready to start!
❤20👍3🥰1
📚 Data Science Riddle
Your batch ETL job runs slower each week despite no code change. What's your first suspect?
Your batch ETL job runs slower each week despite no code change. What's your first suspect?
Anonymous Quiz
11%
Code inefficiency
20%
Schema mismatch
64%
Data volume growth
5%
Resource throttling
🚨 When & How Jupyter Notebooks Fail (And What To Use Instead)
Hey Data Folks! 👩💻👨💻
Let’s talk about Jupyter Notebooks — powerful for exploration, but risky in production. Here’s why:
❌ Problems with Notebooks:
1. Out-of-order execution → hidden bugs.
2. Code changes after execution → inconsistent results.
3. Data leakage → sensitive info in outputs.
4. Security risks → tokens/keys exposed.
5. Hard to apply engineering practices → no modular code, testing, CI/CD.
6. Collaboration pain → merge conflicts, JSON issues.
7. Reproducibility issues → missing dependencies, versions.
✅ When They’re Useful:
- Quick data exploration & prototyping.
- Knowledge sharing (clean, runnable from top to bottom).
- Teaching / hands-on tutorials (with solution notebooks).
🔧 What to Use Instead:
- For production code → .py files + IDEs.
- For workflows → template repos & reproducible setups.
- For deployment → MLOps tools, pipelines, automation.
💡 Key Takeaways:
- Use notebooks for exploration & teaching.
- Use structured code + pipelines for production & deployment.
- Always document dependencies, keep notebooks clean, never commit secrets!
Hey Data Folks! 👩💻👨💻
Let’s talk about Jupyter Notebooks — powerful for exploration, but risky in production. Here’s why:
❌ Problems with Notebooks:
1. Out-of-order execution → hidden bugs.
2. Code changes after execution → inconsistent results.
3. Data leakage → sensitive info in outputs.
4. Security risks → tokens/keys exposed.
5. Hard to apply engineering practices → no modular code, testing, CI/CD.
6. Collaboration pain → merge conflicts, JSON issues.
7. Reproducibility issues → missing dependencies, versions.
✅ When They’re Useful:
- Quick data exploration & prototyping.
- Knowledge sharing (clean, runnable from top to bottom).
- Teaching / hands-on tutorials (with solution notebooks).
🔧 What to Use Instead:
- For production code → .py files + IDEs.
- For workflows → template repos & reproducible setups.
- For deployment → MLOps tools, pipelines, automation.
💡 Key Takeaways:
- Use notebooks for exploration & teaching.
- Use structured code + pipelines for production & deployment.
- Always document dependencies, keep notebooks clean, never commit secrets!
❤5👍2
List of AI Project Ideas 👨🏻💻
Beginner Projects
🔹 Sentiment Analyzer
🔹 Image Classifier
🔹 Spam Detection System
🔹 Face Detection
🔹 Chatbot (Rule-based)
🔹 Movie Recommendation System
🔹 Handwritten Digit Recognition
🔹 Speech-to-Text Converter
🔹 AI-Powered Calculator
🔹 AI Hangman Game
Intermediate Projects
🔸 AI Virtual Assistant
🔸 Fake News Detector
🔸 Music Genre Classification
🔸 AI Resume Screener
🔸 Style Transfer App
🔸 Real-Time Object Detection
🔸 Chatbot with Memory
🔸 Autocorrect Tool
🔸 Face Recognition Attendance System
🔸 AI Sudoku Solver
Advanced Projects
🔺 AI Stock Predictor
🔺 AI Writer (GPT-based)
🔺 AI-powered Resume Builder
🔺 Deepfake Generator
🔺 AI Lawyer Assistant
🔺 AI-Powered Medical Diagnosis
🔺 AI-based Game Bot
🔺 Custom Voice Cloning
🔺 Multi-modal AI App
🔺 AI Research Paper Summarizer
Beginner Projects
🔹 Sentiment Analyzer
🔹 Image Classifier
🔹 Spam Detection System
🔹 Face Detection
🔹 Chatbot (Rule-based)
🔹 Movie Recommendation System
🔹 Handwritten Digit Recognition
🔹 Speech-to-Text Converter
🔹 AI-Powered Calculator
🔹 AI Hangman Game
Intermediate Projects
🔸 AI Virtual Assistant
🔸 Fake News Detector
🔸 Music Genre Classification
🔸 AI Resume Screener
🔸 Style Transfer App
🔸 Real-Time Object Detection
🔸 Chatbot with Memory
🔸 Autocorrect Tool
🔸 Face Recognition Attendance System
🔸 AI Sudoku Solver
Advanced Projects
🔺 AI Stock Predictor
🔺 AI Writer (GPT-based)
🔺 AI-powered Resume Builder
🔺 Deepfake Generator
🔺 AI Lawyer Assistant
🔺 AI-Powered Medical Diagnosis
🔺 AI-based Game Bot
🔺 Custom Voice Cloning
🔺 Multi-modal AI App
🔺 AI Research Paper Summarizer
❤7👏1
📚 Data Science Riddle
You discover your regression model performs poorly on recent data. The relationships between variables have shifted. What's this called?
You discover your regression model performs poorly on recent data. The relationships between variables have shifted. What's this called?
Anonymous Quiz
39%
Model Overfitting
39%
Concept Drift
10%
Sampling Error
13%
Data Leakage
Regularization: The Art of Keeping Models Humble
Overfitting is the “ego problem” of models. They memorize training data and forget how to generalize.
Regularization is how we humble them.
➡️ L1 (Lasso): Shrinks some weights to zero → performs feature selection.
➡️ L2 (Ridge): Reduces all weights slightly → smooths learning.
➡️ Dropout: Randomly removes neurons during training → prevents co-dependence.
It’s not about punishment but it’s about discipline.
Regularization teaches models to focus on patterns, not exceptions.
💭 Remember: The best models don’t just fit data. They respect uncertainty.
Overfitting is the “ego problem” of models. They memorize training data and forget how to generalize.
Regularization is how we humble them.
➡️ L1 (Lasso): Shrinks some weights to zero → performs feature selection.
➡️ L2 (Ridge): Reduces all weights slightly → smooths learning.
➡️ Dropout: Randomly removes neurons during training → prevents co-dependence.
It’s not about punishment but it’s about discipline.
Regularization teaches models to focus on patterns, not exceptions.
💭 Remember: The best models don’t just fit data. They respect uncertainty.
❤7😁1
Explaining LLMs By BigData Specialist.pdf
4.3 MB
This is our latest post from Instagram page, saved as PDF.
If you want a very comprehensive breakdown on what's LLMs are and how they actually work, you might want to check it out.
Here's our Instagram post: Explaining LLMs
If you want a very comprehensive breakdown on what's LLMs are and how they actually work, you might want to check it out.
Here's our Instagram post: Explaining LLMs
❤9
📚 Data Science Riddle
Why might your SQL join explode the number of rows unexpectedly?
Why might your SQL join explode the number of rows unexpectedly?
Anonymous Quiz
20%
Index missing
40%
Wrong join key
33%
Duplicate keys
8%
Slow query optimizer
Database Querying Using SQL.pdf
136.4 KB
Notes on SQL for data management and analysis, including queries and integration with R, from University of South Carolina.
❤2👏1
📚 Data Science Riddle
A business team wants interpretable insights, not just predictions. What's the best model to start with?
A business team wants interpretable insights, not just predictions. What's the best model to start with?
Anonymous Quiz
32%
Random Forest
36%
Logistic Regression
12%
XGBoost
19%
Deep Neural Net
Forwarded from Cool GitHub repositories
lerobot
This is an end-to-end library for robot learning. It handles the entire pipeline from loading and processing robotics datasets to training policies and deploying them in simulation or on real hardware.
Creator: huggingface
Stars ⭐️: 19,000
Forked by: 3,000
Github Repo:
https://github.com/huggingface/lerobot
#robotics #AI
➖➖➖➖➖➖➖➖➖➖➖➖➖➖
Join @github_repositories_bds for more cool repositories. This channel belongs to @bigdataspecialist group
This is an end-to-end library for robot learning. It handles the entire pipeline from loading and processing robotics datasets to training policies and deploying them in simulation or on real hardware.
Creator: huggingface
Stars ⭐️: 19,000
Forked by: 3,000
Github Repo:
https://github.com/huggingface/lerobot
#robotics #AI
➖➖➖➖➖➖➖➖➖➖➖➖➖➖
Join @github_repositories_bds for more cool repositories. This channel belongs to @bigdataspecialist group
GitHub
GitHub - huggingface/lerobot: 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning - huggingface/lerobot
❤3