#96. How do you know if your sample is representative of the population?
A: The best way is through proper sampling techniques. Random sampling is the gold standard, where every member of the population has an equal chance of being selected. You can also use stratified sampling, where you divide the population into subgroups (strata) and then take a random sample from each subgroup to ensure all groups are represented proportionally.
#97. What is your favorite data visualization and why?
A: "I find the box plot to be incredibly powerful and efficient. In a single, compact chart, it visualizes the distribution of data, showing the median, quartiles (25th and 75th percentiles), and potential outliers. It's excellent for comparing distributions across multiple categories and is much more informative than a simple bar chart of means."
#98. What is survivorship bias?
A: Survivorship bias is a logical error where you concentrate on the people or things that "survived" some process and inadvertently overlook those that did not because of their lack of visibility. A classic example is analyzing the habits of successful startup founders without considering the thousands who failed, which can lead to flawed conclusions about what it takes to succeed.
#99. You are given two datasets. How would you figure out if they can be joined?
A: I would first inspect the columns in both datasets to look for a common key or field. This field should ideally be a unique identifier (like
#100. Why do you want to be a data analyst?
A: "I am passionate about being a data analyst because I enjoy the process of transforming raw data into actionable insights that can drive real business decisions. I love the blend of technical skills like SQL and Python with the problem-solving and storytelling aspects of the role. I find it incredibly rewarding to uncover hidden patterns and help a company grow by making data-informed choices."
━━━━━━━━━━━━━━━
By: @DataScienceT ✨
A: The best way is through proper sampling techniques. Random sampling is the gold standard, where every member of the population has an equal chance of being selected. You can also use stratified sampling, where you divide the population into subgroups (strata) and then take a random sample from each subgroup to ensure all groups are represented proportionally.
#97. What is your favorite data visualization and why?
A: "I find the box plot to be incredibly powerful and efficient. In a single, compact chart, it visualizes the distribution of data, showing the median, quartiles (25th and 75th percentiles), and potential outliers. It's excellent for comparing distributions across multiple categories and is much more informative than a simple bar chart of means."
#98. What is survivorship bias?
A: Survivorship bias is a logical error where you concentrate on the people or things that "survived" some process and inadvertently overlook those that did not because of their lack of visibility. A classic example is analyzing the habits of successful startup founders without considering the thousands who failed, which can lead to flawed conclusions about what it takes to succeed.
#99. You are given two datasets. How would you figure out if they can be joined?
A: I would first inspect the columns in both datasets to look for a common key or field. This field should ideally be a unique identifier (like
user_id, product_id). I would check that the data types of these key columns are the same. Then, I would check the overlap of values between the key columns to understand how many records would match in a join.#100. Why do you want to be a data analyst?
A: "I am passionate about being a data analyst because I enjoy the process of transforming raw data into actionable insights that can drive real business decisions. I love the blend of technical skills like SQL and Python with the problem-solving and storytelling aspects of the role. I find it incredibly rewarding to uncover hidden patterns and help a company grow by making data-informed choices."
━━━━━━━━━━━━━━━
By: @DataScienceT ✨
🔹 Title: Continuous Autoregressive Language Models
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27688
• PDF: https://arxiv.org/pdf/2510.27688
• Project Page: https://shaochenze.github.io/blog/2025/CALM/
• Github: https://shaochenze.github.io/blog/2025/CALM
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27688
• PDF: https://arxiv.org/pdf/2510.27688
• Project Page: https://shaochenze.github.io/blog/2025/CALM/
• Github: https://shaochenze.github.io/blog/2025/CALM
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27492
• PDF: https://arxiv.org/pdf/2510.27492
• Project Page: https://thinkmorph.github.io/
• Github: https://github.com/ThinkMorph/ThinkMorph
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/ThinkMorph/Jigsaw_Assembly
• https://huggingface.co/datasets/ThinkMorph/Visual_Search
• https://huggingface.co/datasets/ThinkMorph/Chart_Refocus
• https://huggingface.co/datasets/ThinkMorph/Spatial_Navigation
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27492
• PDF: https://arxiv.org/pdf/2510.27492
• Project Page: https://thinkmorph.github.io/
• Github: https://github.com/ThinkMorph/ThinkMorph
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/ThinkMorph/Jigsaw_Assembly
• https://huggingface.co/datasets/ThinkMorph/Visual_Search
• https://huggingface.co/datasets/ThinkMorph/Chart_Refocus
• https://huggingface.co/datasets/ThinkMorph/Spatial_Navigation
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27684
• PDF: https://arxiv.org/pdf/2510.27684
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27684
• PDF: https://arxiv.org/pdf/2510.27684
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27623
• PDF: https://arxiv.org/pdf/2510.27623
• Project Page: https://zqs1943.github.io/BEAT/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27623
• PDF: https://arxiv.org/pdf/2510.27623
• Project Page: https://zqs1943.github.io/BEAT/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27607
• PDF: https://arxiv.org/pdf/2510.27607
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27607
• PDF: https://arxiv.org/pdf/2510.27607
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27266
• PDF: https://arxiv.org/pdf/2510.27266
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27266
• PDF: https://arxiv.org/pdf/2510.27266
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: The Denario project: Deep knowledge AI agents for scientific discovery
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26887
• PDF: https://arxiv.org/pdf/2510.26887
• Github: https://github.com/AstroPilot-AI/Denario
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26887
• PDF: https://arxiv.org/pdf/2510.26887
• Github: https://github.com/AstroPilot-AI/Denario
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25602
• PDF: https://arxiv.org/pdf/2510.25602
• Github: https://github.com/ChenMnZ/INT_vs_FP
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25602
• PDF: https://arxiv.org/pdf/2510.25602
• Github: https://github.com/ChenMnZ/INT_vs_FP
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24411
• PDF: https://arxiv.org/pdf/2510.24411
• Github: https://github.com/OS-Copilot/OS-Sentinel
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24411
• PDF: https://arxiv.org/pdf/2510.24411
• Github: https://github.com/OS-Copilot/OS-Sentinel
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Defeating the Training-Inference Mismatch via FP16
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26788
• PDF: https://arxiv.org/pdf/2510.26788
• Github: https://github.com/sail-sg/Precision-RL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26788
• PDF: https://arxiv.org/pdf/2510.26788
• Github: https://github.com/sail-sg/Precision-RL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Higher-order Linear Attention
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27258
• PDF: https://arxiv.org/pdf/2510.27258
• Project Page: https://yifanzhang-pro.github.io/HLA
• Github: https://github.com/yifanzhang-pro/HLA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27258
• PDF: https://arxiv.org/pdf/2510.27258
• Project Page: https://yifanzhang-pro.github.io/HLA
• Github: https://github.com/yifanzhang-pro/HLA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27606
• PDF: https://arxiv.org/pdf/2510.27606
• Github: https://github.com/InternLM/Spatial-SSRL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27606
• PDF: https://arxiv.org/pdf/2510.27606
• Github: https://github.com/InternLM/Spatial-SSRL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25889
• PDF: https://arxiv.org/pdf/2510.25889
• Project Page: https://rlinf.readthedocs.io/en/latest/rst_source/examples/pi0.html
• Github: https://github.com/RLinf/RLinf
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25889
• PDF: https://arxiv.org/pdf/2510.25889
• Project Page: https://rlinf.readthedocs.io/en/latest/rst_source/examples/pi0.html
• Github: https://github.com/RLinf/RLinf
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27224
• PDF: https://arxiv.org/pdf/2510.27224
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27224
• PDF: https://arxiv.org/pdf/2510.27224
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
Forwarded from Kaggle Data Hub
Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕
What topic are you itching to learn next?😊
https://news.1rj.ru/str/DataScienceC🌟
What topic are you itching to learn next?
https://news.1rj.ru/str/DataScienceC
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Udemy Coupons
ads: @HusseinSheikho
The first channel in Telegram that offers free
Udemy coupons
The first channel in Telegram that offers free
Udemy coupons
🔹 Title: Agent Lightning: Train ANY AI Agents with Reinforcement Learning
📝 Summary:
Agent Lightning is a flexible RL framework for training LLMs in any AI agent. It uniquely decouples agent execution from training, allowing seamless integration with diverse existing agents with minimal code changes. This enables robust training for complex interactions and shows stable performan...
🔹 Publication Date: Published on Aug 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03680
• PDF: https://arxiv.org/pdf/2508.03680
• Project Page: https://www.microsoft.com/en-us/research/project/agent-lightning/
• Github: https://github.com/microsoft/agent-lightning
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Agent Lightning is a flexible RL framework for training LLMs in any AI agent. It uniquely decouples agent execution from training, allowing seamless integration with diverse existing agents with minimal code changes. This enables robust training for complex interactions and shows stable performan...
🔹 Publication Date: Published on Aug 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03680
• PDF: https://arxiv.org/pdf/2508.03680
• Project Page: https://www.microsoft.com/en-us/research/project/agent-lightning/
• Github: https://github.com/microsoft/agent-lightning
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Kimi Linear: An Expressive, Efficient Attention Architecture
📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that, for the first time, outperforms full attention across various contexts. It achieves superior performance and efficiency, reducing KV cache and increasing decoding throughput, making it a powerful drop-in replacement.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear
🔹 Models citing this paper:
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
• https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that, for the first time, outperforms full attention across various contexts. It achieves superior performance and efficiency, reducing KV cache and increasing decoding throughput, making it a powerful drop-in replacement.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear
🔹 Models citing this paper:
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
• https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Emu3.5: Native Multimodal Models are World Learners
📝 Summary:
Emu3.5 is a multimodal world model natively predicting vision and language states. Trained on vast video data, it uses Discrete Diffusion Adaptation for 20x faster image inference. It excels at multimodal generation, world modeling, and performs competitively.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5
🔹 Models citing this paper:
• https://huggingface.co/BAAI/Emu3.5
• https://huggingface.co/BAAI/Emu3.5-Image
• https://huggingface.co/BAAI/Emu3.5-VisionTokenizer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Emu3.5 is a multimodal world model natively predicting vision and language states. Trained on vast video data, it uses Discrete Diffusion Adaptation for 20x faster image inference. It excels at multimodal generation, world modeling, and performs competitively.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5
🔹 Models citing this paper:
• https://huggingface.co/BAAI/Emu3.5
• https://huggingface.co/BAAI/Emu3.5-Image
• https://huggingface.co/BAAI/Emu3.5-VisionTokenizer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
📝 Summary:
olmOCR is an open-source toolkit using a fine-tuned vision language model to convert diverse PDFs into clean, structured plain text. It preserves formatting like tables and equations, and is optimized for cost-effective large-scale batch processing, unlocking tokens for language model training.
🔹 Publication Date: Published on Feb 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/davanstrien/test-olmocr2
• https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
• https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
olmOCR is an open-source toolkit using a fine-tuned vision language model to convert diverse PDFs into clean, structured plain text. It preserves formatting like tables and equations, and is optimized for cost-effective large-scale batch processing, unlocking tokens for language model training.
🔹 Publication Date: Published on Feb 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/davanstrien/test-olmocr2
• https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
• https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT