ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Step-DeepResearch Technical Report

📝 Summary:
Step-DeepResearch is an end-to-end agent for deep research, using a data synthesis strategy and progressive training. It achieves expert-level capabilities, outperforming existing models and rivaling SOTA closed-source models with cost-efficiency. It also introduces ADR-Bench for realistic Chines...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20491
• PDF: https://arxiv.org/pdf/2512.20491

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #MachineLearning #DeepResearch #AIagent #SOTA
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

📝 Summary:
This paper decomposes LLM policies into internal layer and modular policies, revealing distinct reasoning patterns across layers. It finds early layers explore and top layers refine. Motivated by this, Bottom-up Policy Optimization BuPO is proposed to optimize internal layer policies for superior...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19673
• PDF: https://arxiv.org/pdf/2512.19673

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #PolicyOptimization #DeepLearning #AIResearch #NLP
SAM Audio: Segment Anything in Audio

📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio

🔹 Models citing this paper:
https://huggingface.co/facebook/sam-audio-large
https://huggingface.co/facebook/sam-audio-small
https://huggingface.co/facebook/sam-audio-base

Spaces citing this paper:
https://huggingface.co/spaces/lpeterl/sam-audio-webui
https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
https://huggingface.co/spaces/chippie1/SAM-Audio-Demo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

📝 Summary:
QuantiPhy is a benchmark that quantitatively assesses state-of-the-art vision perception models' ability to reason about physical properties such as size, velocity, and acceleration from video observa...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19526
• PDF: https://arxiv.org/pdf/2512.19526

Datasets citing this paper:
https://huggingface.co/datasets/PaulineLi/QuantiPhy-validation

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
SpatialTree: How Spatial Abilities Branch Out in MLLMs

📝 Summary:
SpatialTree introduces a 4-level cognitive hierarchy and benchmark for evaluating MLLM spatial abilities. It reveals distinct skill dependencies and strong cross-level transfer from low to high-level abilities. A novel auto-think strategy consistently enhances performance across all spatial levels.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20617
• PDF: https://arxiv.org/pdf/2512.20617
• Project Page: https://spatialtree.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SemanticGen: Video Generation in Semantic Space

📝 Summary:
SemanticGen addresses slow convergence and computational costs in video generation by using a two-stage diffusion model approach that first generates semantic features and then VAE latents, leading to...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20619
• PDF: https://arxiv.org/pdf/2512.20619
• Project Page: https://jianhongbai.github.io/SemanticGen/
• Github: https://jianhongbai.github.io/SemanticGen/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reinforcement Learning for Self-Improving Agent with Skill Library

📝 Summary:
A novel RL framework, SAGE, enhances LLM-based agents' self-improvement capabilities by systematically incorporating skills from a skill library, leading to better performance and efficiency in new en...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17102
• PDF: https://arxiv.org/pdf/2512.17102

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Active Intelligence in Video Avatars via Closed-loop World Modeling

📝 Summary:
Video avatars currently lack agency for autonomous goal pursuit. ORCA introduces a framework for active intelligence, using a closed-loop Observe-Think-Act-Reflect cycle and a dual-system architecture for strategic reasoning and action. It enables robust, goal-directed task completion, transformi...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20615
• PDF: https://arxiv.org/pdf/2512.20615
• Project Page: https://xuanhuahe.github.io/ORCA/
• Github: https://xuanhuahe.github.io/ORCA/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FaithLens: Detecting and Explaining Faithfulness Hallucination

📝 Summary:
FaithLens is a cost-efficient model for detecting and explaining faithfulness hallucinations in LLM outputs. It uses synthesized training data and rule-based reinforcement learning. FaithLens outperforms advanced models like GPT-4.1 on 12 tasks while providing high-quality explanations.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20182
• PDF: https://arxiv.org/pdf/2512.20182
• Github: https://github.com/S1s-Z/FaithLens

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

📝 Summary:
A multi-perspective validation framework using LLMs for thematic analysis combines ensemble validation with Cohen's Kappa and cosine similarity to enhance reliability and extract consensus themes from...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20352
• PDF: https://arxiv.org/pdf/2512.20352
• Project Page: https://azalab-llm-tool.vercel.app/
• Github: https://github.com/NileshArnaiya/LLM-Thematic-Analysis-Tool

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
INTELLECT-3: Technical Report

📝 Summary:
INTELLECT-3, a large Mixture-of-Experts model trained with reinforcement learning, achieves top performance across various benchmarks and is supported by an open-source RL infrastructure framework. AI...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16144
• PDF: https://arxiv.org/pdf/2512.16144

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MemEvolve: Meta-Evolution of Agent Memory Systems

📝 Summary:
MemEvolve, a meta-evolutionary framework, enhances self-evolving memory systems by jointly evolving agents' experiential knowledge and memory architecture, leading to improved performance and generali...

🔹 Publication Date: Published on Dec 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18746
• PDF: https://arxiv.org/pdf/2512.18746

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
2
LongVideoAgent: Multi-Agent Reasoning with Long Videos

📝 Summary:
A multi-agent framework with a master LLM, grounding agent, and vision agent enhances long-video QA by improving temporal grounding and extracting visual details. This RL-trained system outperforms non-agent baselines on new datasets.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20618
• PDF: https://arxiv.org/pdf/2512.20618
• Github: https://longvideoagent.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultiAgentSystems #LLM #VideoUnderstanding #ComputerVision #AI
1
Toxicity Ahead: Forecasting Conversational Derailment on GitHub

📝 Summary:
A novel LLM framework uses a two-step prompting pipeline to predict conversational derailment on GitHub. It generates Summaries of Conversation Dynamics to forecast toxicity, achieving high F1-scores and outperforming baselines for proactive moderation.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15031
• PDF: https://arxiv.org/pdf/2512.15031

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #ToxicityDetection #ContentModeration #GitHub #MachineLearning
1
Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

📝 Summary:
Simulstream is an open-source toolkit for evaluating and demonstrating streaming speech-to-text translation. It supports long-form audio, incremental decoding, and re-translation, plus offers an interactive demo interface.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17648
• PDF: https://arxiv.org/pdf/2512.17648
• Project Page: https://pypi.org/project/simulstream/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SpeechToText #MachineTranslation #NLP #OpenSource #StreamingAI
1
Scaling Laws for Code: Every Programming Language Matters

📝 Summary:
This paper explores scaling laws for multilingual code pre-training, finding interpreted languages benefit more from scaling. It proposes an optimal token allocation strategy for programming languages based on utility and synergy, outperforming uniform distribution.

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13472
• PDF: https://arxiv.org/pdf/2512.13472

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CodeAI #MachineLearning #ProgrammingLanguages #ScalingLaws #LLMs
FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks

📝 Summary:
FlipVQA-Miner automates high-quality QA and VQA extraction from textbooks. It combines layout-aware OCR with LLM-based semantic parsing. This provides accurate, real-world data for LLM training, avoiding synthetic samples and improving reasoning.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16216
• PDF: https://arxiv.org/pdf/2511.16216
• Github: https://github.com/OpenDCAI/DataFlow

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VQA #LLM #OCR #DataExtraction #AIResearch
🚀 Master Data Science & Programming!

Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!


🔰 Machine Learning with Python
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer

🔖 Machine Learning
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM

🧠 Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4

🎯 PyData Careers | Quiz
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ

💾 Kaggle Data Hub
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1

🧑‍🎓 Udemy Coupons | Courses
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC

😀 ML Research Hub
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT

💬 Data Science Chat
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9

🐍 Python Arab| بايثون عربي
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab

🖊 Data Science Jupyter Notebooks
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN

📺 Free Online Courses | Videos
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV

📈 Data Analytics
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX

🎧 Learn Python Hub
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53

⭐️ Research Papers
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY

━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
1