ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

📝 Summary:
Skywork-R1V4 is a 30B multimodal agentic model that unifies image manipulation and deep search with interleaved reasoning. It achieves state-of-the-art performance in perception and multimodal search using only supervised fine-tuning, demonstrating advanced intelligence without reinforcement lear...

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02395
• PDF: https://arxiv.org/pdf/2512.02395
• Project Page: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html
• Github: https://github.com/SkyworkAI/Skywork-R1V/tree/main/r1v4

🔹 Models citing this paper:
https://huggingface.co/Skywork/R1V4

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #MultimodalAI #AgenticAI #DeepLearning #ComputerVision
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

📝 Summary:
CUDA-L2 uses LLMs and reinforcement learning to optimize Half-precision General Matrix Multiply CUDA kernels. It significantly outperforms major baselines like cuBLAS and torch.matmul, achieving up to 28.7% speedup in server mode. This demonstrates AI can enhance even highly optimized kernels.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02551
• PDF: https://arxiv.org/pdf/2512.02551
• Github: https://github.com/deepreinforce-ai/CUDA-L2

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CUDA #ReinforcementLearning #LLM #MatrixMultiplication #AI
👍1
Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models

📝 Summary:
Masked Diffusion Language Models MDLMs show locality bias and poor context comprehension due to appended mask tokens acting as distractors. A mask-agnostic loss function was introduced. This function improves MDLM robustness by mitigating the masks distracting effect.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21338
• PDF: https://arxiv.org/pdf/2511.21338

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LanguageModels #DiffusionModels #NLP #ContextComprehension #AIResearch
1
🚀 Master Data Science & Programming!

Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!


🔰 Machine Learning with Python
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer

🔖 Machine Learning
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM

🧠 Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4

🎯 PyData Careers | Quiz
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ

💾 Kaggle Data Hub
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1

🧑‍🎓 Udemy Coupons | Courses
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC

😀 ML Research Hub
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT

💬 Data Science Chat
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9

🐍 Python Arab| بايثون عربي
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab

🖊 Data Science Jupyter Notebooks
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN

📺 Free Online Courses | Videos
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV

📈 Data Analytics
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX

🎧 Learn Python Hub
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53

⭐️ Research Papers
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY

━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
3
Mistral released Ministral 3 — a new line of reasoning- and instruct-models!

Ministral 3 is available in versions 3B, 8B, and 14B, with vision support and top performance in its class.

The 14B model can be run locally on a machine with 24 GB RAM.

Guide + laptop: https://docs.unsloth.ai/new/ministral-3

GGUF builds: https://huggingface.co/collections/unsloth/ministral-3

👉 https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
4
Qwen3-VL Technical Report

📝 Summary:
Qwen3-VL is a highly capable vision-language model, achieving superior performance across multimodal benchmarks. It supports 256K interleaved contexts and offers strong text understanding, robust long-context comprehension, and advanced multimodal reasoning through key architectural upgrades.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21631
• PDF: https://arxiv.org/pdf/2511.21631

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModel #MultimodalAI #AI #DeepLearning #LLM
ViDiC: Video Difference Captioning

📝 Summary:
The ViDiC task and ViDiC-1K dataset evaluate MLLMs' ability to describe differences between video pairs, overcoming static image captioning limits. It assesses motion and event evolution, finding significant performance gaps in current models for comparative video understanding.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03405
• PDF: https://arxiv.org/pdf/2512.03405
• Project Page: https://vidic-1k.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoCaptioning #MLLM #VideoUnderstanding #ComputerVision #AIResearch
PretrainZero: Reinforcement Active Pretraining

📝 Summary:
PretrainZero is a reinforcement active learning framework that pretrains large models on unlabeled general corpora using RL. It significantly improves general reasoning abilities and benchmark performance, breaking the verification data-wall for artificial general intelligence.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03442
• PDF: https://arxiv.org/pdf/2512.03442

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #ActiveLearning #AGI #Pretraining #MachineLearning
1
OneThinker: All-in-one Reasoning Model for Image and Video

📝 Summary:
OneThinker is an all-in-one model unifying image and video understanding across diverse tasks like QA, captioning, and tracking. It employs a new training corpus and RL method for balanced optimization, achieving strong performance and knowledge transfer across 31 benchmarks. This advances toward...

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03043
• PDF: https://arxiv.org/pdf/2512.03043
• Project Page: https://github.com/tulerfeng/OneThinker
• Github: https://github.com/tulerfeng/OneThinker

🔹 Models citing this paper:
https://huggingface.co/OneThink/OneThinker-8B
https://huggingface.co/OneThink/OneThinker-SFT-Qwen3-8B

Datasets citing this paper:
https://huggingface.co/datasets/OneThink/OneThinker-train-data
https://huggingface.co/datasets/OneThink/OneThinker-eval

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #ComputerVision #MultimodalAI #DeepLearning #VideoUnderstanding
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

📝 Summary:
PRIS adaptively revises prompts during text-to-visual generation inference to enhance user intent alignment. It reviews visual failures and redesigns prompts using fine-grained feedback, proving that jointly scaling prompts and visuals improves accuracy and quality.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03534
• PDF: https://arxiv.org/pdf/2512.03534
• Project Page: https://subin-kim-cv.github.io/PRIS

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#PromptEngineering #TextToImage #GenerativeAI #DeepLearning #AIResearch
Thinking with Programming Vision: Towards a Unified View for Thinking with Images

📝 Summary:
CodeVision enhances MLLMs robustness and tool-based reasoning by generating code for image operations. It overcomes brittleness and improves performance through supervised fine-tuning and reinforcement learning, enabling flexible tool composition and error recovery.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03746
• PDF: https://arxiv.org/pdf/2512.03746

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CodeVision #MLLM #ComputerVision #AIResearch #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
RELIC: Interactive Video World Model with Long-Horizon Memory

📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

📝 Summary:
This paper proposes stable rank, an intrinsic quality signal from LLM representations, to improve alignment without external supervision. Stable rank measures effective dimensionality and is used as a reward in SR-GRPO, boosting LLM performance on reasoning tasks.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02807
• PDF: https://arxiv.org/pdf/2512.02807

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#StableRank #LLMAlignment #LargeLanguageModels #AIResearch #DeepLearning
CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

📝 Summary:
CookAnything is a diffusion framework generating coherent, multi-step recipe image sequences from instructions. It uses step-wise regional control, flexible positional encoding, and cross-step consistency for consistent, high-quality visual synthesis.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03540
• PDF: https://arxiv.org/pdf/2512.03540
• Github: https://github.com/zhangdaxia22/CookAnything

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CookAnything #ImageGeneration #DiffusionModels #AI #RecipeGeneration
AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs

📝 Summary:
AlignBench is a new benchmark for fine-grained image-text alignment, using detailed synthetic image-caption pairs. It reveals that CLIP-based models struggle with compositional reasoning and shows detector self-preference.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20515
• PDF: https://arxiv.org/pdf/2511.20515
• Project Page: https://dahlian00.github.io/AlignBench/
• Github: https://dahlian00.github.io/AlignBench/

Datasets citing this paper:
https://huggingface.co/datasets/omron-sinicx/AlignBench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ImageTextAlignment #MultimodalAI #ComputerVision #Benchmarking #CLIPModels
SkillFactory: Self-Distillation For Learning Cognitive Behaviors

📝 Summary:
SkillFactory fine-tunes models to learn cognitive skills using self-generated data before reinforcement learning. This self-distillation method enhances robustness and generalization post-RL, enabling models to effectively utilize acquired cognitive skills.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04072
• PDF: https://arxiv.org/pdf/2512.04072

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SelfDistillation #ReinforcementLearning #CognitiveAI #MachineLearning #AIResearch
In-Context Representation Hijacking

📝 Summary:
Doublespeak is an in-context attack that hijacks LLM representations. It replaces harmful keywords with benign ones in examples, making LLMs interpret innocuous prompts as harmful, bypassing safety. This highlights a need for representation-level alignment.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03771
• PDF: https://arxiv.org/pdf/2512.03771

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AISafety #AIsecurity #InContextLearning #RepresentationLearning
1
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning