ML Research Hub – Telegram
ML Research Hub
32.6K subscribers
3.93K photos
217 videos
23 files
4.22K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🤖🧠 Distil-Whisper: Faster, Smaller, and Smarter Speech Recognition by Hugging Face

🗓️ 08 Dec 2025
📚 AI News & Trends

The evolution of Automatic Speech Recognition (ASR) has reshaped how humans interact with technology. From dictation tools and live trannoscription to smart assistants and media captioning, ASR technology continues to bridge the gap between speech and digital communication. However, achieving real-time, high-accuracy trannoscription often comes at the cost of heavy computational requirements until now. Enter ...

#DistilWhisper #FasterSpeechRecognition #SmallerModels #HuggingFace #ASRTechnology #RealTimeTrannoscription
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

📝 Summary:
Colon-X introduces ColonR1, a novel reasoning-centric model for intelligent colonoscopy. It achieves 56.61% accuracy, outperforming traditional methods by 25.22% under data scarcity, by leveraging new comprehensive multimodal datasets.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03667
• PDF: https://arxiv.org/pdf/2512.03667
• Github: https://github.com/ai4colonoscopy/Colon-X

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems

📝 Summary:
DoVer is an intervention-driven debugging approach for LLM multi-agent systems. It validates failure hypotheses and measures progress via targeted interventions, improving reliability. DoVer converts 18-49% of failed tasks into successes, offering an outcome-oriented debugging method.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06749
• PDF: https://arxiv.org/pdf/2512.06749
• Project Page: https://aka.ms/DoVer

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #MultiAgentSystems #Debugging #AI #Research
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

📝 Summary:
The paper proposes a method to enhance Rotary Position Embeddings by utilizing both the real and imaginary components of the complex-valued dot product, improving long-context modeling in Large Langua...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07525
• PDF: https://arxiv.org/pdf/2512.07525
• Github: https://github.com/OpenMOSS/rope_pp

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LongCat-Image Technical Report

📝 Summary:
LongCat-Image is a bilingual open-source foundation model for image generation that addresses multilingual text rendering, photorealism, and deployment efficiency through rigorous data curation, compa...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07584
• PDF: https://arxiv.org/pdf/2512.07584
• Project Page: https://longcat.chat/
• Github: https://github.com/meituan-longcat/LongCat-Image

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

📝 Summary:
EgoEdit is a real-time, instruction-following egocentric video editor that addresses challenges in handling egomotion and hand-object interactions, outperforming existing methods on egocentric editing...

🔹 Publication Date: Published on Dec 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06065
• PDF: https://arxiv.org/pdf/2512.06065
• Project Page: https://snap-research.github.io/EgoEdit/
• Github: https://github.com/snap-research/EgoEdit

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling Zero-Shot Reference-to-Video Generation

📝 Summary:
Saber is a scalable zero-shot framework for reference-to-video generation that uses video-text pairs to learn identity-consistent representations and outperforms models trained with explicit reference...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06905
• PDF: https://arxiv.org/pdf/2512.06905
• Project Page: https://franciszzj.github.io/Saber/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rethinking Training Dynamics in Scale-wise Autoregressive Generation

📝 Summary:
Self-Autoregressive Refinement (SAR) improves the quality of autoregressive generative models by addressing exposure bias through Stagger-Scale Rollout and Contrastive Student-Forcing Loss, leading to...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06421
• PDF: https://arxiv.org/pdf/2512.06421
• Project Page: https://gengzezhou.github.io/SAR/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Embodied Referring Expression Comprehension in Human-Robot Interaction

📝 Summary:
A large-scale dataset and multimodal model improve embodied interaction comprehension in robots by addressing perspective bias and enhancing multimodal signal integration. AI-generated summary As robo...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06558
• PDF: https://arxiv.org/pdf/2512.06558

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Media is too big
VIEW IN TELEGRAM
Unified Video Editing with Temporal Reasoner

📝 Summary:
VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks. AI-generated summary Existing ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07469
• PDF: https://arxiv.org/pdf/2512.07469
• Project Page: https://videocof.github.io/
• Github: https://github.com/knightyxp/VideoCoF

🔹 Models citing this paper:
https://huggingface.co/XiangpengYang/VideoCoF

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Relational Visual Similarity

📝 Summary:
Vision-Language models fine-tuned on anonymized image captions can capture relational similarity between images, a capability lacking in current visual similarity metrics. AI-generated summary Humans ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07833
• PDF: https://arxiv.org/pdf/2512.07833
• Project Page: https://thaoshibe.github.io/relsim/
• Github: https://github.com/thaoshibe/relsim

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Group Representational Position Encoding

📝 Summary:
GRAPE is a unified positional encoding framework that combines multiplicative rotations and additive logit biases, extending existing methods like RoPE and ALiBi. AI-generated summary We present GRAPE...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07805
• PDF: https://model-architectures.github.io/GRAPE/GRAPE.pdf
• Github: https://model-architectures.github.io/GRAPE/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Voxify3D: Pixel Art Meets Volumetric Rendering

📝 Summary:
Voxify3D is a two-stage framework that combines 3D mesh optimization with 2D pixel art supervision to generate high-quality voxel art with semantic preservation, pixel-art aesthetics, and discrete col...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07834
• PDF: https://arxiv.org/pdf/2512.07834
• Project Page: https://yichuanh.github.io/Voxify-3D/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

📝 Summary:
A controlled experimental framework isolates and evaluates the contributions of pre-training, mid-training, and reinforcement learning in improving language model reasoning, demonstrating the necessit...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07783
• PDF: https://arxiv.org/pdf/2512.07783
• Github: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Vector Quantization using Gaussian Variational Autoencoder

📝 Summary:
Gaussian Quant (GQ) converts Gaussian VAE to VQ-VAE without training, outperforming previous VQ-VAEs and Gaussian VAE discretization methods across different architectures. AI-generated summary Vector...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06609
• PDF: https://arxiv.org/pdf/2512.06609
• Github: https://github.com/Stability-AI/generative-models

🔹 Models citing this paper:
https://huggingface.co/xutongda/GQModel

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

📝 Summary:
VideoVLA uses a multi-modal Diffusion Transformer to predict actions and visual outcomes from language and image inputs, enabling strong generalization in robotic manipulation tasks. AI-generated summ...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06963
• PDF: https://arxiv.org/pdf/2512.06963
• Project Page: https://videovla-nips2025.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

📝 Summary:
NPR is a teacher-free framework enabling LLMs to perform genuine parallel reasoning. It uses self-distilled training and a new optimization algorithm. This achieves significant performance gains and speedups on reasoning benchmarks.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07461
• PDF: https://arxiv.org/pdf/2512.07461
• Github: https://bigai-nlco.github.io/Native-Parallel-Reasoner

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Distribution Matching Variational AutoEncoder

📝 Summary:
DMVAE explicitly aligns VAE latent distributions with arbitrary reference distributions, generalizing beyond fixed priors. This improves modeling efficiency and image synthesis fidelity, with SSL-derived distributions showing excellent balance.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07778
• PDF: https://arxiv.org/pdf/2512.07778
• Github: https://github.com/sen-ye/dmvae%7D

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VAE #DeepLearning #GenerativeAI #ImageSynthesis #ArtificialIntelligence
Multi-view Pyramid Transformer: Look Coarser to See Broader

📝 Summary:
MVP is a scalable multi-view transformer that reconstructs large 3D scenes from many images. It uses a dual hierarchy of local-to-global inter-view and fine-to-coarse intra-view processing. This achieves efficient, state-of-the-art 3D scene reconstruction quality.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07806
• PDF: https://arxiv.org/pdf/2512.07806
• Project Page: https://gynjn.github.io/MVP/
• Github: https://github.com/Gynjn/MVP

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DReconstruction #ComputerVision #Transformers #DeepLearning #AI
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch
ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

📝 Summary:
ReCamDriving generates camera-controlled novel-trajectory videos using dense 3DGS renderings and a two-stage training approach, achieving state-of-the-art results in controllability and consistency. A...

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03621
• PDF: https://arxiv.org/pdf/2512.03621
• Project Page: https://recamdriving.github.io/
• Github: https://recamdriving.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research