ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.93K photos
217 videos
23 files
4.23K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Scaling Zero-Shot Reference-to-Video Generation

📝 Summary:
Saber is a scalable zero-shot framework for reference-to-video generation that uses video-text pairs to learn identity-consistent representations and outperforms models trained with explicit reference...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06905
• PDF: https://arxiv.org/pdf/2512.06905
• Project Page: https://franciszzj.github.io/Saber/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rethinking Training Dynamics in Scale-wise Autoregressive Generation

📝 Summary:
Self-Autoregressive Refinement (SAR) improves the quality of autoregressive generative models by addressing exposure bias through Stagger-Scale Rollout and Contrastive Student-Forcing Loss, leading to...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06421
• PDF: https://arxiv.org/pdf/2512.06421
• Project Page: https://gengzezhou.github.io/SAR/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Embodied Referring Expression Comprehension in Human-Robot Interaction

📝 Summary:
A large-scale dataset and multimodal model improve embodied interaction comprehension in robots by addressing perspective bias and enhancing multimodal signal integration. AI-generated summary As robo...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06558
• PDF: https://arxiv.org/pdf/2512.06558

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Media is too big
VIEW IN TELEGRAM
Unified Video Editing with Temporal Reasoner

📝 Summary:
VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks. AI-generated summary Existing ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07469
• PDF: https://arxiv.org/pdf/2512.07469
• Project Page: https://videocof.github.io/
• Github: https://github.com/knightyxp/VideoCoF

🔹 Models citing this paper:
https://huggingface.co/XiangpengYang/VideoCoF

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Relational Visual Similarity

📝 Summary:
Vision-Language models fine-tuned on anonymized image captions can capture relational similarity between images, a capability lacking in current visual similarity metrics. AI-generated summary Humans ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07833
• PDF: https://arxiv.org/pdf/2512.07833
• Project Page: https://thaoshibe.github.io/relsim/
• Github: https://github.com/thaoshibe/relsim

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Group Representational Position Encoding

📝 Summary:
GRAPE is a unified positional encoding framework that combines multiplicative rotations and additive logit biases, extending existing methods like RoPE and ALiBi. AI-generated summary We present GRAPE...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07805
• PDF: https://model-architectures.github.io/GRAPE/GRAPE.pdf
• Github: https://model-architectures.github.io/GRAPE/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Voxify3D: Pixel Art Meets Volumetric Rendering

📝 Summary:
Voxify3D is a two-stage framework that combines 3D mesh optimization with 2D pixel art supervision to generate high-quality voxel art with semantic preservation, pixel-art aesthetics, and discrete col...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07834
• PDF: https://arxiv.org/pdf/2512.07834
• Project Page: https://yichuanh.github.io/Voxify-3D/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

📝 Summary:
A controlled experimental framework isolates and evaluates the contributions of pre-training, mid-training, and reinforcement learning in improving language model reasoning, demonstrating the necessit...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07783
• PDF: https://arxiv.org/pdf/2512.07783
• Github: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Vector Quantization using Gaussian Variational Autoencoder

📝 Summary:
Gaussian Quant (GQ) converts Gaussian VAE to VQ-VAE without training, outperforming previous VQ-VAEs and Gaussian VAE discretization methods across different architectures. AI-generated summary Vector...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06609
• PDF: https://arxiv.org/pdf/2512.06609
• Github: https://github.com/Stability-AI/generative-models

🔹 Models citing this paper:
https://huggingface.co/xutongda/GQModel

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

📝 Summary:
VideoVLA uses a multi-modal Diffusion Transformer to predict actions and visual outcomes from language and image inputs, enabling strong generalization in robotic manipulation tasks. AI-generated summ...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06963
• PDF: https://arxiv.org/pdf/2512.06963
• Project Page: https://videovla-nips2025.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

📝 Summary:
NPR is a teacher-free framework enabling LLMs to perform genuine parallel reasoning. It uses self-distilled training and a new optimization algorithm. This achieves significant performance gains and speedups on reasoning benchmarks.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07461
• PDF: https://arxiv.org/pdf/2512.07461
• Github: https://bigai-nlco.github.io/Native-Parallel-Reasoner

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Distribution Matching Variational AutoEncoder

📝 Summary:
DMVAE explicitly aligns VAE latent distributions with arbitrary reference distributions, generalizing beyond fixed priors. This improves modeling efficiency and image synthesis fidelity, with SSL-derived distributions showing excellent balance.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07778
• PDF: https://arxiv.org/pdf/2512.07778
• Github: https://github.com/sen-ye/dmvae%7D

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VAE #DeepLearning #GenerativeAI #ImageSynthesis #ArtificialIntelligence
Multi-view Pyramid Transformer: Look Coarser to See Broader

📝 Summary:
MVP is a scalable multi-view transformer that reconstructs large 3D scenes from many images. It uses a dual hierarchy of local-to-global inter-view and fine-to-coarse intra-view processing. This achieves efficient, state-of-the-art 3D scene reconstruction quality.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07806
• PDF: https://arxiv.org/pdf/2512.07806
• Project Page: https://gynjn.github.io/MVP/
• Github: https://github.com/Gynjn/MVP

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DReconstruction #ComputerVision #Transformers #DeepLearning #AI
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch
ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

📝 Summary:
ReCamDriving generates camera-controlled novel-trajectory videos using dense 3DGS renderings and a two-stage training approach, achieving state-of-the-art results in controllability and consistency. A...

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03621
• PDF: https://arxiv.org/pdf/2512.03621
• Project Page: https://recamdriving.github.io/
• Github: https://recamdriving.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning

📝 Summary:
VG-Refiner improves visual reasoning by addressing unreliable tool outputs. It uses a two-stage think-rethink mechanism and refinement reward to correct poor tool results. This significantly improves accuracy and correction ability in referring and grounding tasks.

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06373
• PDF: https://arxiv.org/pdf/2512.06373
• Github: https://github.com/VoyageWang/VG-Refiner

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisualReasoning #ReinforcementLearning #ComputerVision #AIResearch #MachineLearning
Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning

📝 Summary:
DoGe is a framework that addresses data scarcity in vision-language models. It decouples context learning from problem solving, using a curriculum to improve reward signals and data diversity. This enhances generalization and performance.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06835
• PDF: https://arxiv.org/pdf/2512.06835

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguage #DataScarcity #MachineLearning #AIResearch #DeepLearning
1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

📝 Summary:
GLM-4.1V-Thinking is a vision-language model using a reasoning-centric training framework. It achieves state-of-the-art multimodal reasoning across various tasks like STEM and long document understanding. The model outperforms larger models and competes with closed-source systems like GPT-4o.

🔹 Publication Date: Published on Jul 1

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/glm-41v-thinking-towards-versatile-multimodal-reasoning-with-scalable-reinforcement-learning
• PDF: https://arxiv.org/pdf/2507.01006
• Github: https://github.com/THUDM/GLM-4.1V-Thinking

🔹 Models citing this paper:
https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking
https://huggingface.co/zai-org/GLM-4.5V
https://huggingface.co/zai-org/GLM-4.6V-Flash

Spaces citing this paper:
https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-Demo
https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-API-Demo
https://huggingface.co/spaces/akhaliq/anycoder

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GLM41VThinking #MultimodalAI #VisionLanguageModels #ReinforcementLearning #AIResearch
Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

📝 Summary:
Reinforcement Learning enhances decoding-based regression by introducing sequence-level rewards. This overcomes token-level limitations, improving precision and generalization. It establishes a robust and accurate paradigm for numerical prediction.

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06533
• PDF: https://arxiv.org/pdf/2512.06533

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #MachineLearning #Regression #DataScience #AI
DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue

📝 Summary:
DZ-TDPO addresses state inertia in long-context dialogue using dynamic KL constraints and temporal attention bias. It achieves state-of-the-art win rates and robust zero-shot generalization, resolving user intent conflicts while preserving model capabilities.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03704
• PDF: https://arxiv.org/pdf/2512.03704
• Github: https://github.com/lyj20071013/DZ-TDPO

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DialogueSystems #NLP #MachineLearning #StateTracking #LongContext
1