ML Research Hub – Telegram
ML Research Hub
32.9K subscribers
4.64K photos
287 videos
24 files
5.02K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Qwen3-TTS Technical Report

📝 Summary:
The Qwen3-TTS series presents advanced multilingual text-to-speech models with voice cloning and controllable speech generation capabilities, utilizing dual-track LM architecture and specialized speec...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15621
• PDF: https://arxiv.org/pdf/2601.15621
• Github: https://github.com/QwenLM/Qwen3-TTS

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

📝 Summary:
An advanced vision encoder named OpenVision 3 learns a unified visual representation for both image understanding and generation by combining VAE-compressed image latents with ViT architecture and joi...

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15369
• PDF: https://arxiv.org/pdf/2601.15369
• Project Page: https://ucsc-vlaa.github.io/OpenVision3/
• Github: https://ucsc-vlaa.github.io/OpenVision3/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

📝 Summary:
Terminal-Bench 2.0 presents a challenging benchmark with 89 terminal-based tasks to evaluate AI agents' capabilities in real-world scenarios. AI-generated summary AI agents may soon become capable of ...

🔹 Publication Date: Published on Jan 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11868
• PDF: https://arxiv.org/pdf/2601.11868
• Github: https://github.com/laude-institute/terminal-bench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

📝 Summary:
A novel fine-grained composed image retrieval benchmark is introduced through image editing techniques, revealing significant capability gaps in existing multimodal models and exposing limitations of ...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16125
• PDF: https://arxiv.org/pdf/2601.16125
• Github: https://github.com/SighingSnow/edir

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

📝 Summary:
VLA models struggle with generalization due to Information Collapse where language is ignored. BayesianVLA uses Bayesian decomposition and latent action queries. It optimizes conditional PMI to penalize vision shortcuts, significantly improving out-of-distribution generalization.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15197
• PDF: https://arxiv.org/pdf/2601.15197
• Project Page: https://github.com/ZGC-EmbodyAI/BayesianVLA
• Github: https://github.com/ZGC-EmbodyAI/BayesianVLA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SAMTok: Representing Any Mask with Two Words

📝 Summary:
SAMTok enables pixel-wise capabilities in multi-modal LLMs through discrete mask tokenization and standard training methods, achieving state-of-the-art performance on various vision-language tasks. AI...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16093
• PDF: https://arxiv.org/pdf/2601.16093
• Project Page: https://github.com/bytedance/Sa2VA/tree/main/projects/samtok

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

📝 Summary:
Representation Autoencoders (RAEs) demonstrate superior performance over VAEs in large-scale text-to-image generation, showing improved stability, faster convergence, and better quality while enabling...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16208
• PDF: https://arxiv.org/pdf/2601.16208
• Project Page: https://rae-dit.github.io/scale-rae/
• Github: https://github.com/ZitengWangNYU/Scale-RAE

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Wigner's Friend as a Circuit: Inter-Branch Communication Witness Benchmarks on Superconducting Quantum Hardware

📝 Summary:
Implementation and benchmarking of quantum circuits for estimating operational inter-branch communication witnesses on IBM Quantum hardware demonstrates visibility and coherence witness measurements u...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16004
• PDF: https://arxiv.org/pdf/2601.16004
• Github: https://github.com/christopher-altman/ibm-qml-kernel

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

📝 Summary:
A pretrained video model is adapted into a robot policy through single-stage post-training, enabling direct action generation and planning capabilities without architectural modifications. AI-generate...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16163
• PDF: https://arxiv.org/pdf/2601.16163

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

📝 Summary:
HERMES enables real-time streaming video understanding by reusing a compact KV cache as hierarchical memory. It provides 10x faster response times and superior accuracy, even with greatly reduced video token input, improving efficiency in resource-constrained settings.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14724
• PDF: https://arxiv.org/pdf/2601.14724
• Project Page: https://hermes-streaming.github.io/
• Github: https://hermes-streaming.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VIOLA: Towards Video In-Context Learning with Minimal Annotations

📝 Summary:
VIOLA enables effective multimodal large language model adaptation in low-resource video domains using minimal expert annotations and abundant unlabeled data. It uses density-uncertainty sampling and confidence-aware retrieval to maximize efficiency and leverage unlabeled data, significantly outp...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15549
• PDF: https://arxiv.org/pdf/2601.15549

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
360Anything: Geometry-Free Lifting of Images and Videos to 360°

📝 Summary:
360Anything is a geometry-free framework using diffusion transformers to lift perspective images and videos to 360 panoramas without camera metadata. It achieves state-of-the-art results and uses circular latent encoding to eliminate seam artifacts.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16192
• PDF: https://arxiv.org/pdf/2601.16192
• Github: https://360anything.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ComputerVision #DiffusionModels #360Photography #ImageProcessing #DeepLearning
Numba-Accelerated 2D Diffusion-Limited Aggregation: Implementation and Fractal Characterization

📝 Summary:
This paper details a Numba-accelerated Python framework for 2D DLA simulations. It confirms a fractal dimension of 1.71 for dilute regimes but reveals a crossover to 1.87 compact growth in high-density environments. This provides an open-source testbed.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15440
• PDF: https://arxiv.org/pdf/2601.15440
• Project Page: https://pypi.org/project/dla-ideal-solver/
• Github: https://github.com/sandyherho/dla-ideal-solver

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DLA #Fractals #ScientificComputing #Python #Simulations
1
Media is too big
VIEW IN TELEGRAM
VideoMaMa: Mask-Guided Video Matting via Generative Prior

📝 Summary:
VideoMaMa uses pretrained video diffusion models to convert coarse masks into accurate alpha mattes, achieving zero-shot generalization. This enabled a scalable pseudo-labeling pipeline to create the large MA-V dataset, significantly improving real-world video matting performance.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14255
• PDF: https://arxiv.org/pdf/2601.14255
• Github: https://cvlab-kaist.github.io/VideoMaMa/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoMatting #ComputerVision #DeepLearning #DiffusionModels #AIResearch
1
Towards Automated Kernel Generation in the Era of LLMs

📝 Summary:
This survey explores how large language models and agent systems are automating kernel generation and optimization, a critical yet non-scalable process for AI systems. It provides a structured overview of existing approaches, datasets, and benchmarks, aiming to unify this fragmented field and out...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15727
• PDF: https://arxiv.org/pdf/2601.15727
• Github: https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #KernelGeneration #AI #Automation #CodeGeneration
LLM Prompt Evaluation for Educational Applications

📝 Summary:
This study presents a systematic framework using tournament-style testing and Glicko2 ratings to evaluate LLM prompts for education. A prompt emphasizing metacognitive learning strategies outperformed others, demonstrating evidence-based prompt development.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16134
• PDF: https://arxiv.org/pdf/2601.16134

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #Education #PromptEngineering #AIinEducation #Metacognition
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

📝 Summary:
Open-Sora 2.0 is a commercial-level video generation model trained for only $200k. It achieves performance comparable to top models. This open-source project aims to democratize access and foster innovation in video generation.

🔹 Publication Date: Published on Mar 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.09642
• PDF: https://arxiv.org/pdf/2503.09642
• Github: https://github.com/hpcaitech/open-sora

🔹 Models citing this paper:
https://huggingface.co/hpcai-tech/Open-Sora-v2
https://huggingface.co/Compumacy/OPensora

Spaces citing this paper:
https://huggingface.co/spaces/zumwaltboi/Sora2_test
https://huggingface.co/spaces/AverageAiLiker/vidsora-magic-wand
https://huggingface.co/spaces/AverageAiLiker/bot-tks1p3jy

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #OpenSora #GenerativeAI #DeepLearning #OpenSource
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

📝 Summary:
EvoCUA introduces an evolutionary computer-use agent that combines autonomous task generation with policy optimization. This scalable approach achieves a new state-of-the-art 56.7% success rate on the OSWorld benchmark, demonstrating a robust path for advancing native agent capabilities.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15876
• PDF: https://arxiv.org/pdf/2601.15876
• Github: https://github.com/meituan/EvoCUA

🔹 Models citing this paper:
https://huggingface.co/meituan/EvoCUA-32B-20260105
https://huggingface.co/meituan/EvoCUA-8B-20260105

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #Agents #MachineLearning #ReinforcementLearning #EvolutionaryAlgorithms
Media is too big
VIEW IN TELEGRAM
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

📝 Summary:
ActionMesh extends 3D diffusion models with a temporal axis to generate high-quality, rig-free animated 3D meshes. This 'temporal 3D diffusion' framework quickly creates topology-consistent animations from various inputs like video or text, achieving state-of-the-art results.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16148
• PDF: https://remysabathier.github.io/actionmesh/actionmesh_2026.pdf
• Project Page: https://remysabathier.github.io/actionmesh/
• Github: https://github.com/facebookresearch/actionmesh

🔹 Models citing this paper:
https://huggingface.co/facebook/ActionMesh

Spaces citing this paper:
https://huggingface.co/spaces/facebook/ActionMesh

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DAnimation #DiffusionModels #ComputerGraphics #DeepLearning #3DModeling
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

📝 Summary:
VLMs struggle to estimate task progress from partial views. ProgressLM-3B, a new training-based model, shows consistent improvements in progress reasoning across disjoint tasks, addressing this limitation.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15224
• PDF: https://arxiv.org/pdf/2601.15224
• Project Page: https://progresslm.github.io/ProgressLM/
• Github: https://github.com/ProgressLM/ProgressLM

🔹 Models citing this paper:
https://huggingface.co/Raymond-Qiancx/ProgressLM-3B-SFT
https://huggingface.co/Raymond-Qiancx/ProgressLM-3B-RL

Datasets citing this paper:
https://huggingface.co/datasets/Raymond-Qiancx/ProgressLM-Dataset

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VLM #ProgressReasoning #AI #MachineLearning #DeepLearning