ML Research Hub – Telegram
ML Research Hub
32.6K subscribers
3.92K photos
217 videos
23 files
4.22K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

📝 Summary:
RePlan, a plan-then-execute framework, enhances instruction-based image editing by combining a vision-language planner with a diffusion editor, achieving superior performance in complex and intricate ...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16864
• PDF: https://arxiv.org/pdf/2512.16864
• Project Page: https://replan-iv-edit.github.io/
• Github: https://github.com/dvlab-research/RePlan

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
AdaTooler-V: Adaptive Tool-Use for Images and Videos

📝 Summary:
AdaTooler-V, a multimodal large language model, adaptively uses vision tools based on reinforcement learning, improving performance and reducing unnecessary tool invocations in visual reasoning tasks....

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16918
• PDF: https://arxiv.org/pdf/2512.16918
• Github: https://github.com/CYWang735/AdaTooler-V

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
This media is not supported in your browser
VIEW IN TELEGRAM
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

📝 Summary:
N3D-VLM integrates native 3D perception and reasoning in vision-language models, enabling precise 3D localization and spatial understanding with a large-scale dataset. AI-generated summary While curre...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16561
• PDF: https://arxiv.org/pdf/2512.16561
• Github: https://github.com/W-Ted/N3D-VLM

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Media is too big
VIEW IN TELEGRAM
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

📝 Summary:
WorldCanvas generates coherent, controllable world events by integrating text, trajectories, and reference images. This multimodal approach surpasses text-only or image-to-video methods, creating videos with preserved object identity and temporal consistency. It advances world models from passive...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16924
• PDF: https://arxiv.org/pdf/2512.16924
• Project Page: https://worldcanvas.github.io/
• Github: https://github.com/pPetrichor/WorldCanvas

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

📝 Summary:
Log-linear Sparse Attention (LLSA) improves the efficiency of diffusion transformers by reducing computational costs for long token sequences through a hierarchical structure, enhancing training speed...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16615
• PDF: https://arxiv.org/pdf/2512.16615
• Github: https://github.com/SingleZombie/LLSA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Coupled Variational Reinforcement Learning for Language Model General Reasoning

📝 Summary:
CoVRL, a hybrid approach combining variational inference and reinforcement learning, enhances language model reasoning by coupling prior and posterior distributions, improving performance and coherenc...

🔹 Publication Date: Published on Dec 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12576
• PDF: https://arxiv.org/pdf/2512.12576

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

📝 Summary:
VenusBench-GD is a comprehensive, multi-platform GUI grounding benchmark with a hierarchical evaluation. It reveals general models excel at basic tasks, but specialized models are still better for advanced, despite overfitting.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16501
• PDF: https://arxiv.org/pdf/2512.16501
• Project Page: https://ui-venus.github.io/VenusBench-GD/

Datasets citing this paper:
https://huggingface.co/datasets/inclusionAI/VenusBench-GD

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

📝 Summary:
REGLUE, a unified latent diffusion framework, enhances image synthesis by jointly modeling VAE latents, patch-level VFM semantics, and global tokens, improving semantic supervision and convergence. AI...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16636
• PDF: https://arxiv.org/pdf/2512.16636

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

📝 Summary:
FlashPortrait is a diffusion-based video transformer for long-portrait animation that ensures ID consistency and achieves 6x acceleration through a dynamic sliding-window scheme and higher-order laten...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16900
• PDF: https://arxiv.org/pdf/2512.16900
• Project Page: https://francis-rings.github.io/FlashPortrait/
• Github: https://github.com/Francis-Rings/FlashPortrait

🔹 Models citing this paper:
https://huggingface.co/FrancisRing/FlashPortrait

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

📝 Summary:
Insight Miner, a large-scale multimodal model, generates high-quality time-series denoscriptions using a novel agentic workflow and outperforms existing models with the help of the TS-Insights dataset. ...

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11251
• PDF: https://arxiv.org/pdf/2512.11251

Datasets citing this paper:
https://huggingface.co/datasets/zhykoties/time-series-language-alignment

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Media is too big
VIEW IN TELEGRAM
Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation

📝 Summary:
A novel feed-forward framework, Make-It-Poseable, reformulates character posing as a latent-space transformation problem, using a latent posing transformer and dense pose representation to achieve sup...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16767
• PDF: https://arxiv.org/pdf/2512.16767
• Project Page: https://jasongzy.github.io/Make-It-Poseable/
• Github: https://github.com/jasongzy/Make-It-Poseable

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch
1
This media is not supported in your browser
VIEW IN TELEGRAM
Vibe Spaces for Creatively Connecting and Expressing Visual Concepts

📝 Summary:
Vibe Blending uses Vibe Space, a hierarchical graph manifold, to create coherent and creative image hybrids. It learns geodesics in feature spaces, outperforming current methods in creativity and coherence as rated by humans.

🔹 Publication Date: Published on Dec 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14884
• PDF: https://arxiv.org/pdf/2512.14884
• Project Page: https://huzeyann.github.io/VibeSpace-webpage/
• Github: https://github.com/huzeyann/VibeSpace

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ImageGeneration #ComputerVision #AI #MachineLearning #CreativeAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
1
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning