ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

📝 Summary:
PosterCopilot enhances professional graphic design by training LMMs with a three-stage strategy for geometrically accurate and aesthetically superior layouts. This framework enables controllable, iterative, layer-specific editing, improving on existing automated design methods.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04082
• PDF: https://arxiv.org/pdf/2512.04082
• Project Page: https://postercopilot.github.io/
• Github: https://github.com/JiazheWei/PosterCopilot

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GraphicDesign #AI #ComputationalDesign #LayoutDesign #DesignAutomation
1
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

📝 Summary:
Large Multimodal Models struggle with long video understanding due to context limits. The DIG framework adapts frame selection to query types, using efficient uniform sampling for global queries and specialized selection for localized ones. This approach significantly improves LMM performance on ...

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04000
• PDF: https://arxiv.org/pdf/2512.04000
• Project Page: https://github.com/Jialuo-Li/DIG
• Github: https://github.com/Jialuo-Li/DIG

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoUnderstanding #LMMs #MultimodalAI #DeepLearning #ComputerVision
1
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

📝 Summary:
Pyramid Sparse Attention PSA introduces multi-level pooled key-value representations to overcome information loss in traditional sparse attention. It dynamically retains critical information, improving efficiency and performance for video understanding and generation tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04025
• PDF: https://arxiv.org/pdf/2512.04025
• Project Page: https://ziplab.co/PSA/
• Github: https://github.com/ziplab/Pyramid-Sparse-Attention

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SparseAttention #VideoUnderstanding #VideoGeneration #DeepLearning #ComputerVision
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

📝 Summary:
4DLangVGGT is a new Transformer framework for 4D scene understanding. It integrates geometry and language to enable scalable, open-vocabulary semantic fields, improving generalization and efficiency over prior methods.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05060
• PDF: https://arxiv.org/pdf/2512.05060
• Github: https://hustvl.github.io/4DLangVGGT/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#4DSceneUnderstanding #Transformer #ComputerVision #DeepLearning #AI
SIMA 2: A Generalist Embodied Agent for Virtual Worlds

📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

📝 Summary:
Reward Forcing improves streaming video generation by using EMA-Sink to update context tokens, preventing static initial frames. It also introduces Rewarded Distribution Matching Distillation to prioritize dynamic content, enhancing motion quality and achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04678
• PDF: https://arxiv.org/pdf/2512.04678
• Project Page: https://reward-forcing.github.io/
• Github: https://reward-forcing.github.io/

🔹 Models citing this paper:
https://huggingface.co/JaydenLu666/Reward-Forcing-T2V-1.3B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #ComputerVision #AIResearch
SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization

📝 Summary:
SeeNav-Agent improves Vision-Language Navigation with dual-view visual prompts, reducing perception errors and enhancing spatial understanding. It also uses SRGPO, a step-level reinforcement fine-tuning method, to boost planning and achieve higher success rates for VLN agents.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02631
• PDF: https://arxiv.org/pdf/2512.02631
• Github: https://github.com/WzcTHU/SeeNav-Agent

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageNavigation #AI #ReinforcementLearning #ComputerVision #DeepLearning
Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

📝 Summary:
Splannequin improves frozen 3D scenes from monocular videos by fixing artifacts in dynamic Gaussian splatting. It uses temporal anchoring for hidden or defective Gaussians to resolve ghosting and blur from sparse supervision. This boosts visual quality for high-fidelity, user-selectable frozen-ti...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05113
• PDF: https://arxiv.org/pdf/2512.05113
• Project Page: https://chien90190.github.io/splannequin/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ComputerVision #3DReconstruction #GaussianSplatting #NeuralRendering #DeepLearning
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

📝 Summary:
Training autonomous LLM agents requires scalable, high-quality interactive environments. The Nex ecosystem provides NexAU for complexity, NexA4A for diversity, and NexGAP for fidelity in environment construction. Nex-N1, trained using this infrastructure, outperforms SOTA models on agentic tasks.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04987
• PDF: https://arxiv.org/pdf/2512.04987
• Github: https://github.com/nex-agi/Nex-N1

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMAgents #LargeLanguageModels #AI #AISimulation #AIResearch
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

📝 Summary:
Semantic-First Diffusion SFD asynchronously denoises semantic and texture latents for image generation. This method prioritizes semantic formation, providing clearer guidance for texture refinement. SFD significantly improves convergence speed by up to 100x and enhances image quality.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04926
• PDF: https://arxiv.org/pdf/2512.04926

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DiffusionModels #ImageGeneration #SemanticAI #GenerativeAI #DeepLearning
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

📝 Summary:
SignRoundV2 is a post-training quantization framework for LLMs. It uses a sensitivity metric for bit allocation and pre-tuning for scales to achieve competitive accuracy even at 2-bit quantization, closing the gap with full-precision models.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04746
• PDF: https://arxiv.org/pdf/2512.04746

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #Quantization #DeepLearning #AI #MachineLearning
TV2TV: A Unified Framework for Interleaved Language and Video Generation

📝 Summary:
TV2TV is a unified framework for interleaved language and video generation, using a Mixture-of-Transformers. It learns to 'think in words' before 'acting in pixels,' enhancing visual quality, controllability, and prompt alignment. The model shows strong performance on video game and natural video...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05103
• PDF: https://arxiv.org/pdf/2512.05103

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #GenerativeAI #MultimodalAI #Transformers #AI
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

📝 Summary:
DAComp is a benchmark with 210 tasks for data engineering and analysis workflows. It reveals significant deficiencies in state-of-the-art agents, with success rates under 20% for engineering and below 40% for analysis, highlighting critical gaps.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04324
• PDF: https://arxiv.org/pdf/2512.04324
• Project Page: https://da-comp.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DataAgents #Benchmarking #DataEngineering #DataAnalysis #AIResearch
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

📝 Summary:
GRPO in tool-integrated RL collapses due to Lazy Likelihood Displacement LLD, a systematic drop in response likelihoods. LLDS regularization addresses this by preserving likelihoods, stabilizing training, preventing gradient explosion, and substantially improving performance.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04220
• PDF: https://arxiv.org/pdf/2512.04220

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #MachineLearning #AI #DeepLearning #AIResearch
1
Media is too big
VIEW IN TELEGRAM
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

📝 Summary:
Stable Video Infinity SVI generates infinite-length videos with high consistency and controllable stories. It introduces Error-Recycling Fine-Tuning, teaching the Diffusion Transformer to correct its self-generated errors and address the training-test discrepancy.

🔹 Publication Date: Published on Oct 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09212
• PDF: https://arxiv.org/pdf/2510.09212
• Project Page: https://stable-video-infinity.github.io/homepage/
• Github: https://github.com/vita-epfl/Stable-Video-Infinity

🔹 Models citing this paper:
https://huggingface.co/vita-video-gen/svi-model

Datasets citing this paper:
https://huggingface.co/datasets/vita-video-gen/svi-benchmark

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AI #DiffusionModels #DeepLearning #ComputerVision
This media is not supported in your browser
VIEW IN TELEGRAM
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

📝 Summary:
PaperDebugger is an in-editor, multi-agent academic writing assistant that integrates large language models directly into LaTeX environments. It allows deep interaction with document state and revision history for enhanced writing, review, and editing workflows.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02589
• PDF: https://arxiv.org/pdf/2512.02589
• Project Page: https://www.paperdebugger.com/
• Github: https://github.com/PaperDebugger/PaperDebugger

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AcademicWriting #LLM #MultiAgentSystems #ResearchTools #AI
Media is too big
VIEW IN TELEGRAM
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

📝 Summary:
DynamicVerse introduces a 4D world modeling framework for dynamic real-world videos, overcoming existing dataset limitations. It integrates large vision, geometric, and multimodal models to create a vast dataset with metric-scale annotations. This approach achieves superior performance in depth, ...

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03000
• PDF: https://arxiv.org/pdf/2512.03000
• Project Page: https://dynamic-verse.github.io/
• Github: https://dynamic-verse.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#4DModeling #MultimodalAI #ComputerVision #DeepLearning #AIResearch
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

📝 Summary:
DraCo is a novel text-to-image generation method that uses interleaved reasoning with both textual and visual content. It generates low-resolution drafts, verifies semantic alignment, and refines images to address coarse textual planning and rare attribute generation. DraCo significantly outperfo...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05112
• PDF: https://arxiv.org/pdf/2512.05112
• Github: https://github.com/CaraJ7/DraCo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TextToImage #GenerativeAI #DeepLearning #ComputerVision #AI
BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

📝 Summary:
This paper presents a video diffusion framework that decouples scene dynamics from camera pose. This enables precise 4D control over time and viewpoint for high-quality video generation, outperforming prior models in controllability.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05076
• PDF: https://arxiv.org/pdf/2512.05076

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #ComputerVision #AICameraControl
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

📝 Summary:
Standard diffusion corrupts image phase, destroying spatial structure. This paper introduces Phase-Preserving Diffusion phi-PD to preserve phase, enabling structure-aligned generation for tasks like re-rendering. It adds no cost and improves sim-to-real enhancement significantly.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05106
• PDF: https://arxiv.org/pdf/2512.05106

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DiffusionModels #GenerativeAI #ComputerVision #DeepLearning #AIResearch