✨MMGR: Multi-Modal Generative Reasoning
📝 Summary:
MMGR is a new benchmark assessing video and image model reasoning across physical, logical, and spatial domains. It uncovers major performance gaps, showing models struggle with abstract reasoning and planning, often prioritizing visual plausibility over true causal correctness.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14691
• PDF: https://arxiv.org/pdf/2512.14691
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MMGR is a new benchmark assessing video and image model reasoning across physical, logical, and spatial domains. It uncovers major performance gaps, showing models struggle with abstract reasoning and planning, often prioritizing visual plausibility over true causal correctness.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14691
• PDF: https://arxiv.org/pdf/2512.14691
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
📝 Summary:
RoboTracer, a 3D-aware visual language model, enhances spatial tracing by combining supervised and reinforcement fine-tuning with a universal spatial encoder and regression-supervised decoder, achievi...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13660
• PDF: https://arxiv.org/pdf/2512.13660
• Project Page: https://zhoues.github.io/RoboTracer/
• Github: https://zhoues.github.io/RoboTracer/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
RoboTracer, a 3D-aware visual language model, enhances spatial tracing by combining supervised and reinforcement fine-tuning with a universal spatial encoder and regression-supervised decoder, achievi...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13660
• PDF: https://arxiv.org/pdf/2512.13660
• Project Page: https://zhoues.github.io/RoboTracer/
• Github: https://zhoues.github.io/RoboTracer/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨RecGPT-V2 Technical Report
📝 Summary:
RecGPT-V2 enhances recommender systems by integrating a Hierarchical Multi-Agent System, Hybrid Representation Inference, Meta-Prompting, constrained reinforcement learning, and an Agent-as-a-Judge fr...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14503
• PDF: https://arxiv.org/pdf/2512.14503
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
RecGPT-V2 enhances recommender systems by integrating a Hierarchical Multi-Agent System, Hybrid Representation Inference, Meta-Prompting, constrained reinforcement learning, and an Agent-as-a-Judge fr...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14503
• PDF: https://arxiv.org/pdf/2512.14503
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
📝 Summary:
MemFlow dynamically updates a memory bank by retrieving relevant historical frames for each video chunk, ensuring narrative coherence and generation efficiency with minimal computational overhead. AI-...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14699
• PDF: https://arxiv.org/pdf/2512.14699
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MemFlow dynamically updates a memory bank by retrieving relevant historical frames for each video chunk, ensuring narrative coherence and generation efficiency with minimal computational overhead. AI-...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14699
• PDF: https://arxiv.org/pdf/2512.14699
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
📝 Summary:
A4-Agent, a training-free framework, decouples affordance prediction into three stages using specialized pre-trained models to enhance generalization and performance in real-world settings. AI-generat...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14442
• PDF: https://arxiv.org/pdf/2512.14442
• Project Page: https://zixinzhang02.github.io/A4-Agent-page/
• Github: https://zixinzhang02.github.io/A4-Agent-page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A4-Agent, a training-free framework, decouples affordance prediction into three stages using specialized pre-trained models to enhance generalization and performance in real-world settings. AI-generat...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14442
• PDF: https://arxiv.org/pdf/2512.14442
• Project Page: https://zixinzhang02.github.io/A4-Agent-page/
• Github: https://zixinzhang02.github.io/A4-Agent-page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
📝 Summary:
AR-to-dLM conversion enhances diffusion language models' efficiency and speed while maintaining task accuracy through refined attention patterns and token masking strategies. AI-generated summary Diff...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14067
• PDF: https://arxiv.org/pdf/2512.14067
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
AR-to-dLM conversion enhances diffusion language models' efficiency and speed while maintaining task accuracy through refined attention patterns and token masking strategies. AI-generated summary Diff...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14067
• PDF: https://arxiv.org/pdf/2512.14067
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation
📝 Summary:
Four abliteration tools are evaluated for their effectiveness in removing refusal representations from large language models, with findings showing variability in capability preservation and distribut...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13655
• PDF: https://arxiv.org/pdf/2512.13655
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Four abliteration tools are evaluated for their effectiveness in removing refusal representations from large language models, with findings showing variability in capability preservation and distribut...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13655
• PDF: https://arxiv.org/pdf/2512.13655
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
📝 Summary:
WorldPlay is a streaming video diffusion model that achieves real-time, interactive world modeling with long-term geometric consistency by using a Dual Action Representation, Reconstituted Context Mem...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14614
• PDF: https://arxiv.org/pdf/2512.14614
• Project Page: https://3d-models.hunyuan.tencent.com/world/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
WorldPlay is a streaming video diffusion model that achieves real-time, interactive world modeling with long-term geometric consistency by using a Dual Action Representation, Reconstituted Context Mem...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14614
• PDF: https://arxiv.org/pdf/2512.14614
• Project Page: https://3d-models.hunyuan.tencent.com/world/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement
📝 Summary:
ShowTable is a new pipeline that combines MLLMs and diffusion models to generate high-fidelity, creative infographics from data tables. It excels in multi-modal reasoning, generation, and error correction, outperforming existing methods for complex table visualization tasks.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13303
• PDF: https://arxiv.org/pdf/2512.13303
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ShowTable is a new pipeline that combines MLLMs and diffusion models to generate high-fidelity, creative infographics from data tables. It excels in multi-modal reasoning, generation, and error correction, outperforming existing methods for complex table visualization tasks.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13303
• PDF: https://arxiv.org/pdf/2512.13303
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨SS4D: Native 4D Generative Model via Structured Spacetime Latents
📝 Summary:
SS4D synthesizes dynamic 3D objects from monocular video using a native 4D generative model with structured spacetime latents, ensuring high fidelity, temporal coherence, and structural consistency. A...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14284
• PDF: https://arxiv.org/pdf/2512.14284
• Project Page: https://lizb6626.github.io/SS4D/
• Github: https://github.com/Lizb6626/SS4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SS4D synthesizes dynamic 3D objects from monocular video using a native 4D generative model with structured spacetime latents, ensuring high fidelity, temporal coherence, and structural consistency. A...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14284
• PDF: https://arxiv.org/pdf/2512.14284
• Project Page: https://lizb6626.github.io/SS4D/
• Github: https://github.com/Lizb6626/SS4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨RePo: Language Models with Context Re-Positioning
📝 Summary:
RePo, a novel context re-positioning mechanism in LLMs, reduces extraneous cognitive load by differentiably assigning token positions, enhancing performance on noisy and long contexts without compromi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14391
• PDF: https://arxiv.org/pdf/2512.14391
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/RePo-OLMo2-1B-stage2-L5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
RePo, a novel context re-positioning mechanism in LLMs, reduces extraneous cognitive load by differentiably assigning token positions, enhancing performance on noisy and long contexts without compromi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14391
• PDF: https://arxiv.org/pdf/2512.14391
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/RePo-OLMo2-1B-stage2-L5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
📝 Summary:
Sparse-LaViDa accelerates Masked Discrete Diffusion Models by dynamically truncating masked tokens during inference, maintaining quality and achieving up to a 2x speedup across various tasks. AI-gener...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14008
• PDF: https://arxiv.org/pdf/2512.14008
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Sparse-LaViDa accelerates Masked Discrete Diffusion Models by dynamically truncating masked tokens during inference, maintaining quality and achieving up to a 2x speedup across various tasks. AI-gener...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14008
• PDF: https://arxiv.org/pdf/2512.14008
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤2
✨Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure
📝 Summary:
A framework aggregates weak predictions to recover semantic structure, enabling coherent SVG animations and improving VLM interactions with vector graphics. AI-generated summary Scalable Vector Graphi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14336
• PDF: https://arxiv.org/pdf/2512.14336
• Project Page: https://yeolj00.github.io/personal-projects/vector-prism/
• Github: https://github.com/YeolJ00/vector-prism
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A framework aggregates weak predictions to recover semantic structure, enabling coherent SVG animations and improving VLM interactions with vector graphics. AI-generated summary Scalable Vector Graphi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14336
• PDF: https://arxiv.org/pdf/2512.14336
• Project Page: https://yeolj00.github.io/personal-projects/vector-prism/
• Github: https://github.com/YeolJ00/vector-prism
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
📝 Summary:
EVOLVE-VLA, a test-time training framework for Vision-Language-Action models, enables continuous adaptation through environmental interaction with minimal task-specific demonstrations, achieving signi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14666
• PDF: https://arxiv.org/pdf/2512.14666
• Project Page: https://showlab.github.io/EVOLVE-VLA/
• Github: https://github.com/showlab/EVOLVE-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
EVOLVE-VLA, a test-time training framework for Vision-Language-Action models, enables continuous adaptation through environmental interaction with minimal task-specific demonstrations, achieving signi...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14666
• PDF: https://arxiv.org/pdf/2512.14666
• Project Page: https://showlab.github.io/EVOLVE-VLA/
• Github: https://github.com/showlab/EVOLVE-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration
📝 Summary:
A task-adaptive Transformer (TAT) framework addresses challenges in medical image restoration by dynamically adjusting task-specific weights and loss balances, achieving state-of-the-art performance a...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14550
• PDF: https://arxiv.org/pdf/2512.14550
• Github: https://github.com/Yaziwel/TAT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A task-adaptive Transformer (TAT) framework addresses challenges in medical image restoration by dynamically adjusting task-specific weights and loss balances, achieving state-of-the-art performance a...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14550
• PDF: https://arxiv.org/pdf/2512.14550
• Github: https://github.com/Yaziwel/TAT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
📝 Summary:
Zoom-Zero, a coarse-to-fine framework, enhances grounded video question answering by improving temporal grounding and answer accuracy through a zoom-in accuracy reward and token-selective credit assig...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14273
• PDF: https://arxiv.org/pdf/2512.14273
• Github: https://xiaoqian-shen.github.io/Zoom-Zero/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Zoom-Zero, a coarse-to-fine framework, enhances grounded video question answering by improving temporal grounding and answer accuracy through a zoom-in accuracy reward and token-selective credit assig...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14273
• PDF: https://arxiv.org/pdf/2512.14273
• Github: https://xiaoqian-shen.github.io/Zoom-Zero/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MobileWorldBench: Towards Semantic World Modeling For Mobile Agents
📝 Summary:
A novel vision-language model framework improves task success rates for mobile GUI agents by using semantic world models instead of pixel-based predictions. AI-generated summary World models have show...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14014
• PDF: https://arxiv.org/pdf/2512.14014
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A novel vision-language model framework improves task success rates for mobile GUI agents by using semantic world models instead of pixel-based predictions. AI-generated summary World models have show...
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14014
• PDF: https://arxiv.org/pdf/2512.14014
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research