✨RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
📝 Summary:
RePlan, a plan-then-execute framework, enhances instruction-based image editing by combining a vision-language planner with a diffusion editor, achieving superior performance in complex and intricate ...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16864
• PDF: https://arxiv.org/pdf/2512.16864
• Project Page: https://replan-iv-edit.github.io/
• Github: https://github.com/dvlab-research/RePlan
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
RePlan, a plan-then-execute framework, enhances instruction-based image editing by combining a vision-language planner with a diffusion editor, achieving superior performance in complex and intricate ...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16864
• PDF: https://arxiv.org/pdf/2512.16864
• Project Page: https://replan-iv-edit.github.io/
• Github: https://github.com/dvlab-research/RePlan
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨AdaTooler-V: Adaptive Tool-Use for Images and Videos
📝 Summary:
AdaTooler-V, a multimodal large language model, adaptively uses vision tools based on reinforcement learning, improving performance and reducing unnecessary tool invocations in visual reasoning tasks....
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16918
• PDF: https://arxiv.org/pdf/2512.16918
• Github: https://github.com/CYWang735/AdaTooler-V
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
AdaTooler-V, a multimodal large language model, adaptively uses vision tools based on reinforcement learning, improving performance and reducing unnecessary tool invocations in visual reasoning tasks....
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16918
• PDF: https://arxiv.org/pdf/2512.16918
• Github: https://github.com/CYWang735/AdaTooler-V
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
📝 Summary:
N3D-VLM integrates native 3D perception and reasoning in vision-language models, enabling precise 3D localization and spatial understanding with a large-scale dataset. AI-generated summary While curre...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16561
• PDF: https://arxiv.org/pdf/2512.16561
• Github: https://github.com/W-Ted/N3D-VLM
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
N3D-VLM integrates native 3D perception and reasoning in vision-language models, enabling precise 3D localization and spatial understanding with a large-scale dataset. AI-generated summary While curre...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16561
• PDF: https://arxiv.org/pdf/2512.16561
• Github: https://github.com/W-Ted/N3D-VLM
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
Media is too big
VIEW IN TELEGRAM
✨The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
📝 Summary:
WorldCanvas generates coherent, controllable world events by integrating text, trajectories, and reference images. This multimodal approach surpasses text-only or image-to-video methods, creating videos with preserved object identity and temporal consistency. It advances world models from passive...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16924
• PDF: https://arxiv.org/pdf/2512.16924
• Project Page: https://worldcanvas.github.io/
• Github: https://github.com/pPetrichor/WorldCanvas
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
WorldCanvas generates coherent, controllable world events by integrating text, trajectories, and reference images. This multimodal approach surpasses text-only or image-to-video methods, creating videos with preserved object identity and temporal consistency. It advances world models from passive...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16924
• PDF: https://arxiv.org/pdf/2512.16924
• Project Page: https://worldcanvas.github.io/
• Github: https://github.com/pPetrichor/WorldCanvas
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
📝 Summary:
Log-linear Sparse Attention (LLSA) improves the efficiency of diffusion transformers by reducing computational costs for long token sequences through a hierarchical structure, enhancing training speed...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16615
• PDF: https://arxiv.org/pdf/2512.16615
• Github: https://github.com/SingleZombie/LLSA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Log-linear Sparse Attention (LLSA) improves the efficiency of diffusion transformers by reducing computational costs for long token sequences through a hierarchical structure, enhancing training speed...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16615
• PDF: https://arxiv.org/pdf/2512.16615
• Github: https://github.com/SingleZombie/LLSA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Coupled Variational Reinforcement Learning for Language Model General Reasoning
📝 Summary:
CoVRL, a hybrid approach combining variational inference and reinforcement learning, enhances language model reasoning by coupling prior and posterior distributions, improving performance and coherenc...
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12576
• PDF: https://arxiv.org/pdf/2512.12576
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
CoVRL, a hybrid approach combining variational inference and reinforcement learning, enhances language model reasoning by coupling prior and posterior distributions, improving performance and coherenc...
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12576
• PDF: https://arxiv.org/pdf/2512.12576
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks
📝 Summary:
VenusBench-GD is a comprehensive, multi-platform GUI grounding benchmark with a hierarchical evaluation. It reveals general models excel at basic tasks, but specialized models are still better for advanced, despite overfitting.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16501
• PDF: https://arxiv.org/pdf/2512.16501
• Project Page: https://ui-venus.github.io/VenusBench-GD/
✨ Datasets citing this paper:
• https://huggingface.co/datasets/inclusionAI/VenusBench-GD
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VenusBench-GD is a comprehensive, multi-platform GUI grounding benchmark with a hierarchical evaluation. It reveals general models excel at basic tasks, but specialized models are still better for advanced, despite overfitting.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16501
• PDF: https://arxiv.org/pdf/2512.16501
• Project Page: https://ui-venus.github.io/VenusBench-GD/
✨ Datasets citing this paper:
• https://huggingface.co/datasets/inclusionAI/VenusBench-GD
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion
📝 Summary:
REGLUE, a unified latent diffusion framework, enhances image synthesis by jointly modeling VAE latents, patch-level VFM semantics, and global tokens, improving semantic supervision and convergence. AI...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16636
• PDF: https://arxiv.org/pdf/2512.16636
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
REGLUE, a unified latent diffusion framework, enhances image synthesis by jointly modeling VAE latents, patch-level VFM semantics, and global tokens, improving semantic supervision and convergence. AI...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16636
• PDF: https://arxiv.org/pdf/2512.16636
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
📝 Summary:
FlashPortrait is a diffusion-based video transformer for long-portrait animation that ensures ID consistency and achieves 6x acceleration through a dynamic sliding-window scheme and higher-order laten...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16900
• PDF: https://arxiv.org/pdf/2512.16900
• Project Page: https://francis-rings.github.io/FlashPortrait/
• Github: https://github.com/Francis-Rings/FlashPortrait
🔹 Models citing this paper:
• https://huggingface.co/FrancisRing/FlashPortrait
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
FlashPortrait is a diffusion-based video transformer for long-portrait animation that ensures ID consistency and achieves 6x acceleration through a dynamic sliding-window scheme and higher-order laten...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16900
• PDF: https://arxiv.org/pdf/2512.16900
• Project Page: https://francis-rings.github.io/FlashPortrait/
• Github: https://github.com/Francis-Rings/FlashPortrait
🔹 Models citing this paper:
• https://huggingface.co/FrancisRing/FlashPortrait
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language
📝 Summary:
Insight Miner, a large-scale multimodal model, generates high-quality time-series denoscriptions using a novel agentic workflow and outperforms existing models with the help of the TS-Insights dataset. ...
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11251
• PDF: https://arxiv.org/pdf/2512.11251
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhykoties/time-series-language-alignment
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Insight Miner, a large-scale multimodal model, generates high-quality time-series denoscriptions using a novel agentic workflow and outperforms existing models with the help of the TS-Insights dataset. ...
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11251
• PDF: https://arxiv.org/pdf/2512.11251
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhykoties/time-series-language-alignment
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
Media is too big
VIEW IN TELEGRAM
✨Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation
📝 Summary:
A novel feed-forward framework, Make-It-Poseable, reformulates character posing as a latent-space transformation problem, using a latent posing transformer and dense pose representation to achieve sup...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16767
• PDF: https://arxiv.org/pdf/2512.16767
• Project Page: https://jasongzy.github.io/Make-It-Poseable/
• Github: https://github.com/jasongzy/Make-It-Poseable
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A novel feed-forward framework, Make-It-Poseable, reformulates character posing as a latent-space transformation problem, using a latent posing transformer and dense pose representation to achieve sup...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16767
• PDF: https://arxiv.org/pdf/2512.16767
• Project Page: https://jasongzy.github.io/Make-It-Poseable/
• Github: https://github.com/jasongzy/Make-It-Poseable
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch
📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Vibe Spaces for Creatively Connecting and Expressing Visual Concepts
📝 Summary:
Vibe Blending uses Vibe Space, a hierarchical graph manifold, to create coherent and creative image hybrids. It learns geodesics in feature spaces, outperforming current methods in creativity and coherence as rated by humans.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14884
• PDF: https://arxiv.org/pdf/2512.14884
• Project Page: https://huzeyann.github.io/VibeSpace-webpage/
• Github: https://github.com/huzeyann/VibeSpace
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #AI #MachineLearning #CreativeAI
📝 Summary:
Vibe Blending uses Vibe Space, a hierarchical graph manifold, to create coherent and creative image hybrids. It learns geodesics in feature spaces, outperforming current methods in creativity and coherence as rated by humans.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14884
• PDF: https://arxiv.org/pdf/2512.14884
• Project Page: https://huzeyann.github.io/VibeSpace-webpage/
• Github: https://github.com/huzeyann/VibeSpace
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #ComputerVision #AI #MachineLearning #CreativeAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering
📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
📝 Summary:
FrameDiffuser is an autoregressive neural rendering framework. It generates temporally consistent, photorealistic frames using G-buffer data and its own previous output. This achieves interactive speed and high quality compared to prior methods.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16670
• PDF: https://arxiv.org/pdf/2512.16670
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#NeuralRendering #DiffusionModels #ComputerGraphics #RealtimeRendering #DeepLearning
❤1
✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
✨Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
✨Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning