✨START: Spatial and Textual Learning for Chart Understanding
📝 Summary:
START enhances multimodal large language models by integrating spatial and textual learning through chart-element grounding and chart-to-code generation, improving chart understanding and performance ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07186
• PDF: https://arxiv.org/pdf/2512.07186
• Github: https://github.com/dragonlzm/START
🔹 Models citing this paper:
• https://huggingface.co/zhuomingliu/START
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuomingliu/CS-Bench
• https://huggingface.co/datasets/zhuomingliu/START-Dataset
• https://huggingface.co/datasets/zhuomingliu/START_eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
START enhances multimodal large language models by integrating spatial and textual learning through chart-element grounding and chart-to-code generation, improving chart understanding and performance ...
🔹 Publication Date: Published on Dec 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07186
• PDF: https://arxiv.org/pdf/2512.07186
• Github: https://github.com/dragonlzm/START
🔹 Models citing this paper:
• https://huggingface.co/zhuomingliu/START
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuomingliu/CS-Bench
• https://huggingface.co/datasets/zhuomingliu/START-Dataset
• https://huggingface.co/datasets/zhuomingliu/START_eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Memory in the Age of AI Agents
📝 Summary:
This survey provides an updated overview of agent memory research, distinguishing its forms, functions, and dynamics, and highlights emerging research directions. AI-generated summary Memory has emerg...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13564
• PDF: https://arxiv.org/pdf/2512.13564
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This survey provides an updated overview of agent memory research, distinguishing its forms, functions, and dynamics, and highlights emerging research directions. AI-generated summary Memory has emerg...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13564
• PDF: https://arxiv.org/pdf/2512.13564
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
📝 Summary:
ReFusion, a novel masked diffusion model, improves performance and efficiency by using slot-based parallel decoding, achieving superior results compared to autoregressive models and traditional masked...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13586
• PDF: https://arxiv.org/pdf/2512.13586
• Github: https://github.com/ML-GSAI/ReFusion
🔹 Models citing this paper:
• https://huggingface.co/GSAI-ML/ReFusion
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ReFusion, a novel masked diffusion model, improves performance and efficiency by using slot-based parallel decoding, achieving superior results compared to autoregressive models and traditional masked...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13586
• PDF: https://arxiv.org/pdf/2512.13586
• Github: https://github.com/ML-GSAI/ReFusion
🔹 Models citing this paper:
• https://huggingface.co/GSAI-ML/ReFusion
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos
📝 Summary:
A Spatial-Aware VLA Pretraining paradigm improves 3D spatial understanding in robots by aligning 2D visual inputs with 3D actions using dual-encoder architecture with a 3D visual encoder. AI-generated...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13080
• PDF: https://arxiv.org/pdf/2512.13080
• Project Page: https://beingbeyond.github.io/VIPA-VLA/
• Github: https://beingbeyond.github.io/VIPA-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A Spatial-Aware VLA Pretraining paradigm improves 3D spatial understanding in robots by aligning 2D visual inputs with 3D actions using dual-encoder architecture with a 3D visual encoder. AI-generated...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13080
• PDF: https://arxiv.org/pdf/2512.13080
• Project Page: https://beingbeyond.github.io/VIPA-VLA/
• Github: https://beingbeyond.github.io/VIPA-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
📝 Summary:
VG-AVS, a task and framework fine-tunes VLMs to select the most informative next viewpoint for visual question answering, enhancing performance and generalization. AI-generated summary Vision Language...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13250
• PDF: https://arxiv.org/pdf/2512.13250
• Project Page: https://active-view-selection.github.io
• Github: https://github.com/KAIST-Visual-AI-Group/VG-AVS
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VG-AVS, a task and framework fine-tunes VLMs to select the most informative next viewpoint for visual question answering, enhancing performance and generalization. AI-generated summary Vision Language...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13250
• PDF: https://arxiv.org/pdf/2512.13250
• Project Page: https://active-view-selection.github.io
• Github: https://github.com/KAIST-Visual-AI-Group/VG-AVS
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LitePT: Lighter Yet Stronger Point Transformer
📝 Summary:
LitePT combines early convolutions and deep attention for 3D point clouds, using PointROPE positional encoding. This new model is highly efficient, outperforming state-of-the-art while using fewer resources.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13689
• PDF: https://arxiv.org/pdf/2512.13689
• Project Page: https://litept.github.io/
• Github: https://github.com/prs-eth/LitePT
🔹 Models citing this paper:
• https://huggingface.co/yuanwenyue/LitePT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LitePT combines early convolutions and deep attention for 3D point clouds, using PointROPE positional encoding. This new model is highly efficient, outperforming state-of-the-art while using fewer resources.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13689
• PDF: https://arxiv.org/pdf/2512.13689
• Project Page: https://litept.github.io/
• Github: https://github.com/prs-eth/LitePT
🔹 Models citing this paper:
• https://huggingface.co/yuanwenyue/LitePT
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer
📝 Summary:
AEGIS, a Vision-Language-Safe Action architecture with a plug-and-play safety constraint layer using control barrier functions, enhances safety and performance in robotic manipulation tasks. AI-genera...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11891
• PDF: https://arxiv.org/pdf/2512.11891
• Github: https://vlsa-aegis.github.io
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
AEGIS, a Vision-Language-Safe Action architecture with a plug-and-play safety constraint layer using control barrier functions, enhances safety and performance in robotic manipulation tasks. AI-genera...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11891
• PDF: https://arxiv.org/pdf/2512.11891
• Github: https://vlsa-aegis.github.io
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨Towards Interactive Intelligence for Digital Humans
📝 Summary:
Interactive Intelligence, realized through Mio framework, enables advanced digital humans with personality, adaptive interactions, and self-evolution, surpassing current benchmarks. AI-generated summa...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13674
• PDF: https://arxiv.org/pdf/2512.13674
• Project Page: https://shandaai.github.io/project_mio_page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DigitalHumans #InteractiveAI #ArtificialIntelligence #AIResearch #VirtualAgents
📝 Summary:
Interactive Intelligence, realized through Mio framework, enables advanced digital humans with personality, adaptive interactions, and self-evolution, surpassing current benchmarks. AI-generated summa...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13674
• PDF: https://arxiv.org/pdf/2512.13674
• Project Page: https://shandaai.github.io/project_mio_page/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DigitalHumans #InteractiveAI #ArtificialIntelligence #AIResearch #VirtualAgents
✨DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders
📝 Summary:
DiffusionBrowser is a lightweight decoder for interactive video previews during diffusion model denoising. It enables fast multi-modal previews, enhancing user control and revealing how video details are composed internally.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13690
• PDF: https://arxiv.org/pdf/2512.13690
• Github: https://susunghong.github.io/DiffusionBrowser
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DiffusionBrowser is a lightweight decoder for interactive video previews during diffusion model denoising. It enables fast multi-modal previews, enhancing user control and revealing how video details are composed internally.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13690
• PDF: https://arxiv.org/pdf/2512.13690
• Github: https://susunghong.github.io/DiffusionBrowser
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨RecTok: Reconstruction Distillation along Rectified Flow
📝 Summary:
RecTok improves diffusion models by enriching forward flow semantics and enhancing reconstruction, achieving state-of-the-art results with high-dimensional visual tokenizers. AI-generated summary Visu...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13421
• PDF: https://arxiv.org/pdf/2512.13421
• Project Page: https://shi-qingyu.github.io/rectok.github.io/
• Github: https://github.com/Shi-qingyu/RecTok
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
RecTok improves diffusion models by enriching forward flow semantics and enhancing reconstruction, achieving state-of-the-art results with high-dimensional visual tokenizers. AI-generated summary Visu...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13421
• PDF: https://arxiv.org/pdf/2512.13421
• Project Page: https://shi-qingyu.github.io/rectok.github.io/
• Github: https://github.com/Shi-qingyu/RecTok
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Self-Supervised Prompt Optimization
📝 Summary:
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data. AI-generated summary Well-de...
🔹 Publication Date: Published on Feb 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.06855
• PDF: https://arxiv.org/pdf/2502.06855
• Github: https://github.com/geekan/metagpt
✨ Spaces citing this paper:
• https://huggingface.co/spaces/XiangJinYu/SPO
• https://huggingface.co/spaces/tang-x/SPO
• https://huggingface.co/spaces/ositamiles/SPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A self-supervised framework optimizes prompts for both closed and open-ended tasks by evaluating LLM outputs without external references, reducing costs and required data. AI-generated summary Well-de...
🔹 Publication Date: Published on Feb 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.06855
• PDF: https://arxiv.org/pdf/2502.06855
• Github: https://github.com/geekan/metagpt
✨ Spaces citing this paper:
• https://huggingface.co/spaces/XiangJinYu/SPO
• https://huggingface.co/spaces/tang-x/SPO
• https://huggingface.co/spaces/ositamiles/SPO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨DeepSeek-V3 Technical Report
📝 Summary:
DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost. AI-ge...
🔹 Publication Date: Published on Dec 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.19437
• PDF: https://arxiv.org/pdf/2412.19437
• Github: https://github.com/deepseek-ai/deepseek-v3
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-V3
• https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
• https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/weege007/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DeepSeek-V3 is a parameter-efficient Mixture-of-Experts language model using MLA and DeepSeekMoE architectures, achieving high performance with efficient training and minimal computational cost. AI-ge...
🔹 Publication Date: Published on Dec 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.19437
• PDF: https://arxiv.org/pdf/2412.19437
• Github: https://github.com/deepseek-ai/deepseek-v3
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-V3
• https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
• https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/weege007/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
DeepSeek-V3 Technical Report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training,...
✨DeepSeek-OCR: Contexts Optical Compression
📝 Summary:
DeepSeek-OCR uses optical 2D mapping to compress long contexts, achieving high OCR precision with reduced vision tokens and demonstrating practical value in document processing. AI-generated summary W...
🔹 Publication Date: Published on Oct 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/unsloth/DeepSeek-OCR
• https://huggingface.co/Jalea96/DeepSeek-OCR-bnb-4bit-NF4
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
• https://huggingface.co/spaces/prithivMLmods/Super-OCRs-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DeepSeek-OCR uses optical 2D mapping to compress long contexts, achieving high OCR precision with reduced vision tokens and demonstrating practical value in document processing. AI-generated summary W...
🔹 Publication Date: Published on Oct 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/deepseek-ocr-contexts-optical-compression
• PDF: https://arxiv.org/pdf/2510.18234
• Github: https://github.com/deepseek-ai/DeepSeek-OCR
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-OCR
• https://huggingface.co/unsloth/DeepSeek-OCR
• https://huggingface.co/Jalea96/DeepSeek-OCR-bnb-4bit-NF4
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merterbak/DeepSeek-OCR-Demo
• https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO
• https://huggingface.co/spaces/prithivMLmods/Super-OCRs-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Arxivexplained
DeepSeek-OCR: Contexts Optical Compression - Explained Simply
By Haoran Wei, Yaofeng Sun, Yukun Li. # DeepSeek-OCR: A Game-Changer for Processing Text-Heavy Documents
**The Problem:** Current AI syst...
**The Problem:** Current AI syst...
✨Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs
📝 Summary:
mmGRPO, a multi-module extension of GRPO, enhances accuracy in modular AI systems by optimizing LM calls and prompts across various tasks. AI-generated summary Group Relative Policy Optimization ( GRP...
🔹 Publication Date: Published on Aug 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• Github: https://github.com/stanfordnlp/dspy
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
mmGRPO, a multi-module extension of GRPO, enhances accuracy in modular AI systems by optimizing LM calls and prompts across various tasks. AI-generated summary Group Relative Policy Optimization ( GRP...
🔹 Publication Date: Published on Aug 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• Github: https://github.com/stanfordnlp/dspy
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Directional Textual Inversion for Personalized Text-to-Image Generation
📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
✨One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
📝 Summary:
One-to-All Animation is a unified framework for high-fidelity character animation and image pose transfer. It tackles misaligned and partially visible references using self-supervised outpainting, a robust reference extractor, and identity-robust pose control to outperform existing methods.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22940
• PDF: https://arxiv.org/pdf/2511.22940
• Project Page: https://ssj9596.github.io/one-to-all-animation-project/
• Github: https://github.com/ssj9596/One-to-All-Animation
🔹 Models citing this paper:
• https://huggingface.co/MochunniaN1/One-to-All-14b
• https://huggingface.co/MochunniaN1/One-to-All-1.3b_2
• https://huggingface.co/MochunniaN1/One-to-All-1.3b_1
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MochunniaN1/One-to-All-sub
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CharacterAnimation #PoseTransfer #ComputerVision #AI #DeepLearning
📝 Summary:
One-to-All Animation is a unified framework for high-fidelity character animation and image pose transfer. It tackles misaligned and partially visible references using self-supervised outpainting, a robust reference extractor, and identity-robust pose control to outperform existing methods.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22940
• PDF: https://arxiv.org/pdf/2511.22940
• Project Page: https://ssj9596.github.io/one-to-all-animation-project/
• Github: https://github.com/ssj9596/One-to-All-Animation
🔹 Models citing this paper:
• https://huggingface.co/MochunniaN1/One-to-All-14b
• https://huggingface.co/MochunniaN1/One-to-All-1.3b_2
• https://huggingface.co/MochunniaN1/One-to-All-1.3b_1
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MochunniaN1/One-to-All-sub
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CharacterAnimation #PoseTransfer #ComputerVision #AI #DeepLearning
arXiv.org
One-to-All Animation: Alignment-Free Character Animation and Image...
Recent advances in diffusion models have greatly improved pose-driven character animation. However, existing methods are limited to spatially aligned reference-pose pairs with matched skeletal...
✨What matters for Representation Alignment: Global Information or Spatial Structure?
📝 Summary:
Representation alignment enhances generative training by transferring spatial structure from pretrained vision encoders to diffusion models, surpassing the importance of global semantic performance. A...
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10794
• PDF: https://arxiv.org/pdf/2512.10794
• Project Page: https://end2end-diffusion.github.io/irepa
• Github: https://github.com/end2end-diffusion/irepa
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Representation alignment enhances generative training by transferring spatial structure from pretrained vision encoders to diffusion models, surpassing the importance of global semantic performance. A...
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10794
• PDF: https://arxiv.org/pdf/2512.10794
• Project Page: https://end2end-diffusion.github.io/irepa
• Github: https://github.com/end2end-diffusion/irepa
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
📝 Summary:
DrivePI is a new spatial-aware 4D MLLM for autonomous driving, unifying understanding, 3D perception, prediction, and planning. It integrates point clouds, images, and language instructions, achieving state-of-the-art performance by outperforming existing VLA and specialized VA models.
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12799
• PDF: https://arxiv.org/pdf/2512.12799
• Github: https://github.com/happinesslz/DrivePI
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutonomousDriving #MLLM #ComputerVision #DeepLearning #AI
📝 Summary:
DrivePI is a new spatial-aware 4D MLLM for autonomous driving, unifying understanding, 3D perception, prediction, and planning. It integrates point clouds, images, and language instructions, achieving state-of-the-art performance by outperforming existing VLA and specialized VA models.
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12799
• PDF: https://arxiv.org/pdf/2512.12799
• Github: https://github.com/happinesslz/DrivePI
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutonomousDriving #MLLM #ComputerVision #DeepLearning #AI