✨EvoVLA: Self-Evolving Vision-Language-Action Model
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
💸 PacketSDK--A New Way To Make Revenue From Your Apps
Regardless of whether your app is on desktop, mobile, TV, or Unity platforms, no matter which app monetization tools you’re using, PacketSDK can bring you additional revenue!
● Working Principle: Convert your app's active users into profits 👥→💵
● Product Features: Ad-free monetization 🚫, no user interference
● Additional Revenue: Fully compatible with your existing ad SDKs
● CCPA & GDPR: Based on user consent, no collection of any personal data 🔒
● Easy Integration: Only a few simple steps, taking approximately 30 minutes
Join us:https://www.packetsdk.com/?utm-source=SyWayQNK
Contact us & Estimated income:
Telegram:@Packet_SDK
Whatsapp:https://wa.me/85256440384
Teams:https://teams.live.com/l/invite/FBA_1zP2ehmA6Jn4AI
⏰ Join early ,earn early!
Regardless of whether your app is on desktop, mobile, TV, or Unity platforms, no matter which app monetization tools you’re using, PacketSDK can bring you additional revenue!
● Working Principle: Convert your app's active users into profits 👥→💵
● Product Features: Ad-free monetization 🚫, no user interference
● Additional Revenue: Fully compatible with your existing ad SDKs
● CCPA & GDPR: Based on user consent, no collection of any personal data 🔒
● Easy Integration: Only a few simple steps, taking approximately 30 minutes
Join us:https://www.packetsdk.com/?utm-source=SyWayQNK
Contact us & Estimated income:
Telegram:@Packet_SDK
Whatsapp:https://wa.me/85256440384
Teams:https://teams.live.com/l/invite/FBA_1zP2ehmA6Jn4AI
⏰ Join early ,earn early!
❤1
Media is too big
VIEW IN TELEGRAM
✨One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
❤2
✨Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
📝 Summary:
ReVeL converts multiple-choice questions to verifiable open-form questions to address unreliable MCQA metrics and answer guessing. This framework improves data efficiency and robustness for multimodal language models, revealing significant score inflation in MCQA benchmarks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17405
• PDF: https://arxiv.org/pdf/2511.17405
• Github: https://flageval-baai.github.io/ReVeL/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OpenQA #VisionLanguage #LanguageModels #AIEvaluation #MachineLearning
📝 Summary:
ReVeL converts multiple-choice questions to verifiable open-form questions to address unreliable MCQA metrics and answer guessing. This framework improves data efficiency and robustness for multimodal language models, revealing significant score inflation in MCQA benchmarks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17405
• PDF: https://arxiv.org/pdf/2511.17405
• Github: https://flageval-baai.github.io/ReVeL/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OpenQA #VisionLanguage #LanguageModels #AIEvaluation #MachineLearning
✨Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus
📝 Summary:
Benchmarking LLMs on subjective tasks like emotional intelligence is challenging. The Language Model Council LMC uses a democratic process with 20 LLMs to formulate, administer, and evaluate tests. This yields more robust, less biased rankings that align better with human leaderboards.
🔹 Publication Date: Published on Jun 12, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08598
• PDF: https://arxiv.org/pdf/2406.08598
• Github: https://github.com/llm-council/llm-council
✨ Datasets citing this paper:
• https://huggingface.co/datasets/llm-council/emotional_application
✨ Spaces citing this paper:
• https://huggingface.co/spaces/llm-council/llm-council
• https://huggingface.co/spaces/llm-council/sandbox
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #Benchmarking #AIEvaluation #FoundationModels #ConsensusAI
📝 Summary:
Benchmarking LLMs on subjective tasks like emotional intelligence is challenging. The Language Model Council LMC uses a democratic process with 20 LLMs to formulate, administer, and evaluate tests. This yields more robust, less biased rankings that align better with human leaderboards.
🔹 Publication Date: Published on Jun 12, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08598
• PDF: https://arxiv.org/pdf/2406.08598
• Github: https://github.com/llm-council/llm-council
✨ Datasets citing this paper:
• https://huggingface.co/datasets/llm-council/emotional_application
✨ Spaces citing this paper:
• https://huggingface.co/spaces/llm-council/llm-council
• https://huggingface.co/spaces/llm-council/sandbox
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #Benchmarking #AIEvaluation #FoundationModels #ConsensusAI
✨SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
📝 Summary:
SteadyDancer is an Image-to-Video framework that solves identity drift and motion control challenges in human image animation. It achieves robust first-frame preservation via condition reconciliation, adaptive pose, and hierarchical training, outperforming others while using fewer resources.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19320
• PDF: https://arxiv.org/pdf/2511.19320
• Project Page: https://mcg-nju.github.io/steadydancer-web
• Github: https://github.com/MCG-NJU/SteadyDancer
🔹 Models citing this paper:
• https://huggingface.co/MCG-NJU/SteadyDancer-14B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MCG-NJU/X-Dance
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HumanImageAnimation #ImageToVideo #FirstFramePreservation #GenerativeAI #ComputerVision
📝 Summary:
SteadyDancer is an Image-to-Video framework that solves identity drift and motion control challenges in human image animation. It achieves robust first-frame preservation via condition reconciliation, adaptive pose, and hierarchical training, outperforming others while using fewer resources.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19320
• PDF: https://arxiv.org/pdf/2511.19320
• Project Page: https://mcg-nju.github.io/steadydancer-web
• Github: https://github.com/MCG-NJU/SteadyDancer
🔹 Models citing this paper:
• https://huggingface.co/MCG-NJU/SteadyDancer-14B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MCG-NJU/X-Dance
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HumanImageAnimation #ImageToVideo #FirstFramePreservation #GenerativeAI #ComputerVision
✨GigaWorld-0: World Models as Data Engine to Empower Embodied AI
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
✨Unified all-atom molecule generation with neural fields
📝 Summary:
FuncBind uses neural fields and computer vision models to generate diverse all-atom molecules across various systems, from small molecules to antibodies. This modality-agnostic framework achieves competitive performance in structure-conditioned molecular design and can generate novel binders.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15906
• PDF: https://arxiv.org/pdf/2511.15906
• Github: https://github.com/prescient-design/funcbind/
🔹 Models citing this paper:
• https://huggingface.co/mkirchmeyer/funcbind
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MoleculeGeneration #NeuralFields #DrugDiscovery #AIforScience #ComputationalChemistry
📝 Summary:
FuncBind uses neural fields and computer vision models to generate diverse all-atom molecules across various systems, from small molecules to antibodies. This modality-agnostic framework achieves competitive performance in structure-conditioned molecular design and can generate novel binders.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15906
• PDF: https://arxiv.org/pdf/2511.15906
• Github: https://github.com/prescient-design/funcbind/
🔹 Models citing this paper:
• https://huggingface.co/mkirchmeyer/funcbind
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MoleculeGeneration #NeuralFields #DrugDiscovery #AIforScience #ComputationalChemistry
✨Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
📝 Summary:
UniSandbox evaluates Unified Multimodal Models, revealing a gap between understanding and generation in reasoning and knowledge transfer. Chain-of-Thought and self-training effectively bridge this gap, providing insights for future model design.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20561
• PDF: https://arxiv.org/pdf/2511.20561
• Github: https://github.com/PKU-YuanGroup/UniSandBox
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #AIUnderstanding #ChainOfThought #LLMs #AIResearch
📝 Summary:
UniSandbox evaluates Unified Multimodal Models, revealing a gap between understanding and generation in reasoning and knowledge transfer. Chain-of-Thought and self-training effectively bridge this gap, providing insights for future model design.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20561
• PDF: https://arxiv.org/pdf/2511.20561
• Github: https://github.com/PKU-YuanGroup/UniSandBox
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #AIUnderstanding #ChainOfThought #LLMs #AIResearch
✨MedSAM3: Delving into Segment Anything with Medical Concepts
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
✨HunyuanOCR Technical Report
📝 Summary:
HunyuanOCR is a lightweight Vision-Language Model for OCR, using a unified end-to-end architecture ViT + LLM. It achieves state-of-the-art performance in diverse tasks, outperforming larger models and commercial APIs, powered by data-driven and RL strategies.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19575
• PDF: https://arxiv.org/pdf/2511.19575
• Github: https://github.com/Tencent-Hunyuan/HunyuanOCR
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OCR #VisionLanguageModel #LLM #AI #MachineLearning
📝 Summary:
HunyuanOCR is a lightweight Vision-Language Model for OCR, using a unified end-to-end architecture ViT + LLM. It achieves state-of-the-art performance in diverse tasks, outperforming larger models and commercial APIs, powered by data-driven and RL strategies.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19575
• PDF: https://arxiv.org/pdf/2511.19575
• Github: https://github.com/Tencent-Hunyuan/HunyuanOCR
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OCR #VisionLanguageModel #LLM #AI #MachineLearning
✨iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
This media is not supported in your browser
VIEW IN TELEGRAM
✨PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding
📝 Summary:
PhysChoreo generates physically realistic and controllable videos from a single image. It reconstructs part-aware physical properties and simulates dynamic behavior, outperforming existing methods.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20562
• PDF: https://arxiv.org/pdf/2511.20562
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #PhysicalSimulation #ComputerVision #DeepLearning #AIResearch
📝 Summary:
PhysChoreo generates physically realistic and controllable videos from a single image. It reconstructs part-aware physical properties and simulates dynamic behavior, outperforming existing methods.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20562
• PDF: https://arxiv.org/pdf/2511.20562
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #PhysicalSimulation #ComputerVision #DeepLearning #AIResearch
✨Fara-7B: An Efficient Agentic Model for Computer Use
📝 Summary:
FaraGen creates synthetic datasets for computer use agents, solving a data scarcity problem. This data trains Fara-7B, a small on-device model that perceives computers via screenshots and outperforms larger models on diverse web tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19663
• PDF: https://arxiv.org/pdf/2511.19663
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #OnDeviceAI #SyntheticData #MachineLearning #ComputerVision
📝 Summary:
FaraGen creates synthetic datasets for computer use agents, solving a data scarcity problem. This data trains Fara-7B, a small on-device model that perceives computers via screenshots and outperforms larger models on diverse web tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19663
• PDF: https://arxiv.org/pdf/2511.19663
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #OnDeviceAI #SyntheticData #MachineLearning #ComputerVision
✨Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
📝 Summary:
Agent0-VL is a self-evolving vision-language agent that integrates tool usage into both reasoning and self-evaluation. It uses a Solver and Verifier in a self-evolving cycle for continuous improvement without human annotation or external rewards, achieving a 12.5% performance gain.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19900
• PDF: https://arxiv.org/pdf/2511.19900
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #VisionLanguage #SelfEvolvingAI #ToolAugmentedAI #AIResearch
📝 Summary:
Agent0-VL is a self-evolving vision-language agent that integrates tool usage into both reasoning and self-evaluation. It uses a Solver and Verifier in a self-evolving cycle for continuous improvement without human annotation or external rewards, achieving a 12.5% performance gain.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19900
• PDF: https://arxiv.org/pdf/2511.19900
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #VisionLanguage #SelfEvolvingAI #ToolAugmentedAI #AIResearch
✨Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
📝 Summary:
VISTA-Gym is a scalable training environment that enhances vision-language models VLMs tool-integrated visual reasoning using reinforcement learning. It unifies diverse multimodal tasks and provides standardized visual tools. VISTA-R1 trained with VISTA-Gym significantly outperforms leading basel...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19773
• PDF: https://arxiv.org/pdf/2511.19773
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #ReinforcementLearning #ToolIntegratedAI #MultimodalAI #AIResearch
📝 Summary:
VISTA-Gym is a scalable training environment that enhances vision-language models VLMs tool-integrated visual reasoning using reinforcement learning. It unifies diverse multimodal tasks and provides standardized visual tools. VISTA-R1 trained with VISTA-Gym significantly outperforms leading basel...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19773
• PDF: https://arxiv.org/pdf/2511.19773
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #ReinforcementLearning #ToolIntegratedAI #MultimodalAI #AIResearch
❤1
✨UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
📝 Summary:
Video diffusion transformers struggle with video length extrapolation due to attention dispersion, causing quality degradation and repetition. UltraViCo suppresses attention for tokens beyond the training window, improving quality and reducing repetition. This extends the extrapolation limit from...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20123
• PDF: https://arxiv.org/pdf/2511.20123
• Project Page: https://thu-ml.github.io/UltraViCo.github.io/
• Github: https://github.com/thu-ml/DiT-Extrapolation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #DiffusionModels #Transformers #GenerativeAI #DeepLearning
📝 Summary:
Video diffusion transformers struggle with video length extrapolation due to attention dispersion, causing quality degradation and repetition. UltraViCo suppresses attention for tokens beyond the training window, improving quality and reducing repetition. This extends the extrapolation limit from...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20123
• PDF: https://arxiv.org/pdf/2511.20123
• Project Page: https://thu-ml.github.io/UltraViCo.github.io/
• Github: https://github.com/thu-ml/DiT-Extrapolation
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoAI #DiffusionModels #Transformers #GenerativeAI #DeepLearning
✨ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
📝 Summary:
ReDirector presents a camera-controlled video retake generation method using Rotary Camera Encoding RoCE. This novel camera conditioned RoPE phase shift improves dynamic object localization and static background preservation across variable length videos and diverse camera trajectories.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19827
• PDF: https://arxiv.org/pdf/2511.19827
• Project Page: https://byeongjun-park.github.io/ReDirector/
• Github: https://byeongjun-park.github.io/ReDirector/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #ComputerVision #AIResearch #CameraControl #VideoEditing
📝 Summary:
ReDirector presents a camera-controlled video retake generation method using Rotary Camera Encoding RoCE. This novel camera conditioned RoPE phase shift improves dynamic object localization and static background preservation across variable length videos and diverse camera trajectories.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19827
• PDF: https://arxiv.org/pdf/2511.19827
• Project Page: https://byeongjun-park.github.io/ReDirector/
• Github: https://byeongjun-park.github.io/ReDirector/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #ComputerVision #AIResearch #CameraControl #VideoEditing
✨VQ-VA World: Towards High-Quality Visual Question-Visual Answering
📝 Summary:
VQ-VA World introduces a data-centric framework and benchmark for Visual Question-Visual Answering, generating images from visual questions. This significantly improves open-source models, narrowing the performance gap with proprietary systems.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20573
• PDF: https://arxiv.org/pdf/2511.20573
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VQA #GenerativeAI #DataCentricAI #ComputerVision #MachineLearning
📝 Summary:
VQ-VA World introduces a data-centric framework and benchmark for Visual Question-Visual Answering, generating images from visual questions. This significantly improves open-source models, narrowing the performance gap with proprietary systems.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20573
• PDF: https://arxiv.org/pdf/2511.20573
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VQA #GenerativeAI #DataCentricAI #ComputerVision #MachineLearning
✨Soft Adaptive Policy Optimization
📝 Summary:
SAPO improves RL training stability for LLMs. It uses a smooth adaptive gate to attenuate off-policy updates, unlike hard clipping. This selectively down-weights problematic tokens, leading to improved training stability and higher performance.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20347
• PDF: https://arxiv.org/pdf/2511.20347
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #PolicyOptimization #DeepLearning #AI
📝 Summary:
SAPO improves RL training stability for LLMs. It uses a smooth adaptive gate to attenuate off-policy updates, unlike hard clipping. This selectively down-weights problematic tokens, leading to improved training stability and higher performance.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20347
• PDF: https://arxiv.org/pdf/2511.20347
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #PolicyOptimization #DeepLearning #AI
❤1