🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
✨ Title: AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
📝 Summary:
AthenaBench enhances CTI LLM evaluation, revealing current models, even top proprietary ones, have limited reasoning for tasks like threat attribution and risk mitigation. This highlights fundamental LLM weaknesses and the need for CTI-specific AI.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01144
• PDF: https://arxiv.org/pdf/2511.01144
• Github: https://github.com/Athena-Software-Group/athenabench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
AthenaBench enhances CTI LLM evaluation, revealing current models, even top proprietary ones, have limited reasoning for tasks like threat attribution and risk mitigation. This highlights fundamental LLM weaknesses and the need for CTI-specific AI.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01144
• PDF: https://arxiv.org/pdf/2511.01144
• Github: https://github.com/Athena-Software-Group/athenabench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
📝 Summary:
VCode introduces a benchmark to generate SVG code from images, preserving symbolic meaning. VCoder, an agentic framework, boosts VLM performance via revision and visual tools. This shows VLM limitations and the promise of symbolic visual representation.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
VCode introduces a benchmark to generate SVG code from images, preserving symbolic meaning. VCoder, an agentic framework, boosts VLM performance via revision and visual tools. This shows VLM limitations and the promise of symbolic visual representation.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation
📝 Summary:
D2D transforms non-differentiable detectors into differentiable critics for text-to-image generation. This leverages their superior counting ability to greatly improve object numeracy, boosting accuracy with minimal impact on image quality.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19278
• PDF: https://arxiv.org/pdf/2510.19278
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
D2D transforms non-differentiable detectors into differentiable critics for text-to-image generation. This leverages their superior counting ability to greatly improve object numeracy, boosting accuracy with minimal impact on image quality.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19278
• PDF: https://arxiv.org/pdf/2510.19278
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
📝 Summary:
LiveSecBench is a dynamic safety benchmark for Chinese LLMs, continuously updated to reflect new threats. It evaluates models across six critical dimensions tailored to Chinese legal and social frameworks. This benchmark offers a current landscape of AI safety in China, with a public leaderboard.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02366
• PDF: https://arxiv.org/pdf/2511.02366
• Project Page: https://livesecbench.intokentech.cn/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
LiveSecBench is a dynamic safety benchmark for Chinese LLMs, continuously updated to reflect new threats. It evaluates models across six critical dimensions tailored to Chinese legal and social frameworks. This benchmark offers a current landscape of AI safety in China, with a public leaderboard.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02366
• PDF: https://arxiv.org/pdf/2511.02366
• Project Page: https://livesecbench.intokentech.cn/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
📝 Summary:
TWIST2 is a portable, mocap-free system for efficient humanoid data collection using VR and egocentric vision. It enables whole-body human-to-humanoid control and a hierarchical visuomotor policy for autonomous complex skills.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02832
• PDF: https://arxiv.org/pdf/2511.02832
• Project Page: https://yanjieze.com/TWIST2/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
TWIST2 is a portable, mocap-free system for efficient humanoid data collection using VR and egocentric vision. It enables whole-body human-to-humanoid control and a hierarchical visuomotor policy for autonomous complex skills.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02832
• PDF: https://arxiv.org/pdf/2511.02832
• Project Page: https://yanjieze.com/TWIST2/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: iFlyBot-VLA Technical Report
📝 Summary:
iFlyBot-VLA is a large VLA model that uses a latent action model and dual-level action representation. This enhances 3D perception and reasoning, achieving superior performance in diverse manipulation tasks.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01914
• PDF: https://arxiv.org/pdf/2511.01914
• Project Page: https://xuwenjie401.github.io/iFlyBot-VLA.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
iFlyBot-VLA is a large VLA model that uses a latent action model and dual-level action representation. This enhances 3D perception and reasoning, achieving superior performance in diverse manipulation tasks.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01914
• PDF: https://arxiv.org/pdf/2511.01914
• Project Page: https://xuwenjie401.github.io/iFlyBot-VLA.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
✨ Title: VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
📝 Summary:
This paper introduces VidEmo, a new video emotion foundation model that uses an affective cues-guided reasoning framework. It is trained on the Emo-CFG dataset and achieves competitive performance in emotion understanding and face perception tasks.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02712
• PDF: https://arxiv.org/pdf/2511.02712
• Project Page: https://zzcheng.top/VidEmo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This paper introduces VidEmo, a new video emotion foundation model that uses an affective cues-guided reasoning framework. It is trained on the Emo-CFG dataset and achieves competitive performance in emotion understanding and face perception tasks.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02712
• PDF: https://arxiv.org/pdf/2511.02712
• Project Page: https://zzcheng.top/VidEmo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension
📝 Summary:
A new automated code-driven pipeline, ChartM^3, generates diverse datasets for complex chart understanding via RAG and CoT. This improves MLLM reasoning and generalization, enabling smaller models to match larger ones in complex chart comprehension.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02415
• PDF: https://arxiv.org/pdf/2511.02415
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
A new automated code-driven pipeline, ChartM^3, generates diverse datasets for complex chart understanding via RAG and CoT. This improves MLLM reasoning and generalization, enabling smaller models to match larger ones in complex chart comprehension.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02415
• PDF: https://arxiv.org/pdf/2511.02415
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning
📝 Summary:
DiMoDE introduces a discriminative treatment of motion components for robust joint depth and ego-motion learning. By leveraging geometric constraints and reforming the learning process, it improves accuracy and achieves state-of-the-art performance.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01502
• PDF: https://arxiv.org/pdf/2511.01502
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
DiMoDE introduces a discriminative treatment of motion components for robust joint depth and ego-motion learning. By leveraging geometric constraints and reforming the learning process, it improves accuracy and achieves state-of-the-art performance.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01502
• PDF: https://arxiv.org/pdf/2511.01502
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
📝 Summary:
VCode introduces a benchmark for generating SVG code from images, preserving symbolic meaning for visual reasoning. Frontier VLMs struggle with this visual-centric task. VCoder, an agentic framework, improves performance using iterative revision and visual tools.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VCode #MultimodalAI #SVG #VisualReasoning #VLMs
📝 Summary:
VCode introduces a benchmark for generating SVG code from images, preserving symbolic meaning for visual reasoning. Frontier VLMs struggle with this visual-centric task. VCoder, an agentic framework, improves performance using iterative revision and visual tools.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02778
• PDF: https://arxiv.org/pdf/2511.02778
• Project Page: https://csu-jpg.github.io/VCode/
• Github: https://github.com/CSU-JPG/VCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VCode #MultimodalAI #SVG #VisualReasoning #VLMs
✨ Title: When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
📝 Summary:
MIRA is a new benchmark for evaluating models that use intermediate visual images to enhance reasoning. It includes 546 multimodal problems requiring models to generate and utilize visual cues. Experiments show models achieve a 33.7% performance gain with visual cues compared to text-only prompts...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02779
• PDF: https://arxiv.org/pdf/2511.02779
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ChainOfThought #MultimodalAI #AIBenchmark #ComputerVision
📝 Summary:
MIRA is a new benchmark for evaluating models that use intermediate visual images to enhance reasoning. It includes 546 multimodal problems requiring models to generate and utilize visual cues. Experiments show models achieve a 33.7% performance gain with visual cues compared to text-only prompts...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02779
• PDF: https://arxiv.org/pdf/2511.02779
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ChainOfThought #MultimodalAI #AIBenchmark #ComputerVision
✨When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
📝 Summary:
A new framework explains MLLM conflict resolution by decomposing modality following into relative reasoning uncertainty and inherent modality preference. Modality following decreases with relative uncertainty. Inherent preference is measured at the balance point, offering mechanistic insights.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02243
• PDF: https://arxiv.org/pdf/2511.02243
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #MultimodalAI #LLM #DeepLearning #AIResearch
📝 Summary:
A new framework explains MLLM conflict resolution by decomposing modality following into relative reasoning uncertainty and inherent modality preference. Modality following decreases with relative uncertainty. Inherent preference is measured at the balance point, offering mechanistic insights.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02243
• PDF: https://arxiv.org/pdf/2511.02243
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLMs #MultimodalAI #LLM #DeepLearning #AIResearch
✨Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
📝 Summary:
LLMs for step-by-step reasoning become verbose as RLVR often filters easy problems. This work shows that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. This approach significantly reduces output verbosity by half while maintaining accuracy, wi...
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01937
• PDF: https://arxiv.org/pdf/2511.01937
• Github: https://github.com/MBZUAI-Paris/Frugal-AI-Math
🔹 Models citing this paper:
• https://huggingface.co/MBZUAI-Paris/Frugal-Math-4B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MBZUAI-Paris/frugal-maths-data-split-v1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #ReinforcementLearning #FrugalAI #MathematicalReasoning
📝 Summary:
LLMs for step-by-step reasoning become verbose as RLVR often filters easy problems. This work shows that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. This approach significantly reduces output verbosity by half while maintaining accuracy, wi...
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01937
• PDF: https://arxiv.org/pdf/2511.01937
• Github: https://github.com/MBZUAI-Paris/Frugal-AI-Math
🔹 Models citing this paper:
• https://huggingface.co/MBZUAI-Paris/Frugal-Math-4B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MBZUAI-Paris/frugal-maths-data-split-v1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #ReinforcementLearning #FrugalAI #MathematicalReasoning
✨BRAINS: A Retrieval-Augmented System for Alzheimer's Detection and Monitoring
📝 Summary:
BRAINS is an LLM-based system for Alzheimer's detection and monitoring. It integrates cognitive assessments and a case retrieval module for risk assessment and disease severity classification. Evaluations demonstrate its effectiveness as a scalable, explainable, early-stage detection tool.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02490
• PDF: https://arxiv.org/pdf/2511.02490
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Alzheimers #LLM #AI #MedicalAI #EarlyDetection
📝 Summary:
BRAINS is an LLM-based system for Alzheimer's detection and monitoring. It integrates cognitive assessments and a case retrieval module for risk assessment and disease severity classification. Evaluations demonstrate its effectiveness as a scalable, explainable, early-stage detection tool.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02490
• PDF: https://arxiv.org/pdf/2511.02490
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Alzheimers #LLM #AI #MedicalAI #EarlyDetection
✨Kimi Linear: An Expressive, Efficient Attention Architecture
📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that outperforms full attention in performance and efficiency across diverse scenarios. It leverages Kimi Delta Attention and Multi-Head Latent Attention, reducing KV cache by up to 75% and boosting decoding throughput by 6x.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear
🔹 Models citing this paper:
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
• https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Speedofmastery/orynxml-agents
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AttentionMechanisms #LLM #AIResearch #DeepLearning #ModelEfficiency
📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that outperforms full attention in performance and efficiency across diverse scenarios. It leverages Kimi Delta Attention and Multi-Head Latent Attention, reducing KV cache by up to 75% and boosting decoding throughput by 6x.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear
🔹 Models citing this paper:
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
• https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Speedofmastery/orynxml-agents
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AttentionMechanisms #LLM #AIResearch #DeepLearning #ModelEfficiency
arXiv.org
Kimi Linear: An Expressive, Efficient Attention Architecture
We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context,...
✨PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
📝 Summary:
PaddleOCR-VL is a new 0.9B vision-language model for document parsing. It uses a NaViT-style visual encoder and ERNIE-4.5, achieving state-of-the-art performance across 109 languages with minimal resources and fast inference. This model is highly suitable for practical deployment.
🔹 Publication Date: Published on Oct 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14528
• PDF: https://arxiv.org/pdf/2510.14528
• Github: https://github.com/PaddlePaddle/PaddleOCR
🔹 Models citing this paper:
• https://huggingface.co/PaddlePaddle/PaddleOCR-VL
• https://huggingface.co/PaddlePaddle/PP-DocLayoutV2
• https://huggingface.co/lvyufeng/PaddleOCR-VL-0.9B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/markobinario/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/waytoAGI/PaddleOCR-VL_Online_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OCR #VisionLanguageModel #DocumentAI #DeepLearning #AI
📝 Summary:
PaddleOCR-VL is a new 0.9B vision-language model for document parsing. It uses a NaViT-style visual encoder and ERNIE-4.5, achieving state-of-the-art performance across 109 languages with minimal resources and fast inference. This model is highly suitable for practical deployment.
🔹 Publication Date: Published on Oct 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14528
• PDF: https://arxiv.org/pdf/2510.14528
• Github: https://github.com/PaddlePaddle/PaddleOCR
🔹 Models citing this paper:
• https://huggingface.co/PaddlePaddle/PaddleOCR-VL
• https://huggingface.co/PaddlePaddle/PP-DocLayoutV2
• https://huggingface.co/lvyufeng/PaddleOCR-VL-0.9B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/markobinario/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/waytoAGI/PaddleOCR-VL_Online_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OCR #VisionLanguageModel #DocumentAI #DeepLearning #AI
arXiv.org
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B...
In this report, we propose PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model...
✨Emu3.5: Native Multimodal Models are World Learners
📝 Summary:
Emu3.5 is a large-scale multimodal world model predicting next states in vision and language. It uses reinforcement learning and Discrete Diffusion Adaptation for efficient inference, delivering strong performance in multimodal tasks and world exploration.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5
🔹 Models citing this paper:
• https://huggingface.co/BAAI/Emu3.5
• https://huggingface.co/BAAI/Emu3.5-Image
• https://huggingface.co/BAAI/Emu3.5-VisionTokenizer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #WorldModels #ReinforcementLearning #ComputerVision #NLP
📝 Summary:
Emu3.5 is a large-scale multimodal world model predicting next states in vision and language. It uses reinforcement learning and Discrete Diffusion Adaptation for efficient inference, delivering strong performance in multimodal tasks and world exploration.
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5
🔹 Models citing this paper:
• https://huggingface.co/BAAI/Emu3.5
• https://huggingface.co/BAAI/Emu3.5-Image
• https://huggingface.co/BAAI/Emu3.5-VisionTokenizer
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #WorldModels #ReinforcementLearning #ComputerVision #NLP