🔹 Title: villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The ViLLA framework enhances VLA models by incorporating latent actions, improving performance in both simulated and real-world robot manipulation tasks. AI-generated summary Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions , an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO , as well as on two real-world robot setups including gripper and dexterous hand manipulation . We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23682
• PDF: https://arxiv.org/pdf/2507.23682
• Project Page: https://microsoft.github.io/villa-x/
• Github: https://github.com/microsoft/villa-x/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The ViLLA framework enhances VLA models by incorporating latent actions, improving performance in both simulated and real-world robot manipulation tasks. AI-generated summary Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions , an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO , as well as on two real-world robot setups including gripper and dexterous hand manipulation . We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23682
• PDF: https://arxiv.org/pdf/2507.23682
• Project Page: https://microsoft.github.io/villa-x/
• Github: https://github.com/microsoft/villa-x/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
arXiv.org
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent...
Today we will try to launch bots again
Although there are many selfish people who have reacted in a negative way, but it's okay, I will continue.
Although there are many selfish people who have reacted in a negative way, but it's okay, I will continue.
❤2
🔹 Title: A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
🔹 Publication Date: Published on Aug 2
🔹 Abstract: A benchmark and model for 3D occupancy grounding using natural language and voxel-level annotations improve object perception in autonomous driving. AI-generated summary Visual grounding aims to identify objects or regions in a scene based on natural language denoscriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset , it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding . The dataset is available at https://github.com/RONINGOD/GroundingOcc.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01197
• PDF: https://arxiv.org/pdf/2508.01197
• Github: https://github.com/RONINGOD/GroundingOcc
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 2
🔹 Abstract: A benchmark and model for 3D occupancy grounding using natural language and voxel-level annotations improve object perception in autonomous driving. AI-generated summary Visual grounding aims to identify objects or regions in a scene based on natural language denoscriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset , it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding . The dataset is available at https://github.com/RONINGOD/GroundingOcc.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01197
• PDF: https://arxiv.org/pdf/2508.01197
• Github: https://github.com/RONINGOD/GroundingOcc
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Intern-S1: A Scientific Multimodal Foundation Model
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/intern-s1-a-scientific-multimodal-foundation-model
• PDF: https://arxiv.org/pdf/2508.15763
• Github: https://github.com/InternLM/Intern-S1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/intern-s1-a-scientific-multimodal-foundation-model
• PDF: https://arxiv.org/pdf/2508.15763
• Github: https://github.com/InternLM/Intern-S1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Mobile-Agent-v3: Foundamental Agents for GUI Automation
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15760
• PDF: https://arxiv.org/pdf/2508.15760
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15760
• PDF: https://arxiv.org/pdf/2508.15760
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Deep Think with Confidence
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2503.00031
• PDF: https://arxiv.org/pdf/2508.15260
• Project Page: https://jiaweizzhao.github.io/deepconf/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2503.00031
• PDF: https://arxiv.org/pdf/2508.15260
• Project Page: https://jiaweizzhao.github.io/deepconf/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Waver: Wave Your Way to Lifelike Video Generation
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15761
• PDF: https://arxiv.org/pdf/2508.15761
• Github: https://github.com/FoundationVision/Waver
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15761
• PDF: https://arxiv.org/pdf/2508.15761
• Github: https://github.com/FoundationVision/Waver
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15769
• PDF: https://arxiv.org/pdf/2508.15769
• Project Page: https://mengmouxu.github.io/SceneGen/
• Github: https://github.com/Mengmouxu/SceneGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15769
• PDF: https://arxiv.org/pdf/2508.15769
• Project Page: https://mengmouxu.github.io/SceneGen/
• Github: https://github.com/Mengmouxu/SceneGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: A Survey on Large Language Model Benchmarks
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15361
• PDF: https://arxiv.org/pdf/2508.15361
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15361
• PDF: https://arxiv.org/pdf/2508.15361
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15767
• PDF: https://arxiv.org/pdf/2508.15767
• Project Page: https://jindapark.github.io/projects/atlas/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15767
• PDF: https://arxiv.org/pdf/2508.15767
• Project Page: https://jindapark.github.io/projects/atlas/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15126
• PDF: https://arxiv.org/pdf/2508.15126
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15126
• PDF: https://arxiv.org/pdf/2508.15126
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Visual Autoregressive Modeling for Instruction-Guided Image Editing
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15772
• PDF: https://arxiv.org/pdf/2508.15772
• Project Page: https://huggingface.co/HiDream-ai/VAREdit
• Github: https://github.com/HiDream-ai/VAREdit
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-1024
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-512
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15772
• PDF: https://arxiv.org/pdf/2508.15772
• Project Page: https://huggingface.co/HiDream-ai/VAREdit
• Github: https://github.com/HiDream-ai/VAREdit
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-1024
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-512
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15752
• PDF: https://arxiv.org/pdf/2508.15752
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15752
• PDF: https://arxiv.org/pdf/2508.15752
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14892
• PDF: https://arxiv.org/pdf/2508.14892
• Github: https://hustvl.github.io/Snap-Snap/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14892
• PDF: https://arxiv.org/pdf/2508.14892
• Github: https://hustvl.github.io/Snap-Snap/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15641
• PDF: https://arxiv.org/pdf/2508.15641
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15641
• PDF: https://arxiv.org/pdf/2508.15641
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15418
• PDF: https://arxiv.org/pdf/2508.15418
• Github: https://github.com/EIT-NLP/LLaSO
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YirongSun/LLaSO-Instruct
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15418
• PDF: https://arxiv.org/pdf/2508.15418
• Github: https://github.com/EIT-NLP/LLaSO
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YirongSun/LLaSO-Instruct
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15202
• PDF: https://arxiv.org/pdf/2508.15202
• Project Page: https://github.com/aliyun/qwen-dianjin
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15202
• PDF: https://arxiv.org/pdf/2508.15202
• Project Page: https://github.com/aliyun/qwen-dianjin
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: INTIMA: A Benchmark for Human-AI Companionship Behavior
🔹 Publication Date: Published on Aug 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09998
• PDF: https://arxiv.org/pdf/2508.09998
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09998
• PDF: https://arxiv.org/pdf/2508.09998
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤3👍1
🔹 Title: Investigating Hallucination in Conversations for Low Resource Languages
🔹 Publication Date: Published on Jul 30
🔹 Abstract: LLMs generate fewer hallucinations in Mandarin compared to Hindi and Farsi across multiple models. AI-generated summary Large Language Models ( LLMs ) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as ' hallucination '. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs . While much research has focused on hallucination s in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5 , GPT-4o , Llama-3.1 , Gemma-2.0 , DeepSeek-R1 and Qwen-3 . We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucination s in Hindi and Farsi.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22720
• PDF: https://arxiv.org/pdf/2507.22720
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Jul 30
🔹 Abstract: LLMs generate fewer hallucinations in Mandarin compared to Hindi and Farsi across multiple models. AI-generated summary Large Language Models ( LLMs ) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as ' hallucination '. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs . While much research has focused on hallucination s in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5 , GPT-4o , Llama-3.1 , Gemma-2.0 , DeepSeek-R1 and Qwen-3 . We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucination s in Hindi and Farsi.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22720
• PDF: https://arxiv.org/pdf/2507.22720
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
🔹 Publication Date: Published on Aug 4
🔹 Abstract: A modular training framework accelerates the development of omni-modal LLMs through efficient 3D parallelism and flexible configuration. AI-generated summary Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic , incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. % We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation , enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. % Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02317
• PDF: https://arxiv.org/pdf/2508.02317
• Github: https://github.com/ByteDance-Seed/VeOmni
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Abstract: A modular training framework accelerates the development of omni-modal LLMs through efficient 3D parallelism and flexible configuration. AI-generated summary Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic , incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. % We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation , enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. % Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02317
• PDF: https://arxiv.org/pdf/2508.02317
• Github: https://github.com/ByteDance-Seed/VeOmni
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔥1