🔹 Title: DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14460
• PDF: https://arxiv.org/pdf/2508.14460
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14460
• PDF: https://arxiv.org/pdf/2508.14460
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14187
• PDF: https://arxiv.org/pdf/2508.14187
• Project Page: https://ashiq24.github.io/local-scale-equivariance/
• Github: https://ashiq24.github.io/local-scale-equivariance/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14187
• PDF: https://arxiv.org/pdf/2508.14187
• Project Page: https://ashiq24.github.io/local-scale-equivariance/
• Github: https://ashiq24.github.io/local-scale-equivariance/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10137
• PDF: https://arxiv.org/pdf/2508.10137
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10137
• PDF: https://arxiv.org/pdf/2508.10137
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
🔹 Publication Date: Published on Aug 5
🔹 Abstract: CoTox, a framework integrating LLMs with chain-of-thought reasoning, enhances multi-toxicity prediction by incorporating chemical structure data, biological pathways, and gene ontology terms, improving interpretability and predictive performance in drug development. AI-generated summary Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models ( LLM s) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction . CoTox combines chemical structure data , biological pathways , and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o , we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLM s to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names , which are easier for LLM s to understand than SMILES , enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses , as shown in case study. This result highlights the potential of LLM -based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03159
• PDF: https://arxiv.org/pdf/2508.03159
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 5
🔹 Abstract: CoTox, a framework integrating LLMs with chain-of-thought reasoning, enhances multi-toxicity prediction by incorporating chemical structure data, biological pathways, and gene ontology terms, improving interpretability and predictive performance in drug development. AI-generated summary Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models ( LLM s) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction . CoTox combines chemical structure data , biological pathways , and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o , we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLM s to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names , which are easier for LLM s to understand than SMILES , enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses , as shown in case study. This result highlights the potential of LLM -based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03159
• PDF: https://arxiv.org/pdf/2508.03159
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14111
• PDF: https://arxiv.org/pdf/2508.14111
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14111
• PDF: https://arxiv.org/pdf/2508.14111
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13745
• PDF: https://arxiv.org/pdf/2508.13745
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13745
• PDF: https://arxiv.org/pdf/2508.13745
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13680
• PDF: https://arxiv.org/pdf/2508.13680
• Project Page: https://vi-exam.github.io
• Github: https://vi-exam.github.io
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/anvo25/viexam
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13680
• PDF: https://arxiv.org/pdf/2508.13680
• Project Page: https://vi-exam.github.io
• Github: https://vi-exam.github.io
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/anvo25/viexam
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11408
• PDF: https://arxiv.org/pdf/2508.11408
• Github: https://github.com/modelscope/Trinity-RFT
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11408
• PDF: https://arxiv.org/pdf/2508.11408
• Github: https://github.com/modelscope/Trinity-RFT
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14704
• PDF: https://arxiv.org/pdf/2508.14704
• Project Page: https://mcp-universe.github.io/
• Github: https://mcp-universe.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14704
• PDF: https://arxiv.org/pdf/2508.14704
• Project Page: https://mcp-universe.github.io/
• Github: https://mcp-universe.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14568
• PDF: https://arxiv.org/pdf/2508.14568
• Github: https://github.com/KULeuven-COSIC/leuvenshtein
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14568
• PDF: https://arxiv.org/pdf/2508.14568
• Github: https://github.com/KULeuven-COSIC/leuvenshtein
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: VeriGUI: Verifiable Long-Chain GUI Dataset
🔹 Publication Date: Published on Aug 6
🔹 Abstract: VeriGUI is a novel dataset for evaluating GUI agents in long-horizon tasks, emphasizing long-chain complexity and subtask-level verifiability. AI-generated summary Recent studies have delved into constructing autonomous agents capable of performing complex Graphical User Interface (GUI) -based computer tasks, with the potential to revolutionize human-computer interaction. Despite encouraging results, existing efforts mainly focus on short-term interactions and rely on outcome-only verification, thereby limiting their scalability in real-world GUI applications that demand long-horizon task decomposition and execution. In this work, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designed to facilitate the development and evaluation of generalist GUI agents operating in realistic computer environments. Our dataset emphasizes two critical dimensions: (1) long-chain complexity , with tasks decomposed into a sequence of interdependent subtasks spanning hundreds of steps, explicitly designed to allow any subtask to serve as a valid starting point; and (2) subtask-level verifiability, which enables diverse exploration strategies within each subtask, while ensuring that each subtask-level goal remains verifiable and consistent. The dataset consists of GUI task trajectories across both desktop and web , annotated by human experts . Extensive experiments on VeriGUI using various agents with different foundation models reveal significant performance gaps in handling long-horizon tasks , highlighting the need for more robust planning and decision-making capabilities in GUI agents .
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04026
• PDF: https://arxiv.org/pdf/2508.04026
• Github: https://github.com/VeriGUI-Team/VeriGUI
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 6
🔹 Abstract: VeriGUI is a novel dataset for evaluating GUI agents in long-horizon tasks, emphasizing long-chain complexity and subtask-level verifiability. AI-generated summary Recent studies have delved into constructing autonomous agents capable of performing complex Graphical User Interface (GUI) -based computer tasks, with the potential to revolutionize human-computer interaction. Despite encouraging results, existing efforts mainly focus on short-term interactions and rely on outcome-only verification, thereby limiting their scalability in real-world GUI applications that demand long-horizon task decomposition and execution. In this work, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designed to facilitate the development and evaluation of generalist GUI agents operating in realistic computer environments. Our dataset emphasizes two critical dimensions: (1) long-chain complexity , with tasks decomposed into a sequence of interdependent subtasks spanning hundreds of steps, explicitly designed to allow any subtask to serve as a valid starting point; and (2) subtask-level verifiability, which enables diverse exploration strategies within each subtask, while ensuring that each subtask-level goal remains verifiable and consistent. The dataset consists of GUI task trajectories across both desktop and web , annotated by human experts . Extensive experiments on VeriGUI using various agents with different foundation models reveal significant performance gaps in handling long-horizon tasks , highlighting the need for more robust planning and decision-making capabilities in GUI agents .
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04026
• PDF: https://arxiv.org/pdf/2508.04026
• Github: https://github.com/VeriGUI-Team/VeriGUI
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤3
🔹 Title: SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
🔹 Publication Date: Published on Aug 6
🔹 Abstract: SEAgent, an agentic self-evolving framework, enables computer-use agents to autonomously master novel software through experiential learning and a curriculum of tasks, achieving superior performance compared to existing methods. AI-generated summary Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent , an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning , where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning , comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World . Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS .
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04700
• PDF: https://arxiv.org/pdf/2508.04700
• Project Page: https://github.com/SunzeY/SEAgent
• Github: https://github.com/SunzeY/SEAgent
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Zery/WSM-7B-AgentRewardBench
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 6
🔹 Abstract: SEAgent, an agentic self-evolving framework, enables computer-use agents to autonomously master novel software through experiential learning and a curriculum of tasks, achieving superior performance compared to existing methods. AI-generated summary Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent , an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning , where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning , comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World . Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS .
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04700
• PDF: https://arxiv.org/pdf/2508.04700
• Project Page: https://github.com/SunzeY/SEAgent
• Github: https://github.com/SunzeY/SEAgent
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Zery/WSM-7B-AgentRewardBench
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The ViLLA framework enhances VLA models by incorporating latent actions, improving performance in both simulated and real-world robot manipulation tasks. AI-generated summary Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions , an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO , as well as on two real-world robot setups including gripper and dexterous hand manipulation . We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23682
• PDF: https://arxiv.org/pdf/2507.23682
• Project Page: https://microsoft.github.io/villa-x/
• Github: https://github.com/microsoft/villa-x/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The ViLLA framework enhances VLA models by incorporating latent actions, improving performance in both simulated and real-world robot manipulation tasks. AI-generated summary Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions , an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO , as well as on two real-world robot setups including gripper and dexterous hand manipulation . We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23682
• PDF: https://arxiv.org/pdf/2507.23682
• Project Page: https://microsoft.github.io/villa-x/
• Github: https://github.com/microsoft/villa-x/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
arXiv.org
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent...
Today we will try to launch bots again
Although there are many selfish people who have reacted in a negative way, but it's okay, I will continue.
Although there are many selfish people who have reacted in a negative way, but it's okay, I will continue.
❤2
🔹 Title: A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
🔹 Publication Date: Published on Aug 2
🔹 Abstract: A benchmark and model for 3D occupancy grounding using natural language and voxel-level annotations improve object perception in autonomous driving. AI-generated summary Visual grounding aims to identify objects or regions in a scene based on natural language denoscriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset , it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding . The dataset is available at https://github.com/RONINGOD/GroundingOcc.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01197
• PDF: https://arxiv.org/pdf/2508.01197
• Github: https://github.com/RONINGOD/GroundingOcc
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 2
🔹 Abstract: A benchmark and model for 3D occupancy grounding using natural language and voxel-level annotations improve object perception in autonomous driving. AI-generated summary Visual grounding aims to identify objects or regions in a scene based on natural language denoscriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset , it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding . The dataset is available at https://github.com/RONINGOD/GroundingOcc.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01197
• PDF: https://arxiv.org/pdf/2508.01197
• Github: https://github.com/RONINGOD/GroundingOcc
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Intern-S1: A Scientific Multimodal Foundation Model
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/intern-s1-a-scientific-multimodal-foundation-model
• PDF: https://arxiv.org/pdf/2508.15763
• Github: https://github.com/InternLM/Intern-S1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/intern-s1-a-scientific-multimodal-foundation-model
• PDF: https://arxiv.org/pdf/2508.15763
• Github: https://github.com/InternLM/Intern-S1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Mobile-Agent-v3: Foundamental Agents for GUI Automation
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15760
• PDF: https://arxiv.org/pdf/2508.15760
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15760
• PDF: https://arxiv.org/pdf/2508.15760
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Deep Think with Confidence
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2503.00031
• PDF: https://arxiv.org/pdf/2508.15260
• Project Page: https://jiaweizzhao.github.io/deepconf/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2503.00031
• PDF: https://arxiv.org/pdf/2508.15260
• Project Page: https://jiaweizzhao.github.io/deepconf/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: Waver: Wave Your Way to Lifelike Video Generation
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15761
• PDF: https://arxiv.org/pdf/2508.15761
• Github: https://github.com/FoundationVision/Waver
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15761
• PDF: https://arxiv.org/pdf/2508.15761
• Github: https://github.com/FoundationVision/Waver
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15769
• PDF: https://arxiv.org/pdf/2508.15769
• Project Page: https://mengmouxu.github.io/SceneGen/
• Github: https://github.com/Mengmouxu/SceneGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15769
• PDF: https://arxiv.org/pdf/2508.15769
• Project Page: https://mengmouxu.github.io/SceneGen/
• Github: https://github.com/Mengmouxu/SceneGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1