NEW BOT Телеграм, страница

ML Research Hub

🔹 Title: SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

🔹 Publication Date: Published on Aug 6

🔹 Abstract: SEAgent, an agentic self-evolving framework, enables computer-use agents to autonomously master novel software through experiential learning and a curriculum of tasks, achieving superior performance compared to existing methods. AI-generated summary Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent , an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning , where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning , comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World . Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04700

• PDF: https://arxiv.org/pdf/2508.04700

• Project Page: https://github.com/SunzeY/SEAgent

• Github: https://github.com/SunzeY/SEAgent

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Zery/WSM-7B-AgentRewardBench

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤2

584 views06:57

ML Research Hub

🔹 Title: villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

🔹 Publication Date: Published on Jul 31

🔹 Abstract: The ViLLA framework enhances VLA models by incorporating latent actions, improving performance in both simulated and real-world robot manipulation tasks. AI-generated summary Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions , an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO , as well as on two real-world robot setups including gripper and dexterous hand manipulation . We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23682

• PDF: https://arxiv.org/pdf/2507.23682

• Project Page: https://microsoft.github.io/villa-x/

• Github: https://github.com/microsoft/villa-x/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

arXiv.org

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Vision-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent...

549 views06:57

ML Research Hub

Today we will try to launch bots again

Although there are many selfish people who have reacted in a negative way, but it's okay, I will continue.

❤2

493 views08:38

ML Research Hub

🔹 Title: A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding

🔹 Publication Date: Published on Aug 2

🔹 Abstract: A benchmark and model for 3D occupancy grounding using natural language and voxel-level annotations improve object perception in autonomous driving. AI-generated summary Visual grounding aims to identify objects or regions in a scene based on natural language denoscriptions, essential for spatially aware perception in autonomous driving. However, existing visual grounding tasks typically depend on bounding boxes that often fail to capture fine-grained details. Not all voxels within a bounding box are occupied, resulting in inaccurate object representations. To address this, we introduce a benchmark for 3D occupancy grounding in challenging outdoor scenes. Built on the nuScenes dataset , it integrates natural language with voxel-level occupancy annotations, offering more precise object perception compared to the traditional grounding task. Moreover, we propose GroundingOcc, an end-to-end model designed for 3D occupancy grounding through multi-modal learning. It combines visual, textual, and point cloud features to predict object location and occupancy information from coarse to fine. Specifically, GroundingOcc comprises a multimodal encoder for feature extraction, an occupancy head for voxel-wise predictions, and a grounding head to refine localization. Additionally, a 2D grounding module and a depth estimation module enhance geometric understanding, thereby boosting model performance. Extensive experiments on the benchmark demonstrate that our method outperforms existing baselines on 3D occupancy grounding . The dataset is available at https://github.com/RONINGOD/GroundingOcc.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01197

• PDF: https://arxiv.org/pdf/2508.01197

• Github: https://github.com/RONINGOD/GroundingOcc

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

558 views10:39

ML Research Hub

🔹 Title: Intern-S1: A Scientific Multimodal Foundation Model

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/intern-s1-a-scientific-multimodal-foundation-model
• PDF: https://arxiv.org/pdf/2508.15763
• Github: https://github.com/InternLM/Intern-S1

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

416 views15:29

Explore Data Science

ML Research Hub

🔹 Title: Mobile-Agent-v3: Foundamental Agents for GUI Automation

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

364 views15:29

Explore Data Science

ML Research Hub

🔹 Title: LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15760
• PDF: https://arxiv.org/pdf/2508.15760

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

307 views15:29

Explore Data Science

ML Research Hub

🔹 Title: Deep Think with Confidence

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2503.00031
• PDF: https://arxiv.org/pdf/2508.15260
• Project Page: https://jiaweizzhao.github.io/deepconf/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

304 views15:29

Explore Data Science

ML Research Hub

🔹 Title: Waver: Wave Your Way to Lifelike Video Generation

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15761
• PDF: https://arxiv.org/pdf/2508.15761
• Github: https://github.com/FoundationVision/Waver

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

356 views15:29

Explore Data Science

ML Research Hub

🔹 Title: SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15769
• PDF: https://arxiv.org/pdf/2508.15769
• Project Page: https://mengmouxu.github.io/SceneGen/
• Github: https://github.com/Mengmouxu/SceneGen

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

297 views15:29

Explore Data Science

ML Research Hub

🔹 Title: A Survey on Large Language Model Benchmarks

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15361
• PDF: https://arxiv.org/pdf/2508.15361

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

355 views15:30

Explore Data Science

ML Research Hub

🔹 Title: ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15767
• PDF: https://arxiv.org/pdf/2508.15767
• Project Page: https://jindapark.github.io/projects/atlas/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

333 views15:30

Explore Data Science

ML Research Hub

🔹 Title: aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15126
• PDF: https://arxiv.org/pdf/2508.15126

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

316 views15:30

Explore Data Science

ML Research Hub

🔹 Title: Visual Autoregressive Modeling for Instruction-Guided Image Editing

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15772
• PDF: https://arxiv.org/pdf/2508.15772
• Project Page: https://huggingface.co/HiDream-ai/VAREdit
• Github: https://github.com/HiDream-ai/VAREdit

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-1024
• https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-512
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

311 views15:30

Explore Data Science

ML Research Hub

🔹 Title: "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15752
• PDF: https://arxiv.org/pdf/2508.15752

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

263 views15:30

Explore Data Science

ML Research Hub

🔹 Title: Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14892
• PDF: https://arxiv.org/pdf/2508.14892
• Github: https://hustvl.github.io/Snap-Snap/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

315 views15:30

Explore Data Science

ML Research Hub

🔹 Title: When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15641
• PDF: https://arxiv.org/pdf/2508.15641

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

370 views15:30

Explore Data Science

ML Research Hub

🔹 Title: LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15418
• PDF: https://arxiv.org/pdf/2508.15418
• Github: https://github.com/EIT-NLP/LLaSO

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YirongSun/LLaSO-Instruct

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

440 views15:30

Explore Data Science

ML Research Hub

🔹 Title: Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15202
• PDF: https://arxiv.org/pdf/2508.15202
• Project Page: https://github.com/aliyun/qwen-dianjin

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤2

532 views15:31

Explore Data Science

ML Research Hub

🔹 Title: INTIMA: A Benchmark for Human-AI Companionship Behavior

🔹 Publication Date: Published on Aug 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09998
• PDF: https://arxiv.org/pdf/2508.09998

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤3👍1

622 views15:31

Explore Data Science

ML Research Hub

🔹 Title: Investigating Hallucination in Conversations for Low Resource Languages

🔹 Publication Date: Published on Jul 30

🔹 Abstract: LLMs generate fewer hallucinations in Mandarin compared to Hindi and Farsi across multiple models. AI-generated summary Large Language Models ( LLMs ) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as ' hallucination '. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs . While much research has focused on hallucination s in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5 , GPT-4o , Llama-3.1 , Gemma-2.0 , DeepSeek-R1 and Qwen-3 . We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucination s in Hindi and Farsi.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22720

• PDF: https://arxiv.org/pdf/2507.22720

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

648 views04:58

About

Blog

Apps

Platform