✨Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
📝 Summary:
Agent0 is a self-evolving framework that trains LLM agents without human data. It uses two competing agents and tool integration in a multi-step co-evolution process. This significantly boosts reasoning capabilities, improving math by 18% and general reasoning by 24% on benchmarks.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16043
• PDF: https://arxiv.org/pdf/2511.16043
• Github: https://github.com/aiming-lab/Agent0
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #SelfEvolvingAI #ToolIntegration #AIResearch #Reasoning
📝 Summary:
Agent0 is a self-evolving framework that trains LLM agents without human data. It uses two competing agents and tool integration in a multi-step co-evolution process. This significantly boosts reasoning capabilities, improving math by 18% and general reasoning by 24% on benchmarks.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16043
• PDF: https://arxiv.org/pdf/2511.16043
• Github: https://github.com/aiming-lab/Agent0
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #SelfEvolvingAI #ToolIntegration #AIResearch #Reasoning
Forwarded from Machine Learning with Python
🚀 THE 7-DAY PROFIT CHALLENGE! 🚀
Can you turn $100 into $5,000 in just 7 days?
Lisa can. And she’s challenging YOU to do the same. 👇
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
Can you turn $100 into $5,000 in just 7 days?
Lisa can. And she’s challenging YOU to do the same. 👇
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
✨MobiAgent: A Systematic Framework for Customizable Mobile Agents
📝 Summary:
MobiAgent is a comprehensive mobile agent system designed to improve real-world task execution accuracy and efficiency. It uses MobiMind models, the AgentRR framework, and MobiFlow benchmarking, plus an AI-assisted data collection pipeline. MobiAgent achieves state-of-the-art performance in mobil...
🔹 Publication Date: Published on Aug 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00531
• PDF: https://arxiv.org/pdf/2509.00531
• Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk
🔹 Models citing this paper:
• https://huggingface.co/IPADS-SAI/MobiMind-Grounder-3B
• https://huggingface.co/IPADS-SAI/MobiMind-Decider-7B
• https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #AI #DeepLearning #Robotics #Automation
📝 Summary:
MobiAgent is a comprehensive mobile agent system designed to improve real-world task execution accuracy and efficiency. It uses MobiMind models, the AgentRR framework, and MobiFlow benchmarking, plus an AI-assisted data collection pipeline. MobiAgent achieves state-of-the-art performance in mobil...
🔹 Publication Date: Published on Aug 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00531
• PDF: https://arxiv.org/pdf/2509.00531
• Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk
🔹 Models citing this paper:
• https://huggingface.co/IPADS-SAI/MobiMind-Grounder-3B
• https://huggingface.co/IPADS-SAI/MobiMind-Decider-7B
• https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MobileAgents #AI #DeepLearning #Robotics #Automation
❤1
✨Code2Video: A Code-centric Paradigm for Educational Video Generation
📝 Summary:
Code2Video is a code-centric agent framework generating educational videos via executable Python code. It uses three collaborative agents to improve coherence and interpretability, outperforming direct code generation by 40% and matching human-crafted tutorials.
🔹 Publication Date: Published on Oct 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.01174
• PDF: https://arxiv.org/pdf/2510.01174
• Project Page: https://showlab.github.io/Code2Video/
• Github: https://github.com/showlab/code2video
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #VideoGeneration #EducationalTech #CodeGeneration #DeepLearning
📝 Summary:
Code2Video is a code-centric agent framework generating educational videos via executable Python code. It uses three collaborative agents to improve coherence and interpretability, outperforming direct code generation by 40% and matching human-crafted tutorials.
🔹 Publication Date: Published on Oct 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.01174
• PDF: https://arxiv.org/pdf/2510.01174
• Project Page: https://showlab.github.io/Code2Video/
• Github: https://github.com/showlab/code2video
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #VideoGeneration #EducationalTech #CodeGeneration #DeepLearning
✨Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics
📝 Summary:
Enterprise Deep Research EDR is a multi-agent system for automated report generation and real-time data analysis in enterprises. It integrates specialized agents, tools, and a reflection mechanism for adaptive research. EDR outperforms state-of-the-art systems on open benchmarks without human ste...
🔹 Publication Date: Published on Oct 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.17797
• PDF: https://arxiv.org/pdf/2510.17797
• Github: https://github.com/SalesforceAIResearch/enterprise-deep-research
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Salesforce/EDR-200
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #EnterpriseAI #DataAnalytics #AIResearch #AutomatedReporting
📝 Summary:
Enterprise Deep Research EDR is a multi-agent system for automated report generation and real-time data analysis in enterprises. It integrates specialized agents, tools, and a reflection mechanism for adaptive research. EDR outperforms state-of-the-art systems on open benchmarks without human ste...
🔹 Publication Date: Published on Oct 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.17797
• PDF: https://arxiv.org/pdf/2510.17797
• Github: https://github.com/SalesforceAIResearch/enterprise-deep-research
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Salesforce/EDR-200
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #EnterpriseAI #DataAnalytics #AIResearch #AutomatedReporting
✨Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
📝 Summary:
Hulu-Med is a transparent medical vision-language model unifying diverse data modalities like text, 2D/3D images, and video. It achieves state-of-the-art performance across 30 clinical benchmarks with efficient training, promoting accessible AI.
🔹 Publication Date: Published on Oct 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08668
• PDF: https://arxiv.org/pdf/2510.08668
• Github: https://github.com/ZJUI-AI4H/Hulu-Med
🔹 Models citing this paper:
• https://huggingface.co/ZJU-AI4H/Hulu-Med-32B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-7B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-14B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #VisionLanguageModel #MultimodalAI #HealthcareAI #AIResearch
📝 Summary:
Hulu-Med is a transparent medical vision-language model unifying diverse data modalities like text, 2D/3D images, and video. It achieves state-of-the-art performance across 30 clinical benchmarks with efficient training, promoting accessible AI.
🔹 Publication Date: Published on Oct 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08668
• PDF: https://arxiv.org/pdf/2510.08668
• Github: https://github.com/ZJUI-AI4H/Hulu-Med
🔹 Models citing this paper:
• https://huggingface.co/ZJU-AI4H/Hulu-Med-32B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-7B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-14B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MedicalAI #VisionLanguageModel #MultimodalAI #HealthcareAI #AIResearch
✨GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
📝 Summary:
GraphGen is a framework that enhances synthetic data generation for LLMs by constructing fine-grained knowledge graphs. It targets high-value knowledge gaps, uses multi-hop sampling, and style-controlled generation to create diverse and accurate QA pairs. This approach outperforms conventional me...
🔹 Publication Date: Published on May 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.20416
• PDF: https://arxiv.org/pdf/2505.20416
• Project Page: https://huggingface.co/spaces/chenzihong/GraphGen
• Github: https://github.com/open-sciencelab/GraphGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/chenzihong/GraphGen-Data
✨ Spaces citing this paper:
• https://huggingface.co/spaces/chenzihong/GraphGen
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #KnowledgeGraphs #SyntheticData #FineTuning #NLP
📝 Summary:
GraphGen is a framework that enhances synthetic data generation for LLMs by constructing fine-grained knowledge graphs. It targets high-value knowledge gaps, uses multi-hop sampling, and style-controlled generation to create diverse and accurate QA pairs. This approach outperforms conventional me...
🔹 Publication Date: Published on May 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.20416
• PDF: https://arxiv.org/pdf/2505.20416
• Project Page: https://huggingface.co/spaces/chenzihong/GraphGen
• Github: https://github.com/open-sciencelab/GraphGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/chenzihong/GraphGen-Data
✨ Spaces citing this paper:
• https://huggingface.co/spaces/chenzihong/GraphGen
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #KnowledgeGraphs #SyntheticData #FineTuning #NLP
✨Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
📝 Summary:
Skywork R1V is a multimodal reasoning model that efficiently extends large language models to visual tasks. It achieves this via efficient transfer, enhanced visual-text alignment, and adaptive Chain-of-Thought optimization, delivering competitive benchmark performance.
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.05599
• PDF: https://arxiv.org/pdf/2504.05599
• Project Page: https://huggingface.co/papers?q=lightweight%20visual%20projector
• Github: https://github.com/SkyworkAI/Skywork-R1V
🔹 Models citing this paper:
• https://huggingface.co/Skywork/Skywork-R1V-38B
• https://huggingface.co/Skywork/Skywork-R1V2-38B
• https://huggingface.co/Skywork/Skywork-R1V2-38B-AWQ
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ChainOfThought #LLMs #ComputerVision #AIResearch
📝 Summary:
Skywork R1V is a multimodal reasoning model that efficiently extends large language models to visual tasks. It achieves this via efficient transfer, enhanced visual-text alignment, and adaptive Chain-of-Thought optimization, delivering competitive benchmark performance.
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.05599
• PDF: https://arxiv.org/pdf/2504.05599
• Project Page: https://huggingface.co/papers?q=lightweight%20visual%20projector
• Github: https://github.com/SkyworkAI/Skywork-R1V
🔹 Models citing this paper:
• https://huggingface.co/Skywork/Skywork-R1V-38B
• https://huggingface.co/Skywork/Skywork-R1V2-38B
• https://huggingface.co/Skywork/Skywork-R1V2-38B-AWQ
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ChainOfThought #LLMs #ComputerVision #AIResearch
👍1
✨OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
📝 Summary:
OpenMMReasoner introduces a two-stage SFT+RL training approach with rigorous data curation. This method significantly enhances multimodal reasoning, improving performance by 11.6% over baselines across nine benchmarks.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16334
• PDF: https://arxiv.org/pdf/2511.16334
• Project Page: https://evolvinglmms-lab.github.io/OpenMMReasoner/
• Github: https://github.com/EvolvingLMMs-Lab/OpenMMReasoner
🔹 Models citing this paper:
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-RL
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-ColdStart
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-SFT-874K
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-RL-74K
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ReinforcementLearning #LLMs #AIResearch #DeepLearning
📝 Summary:
OpenMMReasoner introduces a two-stage SFT+RL training approach with rigorous data curation. This method significantly enhances multimodal reasoning, improving performance by 11.6% over baselines across nine benchmarks.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16334
• PDF: https://arxiv.org/pdf/2511.16334
• Project Page: https://evolvinglmms-lab.github.io/OpenMMReasoner/
• Github: https://github.com/EvolvingLMMs-Lab/OpenMMReasoner
🔹 Models citing this paper:
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-RL
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-ColdStart
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-SFT-874K
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-RL-74K
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #ReinforcementLearning #LLMs #AIResearch #DeepLearning
arXiv.org
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning...
Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of...
❤1
Media is too big
VIEW IN TELEGRAM
✨GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
📝 Summary:
GeoVista is a new agentic model for geolocalization that integrates tool invocation and reinforcement learning. It achieves high performance on the new GeoBench benchmark, surpassing open-source models and matching closed-source models.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15705
• PDF: https://arxiv.org/pdf/2511.15705
• Project Page: https://ekonwang.github.io/geo-vista/
• Github: https://github.com/ekonwang/GeoVista
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Geolocalization #AI #ReinforcementLearning #ComputerVision #AIAgents
📝 Summary:
GeoVista is a new agentic model for geolocalization that integrates tool invocation and reinforcement learning. It achieves high performance on the new GeoBench benchmark, surpassing open-source models and matching closed-source models.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15705
• PDF: https://arxiv.org/pdf/2511.15705
• Project Page: https://ekonwang.github.io/geo-vista/
• Github: https://github.com/ekonwang/GeoVista
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Geolocalization #AI #ReinforcementLearning #ComputerVision #AIAgents
✨SAM 3: Segment Anything with Concepts
📝 Summary:
SAM 3 is a unified model achieving state-of-the-art in promptable concept segmentation and tracking. It uses concept prompts for detecting, segmenting, and tracking objects, doubling accuracy over existing systems. The model and a new benchmark are open sourced.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16719
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
• Github: https://github.com/facebookresearch/sam3
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #ImageSegmentation #ObjectTracking #AI #DeepLearning
📝 Summary:
SAM 3 is a unified model achieving state-of-the-art in promptable concept segmentation and tracking. It uses concept prompts for detecting, segmenting, and tracking objects, doubling accuracy over existing systems. The model and a new benchmark are open sourced.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16719
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
• Github: https://github.com/facebookresearch/sam3
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #ImageSegmentation #ObjectTracking #AI #DeepLearning
✨RynnVLA-002: A Unified Vision-Language-Action and World Model
📝 Summary:
RynnVLA-002 unifies a Vision-Language-Action and world model, enabling joint learning of environmental dynamics and action planning. This mutual enhancement leads to superior performance, achieving 97.4% success in simulation and a 50% boost in real-world robot tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17502
• PDF: https://arxiv.org/pdf/2511.17502
• Github: https://github.com/alibaba-damo-academy/RynnVLA-002
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageAction #WorldModels #Robotics #AI #DeepLearning
📝 Summary:
RynnVLA-002 unifies a Vision-Language-Action and world model, enabling joint learning of environmental dynamics and action planning. This mutual enhancement leads to superior performance, achieving 97.4% success in simulation and a 50% boost in real-world robot tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17502
• PDF: https://arxiv.org/pdf/2511.17502
• Github: https://github.com/alibaba-damo-academy/RynnVLA-002
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageAction #WorldModels #Robotics #AI #DeepLearning
✨Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination
📝 Summary:
Video-R4 is a video reasoning LMM that improves text-rich video QA through iterative visual rumination. It simulates human behavior by iteratively selecting, zooming, and re-encoding frames to update its reasoning. This approach achieves state-of-the-art results on various QA tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17490
• PDF: https://arxiv.org/pdf/2511.17490
• Project Page: https://yunlong10.github.io/Video-R4/
• Github: https://github.com/yunlong10/Video-R4
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoReasoning #LMM #MultimodalAI #DeepLearning #VideoQA
📝 Summary:
Video-R4 is a video reasoning LMM that improves text-rich video QA through iterative visual rumination. It simulates human behavior by iteratively selecting, zooming, and re-encoding frames to update its reasoning. This approach achieves state-of-the-art results on various QA tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17490
• PDF: https://arxiv.org/pdf/2511.17490
• Project Page: https://yunlong10.github.io/Video-R4/
• Github: https://github.com/yunlong10/Video-R4
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoReasoning #LMM #MultimodalAI #DeepLearning #VideoQA
✨WorldGen: From Text to Traversable and Interactive 3D Worlds
📝 Summary:
WorldGen transforms text prompts into interactive 3D worlds. It combines LLM reasoning with procedural and diffusion-based 3D generation to efficiently create coherent, navigable environments for gaming and simulation.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16825
• PDF: https://arxiv.org/pdf/2511.16825
• Project Page: https://www.meta.com/blog/worldgen-3d-world-generation-reality-labs-generative-ai-research/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DGeneration #GenerativeAI #LLMs #VirtualWorlds #AIResearch
📝 Summary:
WorldGen transforms text prompts into interactive 3D worlds. It combines LLM reasoning with procedural and diffusion-based 3D generation to efficiently create coherent, navigable environments for gaming and simulation.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16825
• PDF: https://arxiv.org/pdf/2511.16825
• Project Page: https://www.meta.com/blog/worldgen-3d-world-generation-reality-labs-generative-ai-research/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DGeneration #GenerativeAI #LLMs #VirtualWorlds #AIResearch
Media is too big
VIEW IN TELEGRAM
✨Planning with Sketch-Guided Verification for Physics-Aware Video Generation
📝 Summary:
SketchVerify improves video motion planning by iteratively refining candidate trajectories using lightweight sketch-based verification. This training-free method enhances physical realism and consistency more efficiently than full video generation.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17450
• PDF: https://arxiv.org/pdf/2511.17450
• Project Page: https://sketchverify.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #MotionPlanning #AI #ComputerVision #PhysicsSimulation
📝 Summary:
SketchVerify improves video motion planning by iteratively refining candidate trajectories using lightweight sketch-based verification. This training-free method enhances physical realism and consistency more efficiently than full video generation.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17450
• PDF: https://arxiv.org/pdf/2511.17450
• Project Page: https://sketchverify.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #MotionPlanning #AI #ComputerVision #PhysicsSimulation
✨VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation
📝 Summary:
VLA-4D enhances robotic manipulation by integrating 4D spatial-temporal awareness into visual and action representations. This enables smoother and more coherent robot control for complex tasks by embedding time into 3D positions and extending action planning with temporal information.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17199
• PDF: https://arxiv.org/pdf/2511.17199
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #AI #VLAModels #SpatialTemporalAI #RobotManipulation
📝 Summary:
VLA-4D enhances robotic manipulation by integrating 4D spatial-temporal awareness into visual and action representations. This enables smoother and more coherent robot control for complex tasks by embedding time into 3D positions and extending action planning with temporal information.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17199
• PDF: https://arxiv.org/pdf/2511.17199
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #AI #VLAModels #SpatialTemporalAI #RobotManipulation
✨OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
📝 Summary:
OmniScientist is a framework that models human scientific research's social and collaborative aspects into AI workflows. It provides a structured knowledge system, collaborative protocols, and an evaluation platform, fostering a co-evolving ecosystem of human and AI scientists.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16931
• PDF: https://arxiv.org/pdf/2511.16931
• Project Page: https://omniscientist.ai/chat
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #ScientificDiscovery #HumanAICollaboration #ResearchFramework
📝 Summary:
OmniScientist is a framework that models human scientific research's social and collaborative aspects into AI workflows. It provides a structured knowledge system, collaborative protocols, and an evaluation platform, fostering a co-evolving ecosystem of human and AI scientists.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16931
• PDF: https://arxiv.org/pdf/2511.16931
• Project Page: https://omniscientist.ai/chat
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #ScientificDiscovery #HumanAICollaboration #ResearchFramework
✨O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
📝 Summary:
O-Mem, an active user profiling framework, improves LLM agent consistency and personalization. It updates user profiles and outperforms prior SOTA on LoCoMo and PERSONAMEM benchmarks, also boosting response efficiency.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13593
• PDF: https://arxiv.org/pdf/2511.13593
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #Personalization #AIMemory #GenerativeAI #UserProfiling
📝 Summary:
O-Mem, an active user profiling framework, improves LLM agent consistency and personalization. It updates user profiles and outperforms prior SOTA on LoCoMo and PERSONAMEM benchmarks, also boosting response efficiency.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13593
• PDF: https://arxiv.org/pdf/2511.13593
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #Personalization #AIMemory #GenerativeAI #UserProfiling
✨Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
📝 Summary:
Multi-Faceted Attack MFA reveals cross-model safety vulnerabilities in defense-equipped Vision-Language Models. It uses Attention-Transfer Attack to hide harmful instructions and bypass filters, exploiting shared visual representations for high success rates. MFA challenges the robustness of curr...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16110
• PDF: https://arxiv.org/pdf/2511.16110
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AISecurity #AdversarialAttacks #AIvulnerabilities #MachineLearning
📝 Summary:
Multi-Faceted Attack MFA reveals cross-model safety vulnerabilities in defense-equipped Vision-Language Models. It uses Attention-Transfer Attack to hide harmful instructions and bypass filters, exploiting shared visual representations for high success rates. MFA challenges the robustness of curr...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16110
• PDF: https://arxiv.org/pdf/2511.16110
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AISecurity #AdversarialAttacks #AIvulnerabilities #MachineLearning
✨Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
❤1