✨SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis
📝 Summary:
SyncMV4D generates realistic and consistent multi-view 3D Hand-Object Interaction videos and 4D motions. It unifies visual priors, motion dynamics, and multi-view geometry, using a joint diffusion model and a point aligner for robust generation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19319
• PDF: https://arxiv.org/pdf/2511.19319
• Project Page: https://droliven.github.io/SyncMV4D/
• Github: https://droliven.github.io/SyncMV4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HandObjectInteraction #DiffusionModels #3DGeneration #ComputerVision #GenerativeAI
📝 Summary:
SyncMV4D generates realistic and consistent multi-view 3D Hand-Object Interaction videos and 4D motions. It unifies visual priors, motion dynamics, and multi-view geometry, using a joint diffusion model and a point aligner for robust generation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19319
• PDF: https://arxiv.org/pdf/2511.19319
• Project Page: https://droliven.github.io/SyncMV4D/
• Github: https://droliven.github.io/SyncMV4D/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#HandObjectInteraction #DiffusionModels #3DGeneration #ComputerVision #GenerativeAI
✨Continuous Thought Machines
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
✨LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
✨Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models
📝 Summary:
QTSplus is a query-aware token selector for long-video multimodal language models. It dynamically selects the most important visual tokens based on a text query, significantly compressing vision data and reducing latency. This method maintains overall accuracy and enhances temporal understanding ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/AlpachinoNLP/qtsplus
• PDF: https://arxiv.org/pdf/2511.11910
• Project Page: https://qtsplus.github.io/
• Github: https://github.com/Siyou-Li/QTSplus
🔹 Models citing this paper:
• https://huggingface.co/AlpachinoNLP/QTSplus-3B
• https://huggingface.co/AlpachinoNLP/QTSplus-3B-FT
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AlpachinoNLP/QTSplus-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoAI #LLM #Tokenization #ComputerVision
📝 Summary:
QTSplus is a query-aware token selector for long-video multimodal language models. It dynamically selects the most important visual tokens based on a text query, significantly compressing vision data and reducing latency. This method maintains overall accuracy and enhances temporal understanding ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/AlpachinoNLP/qtsplus
• PDF: https://arxiv.org/pdf/2511.11910
• Project Page: https://qtsplus.github.io/
• Github: https://github.com/Siyou-Li/QTSplus
🔹 Models citing this paper:
• https://huggingface.co/AlpachinoNLP/QTSplus-3B
• https://huggingface.co/AlpachinoNLP/QTSplus-3B-FT
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AlpachinoNLP/QTSplus-3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #VideoAI #LLM #Tokenization #ComputerVision
huggingface.co
QTSplus - a AlpachinoNLP Collection
Official models and datasets for paper(https://arxiv.org/abs/2511.11910)
✨DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
arXiv.org
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via...
✨HunyuanVideo 1.5 Technical Report
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
✨Flow Map Distillation Without Data
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
✨Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
📝 Summary:
Chain-of-Visual-Thought COVT enables VLMs to improve dense visual perception by reasoning through continuous visual tokens. These tokens capture rich perceptual cues like 2D appearance and 3D geometry from lightweight vision experts. COVT consistently boosts VLM performance on diverse benchmarks,...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19418
• PDF: https://arxiv.org/pdf/2511.19418
• Project Page: https://wakalsprojectpage.github.io/comt-website/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #ComputerVision #AI #MachineLearning #VisualReasoning
📝 Summary:
Chain-of-Visual-Thought COVT enables VLMs to improve dense visual perception by reasoning through continuous visual tokens. These tokens capture rich perceptual cues like 2D appearance and 3D geometry from lightweight vision experts. COVT consistently boosts VLM performance on diverse benchmarks,...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19418
• PDF: https://arxiv.org/pdf/2511.19418
• Project Page: https://wakalsprojectpage.github.io/comt-website/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #ComputerVision #AI #MachineLearning #VisualReasoning
✨MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
📝 Summary:
VLMs struggle with physics-driven video reasoning. This paper introduces MASS, a method that injects spatial-temporal signals and motion tracking into VLMs, along with the MASS-Bench dataset. MASS significantly improves VLM performance on physics tasks, outperforming baselines and achieving state...
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18373
• PDF: https://arxiv.org/pdf/2511.18373
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #PhysicsAI #ComputerVision #AIResearch #MachineLearning
📝 Summary:
VLMs struggle with physics-driven video reasoning. This paper introduces MASS, a method that injects spatial-temporal signals and motion tracking into VLMs, along with the MASS-Bench dataset. MASS significantly improves VLM performance on physics tasks, outperforming baselines and achieving state...
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18373
• PDF: https://arxiv.org/pdf/2511.18373
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLMs #PhysicsAI #ComputerVision #AIResearch #MachineLearning
✨Pillar-0: A New Frontier for Radiology Foundation Models
📝 Summary:
Pillar-0 is a new radiology foundation model pretrained on diverse CT/MRI scans, utilizing RATE for scalable label extraction. It significantly outperforms existing models across various radiology tasks and extends to new applications like lung cancer risk prediction and brain hemorrhage detection.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17803
• PDF: https://arxiv.org/pdf/2511.17803
• Github: https://github.com/YalaLab/rate-evals
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Radiology #FoundationModels #AI #MedicalImaging #MachineLearning
📝 Summary:
Pillar-0 is a new radiology foundation model pretrained on diverse CT/MRI scans, utilizing RATE for scalable label extraction. It significantly outperforms existing models across various radiology tasks and extends to new applications like lung cancer risk prediction and brain hemorrhage detection.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17803
• PDF: https://arxiv.org/pdf/2511.17803
• Github: https://github.com/YalaLab/rate-evals
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Radiology #FoundationModels #AI #MedicalImaging #MachineLearning
✨Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
✨Fidelity-Aware Recommendation Explanations via Stochastic Path Integration
📝 Summary:
SPINRec improves recommendation explanation fidelity by using stochastic path integration and baseline sampling, capturing both observed and unobserved interactions. It consistently outperforms prior methods, setting a new benchmark for faithful explainability in recommender systems.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18047
• PDF: https://arxiv.org/pdf/2511.18047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RecommenderSystems #ExplainableAI #MachineLearning #AI #DataScience
📝 Summary:
SPINRec improves recommendation explanation fidelity by using stochastic path integration and baseline sampling, capturing both observed and unobserved interactions. It consistently outperforms prior methods, setting a new benchmark for faithful explainability in recommender systems.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18047
• PDF: https://arxiv.org/pdf/2511.18047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RecommenderSystems #ExplainableAI #MachineLearning #AI #DataScience
✨Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
✨MIST: Mutual Information Via Supervised Training
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
✨AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser
📝 Summary:
This paper introduces MinerU-HTML, a novel language model-based HTML parser that semantically extracts web content, preserving structure better than heuristic methods. It constructs the 7.3T AICC corpus, demonstrating that models trained on AICC significantly outperform those from other parsers, ...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16397
• PDF: https://arxiv.org/pdf/2511.16397
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab/AICC
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #HTMLParsing #Corpus #LanguageModels #WebData
📝 Summary:
This paper introduces MinerU-HTML, a novel language model-based HTML parser that semantically extracts web content, preserving structure better than heuristic methods. It constructs the 7.3T AICC corpus, demonstrating that models trained on AICC significantly outperform those from other parsers, ...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16397
• PDF: https://arxiv.org/pdf/2511.16397
✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab/AICC
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #HTMLParsing #Corpus #LanguageModels #WebData
✨UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
📝 Summary:
UI-S1 introduces Semi-online RL for GUI automation, simulating online RL on offline trajectories to overcome current method limitations. It achieved SOTA performance on dynamic benchmarks, bridging offline training efficiency and online reasoning.
🔹 Publication Date: Published on Sep 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.11543
• PDF: https://arxiv.org/pdf/2509.11543
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/UI-S1-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mPLUG/UI_S1_dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #GUIAutomation #AI #MachineLearning #IntelligentAgents
📝 Summary:
UI-S1 introduces Semi-online RL for GUI automation, simulating online RL on offline trajectories to overcome current method limitations. It achieved SOTA performance on dynamic benchmarks, bridging offline training efficiency and online reasoning.
🔹 Publication Date: Published on Sep 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.11543
• PDF: https://arxiv.org/pdf/2509.11543
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/UI-S1-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mPLUG/UI_S1_dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #GUIAutomation #AI #MachineLearning #IntelligentAgents
✨PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
✨Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
📝 Summary:
GUI automation faces critical errors. This paper introduces GUI-Critic-R1, a pre-operative critic model using Suggestion-aware Gradient Relative Policy Optimization, to provide feedback and diagnose errors before execution. It significantly improves critic accuracy, enhancing automation reliabili...
🔹 Publication Date: Published on Jun 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.04614
• PDF: https://arxiv.org/pdf/2506.04614
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/BonnieOne/GUI-Critic-R1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GUIAutomation #ErrorDiagnosis #AI #MachineLearning #SoftwareTesting
📝 Summary:
GUI automation faces critical errors. This paper introduces GUI-Critic-R1, a pre-operative critic model using Suggestion-aware Gradient Relative Policy Optimization, to provide feedback and diagnose errors before execution. It significantly improves critic accuracy, enhancing automation reliabili...
🔹 Publication Date: Published on Jun 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.04614
• PDF: https://arxiv.org/pdf/2506.04614
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/BonnieOne/GUI-Critic-R1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GUIAutomation #ErrorDiagnosis #AI #MachineLearning #SoftwareTesting
✨Mobile-Agent-v3: Foundamental Agents for GUI Automation
📝 Summary:
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models achieving state-of-the-art performance on GUI benchmarks. GUI-Owl introduces large-scale environment infrastructure, diverse agent capabilities, and scalable reinforcement learning, with Mobile-Agent-v3 further improving these results.
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GUIAgent #Automation #ReinforcementLearning #AIResearch #OpenSourceAI
📝 Summary:
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models achieving state-of-the-art performance on GUI benchmarks. GUI-Owl introduces large-scale environment infrastructure, diverse agent capabilities, and scalable reinforcement learning, with Mobile-Agent-v3 further improving these results.
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GUIAgent #Automation #ReinforcementLearning #AIResearch #OpenSourceAI