✨VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation
📝 Summary:
VLA-4D enhances robotic manipulation by integrating 4D spatial-temporal awareness into visual and action representations. This enables smoother and more coherent robot control for complex tasks by embedding time into 3D positions and extending action planning with temporal information.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17199
• PDF: https://arxiv.org/pdf/2511.17199
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #AI #VLAModels #SpatialTemporalAI #RobotManipulation
📝 Summary:
VLA-4D enhances robotic manipulation by integrating 4D spatial-temporal awareness into visual and action representations. This enables smoother and more coherent robot control for complex tasks by embedding time into 3D positions and extending action planning with temporal information.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17199
• PDF: https://arxiv.org/pdf/2511.17199
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Robotics #AI #VLAModels #SpatialTemporalAI #RobotManipulation
✨OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
📝 Summary:
OmniScientist is a framework that models human scientific research's social and collaborative aspects into AI workflows. It provides a structured knowledge system, collaborative protocols, and an evaluation platform, fostering a co-evolving ecosystem of human and AI scientists.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16931
• PDF: https://arxiv.org/pdf/2511.16931
• Project Page: https://omniscientist.ai/chat
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #ScientificDiscovery #HumanAICollaboration #ResearchFramework
📝 Summary:
OmniScientist is a framework that models human scientific research's social and collaborative aspects into AI workflows. It provides a structured knowledge system, collaborative protocols, and an evaluation platform, fostering a co-evolving ecosystem of human and AI scientists.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16931
• PDF: https://arxiv.org/pdf/2511.16931
• Project Page: https://omniscientist.ai/chat
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #ScientificDiscovery #HumanAICollaboration #ResearchFramework
✨O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents
📝 Summary:
O-Mem, an active user profiling framework, improves LLM agent consistency and personalization. It updates user profiles and outperforms prior SOTA on LoCoMo and PERSONAMEM benchmarks, also boosting response efficiency.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13593
• PDF: https://arxiv.org/pdf/2511.13593
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #Personalization #AIMemory #GenerativeAI #UserProfiling
📝 Summary:
O-Mem, an active user profiling framework, improves LLM agent consistency and personalization. It updates user profiles and outperforms prior SOTA on LoCoMo and PERSONAMEM benchmarks, also boosting response efficiency.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13593
• PDF: https://arxiv.org/pdf/2511.13593
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #Personalization #AIMemory #GenerativeAI #UserProfiling
✨Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
📝 Summary:
Multi-Faceted Attack MFA reveals cross-model safety vulnerabilities in defense-equipped Vision-Language Models. It uses Attention-Transfer Attack to hide harmful instructions and bypass filters, exploiting shared visual representations for high success rates. MFA challenges the robustness of curr...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16110
• PDF: https://arxiv.org/pdf/2511.16110
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AISecurity #AdversarialAttacks #AIvulnerabilities #MachineLearning
📝 Summary:
Multi-Faceted Attack MFA reveals cross-model safety vulnerabilities in defense-equipped Vision-Language Models. It uses Attention-Transfer Attack to hide harmful instructions and bypass filters, exploiting shared visual representations for high success rates. MFA challenges the robustness of curr...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16110
• PDF: https://arxiv.org/pdf/2511.16110
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AISecurity #AdversarialAttacks #AIvulnerabilities #MachineLearning
✨Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
❤1
✨VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
📝 Summary:
VisMem equips Vision-Language Models with dynamic latent vision memories, inspired by human cognition. This framework helps VLMs maintain perceptual fidelity and semantic consistency, significantly boosting performance on complex visual tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11007
• PDF: https://arxiv.org/pdf/2511.11007
• Github: https://github.com/YU-deep/VisMem.git
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisMem #VisionLanguageModels #AI #DeepLearning #ComputerVision
📝 Summary:
VisMem equips Vision-Language Models with dynamic latent vision memories, inspired by human cognition. This framework helps VLMs maintain perceptual fidelity and semantic consistency, significantly boosting performance on complex visual tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11007
• PDF: https://arxiv.org/pdf/2511.11007
• Github: https://github.com/YU-deep/VisMem.git
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisMem #VisionLanguageModels #AI #DeepLearning #ComputerVision
✨Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs
📝 Summary:
PARROT evaluates LLM robustness to sycophancy by comparing neutral and false authoritative questions. Advanced models resist pressure well, but older ones show severe epistemic collapse, even reducing confidence in correct answers. This highlights the need for LLMs to resist pressure for safe dep...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17220
• PDF: https://arxiv.org/pdf/2511.17220
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #AISafety #ModelRobustness #Sycophancy #AIResearch
📝 Summary:
PARROT evaluates LLM robustness to sycophancy by comparing neutral and false authoritative questions. Advanced models resist pressure well, but older ones show severe epistemic collapse, even reducing confidence in correct answers. This highlights the need for LLMs to resist pressure for safe dep...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17220
• PDF: https://arxiv.org/pdf/2511.17220
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #AISafety #ModelRobustness #Sycophancy #AIResearch
❤1
✨Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations
📝 Summary:
This paper introduces the RFxG taxonomy to categorize saliency map explanations by reference-frame and granularity. It proposes novel faithfulness metrics to improve evaluation, aiming to align explanations with diverse user intent and human understanding.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13081
• PDF: https://arxiv.org/pdf/2511.13081
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ExplainableAI #SaliencyMaps #CognitiveScience #AIEvaluation #AIResearch
📝 Summary:
This paper introduces the RFxG taxonomy to categorize saliency map explanations by reference-frame and granularity. It proposes novel faithfulness metrics to improve evaluation, aiming to align explanations with diverse user intent and human understanding.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13081
• PDF: https://arxiv.org/pdf/2511.13081
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ExplainableAI #SaliencyMaps #CognitiveScience #AIEvaluation #AIResearch
✨Taming Generative Synthetic Data for X-ray Prohibited Item Detection
📝 Summary:
Xsyn introduces a one-stage text-to-image synthesis pipeline for X-ray security images. It eliminates labor costs and improves image quality and efficiency for training detection models. This method significantly enhances prohibited item detection performance, outperforming prior approaches.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15299
• PDF: https://arxiv.org/pdf/2511.15299
• Github: https://github.com/pILLOW-1/Xsyn/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#XraySecurity #GenerativeAI #ComputerVision #SyntheticData #ObjectDetection
📝 Summary:
Xsyn introduces a one-stage text-to-image synthesis pipeline for X-ray security images. It eliminates labor costs and improves image quality and efficiency for training detection models. This method significantly enhances prohibited item detection performance, outperforming prior approaches.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15299
• PDF: https://arxiv.org/pdf/2511.15299
• Github: https://github.com/pILLOW-1/Xsyn/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#XraySecurity #GenerativeAI #ComputerVision #SyntheticData #ObjectDetection
✨Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
📝 Summary:
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
This study explores intrinsic dimension ID in large language models, revealing its independence from entropy and genre-specific stratification. Scientific texts show low ID, while creative/opinion writing exhibits hi...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15210
• PDF: https://arxiv.org/pdf/2511.15210
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#IntrinsicDimension #LargeLanguageModels #NLP #TextAnalytics #DataScience
📝 Summary:
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
This study explores intrinsic dimension ID in large language models, revealing its independence from entropy and genre-specific stratification. Scientific texts show low ID, while creative/opinion writing exhibits hi...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15210
• PDF: https://arxiv.org/pdf/2511.15210
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#IntrinsicDimension #LargeLanguageModels #NLP #TextAnalytics #DataScience
✨Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
📝 Summary:
Downscaling multimodal models disproportionately harms visual capabilities, including perception, more than LLM abilities. This paper introduces visual extraction tuning combined with step-by-step reasoning to improve smaller models efficiency and performance.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17487
• PDF: https://arxiv.org/pdf/2511.17487
• Project Page: https://web.stanford.edu/~markendo/projects/downscaling_intelligence
• Github: https://github.com/markendo/downscaling_intelligence
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #SmallModels #ComputerVision #EfficientAI #AIResearch
📝 Summary:
Downscaling multimodal models disproportionately harms visual capabilities, including perception, more than LLM abilities. This paper introduces visual extraction tuning combined with step-by-step reasoning to improve smaller models efficiency and performance.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17487
• PDF: https://arxiv.org/pdf/2511.17487
• Project Page: https://web.stanford.edu/~markendo/projects/downscaling_intelligence
• Github: https://github.com/markendo/downscaling_intelligence
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #SmallModels #ComputerVision #EfficientAI #AIResearch
✨Diversity Has Always Been There in Your Visual Autoregressive Models
📝 Summary:
To combat diversity collapse in Visual Autoregressive models, DiverseVAR modifies feature maps without retraining. This restores generative diversity while maintaining high synthesis quality.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17074
• PDF: https://arxiv.org/pdf/2511.17074
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualAI #GenerativeModels #ModelDiversity #MachineLearning #ComputerVision
📝 Summary:
To combat diversity collapse in Visual Autoregressive models, DiverseVAR modifies feature maps without retraining. This restores generative diversity while maintaining high synthesis quality.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17074
• PDF: https://arxiv.org/pdf/2511.17074
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualAI #GenerativeModels #ModelDiversity #MachineLearning #ComputerVision
This media is not supported in your browser
VIEW IN TELEGRAM
✨Loomis Painter: Reconstructing the Painting Process
📝 Summary:
This paper proposes a unified diffusion model framework for generating consistent, high-fidelity multi-media painting processes. It uses semantic control and cross-medium style augmentation to replicate human artistic workflows, supported by a new dataset and evaluation metrics.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17344
• PDF: https://arxiv.org/pdf/2511.17344
• Project Page: https://markus-pobitzer.github.io/lplp/
• Github: https://github.com/Markus-Pobitzer/wlp
🔹 Models citing this paper:
• https://huggingface.co/Markus-Pobitzer/wlp-lora
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #GenerativeAI #AIArt #ComputerGraphics #MachineLearning
📝 Summary:
This paper proposes a unified diffusion model framework for generating consistent, high-fidelity multi-media painting processes. It uses semantic control and cross-medium style augmentation to replicate human artistic workflows, supported by a new dataset and evaluation metrics.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17344
• PDF: https://arxiv.org/pdf/2511.17344
• Project Page: https://markus-pobitzer.github.io/lplp/
• Github: https://github.com/Markus-Pobitzer/wlp
🔹 Models citing this paper:
• https://huggingface.co/Markus-Pobitzer/wlp-lora
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #GenerativeAI #AIArt #ComputerGraphics #MachineLearning
✨MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging
📝 Summary:
MergeDNA models genomic sequences with a hierarchical architecture and dynamic Token Merging to adaptively chunk bases. This addresses varying information density and lack of a fixed vocabulary, achieving superior performance on DNA benchmarks and multi-omics tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14806
• PDF: https://arxiv.org/pdf/2511.14806
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Genomics #Bioinformatics #MachineLearning #DNA #MultiOmics
📝 Summary:
MergeDNA models genomic sequences with a hierarchical architecture and dynamic Token Merging to adaptively chunk bases. This addresses varying information density and lack of a fixed vocabulary, achieving superior performance on DNA benchmarks and multi-omics tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14806
• PDF: https://arxiv.org/pdf/2511.14806
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#Genomics #Bioinformatics #MachineLearning #DNA #MultiOmics
✨Insights from the ICLR Peer Review and Rebuttal Process
📝 Summary:
ICLR 2024-2025 peer review was analyzed using LLM analysis to understand score changes. Initial scores and co-reviewer ratings strongly predict changes, and rebuttals aid borderline papers. These insights aim to improve the review process.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15462
• PDF: https://arxiv.org/pdf/2511.15462
• Project Page: https://github.com/papercopilot/iclr-insights.
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#PeerReview #LLM #ICLR #AcademicResearch #MachineLearning
📝 Summary:
ICLR 2024-2025 peer review was analyzed using LLM analysis to understand score changes. Initial scores and co-reviewer ratings strongly predict changes, and rebuttals aid borderline papers. These insights aim to improve the review process.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15462
• PDF: https://arxiv.org/pdf/2511.15462
• Project Page: https://github.com/papercopilot/iclr-insights.
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#PeerReview #LLM #ICLR #AcademicResearch #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨Loomis Painter: Reconstructing the Painting Process
📝 Summary:
This paper proposes a unified diffusion model framework for generating consistent, high-fidelity multi-media painting processes. It uses semantic control and cross-medium style augmentation to replicate human artistic workflows, supported by a new dataset and evaluation metrics.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17344
• PDF: https://arxiv.org/pdf/2511.17344
• Project Page: https://markus-pobitzer.github.io/lplp/
• Github: https://github.com/Markus-Pobitzer/wlp
🔹 Models citing this paper:
• https://huggingface.co/Markus-Pobitzer/wlp-lora
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #GenerativeAI #AIArt #ComputerGraphics #MachineLearning
📝 Summary:
This paper proposes a unified diffusion model framework for generating consistent, high-fidelity multi-media painting processes. It uses semantic control and cross-medium style augmentation to replicate human artistic workflows, supported by a new dataset and evaluation metrics.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17344
• PDF: https://arxiv.org/pdf/2511.17344
• Project Page: https://markus-pobitzer.github.io/lplp/
• Github: https://github.com/Markus-Pobitzer/wlp
🔹 Models citing this paper:
• https://huggingface.co/Markus-Pobitzer/wlp-lora
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DiffusionModels #GenerativeAI #AIArt #ComputerGraphics #MachineLearning
✨InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization
📝 Summary:
InstructMix2Mix I-Mix2Mix improves multi-view image editing from sparse inputs, which often lack consistency. It distills a 2D diffusion model into a multi-view diffusion model, leveraging its 3D prior for cross-view coherence. This framework significantly enhances multi-view consistency and per-...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14899
• PDF: https://arxiv.org/pdf/2511.14899
• Project Page: https://danielgilo.github.io/instruct-mix2mix/
• Github: https://danielgilo.github.io/instruct-mix2mix/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiViewEditing #DiffusionModels #ComputerVision #3DVision #ImageSynthesis
📝 Summary:
InstructMix2Mix I-Mix2Mix improves multi-view image editing from sparse inputs, which often lack consistency. It distills a 2D diffusion model into a multi-view diffusion model, leveraging its 3D prior for cross-view coherence. This framework significantly enhances multi-view consistency and per-...
🔹 Publication Date: Published on Nov 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14899
• PDF: https://arxiv.org/pdf/2511.14899
• Project Page: https://danielgilo.github.io/instruct-mix2mix/
• Github: https://danielgilo.github.io/instruct-mix2mix/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultiViewEditing #DiffusionModels #ComputerVision #3DVision #ImageSynthesis
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Computer-Use Agents as Judges for Generative User Interface
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
📝 Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15567
• PDF: https://arxiv.org/pdf/2511.15567
• Project Page: https://showlab.github.io/AUI/
• Github: https://github.com/showlab/AUI/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
✨M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
📝 Summary:
M3-Bench is a new benchmark evaluating multimodal LLM agent tool use in complex, multi-hop workflows requiring visual grounding and tool dependencies. It introduces a similarity-driven alignment method and interpretable metrics. Evaluations show significant gaps in current MLLMs, especially in ar...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17729
• PDF: https://arxiv.org/pdf/2511.17729
• Github: https://github.com/EtaYang10th/Open-M3-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLM #LLMAgents #AI #Benchmarking #ToolUse
📝 Summary:
M3-Bench is a new benchmark evaluating multimodal LLM agent tool use in complex, multi-hop workflows requiring visual grounding and tool dependencies. It introduces a similarity-driven alignment method and interpretable metrics. Evaluations show significant gaps in current MLLMs, especially in ar...
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17729
• PDF: https://arxiv.org/pdf/2511.17729
• Github: https://github.com/EtaYang10th/Open-M3-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MLLM #LLMAgents #AI #Benchmarking #ToolUse
