✨ Title: Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games
📝 Summary:
A new game structure, Bounded One-Sided Response Games BORGs, involves actions briefly transferring control to an opponent to satisfy a condition. A modified Monopoly Deal is used as a benchmark, and standard CFR effectively learns strategies.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25080
• PDF: https://arxiv.org/pdf/2510.25080
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
A new game structure, Bounded One-Sided Response Games BORGs, involves actions briefly transferring control to an opponent to satisfy a condition. A modified Monopoly Deal is used as a benchmark, and standard CFR effectively learns strategies.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25080
• PDF: https://arxiv.org/pdf/2510.25080
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification
📝 Summary:
BOB is a T2I model fine-tuning strategy for synthetic data generation in low-shot fine-grained classification. It extracts class-agnostic attributes to condition fine-tuning, then marginalizes them out during generation. This mitigates overfitting and achieves state-of-the-art results.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24078
• PDF: https://arxiv.org/pdf/2510.24078
• Github: https://github.com/princetonvisualai/BeyondObjects
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
BOB is a T2I model fine-tuning strategy for synthetic data generation in low-shot fine-grained classification. It extracts class-agnostic attributes to condition fine-tuning, then marginalizes them out during generation. This mitigates overfitting and achieves state-of-the-art results.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24078
• PDF: https://arxiv.org/pdf/2510.24078
• Github: https://github.com/princetonvisualai/BeyondObjects
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
📝 Summary:
Ling 2.0 introduces reasoning-oriented language models, scaling to 1 trillion parameters using sparse Mixture-of-Experts. It leverages activated computation to boost reasoning efficiency and capability up to 7-fold compared to dense models. This demonstrates sparse activation enables scalable, ef...
🔹 Publication Date: Published on Oct 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22115
• PDF: https://arxiv.org/pdf/2510.22115
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Ling 2.0 introduces reasoning-oriented language models, scaling to 1 trillion parameters using sparse Mixture-of-Experts. It leverages activated computation to boost reasoning efficiency and capability up to 7-fold compared to dense models. This demonstrates sparse activation enables scalable, ef...
🔹 Publication Date: Published on Oct 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22115
• PDF: https://arxiv.org/pdf/2510.22115
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
✨ Title: Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
📝 Summary:
This paper presents a co-designed framework for universal video retrieval. It introduces the UVRB benchmark, synthesizes multimodal data, and devises a Modality Pyramid curriculum for the General Video Embedder GVE. GVE achieves state-of-the-art zero-shot generalization, highlighting limitations ...
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27571
• PDF: https://arxiv.org/pdf/2510.27571
• Project Page: https://gzn00417.github.io/GVE/
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/GVE-3B
• https://huggingface.co/Alibaba-NLP/GVE-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This paper presents a co-designed framework for universal video retrieval. It introduces the UVRB benchmark, synthesizes multimodal data, and devises a Modality Pyramid curriculum for the General Video Embedder GVE. GVE achieves state-of-the-art zero-shot generalization, highlighting limitations ...
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27571
• PDF: https://arxiv.org/pdf/2510.27571
• Project Page: https://gzn00417.github.io/GVE/
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/GVE-3B
• https://huggingface.co/Alibaba-NLP/GVE-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
✨ Title: Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
📝 Summary:
This paper optimizes multi-LLM collaboration graphs for TTS, finding compute-optimal designs. It proposes Agent-REINFORCE, an LLM-agent framework using textual feedback to efficiently find them. Outperforms baselines, balancing accuracy and latency.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00086
• PDF: https://arxiv.org/pdf/2511.00086
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This paper optimizes multi-LLM collaboration graphs for TTS, finding compute-optimal designs. It proposes Agent-REINFORCE, an LLM-agent framework using textual feedback to efficiently find them. Outperforms baselines, balancing accuracy and latency.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00086
• PDF: https://arxiv.org/pdf/2511.00086
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Towards Robust Mathematical Reasoning
📝 Summary:
The paper presents IMO-Bench, advanced mathematical reasoning benchmarks at the International Mathematical Olympiad level. These include short answer and proof writing evaluations. Gemini Deep Think achieved gold-level IMO performance, significantly outperforming other models on IMO-Bench.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01846
• PDF: https://arxiv.org/pdf/2511.01846
• Project Page: https://imobench.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
The paper presents IMO-Bench, advanced mathematical reasoning benchmarks at the International Mathematical Olympiad level. These include short answer and proof writing evaluations. Gemini Deep Think achieved gold-level IMO performance, significantly outperforming other models on IMO-Bench.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01846
• PDF: https://arxiv.org/pdf/2511.01846
• Project Page: https://imobench.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
✨ Title: UniREditBench: A Unified Reasoning-based Image Editing Benchmark
📝 Summary:
UniREditBench is a new benchmark for reasoning-based image editing. It covers diverse scenarios including multi-object interactions and game-worlds, using multimodal evaluation to assess generative models. This helps improve their performance on complex editing tasks.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01295
• PDF: https://arxiv.org/pdf/2511.01295
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
UniREditBench is a new benchmark for reasoning-based image editing. It covers diverse scenarios including multi-object interactions and game-worlds, using multimodal evaluation to assess generative models. This helps improve their performance on complex editing tasks.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01295
• PDF: https://arxiv.org/pdf/2511.01295
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: LongCat-Flash-Omni Technical Report
📝 Summary:
LongCat-Flash-Omni is a 560B parameter open-source omni-modal model excelling at low-latency real-time audio-visual interaction. It employs a progressive training strategy and achieves state-of-the-art performance across diverse multimodal benchmarks.
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00279
• PDF: https://arxiv.org/pdf/2511.00279
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
LongCat-Flash-Omni is a 560B parameter open-source omni-modal model excelling at low-latency real-time audio-visual interaction. It employs a progressive training strategy and achieves state-of-the-art performance across diverse multimodal benchmarks.
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00279
• PDF: https://arxiv.org/pdf/2511.00279
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
📝 Summary:
TIR-Bench introduces a comprehensive benchmark for evaluating agentic thinking-with-images in AI. It features 13 tasks requiring novel tool use for image processing. The benchmark is universally challenging, demanding genuine thinking-with-images capabilities.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01833
• PDF: https://arxiv.org/pdf/2511.01833
• Github: https://github.com/agents-x-project/TIR-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
TIR-Bench introduces a comprehensive benchmark for evaluating agentic thinking-with-images in AI. It features 13 tasks requiring novel tool use for image processing. The benchmark is universally challenging, demanding genuine thinking-with-images capabilities.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01833
• PDF: https://arxiv.org/pdf/2511.01833
• Github: https://github.com/agents-x-project/TIR-Bench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
📝 Summary:
This paper introduces Unified Diffusion VLA UD-VLA, a vision-language-action model that jointly optimizes image generation and action prediction. It uses a Joint Discrete Denoising Diffusion Process JD3P for intrinsic synergy across modalities. UD-VLA achieves state-of-the-art results on multiple...
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01718
• PDF: https://arxiv.org/pdf/2511.01718
• Project Page: https://irpn-eai.github.io/UD-VLA.github.io/
• Github: https://github.com/OpenHelix-Team/UD-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This paper introduces Unified Diffusion VLA UD-VLA, a vision-language-action model that jointly optimizes image generation and action prediction. It uses a Joint Discrete Denoising Diffusion Process JD3P for intrinsic synergy across modalities. UD-VLA achieves state-of-the-art results on multiple...
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01718
• PDF: https://arxiv.org/pdf/2511.01718
• Project Page: https://irpn-eai.github.io/UD-VLA.github.io/
• Github: https://github.com/OpenHelix-Team/UD-VLA
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: The Underappreciated Power of Vision Models for Graph Structural Understanding
📝 Summary:
Vision models show surprising power for graph understanding, matching GNNs on benchmarks and outperforming them on global structural perception. Our new GraphAbstract benchmark reveals vision models excel at holistic graph properties and scale-invariant reasoning, suggesting their use for graph f...
🔹 Publication Date: Published on Oct 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24788
• PDF: https://arxiv.org/pdf/2510.24788
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Vision models show surprising power for graph understanding, matching GNNs on benchmarks and outperforming them on global structural perception. Our new GraphAbstract benchmark reveals vision models excel at holistic graph properties and scale-invariant reasoning, suggesting their use for graph f...
🔹 Publication Date: Published on Oct 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24788
• PDF: https://arxiv.org/pdf/2510.24788
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
📝 Summary:
ROVER is a new benchmark evaluating reciprocal cross-modal reasoning in unified multimodal models. It tests how models use one modality to guide or verify outputs in another, in both verbal and visual generation tasks. Experiments show cross-modal reasoning is vital for visual generation, but mod...
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01163
• PDF: https://arxiv.org/pdf/2511.01163
• Github: https://roverbench.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
ROVER is a new benchmark evaluating reciprocal cross-modal reasoning in unified multimodal models. It tests how models use one modality to guide or verify outputs in another, in both verbal and visual generation tasks. Experiments show cross-modal reasoning is vital for visual generation, but mod...
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01163
• PDF: https://arxiv.org/pdf/2511.01163
• Github: https://roverbench.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Trove: A Flexible Toolkit for Dense Retrieval
📝 Summary:
Trove is an open-source toolkit for dense retrieval that simplifies research. It offers efficient on-the-fly data management, reducing memory use and allowing flexible dataset experiments. Trove is highly customizable and provides a unified, scalable pipeline for evaluation.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01857
• PDF: https://arxiv.org/pdf/2511.01857
• Project Page: https://ir-trove.dev/
• Github: https://github.com/BatsResearch/trove
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Trove is an open-source toolkit for dense retrieval that simplifies research. It offers efficient on-the-fly data management, reducing memory use and allowing flexible dataset experiments. Trove is highly customizable and provides a unified, scalable pipeline for evaluation.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01857
• PDF: https://arxiv.org/pdf/2511.01857
• Project Page: https://ir-trove.dev/
• Github: https://github.com/BatsResearch/trove
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Data-Efficient RLVR via Off-Policy Influence Guidance
📝 Summary:
This paper proposes CROPI a new method for efficient data selection in Reinforcement Learning with Verifiable Rewards RLVR. It uses off-policy influence estimation and sparse random projection to identify the most valuable data points. CROPI significantly accelerates training achieving 2.66x spee...
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26491
• PDF: https://arxiv.org/pdf/2510.26491
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This paper proposes CROPI a new method for efficient data selection in Reinforcement Learning with Verifiable Rewards RLVR. It uses off-policy influence estimation and sparse random projection to identify the most valuable data points. CROPI significantly accelerates training achieving 2.66x spee...
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26491
• PDF: https://arxiv.org/pdf/2510.26491
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment
📝 Summary:
This study introduces SurgVeo and the Surgical Plausibility Pyramid to evaluate video generation models in surgery. Experts found Veo-3 visually convincing but lacking in actual surgical understanding. This highlights a critical gap between visual mimicry and causal knowledge in surgical AI.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01775
• PDF: https://arxiv.org/pdf/2511.01775
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
This study introduces SurgVeo and the Surgical Plausibility Pyramid to evaluate video generation models in surgery. Experts found Veo-3 visually convincing but lacking in actual surgical understanding. This highlights a critical gap between visual mimicry and causal knowledge in surgical AI.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01775
• PDF: https://arxiv.org/pdf/2511.01775
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
📝 Summary:
UME-R1 introduces generative multimodal embeddings, unifying embedding tasks within a generative paradigm. Its two-stage MLLM training creates reasoning-driven embeddings that significantly outperform conventional discriminative methods, offering a foundation for new interpretability.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00405
• PDF: https://arxiv.org/pdf/2511.00405
• Github: https://github.com/DeepLearnXMU/UME-R1
🔹 Models citing this paper:
• https://huggingface.co/zhibinlan/UME-R1-2B
• https://huggingface.co/zhibinlan/UME-R1-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhibinlan/UME-sft-train
• https://huggingface.co/datasets/zhibinlan/UME-rl-train
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
UME-R1 introduces generative multimodal embeddings, unifying embedding tasks within a generative paradigm. Its two-stage MLLM training creates reasoning-driven embeddings that significantly outperform conventional discriminative methods, offering a foundation for new interpretability.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00405
• PDF: https://arxiv.org/pdf/2511.00405
• Github: https://github.com/DeepLearnXMU/UME-R1
🔹 Models citing this paper:
• https://huggingface.co/zhibinlan/UME-R1-2B
• https://huggingface.co/zhibinlan/UME-R1-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhibinlan/UME-sft-train
• https://huggingface.co/datasets/zhibinlan/UME-rl-train
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: World Simulation with Video Foundation Models for Physical AI
📝 Summary:
Cosmos-Predict2.5 is a new world foundation model for physical AI, unifying Text, Image, and Video2World generation with enhanced quality and control for robotics. It works with Cosmos-Transfer2.5 for Sim2Real translation. Both are open-source to accelerate embodied intelligence research.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00062
• PDF: https://arxiv.org/pdf/2511.00062
• Github: https://github.com/nvidia-cosmos/cosmos-transfer2.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Cosmos-Predict2.5 is a new world foundation model for physical AI, unifying Text, Image, and Video2World generation with enhanced quality and control for robotics. It works with Cosmos-Transfer2.5 for Sim2Real translation. Both are open-source to accelerate embodied intelligence research.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00062
• PDF: https://arxiv.org/pdf/2511.00062
• Github: https://github.com/nvidia-cosmos/cosmos-transfer2.5
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
✨ Title: Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
📝 Summary:
Current VLMs struggle with visual measurement reading, especially indicator localization. We introduce MeasureBench, a new benchmark with real-world and synthetic images, and a data synthesis pipeline. VLMs show poor fine-grained spatial grounding, leading to significant numeric errors despite pl...
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26865
• PDF: https://arxiv.org/pdf/2510.26865
• Project Page: https://flageval-baai.github.io/MeasureBenchPage/
• Github: https://github.com/flageval-baai/MeasureBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/FlagEval/MeasureBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Current VLMs struggle with visual measurement reading, especially indicator localization. We introduce MeasureBench, a new benchmark with real-world and synthetic images, and a data synthesis pipeline. VLMs show poor fine-grained spatial grounding, leading to significant numeric errors despite pl...
🔹 Publication Date: Published on Oct 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26865
• PDF: https://arxiv.org/pdf/2510.26865
• Project Page: https://flageval-baai.github.io/MeasureBenchPage/
• Github: https://github.com/flageval-baai/MeasureBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/FlagEval/MeasureBench
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
📝 Summary:
UniLumos is a fast, unified image and video relighting framework. It uses RGB-space geometry feedback to ensure physically plausible results, unlike prior diffusion models. It achieves state-of-the-art quality with a 20x speedup.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01678
• PDF: https://arxiv.org/pdf/2511.01678
• Github: https://github.com/alibaba-damo-academy/Lumos-Custom
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
UniLumos is a fast, unified image and video relighting framework. It uses RGB-space geometry feedback to ensure physically plausible results, unlike prior diffusion models. It achieves state-of-the-art quality with a 20x speedup.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01678
• PDF: https://arxiv.org/pdf/2511.01678
• Github: https://github.com/alibaba-damo-academy/Lumos-Custom
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
📝 Summary:
Multimodal LLMs struggle with detailed 3D spatial reasoning and cross-view consistency. This paper introduces Viewpoint Learning with the Viewpoint-100K dataset and a two-stage fine-tuning strategy. Their method significantly activates MLLM spatial reasoning, improving performance on various tasks.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01618
• PDF: https://arxiv.org/pdf/2511.01618
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
Multimodal LLMs struggle with detailed 3D spatial reasoning and cross-view consistency. This paper introduces Viewpoint Learning with the Viewpoint-100K dataset and a two-stage fine-tuning strategy. Their method significantly activates MLLM spatial reasoning, improving performance on various tasks.
🔹 Publication Date: Published on Nov 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01618
• PDF: https://arxiv.org/pdf/2511.01618
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
✨ Title: ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use
📝 Summary:
ToolScope is an agentic framework for MLLMs that unifies global planning with local multimodal perception, using a specialized Perceive tool to manage visual context in long-horizon VQA tasks. It improves performance on VQA benchmarks by an average of 6.69%.
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27363
• PDF: https://arxiv.org/pdf/2510.27363
• Github: https://github.com/dengmengjie/ToolScope
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
📝 Summary:
ToolScope is an agentic framework for MLLMs that unifies global planning with local multimodal perception, using a specialized Perceive tool to manage visual context in long-horizon VQA tasks. It improves performance on VQA benchmarks by an average of 6.69%.
🔹 Publication Date: Published on Oct 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27363
• PDF: https://arxiv.org/pdf/2510.27363
• Github: https://github.com/dengmengjie/ToolScope
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT