ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Experience-Guided Adaptation of Inference-Time Reasoning Strategies

📝 Summary:
Experience-Guided Reasoner EGuR dynamically generates and optimizes complete computational strategies at inference time using accumulated experience. It adapts LLM calls tools and control logic improving accuracy up to 14 percent and reducing costs by up to 111x.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11519
• PDF: https://arxiv.org/pdf/2511.11519

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #Reasoning #Optimization #MachineLearning
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

📝 Summary:
Tool-augmented LLMs exhibit Tool-Induced Myopia TIM, treating tool outputs as substitutes for true reasoning. This improves final answer accuracy but significantly degrades reasoning quality. A proposed framework realigns these models to use tools as assistive evidence, enhancing both accuracy an...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10899
• PDF: https://arxiv.org/pdf/2511.10899

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AIResearch #Reasoning #ToolAugmentation #AIHallucinations
1
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

📝 Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03108
• PDF: https://arxiv.org/pdf/2511.03108
• Github: https://github.com/roozbeh-yz/miniF2F_v2

Datasets citing this paper:
https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

📝 Summary:
A parallel multimodal diffusion framework, MMaDA-Parallel, enhances cross-modal alignment and semantic consistency in thinking-aware image synthesis by addressing error propagation issues in sequentia...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09611
• PDF: https://arxiv.org/pdf/2511.09611
• Project Page: https://tyfeld.github.io/mmadaparellel.github.io/
• Github: https://github.com/tyfeld/MMaDA-Parallel

🔹 Models citing this paper:
https://huggingface.co/tyfeld/MMaDA-Parallel-A
https://huggingface.co/tyfeld/MMaDA-Parallel-M

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #DiffusionModels #ImageSynthesis #LLM #AIResearch
Media is too big
VIEW IN TELEGRAM
UFO^3: Weaving the Digital Agent Galaxy

📝 Summary:
UFO^3 unifies diverse digital devices into a single orchestration fabric, enabling AI agents to collaborate seamlessly across platforms. It models tasks dynamically for asynchronous execution, achieving efficient, resilient, and accurate cross-device task orchestration with improved parallelism a...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11332
• PDF: https://arxiv.org/pdf/2511.11332
• Project Page: https://microsoft.github.io/UFO/
• Github: https://github.com/microsoft/UFO/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #TaskOrchestration #DistributedSystems #EdgeAI #MultiAgentSystems
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models

📝 Summary:
VLMs degrade under test-time domain shifts. Spectrum-Aware Test-Time Steering STS is a lightweight method that adapts VLM latent representations by steering them using textual embedding subspaces, without backpropagation. STS surpasses state-of-the-art, offering faster inference and less memory.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09809
• PDF: https://arxiv.org/pdf/2511.09809

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #ZeroShotGeneralization #DomainAdaptation #DeepLearning #AI
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

📝 Summary:
Uni-MoE 2.0-Omni is an open-source omnimodal large model improving multimodal understanding, reasoning, and generation. It uses dynamic MoE and progressive training to achieve state-of-the-art results across 85 benchmarks, outperforming leading models like Qwen2.5-Omni.

🔹 Publication Date: Published on Nov 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12609
• PDF: https://arxiv.org/pdf/2511.12609
• Project Page: https://idealistxy.github.io/Uni-MoE-v2.github.io/
• Github: https://github.com/HITsz-TMG/Uni-MoE

🔹 Models citing this paper:
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Omni
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Base
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Image

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#OmnimodalAI #LLMs #MixtureOfExperts #MultimodalLearning #AIResearch
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

📝 Summary:
GroupRank introduces a novel groupwise reranking paradigm addressing limitations of pointwise and listwise methods. It processes queries with document groups to assign comparative relevance scores, combining flexibility with global context. Trained via reinforcement learning and synthesized data,...

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11653
• PDF: https://arxiv.org/pdf/2511.11653

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#Reranking #ReinforcementLearning #InformationRetrieval #MachineLearning #DataScience
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

📝 Summary:
TiViBench is a new benchmark assessing image-to-video models reasoning across four dimensions and 24 tasks. Commercial models show stronger reasoning potential. VideoTPO, a test-time strategy, significantly enhances performance, advancing reasoning in video generation.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13704
• PDF: https://arxiv.org/pdf/2511.13704
• Project Page: https://haroldchen19.github.io/TiViBench-Page/
• Github: https://haroldchen19.github.io/TiViBench-Page/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AIBenchmark #ComputerVision #DeepLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything

Datasets citing this paper:
https://huggingface.co/datasets/Caoza/PhysX-Mobility

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

📝 Summary:
Part-X-MLLM is a 3D multimodal large language model that unifies diverse 3D tasks by generating structured programs from RGB point clouds and language prompts. It outputs part-level data and edit commands, enabling state-of-the-art 3D generation and editing through one interface.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13647
• PDF: https://arxiv.org/pdf/2511.13647
• Project Page: https://chunshi.wang/Part-X-MLLM/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3D #MLLM #GenerativeAI #ComputerVision #AIResearch
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

📝 Summary:
Live-SWE-agent is the first live software engineering agent that autonomously and continuously evolves itself on-the-fly during runtime. It starts with basic tools and refines its own implementation while solving problems. It achieves 75.4% on SWE-bench Verified and 45.8% on SWE-Bench Pro, outper...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13646
• PDF: https://arxiv.org/pdf/2511.13646

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SoftwareEngineering #AI #AutonomousAgents #SelfEvolvingAI #LiveSWEagent
WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

📝 Summary:
WebCoach introduces a self-evolving framework for web agents with persistent cross-session memory. It uses a WebCondenser, External Memory Store, and a Coach to learn from past experiences without retraining. This significantly improves task success and enables smaller models to match larger LLM ...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12997
• PDF: https://arxiv.org/pdf/2511.12997

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#WebAgents #AI #MachineLearning #LLM #MemoryAI
1
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

📝 Summary:
MiroThinker v1.0 is an open-source research agent introducing 'interactive scaling.' It trains models with reinforcement learning for deeper agent-environment interactions, performing up to 600 tool calls per task. This achieves state-of-the-art performance and establishes interaction depth as a ...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11793
• PDF: https://arxiv.org/pdf/2511.11793
• Project Page: https://dr.miromind.ai/
• Github: https://github.com/MiroMindAI/MiroThinker

🔹 Models citing this paper:
https://huggingface.co/miromind-ai/MiroThinker-v1.0-72B
https://huggingface.co/miromind-ai/MiroThinker-v1.0-8B
https://huggingface.co/miromind-ai/MiroThinker-v1.0-30B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MiroThinker #ResearchAgents #ReinforcementLearning #OpenSourceAI #LLM
1
P1: Mastering Physics Olympiads with Reinforcement Learning

📝 Summary:
P1 is a family of open-source physics reasoning models trained via reinforcement learning. P1-235B-A22B achieved Gold-medal performance at IPhO 2025 and won 12 other competitions. These models also show strong generalizability on other reasoning tasks.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13612
• PDF: https://arxiv.org/pdf/2511.13612
• Project Page: https://prime-rl.github.io/P1/
• Github: https://github.com/PRIME-RL/P1

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #Physics #AI #MachineLearning #OpenSource
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model

📝 Summary:
MicroVQA plus plus is a new high-quality microscopy VQA dataset built via a three-stage process. This includes HiCQA-Graph, a novel filtering method using NLI, CLIP, and MLLM signals. The dataset enables strong microscopy reasoning performance for MLLMs.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11407
• PDF: https://arxiv.org/pdf/2511.11407
• Github: https://github.com/ieellee/MicroVQA-PlusPlus

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MLLM #Microscopy #VQA #AIResearch #Dataset
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

📝 Summary:
SoCE is a novel model souping technique that boosts LLM performance. It uses non-uniform weighted averaging of expert models identified for specific benchmark categories, unlike uniform methods. This leads to state-of-the-art results and improved robustness.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13254
• PDF: https://arxiv.org/pdf/2511.13254

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #ModelSouping #MachineLearning #AI #StateOfTheArt