ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔹 Title: AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance

🔹 Publication Date: Published on Aug 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06944
• PDF: https://arxiv.org/pdf/2508.06944
• Github: https://github.com/TSYJ-He/AMFT

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

🔹 Publication Date: Published on Aug 3

🔹 Abstract: LiveMCPBench provides a comprehensive benchmark for evaluating LLM agents across a diverse set of real-world tasks in the MCP ecosystem, using a scalable evaluation pipeline and adaptive judging framework. AI-generated summary With the rapid development of Model Context Protocol ( MCP ), the number of MCP servers has surpassed 10,000. However, existing MCP benchmarks are limited to single-server settings with only a few tools, hindering effective evaluation of agent capabilities in large-scale, real-world scenarios. To address this limitation, we present LiveMCPBench , the first comprehensive benchmark comprising 95 real-world tasks grounded in the MCP ecosystem, designed to evaluate LLM agents at scale across diverse servers. To support a scalable and reproducible evaluation pipeline in large-scale MCP environments, we curate LiveMCPTool , a diverse and readily deployable collection of 70 MCP servers and 527 tools. Furthermore, we introduce LiveMCPEval , an LLM-as-a-Judge framework that enables automated and adaptive evaluation in dynamic, time-varying task environments, achieving 81% agreement with human reviewers . Finally, we propose the MCP Copilot Agent , a multi-step agent that routes tools for dynamic planning and executes tools for API interaction across the entire LiveMCPTool suite. Our evaluation covers 10 leading models, with the best-performing model (Claude-Sonnet-4) reaching a 78.95% success rate. However, we observe large performance variance across models, and several widely-used models perform poorly in LiveMCPBench 's complex, tool-rich environments. Overall, LiveMCPBench offers the first unified framework for benchmarking LLM agents in realistic, tool-rich, and dynamic MCP environments, laying a solid foundation for scalable and reproducible research on agent capabilities. Our code and data will be publicly available at https://icip-cas.github.io/ LiveMCPBench .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01780

• PDF: https://arxiv.org/pdf/2508.01780

🔹 Datasets citing this paper:
https://huggingface.co/datasets/ICIP/LiveMCPBench

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
3
🔹 Title: GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09667
• PDF: https://arxiv.org/pdf/2508.09667

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

🔹 Publication Date: Published on Aug 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06937
• PDF: https://arxiv.org/pdf/2508.06937
• Project Page: https://vaynexie.github.io/CannyEdit/
• Github: https://github.com/vaynexie/CannyEdit/tree/main

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

🔹 Publication Date: Published on Aug 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01522
• PDF: https://arxiv.org/pdf/2508.01522
• Project Page: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
• Github: https://github.com/Aerial-Manipulation-Lab/MARL_cooperative_aerial_manipulation_ext

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: ASM-UNet: Adaptive Scan Mamba Integrating Group Commonalities and Individual Variations for Fine-Grained Segmentation

🔹 Publication Date: Published on Aug 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07237
• PDF: https://arxiv.org/pdf/2508.07237
• Github: https://github.com/YqunYang/ASM-UNet

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
4
🔹 Title: Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09726
• PDF: https://arxiv.org/pdf/2508.09726

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: μ-Parametrization for Mixture of Experts

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09752
• PDF: https://arxiv.org/pdf/2508.09752

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering

🔹 Publication Date: Published on Aug 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07321
• PDF: https://arxiv.org/pdf/2508.07321

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09603
• PDF: https://arxiv.org/pdf/2508.09603

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09848
• PDF: https://arxiv.org/pdf/2508.09848
• Project Page: https://gorov.github.io/prelude
• Github: https://gorov.github.io/prelude/leaderboard.html

🔹 Datasets citing this paper:
https://huggingface.co/datasets/ttchungc/PRELUDE

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10860
• PDF: https://arxiv.org/pdf/2508.10860

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10576
• PDF: https://arxiv.org/pdf/2508.10576
• Project Page: https://digital-avatar.github.io/ai/HumanSense/
• Github: https://digital-avatar.github.io/ai/HumanSense/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: UI-Venus Technical Report: Building High-performance UI Agents with RFT

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10833
• PDF: https://arxiv.org/pdf/2508.10833
• Github: https://github.com/antgroup/UI-Venus

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10711
• PDF: https://arxiv.org/pdf/2508.10711
• Github: https://github.com/stepfun-ai/NextStep-1

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10751
• PDF: https://arxiv.org/pdf/2508.10751
• Project Page: https://github.com/RUCAIBox/Passk_Training
• Github: https://github.com/RUCAIBox/Passk_Training

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: AgroBench: Vision-Language Model Benchmark in Agriculture

🔹 Publication Date: Published on Jul 28

🔹 Abstract: AgroBench evaluates vision-language models across agricultural tasks, revealing areas for improvement in fine-grained identification, particularly weed identification, with expert-annotated categories. AI-generated summary Precise automated understanding of agricultural tasks such as disease identification is essential for sustainable crop production. Recent advances in vision-language models ( VLMs ) are expected to further expand the range of agricultural tasks by facilitating human-model interaction through easy, text-based communication. Here, we introduce AgroBench (Agronomist AI Benchmark), a benchmark for evaluating VLM models across seven agricultural topics, covering key areas in agricultural engineering and relevant to real-world farming. Unlike recent agricultural VLM benchmarks, AgroBench is annotated by expert agronomists. Our AgroBench covers a state-of-the-art range of categories, including 203 crop categories and 682 disease categories , to thoroughly evaluate VLM capabilities. In our evaluation on AgroBench , we reveal that VLMs have room for improvement in fine-grained identification tasks. Notably, in weed identification , most open-source VLMs perform close to random. With our wide range of topics and expert-annotated categories, we analyze the types of errors made by VLMs and suggest potential pathways for future VLM development. Our dataset and code are available at https://dahlian00.github.io/ AgroBench Page/ .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.20519

• PDF: https://arxiv.org/pdf/2507.20519

• Project Page: https://dahlian00.github.io/AgroBenchPage/

• Github: https://dahlian00.github.io/AgroBenchPage/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10881
• PDF: https://arxiv.org/pdf/2508.10881
• Project Page: https://lg-li.github.io/project/tooncomposer
• Github: https://github.com/TencentARC/ToonComposer

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: A Survey on Diffusion Language Models

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10875
• PDF: https://arxiv.org/pdf/2508.10875

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10637
• PDF: https://arxiv.org/pdf/2508.10637
• Github: https://github.com/ryan-caesar-ramos/visual-encoder-traces

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10893
• PDF: https://arxiv.org/pdf/2508.10893
• Project Page: https://nirvanalan.github.io/projects/stream3r
• Github: https://github.com/NIRVANALAN/STream3R

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT