ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.03K photos
230 videos
23 files
4.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔹 Title: Retrieval-augmented reasoning with lean language models

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11386
• PDF: https://arxiv.org/pdf/2508.11386

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: StrandDesigner: Towards Practical Strand Generation with Sketch Guidance

🔹 Publication Date: Published on Aug 3

🔹 Abstract: A sketch-based strand generation model using a learnable upsampling strategy and multi-scale adaptive conditioning mechanism outperforms existing methods in realism and precision for hair strand generation. AI-generated summary Realistic hair strand generation is crucial for applications like computer graphics and virtual reality. While diffusion models can generate hairstyles from text or images, these inputs lack precision and user-friendliness. Instead, we propose the first sketch-based strand generation model, which offers finer control while remaining user-friendly. Our framework tackles key challenges, such as modeling complex strand interactions and diverse sketch patterns, through two main innovations: a learnable strand upsampling strategy that encodes 3D strands into multi-scale latent spaces , and a multi-scale adaptive conditioning mechanism using a transformer with diffusion heads to ensure consistency across granularity levels. Experiments on several benchmark datasets show our method outperforms existing approaches in realism and precision. Qualitative results further confirm its effectiveness. Code will be released at [GitHub](https://github.com/fighting-Zhang/StrandDesigner).

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01650

• PDF: https://arxiv.org/pdf/2508.01650

• Github: https://github.com/fighting-Zhang/StrandDesigner

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

🔹 Publication Date: Published on Aug 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11987
• PDF: https://arxiv.org/pdf/2508.11987
• Project Page: https://futurex-ai.github.io/

🔹 Datasets citing this paper:
https://huggingface.co/datasets/futurex-ai/Futurex-Online
https://huggingface.co/datasets/futurex-ai/Futurex-Past

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13491
• PDF: https://arxiv.org/pdf/2508.13491

🔹 Datasets citing this paper:
https://huggingface.co/datasets/NextGenWhu/FinCDM-FinEval-KQA
https://huggingface.co/datasets/NextGenWhu/FinCDM-CPA-KQA

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14811
• PDF: https://arxiv.org/pdf/2508.14811
• Project Page: https://aim-uofa.github.io/Tinker/
• Github: https://github.com/aim-uofa/Tinker

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: RynnEC: Bringing MLLMs into Embodied World

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14160
• PDF: https://arxiv.org/pdf/2508.14160
• Github: https://github.com/alibaba-damo-academy/RynnEC

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Multimodal Referring Segmentation: A Survey

🔹 Publication Date: Published on Aug 1

🔹 Abstract: A survey of multimodal referring segmentation techniques, covering advancements in convolutional neural networks, transformers, and large language models for segmenting objects in images, videos, and 3D scenes based on text or audio instructions. AI-generated summary Multimodal referring segmentation aims to segment target objects in visual scenes, such as images, videos, and 3D scenes, based on referring expressions in text or audio format. This task plays a crucial role in practical applications requiring accurate object perception based on user instructions. Over the past decade, it has gained significant attention in the multimodal community, driven by advances in convolutional neural networks , transformers , and large language models , all of which have substantially improved multimodal perception capabilities. This paper provides a comprehensive survey of multimodal referring segmentation . We begin by introducing this field's background, including problem definitions and commonly used datasets. Next, we summarize a unified meta architecture for referring segmentation and review representative methods across three primary visual scenes, including images, videos, and 3D scenes. We further discuss Generalized Referring Expression (GREx) methods to address the challenges of real-world complexity, along with related tasks and practical applications. Extensive performance comparisons on standard benchmarks are also provided. We continually track related works at https://github.com/henghuiding/Awesome-Multimodal-Referring-Segmentation.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00265

• PDF: https://arxiv.org/pdf/2508.00265

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14896
• PDF: https://arxiv.org/pdf/2508.14896

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14444
• PDF: https://arxiv.org/pdf/2508.14444

🔹 Datasets citing this paper:
https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14460
• PDF: https://arxiv.org/pdf/2508.14460

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14187
• PDF: https://arxiv.org/pdf/2508.14187
• Project Page: https://ashiq24.github.io/local-scale-equivariance/
• Github: https://ashiq24.github.io/local-scale-equivariance/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10137
• PDF: https://arxiv.org/pdf/2508.10137

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction

🔹 Publication Date: Published on Aug 5

🔹 Abstract: CoTox, a framework integrating LLMs with chain-of-thought reasoning, enhances multi-toxicity prediction by incorporating chemical structure data, biological pathways, and gene ontology terms, improving interpretability and predictive performance in drug development. AI-generated summary Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models ( LLM s) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction . CoTox combines chemical structure data , biological pathways , and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o , we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLM s to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names , which are easier for LLM s to understand than SMILES , enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses , as shown in case study. This result highlights the potential of LLM -based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03159

• PDF: https://arxiv.org/pdf/2508.03159

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14111
• PDF: https://arxiv.org/pdf/2508.14111

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13745
• PDF: https://arxiv.org/pdf/2508.13745

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13680
• PDF: https://arxiv.org/pdf/2508.13680
• Project Page: https://vi-exam.github.io
• Github: https://vi-exam.github.io

🔹 Datasets citing this paper:
https://huggingface.co/datasets/anvo25/viexam

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11408
• PDF: https://arxiv.org/pdf/2508.11408
• Github: https://github.com/modelscope/Trinity-RFT

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14704
• PDF: https://arxiv.org/pdf/2508.14704
• Project Page: https://mcp-universe.github.io/
• Github: https://mcp-universe.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14568
• PDF: https://arxiv.org/pdf/2508.14568
• Github: https://github.com/KULeuven-COSIC/leuvenshtein

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: VeriGUI: Verifiable Long-Chain GUI Dataset

🔹 Publication Date: Published on Aug 6

🔹 Abstract: VeriGUI is a novel dataset for evaluating GUI agents in long-horizon tasks, emphasizing long-chain complexity and subtask-level verifiability. AI-generated summary Recent studies have delved into constructing autonomous agents capable of performing complex Graphical User Interface (GUI) -based computer tasks, with the potential to revolutionize human-computer interaction. Despite encouraging results, existing efforts mainly focus on short-term interactions and rely on outcome-only verification, thereby limiting their scalability in real-world GUI applications that demand long-horizon task decomposition and execution. In this work, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designed to facilitate the development and evaluation of generalist GUI agents operating in realistic computer environments. Our dataset emphasizes two critical dimensions: (1) long-chain complexity , with tasks decomposed into a sequence of interdependent subtasks spanning hundreds of steps, explicitly designed to allow any subtask to serve as a valid starting point; and (2) subtask-level verifiability, which enables diverse exploration strategies within each subtask, while ensuring that each subtask-level goal remains verifiable and consistent. The dataset consists of GUI task trajectories across both desktop and web , annotated by human experts . Extensive experiments on VeriGUI using various agents with different foundation models reveal significant performance gaps in handling long-horizon tasks , highlighting the need for more robust planning and decision-making capabilities in GUI agents .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04026

• PDF: https://arxiv.org/pdf/2508.04026

• Github: https://github.com/VeriGUI-Team/VeriGUI

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
3
🔹 Title: SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

🔹 Publication Date: Published on Aug 6

🔹 Abstract: SEAgent, an agentic self-evolving framework, enables computer-use agents to autonomously master novel software through experiential learning and a curriculum of tasks, achieving superior performance compared to existing methods. AI-generated summary Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent , an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning , where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning , comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World . Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04700

• PDF: https://arxiv.org/pdf/2508.04700

• Project Page: https://github.com/SunzeY/SEAgent

• Github: https://github.com/SunzeY/SEAgent

🔹 Datasets citing this paper:
https://huggingface.co/datasets/Zery/WSM-7B-AgentRewardBench

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2