ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.03K photos
230 videos
23 files
4.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔹 Title: Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12945
• PDF: https://arxiv.org/pdf/2508.12945
• Project Page: https://lumen-relight.github.io/
• Github: https://lumen-relight.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: Phi-Ground Tech Report: Advancing Perception in GUI Grounding

🔹 Publication Date: Published on Jul 31

🔹 Abstract: The Phi-Ground model family achieves state-of-the-art performance in GUI grounding for multimodal reasoning models, improving accuracy across various benchmarks. AI-generated summary With the development of multimodal reasoning models , Computer Use Agents (CUAs), akin to Jarvis from "Iron Man", are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision , indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the Phi-Ground model family , which achieves state-of-the-art performance across all five grounding benchmarks for models under 10B parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textbf{43.2} on ScreenSpot-pro and \textbf{27.2} on UI-Vision . We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: https://zhangmiaosen2000.github.io/Phi-Ground/{https://zhangmiaosen2000.github.io/Phi-Ground/}

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23779

• PDF: https://arxiv.org/pdf/2507.23779

• Project Page: https://zhangmiaosen2000.github.io/Phi-Ground/

• Github: https://github.com/zhangmiaosen2000/Phi-Ground

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs

🔹 Publication Date: Published on Aug 4

🔹 Abstract: TraceAlign is a framework that identifies and mitigates alignment drift in LLMs by tracing unsafe completions to their training sources and applying interventions to reduce drift while maintaining utility. AI-generated summary Large Language Models (LLMs) fine-tuned to align with human values often exhibit alignment drift , producing unsafe or policy-violating completions when exposed to adversarial prompts , decoding perturbations , or paraphrased jailbreaks. While prior work has behaviorally characterized alignment failure, little is known about the training-time belief sources underlying these failures. We introduce TraceAlign , a unified framework for tracing unsafe completions back to their root causes in the model's training corpus. Central to our approach is the Belief Conflict Index (BCI), which quantifies semantic inconsistency between generated spans and aligned policies, based on retrieved training documents using suffix-array matching . We propose three complementary interventions: (i) TraceShield , an inference-time safety filter that refuses completions with high-BCI spans, (ii) Contrastive Belief Deconfliction Loss , a contrastive fine-tuning objective penalizing high-BCI continuations during DPO, and (iii) Prov-Decode , a provenance-aware decoding strategy that vetoes beam expansions predicted to yield high-BCI spans. Together, these defenses reduce alignment drift by up to 85% on our curated Alignment Drift Benchmark (ADB) while preserving utility on standard tasks, with delta less than 0.2 and improved refusal quality. We further derive a theoretical upper bound on drift likelihood via suffix-array span statistics, linking memorization frequency and length to adversarial reactivation risk . TraceAlign thus provides the first scalable, traceable, and grounded toolkit for understanding and mitigating alignment failures at source. To encourage further exploration and development, we open-source our implementation at: https://anonymous.4open.science/r/ tracealign -2DA7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02063

• PDF: https://arxiv.org/pdf/2508.02063

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12782
• PDF: https://arxiv.org/pdf/2508.12782
• Github: https://github.com/stefanrer/HeroBench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11383
• PDF: https://arxiv.org/pdf/2508.11383
• Github: https://github.com/AIRI-Institute/when-punctuation-matters

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11379
• PDF: https://arxiv.org/pdf/2508.11379

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11252
• PDF: https://arxiv.org/pdf/2508.11252

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: Exploitation Is All You Need... for Exploration

🔹 Publication Date: Published on Aug 2

🔹 Abstract: Meta-reinforcement learning agents can exhibit exploratory behavior when trained with a greedy objective, provided the environment has recurring structure, the agent has memory, and long-horizon credit assignment is possible. AI-generated summary Ensuring sufficient exploration is a central challenge when training meta-reinforcement learning ( meta-RL ) agents to solve novel environments. Conventional solutions to the exploration-exploitation dilemma inject explicit incentives such as randomization, uncertainty bonuses, or intrinsic rewards to encourage exploration. In this work, we hypothesize that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior , provided three conditions are met: (1) Recurring Environmental Structure , where the environment features repeatable regularities that allow past experience to inform future choices; (2) Agent Memory, enabling the agent to retain and utilize historical interaction data; and (3) Long-Horizon Credit Assignment , where learning propagates returns over a time frame sufficient for the delayed benefits of exploration to inform current decisions. Through experiments in stochastic multi-armed bandits and temporally extended gridworlds , we observe that, when both structure and memory are present, a policy trained on a strictly greedy objective exhibits information-seeking exploratory behavior. We further demonstrate, through controlled ablations, that emergent exploration vanishes if either environmental structure or agent memory is absent (Conditions 1 & 2). Surprisingly, removing long-horizon credit assignment (Condition 3) does not always prevent emergent exploration-a result we attribute to the pseudo-Thompson Sampling effect. These findings suggest that, under the right prerequisites, exploration and exploitation need not be treated as orthogonal objectives but can emerge from a unified reward-maximization process.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01287

• PDF: https://arxiv.org/pdf/2508.01287

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Reinforcement Learning with Rubric Anchors

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12790
• PDF: https://arxiv.org/pdf/2508.12790

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12730
• PDF: https://arxiv.org/pdf/2508.12730
• Project Page: https://gnueaj.github.io/Machine-Unlearning-Comparator/
• Github: https://github.com/gnueaj/Machine-Unlearning-Comparator

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/jaeunglee/UnlearningComparator
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
Get top-tier market analysis: world events meet technical trading.

I'm Michael 👋. My team and I share our market insights daily on our Telegram channel. Over the past weekend, our strategies delivered up to +39% gains.

We will tell you everything on the channel, even for beginners.
Join the channel below! 👇
https://news.1rj.ru/str/+9mZure1faNRkOTE8
1🙏1
🔹 Title: RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13968
• PDF: https://arxiv.org/pdf/2508.13968
• Github: https://github.com/tianyiniu/RotBench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

🔹 Publication Date: Published on Aug 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09131
• PDF: https://arxiv.org/pdf/2508.09131

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

🔹 Publication Date: Published on Aug 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13167
• PDF: https://arxiv.org/pdf/2508.13167

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: OmniTry: Virtual Try-On Anything without Masks

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13632
• PDF: https://arxiv.org/pdf/2508.13632
• Project Page: https://omnitry.github.io/
• Github: https://omnitry.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13992
• PDF: https://arxiv.org/pdf/2508.13992

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Leveraging Large Language Models for Predictive Analysis of Human Misery

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12669
• PDF: https://arxiv.org/pdf/2508.12669

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

🔹 Publication Date: Published on Jul 31

🔹 Abstract: SWE-Debate, a competitive multi-agent framework, enhances issue resolution in software engineering by promoting diverse reasoning and achieving better issue localization and fix planning. AI-generated summary Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks . While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase . To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph . Then, it organizes a three-round debate among specialized agents , each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan . Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23348

• PDF: https://arxiv.org/pdf/2507.23348

• Github: https://github.com/YerbaPage/SWE-Debate

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: MultiRef: Controllable Image Generation with Multiple Visual References

🔹 Publication Date: Published on Aug 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06905
• PDF: https://arxiv.org/pdf/2508.06905
• Github: https://multiref.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Motion2Motion: Cross-topology Motion Transfer with Sparse Correspondence

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13139
• PDF: https://arxiv.org/pdf/2508.13139

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

🔹 Publication Date: Published on Jul 31

🔹 Abstract: RL-PLUS, a hybrid-policy optimization approach, enhances LLM reasoning capabilities by integrating Multiple Importance Sampling and Exploration-Based Advantage Function, outperforming RLVR on various benchmarks and resolving capability boundary collapse. AI-generated summary Reinforcement Learning with Verifiable Reward (RLVR) has significantly advanced the complex reasoning abilities of Large Language Models (LLMs) . However, it struggles to break through the inherent capability boundaries of the base LLM, due to its essentially on-policy strategy coupled with LLM's immense action space and sparse reward. Critically, RLVR can lead to the capability boundary collapse , narrowing the LLM's problem-solving scope. To address this problem, we propose RL-PLUS, a novel hybrid-policy optimization approach for LLMs that synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models. RL-PLUS integrates two core components, i.e., Multiple Importance Sampling to address distributional mismatch from external data, and Exploration-Based Advantage Function to guide the model towards high-value, unexplored reasoning paths. We provide both theoretical analysis and extensive experiments to demonstrate the superiority and generalizability of our approach. Compared with existing RLVR methods, RL-PLUS achieves 1) state-of-the-art performance on six math reasoning benchmarks ; 2) superior performance on six out-of-distribution reasoning tasks ; 3) consistent and significant gains across diverse model families, with average relative improvements up to 69.2\%. Moreover, the analysis of Pass@k curves indicates that RL-PLUS effectively resolves the capability boundary collapse problem.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00222

• PDF: https://arxiv.org/pdf/2508.00222

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT