🔹 Title: XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.10395
• PDF: https://arxiv.org/pdf/2508.10395
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.10395
• PDF: https://arxiv.org/pdf/2508.10395
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤3
🔹 Title: DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework
🔹 Publication Date: Published on Aug 4
🔹 Abstract: DreamVVT, a two-stage framework using Diffusion Transformers and LoRA adapters, enhances video virtual try-on by leveraging unpaired human-centric data and pretrained models to preserve garment details and temporal consistency. AI-generated summary Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers ( DiTs ), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model ( VLM ), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance denoscriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters . This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02807
• PDF: https://arxiv.org/pdf/2508.02807
• Github: https://virtu-lab.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Abstract: DreamVVT, a two-stage framework using Diffusion Transformers and LoRA adapters, enhances video virtual try-on by leveraging unpaired human-centric data and pretrained models to preserve garment details and temporal consistency. AI-generated summary Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers ( DiTs ), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model ( VLM ), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance denoscriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters . This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02807
• PDF: https://arxiv.org/pdf/2508.02807
• Github: https://virtu-lab.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10975
• PDF: https://arxiv.org/pdf/2508.10975
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10975
• PDF: https://arxiv.org/pdf/2508.10975
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
🔹 Publication Date: Published on Aug 7
🔹 Abstract: MOSEv2, a more challenging dataset, highlights the limitations of current VOS methods in real-world scenarios with increased complexity and diverse challenges. AI-generated summary Video object segmentation ( VOS ) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F ) on existing benchmarks such as DAVIS and YouTube-VOS , these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation ( MOSEv1 ) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1 , we present MOSEv2 , a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity , including more frequent object disappearance and reappearance, severe occlusions and crowding , smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects , non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge , etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2 . We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05630
• PDF: https://arxiv.org/pdf/2508.05630
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 7
🔹 Abstract: MOSEv2, a more challenging dataset, highlights the limitations of current VOS methods in real-world scenarios with increased complexity and diverse challenges. AI-generated summary Video object segmentation ( VOS ) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F ) on existing benchmarks such as DAVIS and YouTube-VOS , these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation ( MOSEv1 ) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1 , we present MOSEv2 , a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity , including more frequent object disappearance and reappearance, severe occlusions and crowding , smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects , non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge , etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2 . We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05630
• PDF: https://arxiv.org/pdf/2508.05630
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Ovis2.5 Technical Report
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11737
• PDF: https://arxiv.org/pdf/2508.11737
• Github: https://github.com/AIDC-AI/Ovis
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-9B
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-2B
• https://huggingface.co/spaces/Agung1453/Ovis2.5-9B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11737
• PDF: https://arxiv.org/pdf/2508.11737
• Github: https://github.com/AIDC-AI/Ovis
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-9B
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-2B
• https://huggingface.co/spaces/Agung1453/Ovis2.5-9B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤1
🔹 Title: ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10419
• PDF: https://arxiv.org/pdf/2508.10419
• Github: https://github.com/EternityJune25/ComoRAG
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10419
• PDF: https://arxiv.org/pdf/2508.10419
• Github: https://github.com/EternityJune25/ComoRAG
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: 4DNeX: Feed-Forward 4D Generative Modeling Made Easy
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13154
• PDF: https://arxiv.org/pdf/2508.13154
• Project Page: https://4dnex.github.io/
• Github: https://github.com/3DTopia/4DNeX
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13154
• PDF: https://arxiv.org/pdf/2508.13154
• Project Page: https://4dnex.github.io/
• Github: https://github.com/3DTopia/4DNeX
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09834
• PDF: https://arxiv.org/pdf/2508.09834
• Project Page: https://github.com/weigao266/Awesome-Efficient-Arch
• Github: https://github.com/weigao266/Awesome-Efficient-Arch
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09834
• PDF: https://arxiv.org/pdf/2508.09834
• Project Page: https://github.com/weigao266/Awesome-Efficient-Arch
• Github: https://github.com/weigao266/Awesome-Efficient-Arch
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13142
• PDF: https://arxiv.org/pdf/2508.13142
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13142
• PDF: https://arxiv.org/pdf/2508.13142
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13009
• PDF: https://arxiv.org/pdf/2508.13009
• Project Page: https://matrix-game-v2.github.io/
• Github: https://github.com/SkyworkAI/Matrix-Game/tree/main/Matrix-Game-2
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13009
• PDF: https://arxiv.org/pdf/2508.13009
• Project Page: https://matrix-game-v2.github.io/
• Github: https://github.com/SkyworkAI/Matrix-Game/tree/main/Matrix-Game-2
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping
🔹 Publication Date: Published on Aug 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12466
• PDF: https://arxiv.org/pdf/2508.12466
• Project Page: https://inverse-llava.github.io
• Github: https://inverse-llava.github.io
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12466
• PDF: https://arxiv.org/pdf/2508.12466
• Project Page: https://inverse-llava.github.io
• Github: https://inverse-llava.github.io
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12880
• PDF: https://arxiv.org/pdf/2508.12880
• Project Page: https://s2guidance.github.io/
• Github: https://github.com/AMAP-ML/S2-Guidance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12880
• PDF: https://arxiv.org/pdf/2508.12880
• Project Page: https://s2guidance.github.io/
• Github: https://github.com/AMAP-ML/S2-Guidance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Precise Action-to-Video Generation Through Visual Action Prompts
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13104
• PDF: https://arxiv.org/pdf/2508.13104
• Github: https://zju3dv.github.io/VAP/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13104
• PDF: https://arxiv.org/pdf/2508.13104
• Github: https://zju3dv.github.io/VAP/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12945
• PDF: https://arxiv.org/pdf/2508.12945
• Project Page: https://lumen-relight.github.io/
• Github: https://lumen-relight.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12945
• PDF: https://arxiv.org/pdf/2508.12945
• Project Page: https://lumen-relight.github.io/
• Github: https://lumen-relight.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: Phi-Ground Tech Report: Advancing Perception in GUI Grounding
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The Phi-Ground model family achieves state-of-the-art performance in GUI grounding for multimodal reasoning models, improving accuracy across various benchmarks. AI-generated summary With the development of multimodal reasoning models , Computer Use Agents (CUAs), akin to Jarvis from "Iron Man", are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision , indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the Phi-Ground model family , which achieves state-of-the-art performance across all five grounding benchmarks for models under 10B parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textbf{43.2} on ScreenSpot-pro and \textbf{27.2} on UI-Vision . We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: https://zhangmiaosen2000.github.io/Phi-Ground/{https://zhangmiaosen2000.github.io/Phi-Ground/}
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23779
• PDF: https://arxiv.org/pdf/2507.23779
• Project Page: https://zhangmiaosen2000.github.io/Phi-Ground/
• Github: https://github.com/zhangmiaosen2000/Phi-Ground
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Jul 31
🔹 Abstract: The Phi-Ground model family achieves state-of-the-art performance in GUI grounding for multimodal reasoning models, improving accuracy across various benchmarks. AI-generated summary With the development of multimodal reasoning models , Computer Use Agents (CUAs), akin to Jarvis from "Iron Man", are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision , indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the Phi-Ground model family , which achieves state-of-the-art performance across all five grounding benchmarks for models under 10B parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textbf{43.2} on ScreenSpot-pro and \textbf{27.2} on UI-Vision . We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: https://zhangmiaosen2000.github.io/Phi-Ground/{https://zhangmiaosen2000.github.io/Phi-Ground/}
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23779
• PDF: https://arxiv.org/pdf/2507.23779
• Project Page: https://zhangmiaosen2000.github.io/Phi-Ground/
• Github: https://github.com/zhangmiaosen2000/Phi-Ground
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: TRACEALIGN -- Tracing the Drift: Attributing Alignment Failures to Training-Time Belief Sources in LLMs
🔹 Publication Date: Published on Aug 4
🔹 Abstract: TraceAlign is a framework that identifies and mitigates alignment drift in LLMs by tracing unsafe completions to their training sources and applying interventions to reduce drift while maintaining utility. AI-generated summary Large Language Models (LLMs) fine-tuned to align with human values often exhibit alignment drift , producing unsafe or policy-violating completions when exposed to adversarial prompts , decoding perturbations , or paraphrased jailbreaks. While prior work has behaviorally characterized alignment failure, little is known about the training-time belief sources underlying these failures. We introduce TraceAlign , a unified framework for tracing unsafe completions back to their root causes in the model's training corpus. Central to our approach is the Belief Conflict Index (BCI), which quantifies semantic inconsistency between generated spans and aligned policies, based on retrieved training documents using suffix-array matching . We propose three complementary interventions: (i) TraceShield , an inference-time safety filter that refuses completions with high-BCI spans, (ii) Contrastive Belief Deconfliction Loss , a contrastive fine-tuning objective penalizing high-BCI continuations during DPO, and (iii) Prov-Decode , a provenance-aware decoding strategy that vetoes beam expansions predicted to yield high-BCI spans. Together, these defenses reduce alignment drift by up to 85% on our curated Alignment Drift Benchmark (ADB) while preserving utility on standard tasks, with delta less than 0.2 and improved refusal quality. We further derive a theoretical upper bound on drift likelihood via suffix-array span statistics, linking memorization frequency and length to adversarial reactivation risk . TraceAlign thus provides the first scalable, traceable, and grounded toolkit for understanding and mitigating alignment failures at source. To encourage further exploration and development, we open-source our implementation at: https://anonymous.4open.science/r/ tracealign -2DA7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02063
• PDF: https://arxiv.org/pdf/2508.02063
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Abstract: TraceAlign is a framework that identifies and mitigates alignment drift in LLMs by tracing unsafe completions to their training sources and applying interventions to reduce drift while maintaining utility. AI-generated summary Large Language Models (LLMs) fine-tuned to align with human values often exhibit alignment drift , producing unsafe or policy-violating completions when exposed to adversarial prompts , decoding perturbations , or paraphrased jailbreaks. While prior work has behaviorally characterized alignment failure, little is known about the training-time belief sources underlying these failures. We introduce TraceAlign , a unified framework for tracing unsafe completions back to their root causes in the model's training corpus. Central to our approach is the Belief Conflict Index (BCI), which quantifies semantic inconsistency between generated spans and aligned policies, based on retrieved training documents using suffix-array matching . We propose three complementary interventions: (i) TraceShield , an inference-time safety filter that refuses completions with high-BCI spans, (ii) Contrastive Belief Deconfliction Loss , a contrastive fine-tuning objective penalizing high-BCI continuations during DPO, and (iii) Prov-Decode , a provenance-aware decoding strategy that vetoes beam expansions predicted to yield high-BCI spans. Together, these defenses reduce alignment drift by up to 85% on our curated Alignment Drift Benchmark (ADB) while preserving utility on standard tasks, with delta less than 0.2 and improved refusal quality. We further derive a theoretical upper bound on drift likelihood via suffix-array span statistics, linking memorization frequency and length to adversarial reactivation risk . TraceAlign thus provides the first scalable, traceable, and grounded toolkit for understanding and mitigating alignment failures at source. To encourage further exploration and development, we open-source our implementation at: https://anonymous.4open.science/r/ tracealign -2DA7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02063
• PDF: https://arxiv.org/pdf/2508.02063
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12782
• PDF: https://arxiv.org/pdf/2508.12782
• Github: https://github.com/stefanrer/HeroBench
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12782
• PDF: https://arxiv.org/pdf/2508.12782
• Github: https://github.com/stefanrer/HeroBench
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11383
• PDF: https://arxiv.org/pdf/2508.11383
• Github: https://github.com/AIRI-Institute/when-punctuation-matters
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11383
• PDF: https://arxiv.org/pdf/2508.11383
• Github: https://github.com/AIRI-Institute/when-punctuation-matters
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11379
• PDF: https://arxiv.org/pdf/2508.11379
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11379
• PDF: https://arxiv.org/pdf/2508.11379
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Title: Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11252
• PDF: https://arxiv.org/pdf/2508.11252
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11252
• PDF: https://arxiv.org/pdf/2508.11252
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
❤2
🔹 Title: Exploitation Is All You Need... for Exploration
🔹 Publication Date: Published on Aug 2
🔹 Abstract: Meta-reinforcement learning agents can exhibit exploratory behavior when trained with a greedy objective, provided the environment has recurring structure, the agent has memory, and long-horizon credit assignment is possible. AI-generated summary Ensuring sufficient exploration is a central challenge when training meta-reinforcement learning ( meta-RL ) agents to solve novel environments. Conventional solutions to the exploration-exploitation dilemma inject explicit incentives such as randomization, uncertainty bonuses, or intrinsic rewards to encourage exploration. In this work, we hypothesize that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior , provided three conditions are met: (1) Recurring Environmental Structure , where the environment features repeatable regularities that allow past experience to inform future choices; (2) Agent Memory, enabling the agent to retain and utilize historical interaction data; and (3) Long-Horizon Credit Assignment , where learning propagates returns over a time frame sufficient for the delayed benefits of exploration to inform current decisions. Through experiments in stochastic multi-armed bandits and temporally extended gridworlds , we observe that, when both structure and memory are present, a policy trained on a strictly greedy objective exhibits information-seeking exploratory behavior. We further demonstrate, through controlled ablations, that emergent exploration vanishes if either environmental structure or agent memory is absent (Conditions 1 & 2). Surprisingly, removing long-horizon credit assignment (Condition 3) does not always prevent emergent exploration-a result we attribute to the pseudo-Thompson Sampling effect. These findings suggest that, under the right prerequisites, exploration and exploitation need not be treated as orthogonal objectives but can emerge from a unified reward-maximization process.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01287
• PDF: https://arxiv.org/pdf/2508.01287
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
🔹 Publication Date: Published on Aug 2
🔹 Abstract: Meta-reinforcement learning agents can exhibit exploratory behavior when trained with a greedy objective, provided the environment has recurring structure, the agent has memory, and long-horizon credit assignment is possible. AI-generated summary Ensuring sufficient exploration is a central challenge when training meta-reinforcement learning ( meta-RL ) agents to solve novel environments. Conventional solutions to the exploration-exploitation dilemma inject explicit incentives such as randomization, uncertainty bonuses, or intrinsic rewards to encourage exploration. In this work, we hypothesize that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior , provided three conditions are met: (1) Recurring Environmental Structure , where the environment features repeatable regularities that allow past experience to inform future choices; (2) Agent Memory, enabling the agent to retain and utilize historical interaction data; and (3) Long-Horizon Credit Assignment , where learning propagates returns over a time frame sufficient for the delayed benefits of exploration to inform current decisions. Through experiments in stochastic multi-armed bandits and temporally extended gridworlds , we observe that, when both structure and memory are present, a policy trained on a strictly greedy objective exhibits information-seeking exploratory behavior. We further demonstrate, through controlled ablations, that emergent exploration vanishes if either environmental structure or agent memory is absent (Conditions 1 & 2). Surprisingly, removing long-horizon credit assignment (Condition 3) does not always prevent emergent exploration-a result we attribute to the pseudo-Thompson Sampling effect. These findings suggest that, under the right prerequisites, exploration and exploitation need not be treated as orthogonal objectives but can emerge from a unified reward-maximization process.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01287
• PDF: https://arxiv.org/pdf/2508.01287
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT