ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.03K photos
230 videos
23 files
4.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔹 Title: Visual Autoregressive Modeling for Instruction-Guided Image Editing

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15772
• PDF: https://arxiv.org/pdf/2508.15772
• Project Page: https://huggingface.co/HiDream-ai/VAREdit
• Github: https://github.com/HiDream-ai/VAREdit

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-1024
https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-512
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15752
• PDF: https://arxiv.org/pdf/2508.15752

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14892
• PDF: https://arxiv.org/pdf/2508.14892
• Github: https://hustvl.github.io/Snap-Snap/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15641
• PDF: https://arxiv.org/pdf/2508.15641

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15418
• PDF: https://arxiv.org/pdf/2508.15418
• Github: https://github.com/EIT-NLP/LLaSO

🔹 Datasets citing this paper:
https://huggingface.co/datasets/YirongSun/LLaSO-Instruct

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15202
• PDF: https://arxiv.org/pdf/2508.15202
• Project Page: https://github.com/aliyun/qwen-dianjin

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: INTIMA: A Benchmark for Human-AI Companionship Behavior

🔹 Publication Date: Published on Aug 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09998
• PDF: https://arxiv.org/pdf/2508.09998

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
3👍1
🔹 Title: Investigating Hallucination in Conversations for Low Resource Languages

🔹 Publication Date: Published on Jul 30

🔹 Abstract: LLMs generate fewer hallucinations in Mandarin compared to Hindi and Farsi across multiple models. AI-generated summary Large Language Models ( LLMs ) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as ' hallucination '. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs . While much research has focused on hallucination s in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5 , GPT-4o , Llama-3.1 , Gemma-2.0 , DeepSeek-R1 and Qwen-3 . We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucination s in Hindi and Farsi.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22720

• PDF: https://arxiv.org/pdf/2507.22720

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

🔹 Publication Date: Published on Aug 4

🔹 Abstract: A modular training framework accelerates the development of omni-modal LLMs through efficient 3D parallelism and flexible configuration. AI-generated summary Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic , incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. % We present \veomni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. \veomni introduces model-centric distributed recipes that decouples communication from computation , enabling efficient 3D parallelism on omni-modal LLMs. \veomni also features a flexible configuration interface supporting seamless integration of new modalities with minimal code change. % Using \veomni, a omni-modal mixture-of-experts (MoE) model with 30B parameters can be trained with over 2,800 tokens/sec/GPU throughput and scale to 160K context lengths via 3D parallelism on 128 GPUs, showcasing its superior efficiency and scalability for training large omni-modal LLMs.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02317

• PDF: https://arxiv.org/pdf/2508.02317

• Github: https://github.com/ByteDance-Seed/VeOmni

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔥1
🔹 Title: TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15881
• PDF: https://arxiv.org/pdf/2508.15881

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions

🔹 Publication Date: Published on Aug 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16402
• PDF: https://arxiv.org/pdf/2508.16402

🔹 Datasets citing this paper:
https://huggingface.co/datasets/m-a-p/AetherCode

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10390
• PDF: https://arxiv.org/pdf/2508.10390
• Github: https://github.com/AlienZhang1996/DH-CoT

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

🔹 Publication Date: Published on Aug 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16153
• PDF: https://arxiv.org/pdf/2508.16153
• Github: https://github.com/Agent-on-the-Fly/AgentFly

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Do What? Teaching Vision-Language-Action Models to Reject the Impossible

🔹 Publication Date: Published on Aug 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16292
• PDF: https://arxiv.org/pdf/2508.16292

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

🔹 Publication Date: Published on Aug 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15746
• PDF: https://arxiv.org/pdf/2508.15746
• Github: https://github.com/MAGIC-AI4Med/Deep-DxSearch

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

🔹 Publication Date: Published on Aug 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16279
• PDF: https://arxiv.org/pdf/2508.16279
• Github: https://github.com/agentscope-ai/agentscope

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

🔹 Publication Date: Published on Aug 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.16072
• PDF: https://arxiv.org/pdf/2508.16072
• Github: https://github.com/leroy9472/InMind

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: CRISP: Persistent Concept Unlearning via Sparse Autoencoders

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13650
• PDF: https://arxiv.org/pdf/2508.13650

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14029
• PDF: https://arxiv.org/pdf/2508.14029

🔹 Datasets citing this paper:
https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

🔹 Publication Date: Published on Aug 2

🔹 Abstract: CoT reasoning in LLMs is found to be limited by the distribution discrepancy between training and test data, suggesting it is not a robust form of reasoning. AI-generated summary Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more superficial than it appears, motivating us to explore further. In this paper, we study CoT reasoning via a data distribution lens and investigate if CoT reasoning reflects a structured inductive bias learned from in-distribution data, allowing the model to conditionally generate reasoning paths that approximate those seen during training. Thus, its effectiveness is fundamentally bounded by the degree of distribution discrepancy between the training data and the test queries. With this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To investigate each dimension, we design DataAlchemy , an isolated and controlled environment to train LLMs from scratch and systematically probe them under various distribution conditions. Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01191

• PDF: https://arxiv.org/pdf/2508.01191

• Github: https://github.com/ChengshuaiZhao0/DataAlchemy

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: Selective Contrastive Learning for Weakly Supervised Affordance Grounding

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07877
• PDF: https://arxiv.org/pdf/2508.07877
• Github: https://github.com/hynnsk/SelectiveCL

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1