ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.03K photos
230 videos
23 files
4.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🔹 Title: PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11116
• PDF: https://arxiv.org/pdf/2508.11116
• Github: https://github.com/Li-Z-Q/PaperRegister

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11203
• PDF: https://arxiv.org/pdf/2508.11203
• Github: https://kwanyun.github.io/stylemm_page/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation

🔹 Publication Date: Published on Aug 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06429
• PDF: https://arxiv.org/pdf/2508.06429
• Github: https://github.com/GuidoManni/SPARSE

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
4
🔹 Title: Controlling Multimodal LLMs via Reward-guided Decoding

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11616
• PDF: https://arxiv.org/pdf/2508.11616

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11255
• PDF: https://arxiv.org/pdf/2508.11255
• Project Page: https://fantasy-amap.github.io/fantasy-talking2/
• Github: https://fantasy-amap.github.io/fantasy-talking2/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: TexVerse: A Universe of 3D Objects with High-Resolution Textures

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10868
• PDF: https://arxiv.org/pdf/2508.10868
• Github: https://github.com/yiboz2001/TexVerse

🔹 Datasets citing this paper:
https://huggingface.co/datasets/YiboZhang2001/TexVerse

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
3
🔹 Title: X-Node: Self-Explanation is All We Need

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10461
• PDF: https://arxiv.org/pdf/2508.10461
• Github: https://github.com/basiralab/X-Node

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10894
• PDF: https://arxiv.org/pdf/2508.10894
• Github: https://github.com/IGNF/MAESTRO

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: SSRL: Self-Search Reinforcement Learning

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10874
• PDF: https://arxiv.org/pdf/2508.10874
• Project Page: https://huggingface.co/collections/TsinghuaC3I/ssrl-6899957a64d4a31f7f43bc88
• Github: https://github.com/TsinghuaC3I/SSRL

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: DINOv3

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://huggingface.co/collections/srimadhav/project-ideas-6896680486c631bc7d6cedd6
• Github: https://github.com/facebookresearch/dinov3

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/atalaydenknalbant/DINOv3
https://huggingface.co/spaces/merve/dinov3-viz
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.10395
• PDF: https://arxiv.org/pdf/2508.10395

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
3
🔹 Title: DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework

🔹 Publication Date: Published on Aug 4

🔹 Abstract: DreamVVT, a two-stage framework using Diffusion Transformers and LoRA adapters, enhances video virtual try-on by leveraging unpaired human-centric data and pretrained models to preserve garment details and temporal consistency. AI-generated summary Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers ( DiTs ), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model ( VLM ), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance denoscriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters . This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02807

• PDF: https://arxiv.org/pdf/2508.02807

• Github: https://virtu-lab.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10975
• PDF: https://arxiv.org/pdf/2508.10975

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes

🔹 Publication Date: Published on Aug 7

🔹 Abstract: MOSEv2, a more challenging dataset, highlights the limitations of current VOS methods in real-world scenarios with increased complexity and diverse challenges. AI-generated summary Video object segmentation ( VOS ) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F ) on existing benchmarks such as DAVIS and YouTube-VOS , these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation ( MOSEv1 ) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1 , we present MOSEv2 , a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity , including more frequent object disappearance and reappearance, severe occlusions and crowding , smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects , non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge , etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2 . We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05630

• PDF: https://arxiv.org/pdf/2508.05630

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Ovis2.5 Technical Report

🔹 Publication Date: Published on Aug 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11737
• PDF: https://arxiv.org/pdf/2508.11737
• Github: https://github.com/AIDC-AI/Ovis

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/AIDC-AI/Ovis2.5-9B
https://huggingface.co/spaces/AIDC-AI/Ovis2.5-2B
https://huggingface.co/spaces/Agung1453/Ovis2.5-9B
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

🔹 Publication Date: Published on Aug 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10419
• PDF: https://arxiv.org/pdf/2508.10419
• Github: https://github.com/EternityJune25/ComoRAG

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: 4DNeX: Feed-Forward 4D Generative Modeling Made Easy

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13154
• PDF: https://arxiv.org/pdf/2508.13154
• Project Page: https://4dnex.github.io/
• Github: https://github.com/3DTopia/4DNeX

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
🔹 Title: Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09834
• PDF: https://arxiv.org/pdf/2508.09834
• Project Page: https://github.com/weigao266/Awesome-Efficient-Arch
• Github: https://github.com/weigao266/Awesome-Efficient-Arch

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13142
• PDF: https://arxiv.org/pdf/2508.13142

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

🔹 Publication Date: Published on Aug 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13009
• PDF: https://arxiv.org/pdf/2508.13009
• Project Page: https://matrix-game-v2.github.io/
• Github: https://github.com/SkyworkAI/Matrix-Game/tree/main/Matrix-Game-2

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

🔹 Publication Date: Published on Aug 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.12466
• PDF: https://arxiv.org/pdf/2508.12466
• Project Page: https://inverse-llava.github.io
• Github: https://inverse-llava.github.io

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2