ML Research Hub – Telegram
ML Research Hub
32.6K subscribers
3.83K photos
198 videos
23 files
4.11K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

📝 Summary:
mmGRPO, a multi-module extension of GRPO, enhances accuracy in modular AI systems by optimizing LM calls and prompts across various tasks. AI-generated summary Group Relative Policy Optimization ( GRP...

🔹 Publication Date: Published on Aug 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04660
• PDF: https://arxiv.org/pdf/2508.04660
• Project Page: https://dspy.ai
• Github: https://github.com/stanfordnlp/dspy

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Directional Textual Inversion for Personalized Text-to-Image Generation

📝 Summary:
Directional Textual Inversion DTI enhances text-to-image personalization by fixing learned token magnitudes and optimizing only their direction. This prevents norm inflation issues of standard Textual Inversion, improving prompt conditioning and enabling smooth interpolation. DTI offers better te...

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13672
• PDF: https://arxiv.org/pdf/2512.13672
• Project Page: https://kunheek.github.io/dti
• Github: https://github.com/kunheek/dti

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TextualInversion #TextToImage #GenerativeAI #DeepLearning #AI
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

📝 Summary:
One-to-All Animation is a unified framework for high-fidelity character animation and image pose transfer. It tackles misaligned and partially visible references using self-supervised outpainting, a robust reference extractor, and identity-robust pose control to outperform existing methods.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22940
• PDF: https://arxiv.org/pdf/2511.22940
• Project Page: https://ssj9596.github.io/one-to-all-animation-project/
• Github: https://github.com/ssj9596/One-to-All-Animation

🔹 Models citing this paper:
https://huggingface.co/MochunniaN1/One-to-All-14b
https://huggingface.co/MochunniaN1/One-to-All-1.3b_2
https://huggingface.co/MochunniaN1/One-to-All-1.3b_1

Datasets citing this paper:
https://huggingface.co/datasets/MochunniaN1/One-to-All-sub

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CharacterAnimation #PoseTransfer #ComputerVision #AI #DeepLearning
What matters for Representation Alignment: Global Information or Spatial Structure?

📝 Summary:
Representation alignment enhances generative training by transferring spatial structure from pretrained vision encoders to diffusion models, surpassing the importance of global semantic performance. A...

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10794
• PDF: https://arxiv.org/pdf/2512.10794
• Project Page: https://end2end-diffusion.github.io/irepa
• Github: https://github.com/end2end-diffusion/irepa

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

📝 Summary:
DrivePI is a new spatial-aware 4D MLLM for autonomous driving, unifying understanding, 3D perception, prediction, and planning. It integrates point clouds, images, and language instructions, achieving state-of-the-art performance by outperforming existing VLA and specialized VA models.

🔹 Publication Date: Published on Dec 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12799
• PDF: https://arxiv.org/pdf/2512.12799
• Github: https://github.com/happinesslz/DrivePI

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AutonomousDriving #MLLM #ComputerVision #DeepLearning #AI
Towards Scalable Pre-training of Visual Tokenizers for Generation

📝 Summary:
Traditional visual tokenizer training fails to improve generation quality with more compute. VTP is a new framework that jointly optimizes image-text contrastive, self-supervised, and reconstruction losses. This enables better scaling, faster convergence, and significantly improved generative per...

🔹 Publication Date: Published on Dec 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13687
• PDF: https://arxiv.org/pdf/2512.13687
• Github: https://github.com/hustvl

🔹 Models citing this paper:
https://huggingface.co/MiniMaxAI/VTP-Base-f16d64
https://huggingface.co/MiniMaxAI/VTP-Small-f16d64
https://huggingface.co/MiniMaxAI/VTP-Large-f16d64

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Learning Robot Manipulation from Audio World Models

📝 Summary:
A generative latent flow matching model is proposed to predict future audio for robotic manipulation tasks, improving performance over methods without future lookahead by accurately capturing intrinsi...

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08405
• PDF: https://arxiv.org/pdf/2512.08405

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment

📝 Summary:
WebOperator is a tree-search framework that enhances web agents with reliable backtracking and strategic exploration. It addresses challenges like irreversible actions and partial observability by using a safety-aware search and verifying paths. WebOperator achieves state-of-the-art results on We...

🔹 Publication Date: Published on Dec 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12692
• PDF: https://arxiv.org/pdf/2512.12692
• Project Page: https://kagnlp.github.io/WebOperator
• Github: https://kagnlp.github.io/WebOperator

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#WebAgents #TreeSearch #AI #AutonomousAgents #MachineLearning
Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries

📝 Summary:
A deep learning pipeline was optimized for automated fish re-identification in electronic monitoring systems. Using the Swin-T architecture and AutoFish dataset, it achieved 90.43% Rank-1 accuracy, with intra-species viewpoint differences being the main challenge.

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08400
• PDF: https://arxiv.org/pdf/2512.08400
• Github: https://github.com/msamdk/Fish_Re_Identification.git

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#FishReID #DeepLearning #ComputerVision #FisheriesTech #FineGrainedClassification
Efficient Memory Management for Large Language Model Serving with PagedAttention

📝 Summary:
PagedAttention algorithm and vLLM system enhance the throughput of large language models by efficiently managing memory and reducing waste in the key-value cache. AI-generated summary High throughput ...

🔹 Publication Date: Published on Sep 12, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2309.06180
• PDF: https://arxiv.org/pdf/2309.06180
• Github: https://github.com/vllm-project/vllm

🔹 Models citing this paper:
https://huggingface.co/theonlyengine/Flash-attention1

Datasets citing this paper:
https://huggingface.co/datasets/TheBlueScrubs/TheBlueScrubs-v1

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Very Large-Scale Multi-Agent Simulation in AgentScope

📝 Summary:
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendl...

🔹 Publication Date: Published on Jul 25, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• Github: https://github.com/modelscope/agentscope

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

📝 Summary:
The Qwen2-VL Series uses Naive Dynamic Resolution and Multimodal Rotary Position Embedding to enhance visual processing and achieves competitive performance on multimodal benchmarks. AI-generated summ...

🔹 Publication Date: Published on Sep 18, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2409.12191
• PDF: https://arxiv.org/pdf/2409.12191
• Github: https://github.com/QwenLM/Qwen2-VL

🔹 Models citing this paper:
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
https://huggingface.co/Qwen/QVQ-72B-Preview

Spaces citing this paper:
https://huggingface.co/spaces/prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
https://huggingface.co/spaces/linoyts/Qwen-Image-Edit-Angles
https://huggingface.co/spaces/tori29umai/Qwen-Image-2509-MultipleAngles

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

📝 Summary:
Mobile-Agent-v2, a multi-agent system with planning, decision, and reflection components, improves task completion in mobile device operations by addressing navigation challenges and handling errors. ...

🔹 Publication Date: Published on Jun 3, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.01014
• PDF: https://arxiv.org/pdf/2406.01014
• Github: https://github.com/x-plug/mobileagent

Spaces citing this paper:
https://huggingface.co/spaces/junyangwang0410/Mobile-Agent

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Agent S: An Open Agentic Framework that Uses Computers Like a Human

📝 Summary:
Agent S is an open agentic framework enabling autonomous GUI interaction to automate complex tasks. It employs experience-augmented hierarchical planning and an Agent-Computer Interface with MLLMs for enhanced reasoning. Agent S achieves state-of-the-art performance on OSWorld and demonstrates br...

🔹 Publication Date: Published on Oct 10, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.08164
• PDF: https://arxiv.org/pdf/2410.08164
• Github: https://huggingface.co/collections/ranpox/awesome-computer-use-agents

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AgenticAI #MultimodalAI #HumanComputerInteraction #Automation #AIResearch
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

📝 Summary:
Mamba, a novel SSM-based model, outperforms Transformers in inference speed and scalability across various modalities by selectively propagating information and using efficient hardware-aware algorith...

🔹 Publication Date: Published on Dec 1, 2023

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces
• PDF: https://arxiv.org/pdf/2312.00752
• Github: https://github.com/state-spaces/mamba

🔹 Models citing this paper:
https://huggingface.co/tiiuae/falcon-mamba-7b
https://huggingface.co/state-spaces/mamba-2.8b-slimpj
https://huggingface.co/tiiuae/falcon-mamba-7b-instruct

Datasets citing this paper:
https://huggingface.co/datasets/Sherirto/BD4UI

Spaces citing this paper:
https://huggingface.co/spaces/openfree/LLM_Quantization
https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
https://huggingface.co/spaces/seawolf2357/LLM_Quantization

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

📝 Summary:
AI-Trader evaluates the performance of large language models in real-world financial markets, highlighting their limitations in trading and risk management. AI-generated summary Large Language Models ...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• Github: https://github.com/HKUDS/AI-Trader

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Inferring Compositional 4D Scenes without Ever Seeing One

📝 Summary:
COM4D infers 4D/3D object structure and spatio-temporal configuration from 2D video. It avoids 4D compositional training data by disentangling spatial and temporal attention learning. This purely data-driven method achieves state-of-the-art results in 4D object and composed 3D reconstruction.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05272
• PDF: https://arxiv.org/pdf/2512.05272
• Project Page: https://berkegokmen1.github.io/com4d/
• Github: https://github.com/insait-institute/COM4D

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research