ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.05K photos
234 videos
23 files
4.36K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
WebDancer: Towards Autonomous Information Seeking Agency

📝 Summary:
WebDancer proposes a four-stage framework for building autonomous information seeking agents. This approach combines data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning, demonstrating strong performance on challenging benchmarks.

🔹 Publication Date: Published on May 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22648
• PDF: https://arxiv.org/pdf/2505.22648
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebDancer-32B

Spaces citing this paper:
https://huggingface.co/spaces/frucht/Alibaba-NLP-WebDancer-32B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #AutonomousAgents #ReinforcementLearning #MachineLearning #WebAgents
WebSailor: Navigating Super-human Reasoning for Web Agent

📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.

🔹 Publication Date: Published on Jul 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #WebAgents #AI #MachineLearning #Reasoning
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.

🔹 Publication Date: Published on Aug 7

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebWatcher-32B
https://huggingface.co/Alibaba-NLP/WebWatcher-7B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
1
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...

🔹 Publication Date: Published on Jul 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebShaper-32B

Datasets citing this paper:
https://huggingface.co/datasets/Alibaba-NLP/WebShaper

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AIAgents #DataGeneration #FormalMethods #NLP
DeepAgent: A General Reasoning Agent with Scalable Toolsets

📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.

🔹 Publication Date: Published on Oct 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
Zep: A Temporal Knowledge Graph Architecture for Agent Memory

📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence
ReCode: Unify Plan and Action for Universal Granularity Control

📝 Summary:
ReCode unifies planning and action in LLM agents via recursive code generation. It treats plans as abstract functions recursively decomposed into primitive actions, enabling dynamic decision granularity. This significantly improves performance and data efficiency.

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23564
• PDF: https://arxiv.org/pdf/2510.23564
• Github: https://github.com/FoundationAgents/ReCode

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMAgents #AI #CodeGeneration #Planning #GranularityControl
LongCat-Video Technical Report

📝 Summary:
LongCat-Video is a 13.6B Diffusion Transformer model excelling in efficient, high-quality long video generation. It uses a unified architecture for tasks like Text-to-Video and coarse-to-fine generation for efficiency. This model is a significant step toward developing world models.

🔹 Publication Date: Published on Oct 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200
• Github: https://github.com/meituan-longcat/LongCat-Video

🔹 Models citing this paper:
https://huggingface.co/meituan-longcat/LongCat-Video

Spaces citing this paper:
https://huggingface.co/spaces/multimodalart/LongCat-Video
https://huggingface.co/spaces/rahul7star/LongCat-Video
https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #DiffusionModels #Transformers #AI #TextToVideo
RAG-Anything: All-in-One RAG Framework

📝 Summary:
RAG-Anything is a unified framework extending RAG to all modalities, not just text. It integrates cross-modal relationships and semantic matching via dual-graph construction and hybrid retrieval. This significantly improves performance on complex multimodal benchmarks.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/rag-anything-all-in-one-rag-framework
• PDF: https://arxiv.org/pdf/2510.12323
• Github: https://github.com/HKUDS/RAG-Anything

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#RAG #MultimodalAI #MachineLearning #InformationRetrieval #GraphAI
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

📝 Summary:
PokeeResearch-7B is a 7B-parameter deep research agent achieving state-of-the-art results using Reinforcement Learning from AI Feedback RLAIF. Its chain-of-thought reasoning scaffold enhances robustness and alignment, producing an efficient, resilient, and research-grade AI.

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15862
• PDF: https://arxiv.org/pdf/2510.15862
• Github: https://github.com/Pokee-AI/PokeeResearchOSS

🔹 Models citing this paper:
https://huggingface.co/PokeeAI/pokee_research_7b
https://huggingface.co/Mungert/pokee_research_7b-GGUF

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #ReinforcementLearning #LLM #MachineLearning #AIResearch
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

📝 Summary:
FAPO improves LLM reasoning by penalizing flawed-positive rollouts, which are unreliable reasoning patterns. This secures early gains while shifting optimization toward reliable reasoning later, enhancing correctness and stability.

🔹 Publication Date: Published on Oct 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io

🔹 Models citing this paper:
https://huggingface.co/dyyyyyyyy/FAPO-32B
https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B

Datasets citing this paper:
https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #ReinforcementLearning #DeepLearning #Reasoning
The Unreasonable Effectiveness of Scaling Agents for Computer Use

📝 Summary:
Behavior Best-of-N bBoN improves computer-use agent reliability by generating multiple rollouts and selecting them via behavior narratives. This method achieves state-of-the-art performance on OSWorld and generalizes across operating systems, demonstrating effective CUA scaling.

🔹 Publication Date: Published on Oct 2

🔹 Paper Links:
• arXiv Page: http://arxiv.org/abs/2510.02250
• PDF: https://arxiv.org/pdf/2510.02250
• Project Page: https://www.simular.ai/articles/agent-s3
• Github: http://github.com/simular-ai/Agent-S

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #AIScaling #OperatingSystems #BehavioralAI #AIResearch
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

📝 Summary:
Agent S2 is a compositional framework for computer use agents that delegates tasks across generalist and specialist models. Using Mixture-of-Grounding and Proactive Hierarchical Planning, it achieves state-of-the-art performance on diverse benchmarks and operating systems.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• Github: https://github.com/simular-ai/Agent-S

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #MachineLearning #AI #GeneralistSpecialist #AutonomousSystems
1
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k

🔹 Models citing this paper:
https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
MIRIX: Multi-Agent Memory System for LLM-Based Agents

📝 Summary:
MIRIX is a modular multi-agent memory system for LLM-based agents that integrates diverse memory types and a dynamic framework. It significantly enhances memory capabilities for multimodal and long-form conversations. MIRIX achieves superior performance on challenging benchmarks, outperforming ex...

🔹 Publication Date: Published on Jul 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.07957
• PDF: https://arxiv.org/pdf/2507.07957
• Project Page: https://mirix.io/
• Github: https://github.com/Mirix-AI/MIRIX

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #MultiAgentSystems #AISystems #MemorySystems #AI
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

📝 Summary:
C2C enables direct semantic communication between LLMs by projecting and fusing their KV-caches, overcoming text-based communication limits. This method preserves rich semantics, improving accuracy by 3-5% and achieving a 2x speedup over traditional text communication.

🔹 Publication Date: Published on Oct 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.03215
• PDF: https://arxiv.org/pdf/2510.03215
• Project Page: https://fuvty.github.io/C2C_Project_Page/
• Github: https://github.com/thu-nics/C2C

🔹 Models citing this paper:
https://huggingface.co/nics-efc/C2C_Fuser

Spaces citing this paper:
https://huggingface.co/spaces/fuvty/C2C_demo
https://huggingface.co/spaces/nics-efc/C2C_demo

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #SemanticCommunication #AI #DeepLearning #NLP
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

📝 Summary:
Skyfall-GS synthesizes large-scale, explorable 3D urban scenes by combining satellite imagery for geometry and diffusion models for realistic textures. This framework offers improved cross-view consistent geometry and photorealistic appearances without needing costly 3D annotations.

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15869
• PDF: https://arxiv.org/pdf/2510.15869
• Project Page: https://skyfall-gs.jayinnn.dev/
• Github: https://github.com/jayin92/skyfall-gs

🔹 Models citing this paper:
https://huggingface.co/jayinnn/Skyfall-GS-ply

Datasets citing this paper:
https://huggingface.co/datasets/jayinnn/Skyfall-GS-eval
https://huggingface.co/datasets/jayinnn/Skyfall-GS-datasets

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DReconstruction #ComputerVision #SatelliteImagery #DiffusionModels #UrbanModeling
This media is not supported in your browser
VIEW IN TELEGRAM
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

📝 Summary:
Easy Dataset is a framework that synthesizes LLM fine-tuning data from unstructured documents using a GUI and LLMs. It generates domain-specific question-answer pairs with human oversight. This improves LLM performance in specific domains while retaining general knowledge.

🔹 Publication Date: Published on Jul 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.04009
• PDF: https://arxiv.org/pdf/2507.04009
• Github: https://github.com/ConardLi/easy-dataset

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #DataSynthesis #FineTuning #AI #NLP
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/

🔹 Models citing this paper:
https://huggingface.co/OpenGVLab/InternVL3-78B
https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
https://huggingface.co/OpenGVLab/InternVL3-8B

Datasets citing this paper:
https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts

Spaces citing this paper:
https://huggingface.co/spaces/AntResearchNLP/ViLaBench
https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI