✨Towards Scalable Pre-training of Visual Tokenizers for Generation
📝 Summary:
Traditional visual tokenizer training fails to improve generation quality with more compute. VTP is a new framework that jointly optimizes image-text contrastive, self-supervised, and reconstruction losses. This enables better scaling, faster convergence, and significantly improved generative per...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13687
• PDF: https://arxiv.org/pdf/2512.13687
• Github: https://github.com/hustvl
🔹 Models citing this paper:
• https://huggingface.co/MiniMaxAI/VTP-Base-f16d64
• https://huggingface.co/MiniMaxAI/VTP-Small-f16d64
• https://huggingface.co/MiniMaxAI/VTP-Large-f16d64
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Traditional visual tokenizer training fails to improve generation quality with more compute. VTP is a new framework that jointly optimizes image-text contrastive, self-supervised, and reconstruction losses. This enables better scaling, faster convergence, and significantly improved generative per...
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13687
• PDF: https://arxiv.org/pdf/2512.13687
• Github: https://github.com/hustvl
🔹 Models citing this paper:
• https://huggingface.co/MiniMaxAI/VTP-Base-f16d64
• https://huggingface.co/MiniMaxAI/VTP-Small-f16d64
• https://huggingface.co/MiniMaxAI/VTP-Large-f16d64
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Learning Robot Manipulation from Audio World Models
📝 Summary:
A generative latent flow matching model is proposed to predict future audio for robotic manipulation tasks, improving performance over methods without future lookahead by accurately capturing intrinsi...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08405
• PDF: https://arxiv.org/pdf/2512.08405
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A generative latent flow matching model is proposed to predict future audio for robotic manipulation tasks, improving performance over methods without future lookahead by accurately capturing intrinsi...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08405
• PDF: https://arxiv.org/pdf/2512.08405
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment
📝 Summary:
WebOperator is a tree-search framework that enhances web agents with reliable backtracking and strategic exploration. It addresses challenges like irreversible actions and partial observability by using a safety-aware search and verifying paths. WebOperator achieves state-of-the-art results on We...
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12692
• PDF: https://arxiv.org/pdf/2512.12692
• Project Page: https://kagnlp.github.io/WebOperator
• Github: https://kagnlp.github.io/WebOperator
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#WebAgents #TreeSearch #AI #AutonomousAgents #MachineLearning
📝 Summary:
WebOperator is a tree-search framework that enhances web agents with reliable backtracking and strategic exploration. It addresses challenges like irreversible actions and partial observability by using a safety-aware search and verifying paths. WebOperator achieves state-of-the-art results on We...
🔹 Publication Date: Published on Dec 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12692
• PDF: https://arxiv.org/pdf/2512.12692
• Project Page: https://kagnlp.github.io/WebOperator
• Github: https://kagnlp.github.io/WebOperator
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#WebAgents #TreeSearch #AI #AutonomousAgents #MachineLearning
✨Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries
📝 Summary:
A deep learning pipeline was optimized for automated fish re-identification in electronic monitoring systems. Using the Swin-T architecture and AutoFish dataset, it achieved 90.43% Rank-1 accuracy, with intra-species viewpoint differences being the main challenge.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08400
• PDF: https://arxiv.org/pdf/2512.08400
• Github: https://github.com/msamdk/Fish_Re_Identification.git
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FishReID #DeepLearning #ComputerVision #FisheriesTech #FineGrainedClassification
📝 Summary:
A deep learning pipeline was optimized for automated fish re-identification in electronic monitoring systems. Using the Swin-T architecture and AutoFish dataset, it achieved 90.43% Rank-1 accuracy, with intra-species viewpoint differences being the main challenge.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08400
• PDF: https://arxiv.org/pdf/2512.08400
• Github: https://github.com/msamdk/Fish_Re_Identification.git
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#FishReID #DeepLearning #ComputerVision #FisheriesTech #FineGrainedClassification
✨Efficient Memory Management for Large Language Model Serving with PagedAttention
📝 Summary:
PagedAttention algorithm and vLLM system enhance the throughput of large language models by efficiently managing memory and reducing waste in the key-value cache. AI-generated summary High throughput ...
🔹 Publication Date: Published on Sep 12, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2309.06180
• PDF: https://arxiv.org/pdf/2309.06180
• Github: https://github.com/vllm-project/vllm
🔹 Models citing this paper:
• https://huggingface.co/theonlyengine/Flash-attention1
✨ Datasets citing this paper:
• https://huggingface.co/datasets/TheBlueScrubs/TheBlueScrubs-v1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
PagedAttention algorithm and vLLM system enhance the throughput of large language models by efficiently managing memory and reducing waste in the key-value cache. AI-generated summary High throughput ...
🔹 Publication Date: Published on Sep 12, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2309.06180
• PDF: https://arxiv.org/pdf/2309.06180
• Github: https://github.com/vllm-project/vllm
🔹 Models citing this paper:
• https://huggingface.co/theonlyengine/Flash-attention1
✨ Datasets citing this paper:
• https://huggingface.co/datasets/TheBlueScrubs/TheBlueScrubs-v1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Very Large-Scale Multi-Agent Simulation in AgentScope
📝 Summary:
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendl...
🔹 Publication Date: Published on Jul 25, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• Github: https://github.com/modelscope/agentscope
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Enhancements to the AgentScope platform improve scalability, efficiency, and ease of use for large-scale multi-agent simulations through distributed mechanisms, flexible environments, and user-friendl...
🔹 Publication Date: Published on Jul 25, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.17789
• PDF: https://arxiv.org/pdf/2407.17789
• Github: https://github.com/modelscope/agentscope
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
📝 Summary:
The Qwen2-VL Series uses Naive Dynamic Resolution and Multimodal Rotary Position Embedding to enhance visual processing and achieves competitive performance on multimodal benchmarks. AI-generated summ...
🔹 Publication Date: Published on Sep 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2409.12191
• PDF: https://arxiv.org/pdf/2409.12191
• Github: https://github.com/QwenLM/Qwen2-VL
🔹 Models citing this paper:
• https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
• https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
• https://huggingface.co/Qwen/QVQ-72B-Preview
✨ Spaces citing this paper:
• https://huggingface.co/spaces/prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
• https://huggingface.co/spaces/linoyts/Qwen-Image-Edit-Angles
• https://huggingface.co/spaces/tori29umai/Qwen-Image-2509-MultipleAngles
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The Qwen2-VL Series uses Naive Dynamic Resolution and Multimodal Rotary Position Embedding to enhance visual processing and achieves competitive performance on multimodal benchmarks. AI-generated summ...
🔹 Publication Date: Published on Sep 18, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2409.12191
• PDF: https://arxiv.org/pdf/2409.12191
• Github: https://github.com/QwenLM/Qwen2-VL
🔹 Models citing this paper:
• https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
• https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
• https://huggingface.co/Qwen/QVQ-72B-Preview
✨ Spaces citing this paper:
• https://huggingface.co/spaces/prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
• https://huggingface.co/spaces/linoyts/Qwen-Image-Edit-Angles
• https://huggingface.co/spaces/tori29umai/Qwen-Image-2509-MultipleAngles
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
Qwen2-VL: Enhancing Vision-Language Model's Perception of the...
We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing. Qwen2-VL introduces the...
✨Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
📝 Summary:
Mobile-Agent-v2, a multi-agent system with planning, decision, and reflection components, improves task completion in mobile device operations by addressing navigation challenges and handling errors. ...
🔹 Publication Date: Published on Jun 3, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.01014
• PDF: https://arxiv.org/pdf/2406.01014
• Github: https://github.com/x-plug/mobileagent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/Mobile-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Mobile-Agent-v2, a multi-agent system with planning, decision, and reflection components, improves task completion in mobile device operations by addressing navigation challenges and handling errors. ...
🔹 Publication Date: Published on Jun 3, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.01014
• PDF: https://arxiv.org/pdf/2406.01014
• Github: https://github.com/x-plug/mobileagent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/Mobile-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨Agent S: An Open Agentic Framework that Uses Computers Like a Human
📝 Summary:
Agent S is an open agentic framework enabling autonomous GUI interaction to automate complex tasks. It employs experience-augmented hierarchical planning and an Agent-Computer Interface with MLLMs for enhanced reasoning. Agent S achieves state-of-the-art performance on OSWorld and demonstrates br...
🔹 Publication Date: Published on Oct 10, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.08164
• PDF: https://arxiv.org/pdf/2410.08164
• Github: https://huggingface.co/collections/ranpox/awesome-computer-use-agents
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AgenticAI #MultimodalAI #HumanComputerInteraction #Automation #AIResearch
📝 Summary:
Agent S is an open agentic framework enabling autonomous GUI interaction to automate complex tasks. It employs experience-augmented hierarchical planning and an Agent-Computer Interface with MLLMs for enhanced reasoning. Agent S achieves state-of-the-art performance on OSWorld and demonstrates br...
🔹 Publication Date: Published on Oct 10, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.08164
• PDF: https://arxiv.org/pdf/2410.08164
• Github: https://huggingface.co/collections/ranpox/awesome-computer-use-agents
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AgenticAI #MultimodalAI #HumanComputerInteraction #Automation #AIResearch
✨Mamba: Linear-Time Sequence Modeling with Selective State Spaces
📝 Summary:
Mamba, a novel SSM-based model, outperforms Transformers in inference speed and scalability across various modalities by selectively propagating information and using efficient hardware-aware algorith...
🔹 Publication Date: Published on Dec 1, 2023
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces
• PDF: https://arxiv.org/pdf/2312.00752
• Github: https://github.com/state-spaces/mamba
🔹 Models citing this paper:
• https://huggingface.co/tiiuae/falcon-mamba-7b
• https://huggingface.co/state-spaces/mamba-2.8b-slimpj
• https://huggingface.co/tiiuae/falcon-mamba-7b-instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Sherirto/BD4UI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/openfree/LLM_Quantization
• https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
• https://huggingface.co/spaces/seawolf2357/LLM_Quantization
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Mamba, a novel SSM-based model, outperforms Transformers in inference speed and scalability across various modalities by selectively propagating information and using efficient hardware-aware algorith...
🔹 Publication Date: Published on Dec 1, 2023
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces
• PDF: https://arxiv.org/pdf/2312.00752
• Github: https://github.com/state-spaces/mamba
🔹 Models citing this paper:
• https://huggingface.co/tiiuae/falcon-mamba-7b
• https://huggingface.co/state-spaces/mamba-2.8b-slimpj
• https://huggingface.co/tiiuae/falcon-mamba-7b-instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Sherirto/BD4UI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/openfree/LLM_Quantization
• https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
• https://huggingface.co/spaces/seawolf2357/LLM_Quantization
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Arxivexplained
Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Explained Simply
By Albert Gu, Tri Dao. # Mamba: The AI Architecture That Could Replace Transformers
**The Problem:** Today's most powerful...
**The Problem:** Today's most powerful...
✨AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
📝 Summary:
AI-Trader evaluates the performance of large language models in real-world financial markets, highlighting their limitations in trading and risk management. AI-generated summary Large Language Models ...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• Github: https://github.com/HKUDS/AI-Trader
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
AI-Trader evaluates the performance of large language models in real-world financial markets, highlighting their limitations in trading and risk management. AI-generated summary Large Language Models ...
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• Github: https://github.com/HKUDS/AI-Trader
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Inferring Compositional 4D Scenes without Ever Seeing One
📝 Summary:
COM4D infers 4D/3D object structure and spatio-temporal configuration from 2D video. It avoids 4D compositional training data by disentangling spatial and temporal attention learning. This purely data-driven method achieves state-of-the-art results in 4D object and composed 3D reconstruction.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05272
• PDF: https://arxiv.org/pdf/2512.05272
• Project Page: https://berkegokmen1.github.io/com4d/
• Github: https://github.com/insait-institute/COM4D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
COM4D infers 4D/3D object structure and spatio-temporal configuration from 2D video. It avoids 4D compositional training data by disentangling spatial and temporal attention learning. This purely data-driven method achieves state-of-the-art results in 4D object and composed 3D reconstruction.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05272
• PDF: https://arxiv.org/pdf/2512.05272
• Project Page: https://berkegokmen1.github.io/com4d/
• Github: https://github.com/insait-institute/COM4D
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
📝 Summary:
FunAudioLLM enhances voice interactions by integrating SenseVoice for multilingual speech recognition, emotion detection, and audio event detection with CosyVoice for natural speech generation across ...
🔹 Publication Date: Published on Jul 4, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.04051
• PDF: https://arxiv.org/pdf/2407.04051
• Github: https://github.com/FunAudioLLM
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
FunAudioLLM enhances voice interactions by integrating SenseVoice for multilingual speech recognition, emotion detection, and audio event detection with CosyVoice for natural speech generation across ...
🔹 Publication Date: Published on Jul 4, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.04051
• PDF: https://arxiv.org/pdf/2407.04051
• Github: https://github.com/FunAudioLLM
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
📝 Summary:
SmolDocling is a compact vision-language model that performs end-to-end document conversion with robust performance across various document types using 256M parameters and a new markup format. AI-gene...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11576
• PDF: https://huggingface.co/papers/2502.18443
• Project Page: https://huggingface.co/spaces/docling-project/SmolDocling-256M-Demo
• Github: https://github.com/docling-project/docling
🔹 Models citing this paper:
• https://huggingface.co/docling-project/SmolDocling-256M-preview
• https://huggingface.co/ibm-granite/granite-docling-258M
• https://huggingface.co/docling-project/CodeFormulaV2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/docling-project/SynthCodeNet
• https://huggingface.co/datasets/HuggingFaceM4/DoclingMatix
• https://huggingface.co/datasets/docling-project/SynthChartNet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo
• https://huggingface.co/spaces/ibm-granite/granite-docling-258M-WebGPU
• https://huggingface.co/spaces/docling-project/SmolDocling-256M-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SmolDocling is a compact vision-language model that performs end-to-end document conversion with robust performance across various document types using 256M parameters and a new markup format. AI-gene...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11576
• PDF: https://huggingface.co/papers/2502.18443
• Project Page: https://huggingface.co/spaces/docling-project/SmolDocling-256M-Demo
• Github: https://github.com/docling-project/docling
🔹 Models citing this paper:
• https://huggingface.co/docling-project/SmolDocling-256M-preview
• https://huggingface.co/ibm-granite/granite-docling-258M
• https://huggingface.co/docling-project/CodeFormulaV2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/docling-project/SynthCodeNet
• https://huggingface.co/datasets/HuggingFaceM4/DoclingMatix
• https://huggingface.co/datasets/docling-project/SynthChartNet
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ibm-granite/granite-docling-258m-demo
• https://huggingface.co/spaces/ibm-granite/granite-docling-258M-WebGPU
• https://huggingface.co/spaces/docling-project/SmolDocling-256M-Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
SmolDocling: An ultra-compact vision-language model for end-to-end...
We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal...
✨OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
📝 Summary:
OpenDevin is a platform for developing AI agents that interact with the world by writing code, using command lines, and browsing the web, with support for multiple agents and evaluation benchmarks. AI...
🔹 Publication Date: Published on Jul 23, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.16741
• PDF: https://arxiv.org/pdf/2407.16741
• Github: https://github.com/OpenDevin/OpenDevin/?tab=readme-ov-file#-join-our-community
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
OpenDevin is a platform for developing AI agents that interact with the world by writing code, using command lines, and browsing the web, with support for multiple agents and evaluation benchmarks. AI...
🔹 Publication Date: Published on Jul 23, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2407.16741
• PDF: https://arxiv.org/pdf/2407.16741
• Github: https://github.com/OpenDevin/OpenDevin/?tab=readme-ov-file#-join-our-community
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Single-stream Policy Optimization
📝 Summary:
Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to b...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13232
• PDF: https://arxiv.org/pdf/2509.13232
• Project Page: https://zhongwenxu.notion.site/Single-stream-Policy-Optimization-26a1c4e140e380d78d51fa4567727f50
• Github: https://github.com/volcengine/verl
🔹 Models citing this paper:
• https://huggingface.co/jingyaogong/MiniMind2-gguf
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dingzihan737/SPO_Qwen3-8B_DAPO_16k_ReTool_Binary
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Single-stream Policy Optimization (SPO) improves policy-gradient training for Large Language Models by eliminating group-based issues and providing a stable, low-variance learning signal, leading to b...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13232
• PDF: https://arxiv.org/pdf/2509.13232
• Project Page: https://zhongwenxu.notion.site/Single-stream-Policy-Optimization-26a1c4e140e380d78d51fa4567727f50
• Github: https://github.com/volcengine/verl
🔹 Models citing this paper:
• https://huggingface.co/jingyaogong/MiniMind2-gguf
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dingzihan737/SPO_Qwen3-8B_DAPO_16k_ReTool_Binary
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Qwen2.5-VL Technical Report
📝 Summary:
Qwen2.5-VL, the latest vision-language model, advances visual recognition, document parsing, and video comprehension through dynamic resolution processing, Window Attention, and a native Vision Transf...
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.13923
• PDF: https://arxiv.org/pdf/2502.13923
• Project Page: https://chat.qwenlm.ai
• Github: https://github.com/QwenLM/Qwen2.5-VL
🔹 Models citing this paper:
• https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
• https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct
• https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/xlangai/Jedi
• https://huggingface.co/datasets/IntelligenceLab/VideoHallu
• https://huggingface.co/datasets/turing-motors/MOMIJI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/SmartFlowAI/HuggingFaceMonthlyPaper202502
• https://huggingface.co/spaces/hadadxyz/ai
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Qwen2.5-VL, the latest vision-language model, advances visual recognition, document parsing, and video comprehension through dynamic resolution processing, Window Attention, and a native Vision Transf...
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.13923
• PDF: https://arxiv.org/pdf/2502.13923
• Project Page: https://chat.qwenlm.ai
• Github: https://github.com/QwenLM/Qwen2.5-VL
🔹 Models citing this paper:
• https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
• https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct
• https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/xlangai/Jedi
• https://huggingface.co/datasets/IntelligenceLab/VideoHallu
• https://huggingface.co/datasets/turing-motors/MOMIJI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/SmartFlowAI/HuggingFaceMonthlyPaper202502
• https://huggingface.co/spaces/hadadxyz/ai
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
Qwen2.5-VL Technical Report
We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities....