✨WebSailor: Navigating Super-human Reasoning for Web Agent
📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.
🔹 Publication Date: Published on Jul 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #WebAgents #AI #MachineLearning #Reasoning
📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.
🔹 Publication Date: Published on Jul 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #WebAgents #AI #MachineLearning #Reasoning
✨WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.
🔹 Publication Date: Published on Aug 7
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebWatcher-32B
• https://huggingface.co/Alibaba-NLP/WebWatcher-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.
🔹 Publication Date: Published on Aug 7
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebWatcher-32B
• https://huggingface.co/Alibaba-NLP/WebWatcher-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
❤1
✨WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...
🔹 Publication Date: Published on Jul 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebShaper-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Alibaba-NLP/WebShaper
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIAgents #DataGeneration #FormalMethods #NLP
📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...
🔹 Publication Date: Published on Jul 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebShaper-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Alibaba-NLP/WebShaper
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AIAgents #DataGeneration #FormalMethods #NLP
✨DeepAgent: A General Reasoning Agent with Scalable Toolsets
📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.
🔹 Publication Date: Published on Oct 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.
🔹 Publication Date: Published on Oct 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
✨Zep: A Temporal Knowledge Graph Architecture for Agent Memory
📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence
📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence
✨ReCode: Unify Plan and Action for Universal Granularity Control
📝 Summary:
ReCode unifies planning and action in LLM agents via recursive code generation. It treats plans as abstract functions recursively decomposed into primitive actions, enabling dynamic decision granularity. This significantly improves performance and data efficiency.
🔹 Publication Date: Published on Oct 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23564
• PDF: https://arxiv.org/pdf/2510.23564
• Github: https://github.com/FoundationAgents/ReCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #AI #CodeGeneration #Planning #GranularityControl
📝 Summary:
ReCode unifies planning and action in LLM agents via recursive code generation. It treats plans as abstract functions recursively decomposed into primitive actions, enabling dynamic decision granularity. This significantly improves performance and data efficiency.
🔹 Publication Date: Published on Oct 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23564
• PDF: https://arxiv.org/pdf/2510.23564
• Github: https://github.com/FoundationAgents/ReCode
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #AI #CodeGeneration #Planning #GranularityControl
✨LongCat-Video Technical Report
📝 Summary:
LongCat-Video is a 13.6B Diffusion Transformer model excelling in efficient, high-quality long video generation. It uses a unified architecture for tasks like Text-to-Video and coarse-to-fine generation for efficiency. This model is a significant step toward developing world models.
🔹 Publication Date: Published on Oct 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200
• Github: https://github.com/meituan-longcat/LongCat-Video
🔹 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Video
✨ Spaces citing this paper:
• https://huggingface.co/spaces/multimodalart/LongCat-Video
• https://huggingface.co/spaces/rahul7star/LongCat-Video
• https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #DiffusionModels #Transformers #AI #TextToVideo
📝 Summary:
LongCat-Video is a 13.6B Diffusion Transformer model excelling in efficient, high-quality long video generation. It uses a unified architecture for tasks like Text-to-Video and coarse-to-fine generation for efficiency. This model is a significant step toward developing world models.
🔹 Publication Date: Published on Oct 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200
• Github: https://github.com/meituan-longcat/LongCat-Video
🔹 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Video
✨ Spaces citing this paper:
• https://huggingface.co/spaces/multimodalart/LongCat-Video
• https://huggingface.co/spaces/rahul7star/LongCat-Video
• https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #DiffusionModels #Transformers #AI #TextToVideo
✨RAG-Anything: All-in-One RAG Framework
📝 Summary:
RAG-Anything is a unified framework extending RAG to all modalities, not just text. It integrates cross-modal relationships and semantic matching via dual-graph construction and hybrid retrieval. This significantly improves performance on complex multimodal benchmarks.
🔹 Publication Date: Published on Oct 14
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/rag-anything-all-in-one-rag-framework
• PDF: https://arxiv.org/pdf/2510.12323
• Github: https://github.com/HKUDS/RAG-Anything
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RAG #MultimodalAI #MachineLearning #InformationRetrieval #GraphAI
📝 Summary:
RAG-Anything is a unified framework extending RAG to all modalities, not just text. It integrates cross-modal relationships and semantic matching via dual-graph construction and hybrid retrieval. This significantly improves performance on complex multimodal benchmarks.
🔹 Publication Date: Published on Oct 14
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/rag-anything-all-in-one-rag-framework
• PDF: https://arxiv.org/pdf/2510.12323
• Github: https://github.com/HKUDS/RAG-Anything
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#RAG #MultimodalAI #MachineLearning #InformationRetrieval #GraphAI
✨PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold
📝 Summary:
PokeeResearch-7B is a 7B-parameter deep research agent achieving state-of-the-art results using Reinforcement Learning from AI Feedback RLAIF. Its chain-of-thought reasoning scaffold enhances robustness and alignment, producing an efficient, resilient, and research-grade AI.
🔹 Publication Date: Published on Oct 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15862
• PDF: https://arxiv.org/pdf/2510.15862
• Github: https://github.com/Pokee-AI/PokeeResearchOSS
🔹 Models citing this paper:
• https://huggingface.co/PokeeAI/pokee_research_7b
• https://huggingface.co/Mungert/pokee_research_7b-GGUF
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ReinforcementLearning #LLM #MachineLearning #AIResearch
📝 Summary:
PokeeResearch-7B is a 7B-parameter deep research agent achieving state-of-the-art results using Reinforcement Learning from AI Feedback RLAIF. Its chain-of-thought reasoning scaffold enhances robustness and alignment, producing an efficient, resilient, and research-grade AI.
🔹 Publication Date: Published on Oct 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15862
• PDF: https://arxiv.org/pdf/2510.15862
• Github: https://github.com/Pokee-AI/PokeeResearchOSS
🔹 Models citing this paper:
• https://huggingface.co/PokeeAI/pokee_research_7b
• https://huggingface.co/Mungert/pokee_research_7b-GGUF
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ReinforcementLearning #LLM #MachineLearning #AIResearch
✨FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
📝 Summary:
FAPO improves LLM reasoning by penalizing flawed-positive rollouts, which are unreliable reasoning patterns. This secures early gains while shifting optimization toward reliable reasoning later, enhancing correctness and stability.
🔹 Publication Date: Published on Oct 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io
🔹 Models citing this paper:
• https://huggingface.co/dyyyyyyyy/FAPO-32B
• https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #ReinforcementLearning #DeepLearning #Reasoning
📝 Summary:
FAPO improves LLM reasoning by penalizing flawed-positive rollouts, which are unreliable reasoning patterns. This secures early gains while shifting optimization toward reliable reasoning later, enhancing correctness and stability.
🔹 Publication Date: Published on Oct 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io
🔹 Models citing this paper:
• https://huggingface.co/dyyyyyyyy/FAPO-32B
• https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
• https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #ReinforcementLearning #DeepLearning #Reasoning
✨The Unreasonable Effectiveness of Scaling Agents for Computer Use
📝 Summary:
Behavior Best-of-N bBoN improves computer-use agent reliability by generating multiple rollouts and selecting them via behavior narratives. This method achieves state-of-the-art performance on OSWorld and generalizes across operating systems, demonstrating effective CUA scaling.
🔹 Publication Date: Published on Oct 2
🔹 Paper Links:
• arXiv Page: http://arxiv.org/abs/2510.02250
• PDF: https://arxiv.org/pdf/2510.02250
• Project Page: https://www.simular.ai/articles/agent-s3
• Github: http://github.com/simular-ai/Agent-S
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #AIScaling #OperatingSystems #BehavioralAI #AIResearch
📝 Summary:
Behavior Best-of-N bBoN improves computer-use agent reliability by generating multiple rollouts and selecting them via behavior narratives. This method achieves state-of-the-art performance on OSWorld and generalizes across operating systems, demonstrating effective CUA scaling.
🔹 Publication Date: Published on Oct 2
🔹 Paper Links:
• arXiv Page: http://arxiv.org/abs/2510.02250
• PDF: https://arxiv.org/pdf/2510.02250
• Project Page: https://www.simular.ai/articles/agent-s3
• Github: http://github.com/simular-ai/Agent-S
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #AIScaling #OperatingSystems #BehavioralAI #AIResearch
✨Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
📝 Summary:
Agent S2 is a compositional framework for computer use agents that delegates tasks across generalist and specialist models. Using Mixture-of-Grounding and Proactive Hierarchical Planning, it achieves state-of-the-art performance on diverse benchmarks and operating systems.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• Github: https://github.com/simular-ai/Agent-S
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #MachineLearning #AI #GeneralistSpecialist #AutonomousSystems
📝 Summary:
Agent S2 is a compositional framework for computer use agents that delegates tasks across generalist and specialist models. Using Mixture-of-Grounding and Proactive Hierarchical Planning, it achieves state-of-the-art performance on diverse benchmarks and operating systems.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• Github: https://github.com/simular-ai/Agent-S
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AIAgents #MachineLearning #AI #GeneralistSpecialist #AutonomousSystems
❤1
✨Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k
🔹 Models citing this paper:
• https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k
🔹 Models citing this paper:
• https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
✨MIRIX: Multi-Agent Memory System for LLM-Based Agents
📝 Summary:
MIRIX is a modular multi-agent memory system for LLM-based agents that integrates diverse memory types and a dynamic framework. It significantly enhances memory capabilities for multimodal and long-form conversations. MIRIX achieves superior performance on challenging benchmarks, outperforming ex...
🔹 Publication Date: Published on Jul 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.07957
• PDF: https://arxiv.org/pdf/2507.07957
• Project Page: https://mirix.io/
• Github: https://github.com/Mirix-AI/MIRIX
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #MultiAgentSystems #AISystems #MemorySystems #AI
📝 Summary:
MIRIX is a modular multi-agent memory system for LLM-based agents that integrates diverse memory types and a dynamic framework. It significantly enhances memory capabilities for multimodal and long-form conversations. MIRIX achieves superior performance on challenging benchmarks, outperforming ex...
🔹 Publication Date: Published on Jul 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.07957
• PDF: https://arxiv.org/pdf/2507.07957
• Project Page: https://mirix.io/
• Github: https://github.com/Mirix-AI/MIRIX
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #MultiAgentSystems #AISystems #MemorySystems #AI
✨Cache-to-Cache: Direct Semantic Communication Between Large Language Models
📝 Summary:
C2C enables direct semantic communication between LLMs by projecting and fusing their KV-caches, overcoming text-based communication limits. This method preserves rich semantics, improving accuracy by 3-5% and achieving a 2x speedup over traditional text communication.
🔹 Publication Date: Published on Oct 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.03215
• PDF: https://arxiv.org/pdf/2510.03215
• Project Page: https://fuvty.github.io/C2C_Project_Page/
• Github: https://github.com/thu-nics/C2C
🔹 Models citing this paper:
• https://huggingface.co/nics-efc/C2C_Fuser
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fuvty/C2C_demo
• https://huggingface.co/spaces/nics-efc/C2C_demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SemanticCommunication #AI #DeepLearning #NLP
📝 Summary:
C2C enables direct semantic communication between LLMs by projecting and fusing their KV-caches, overcoming text-based communication limits. This method preserves rich semantics, improving accuracy by 3-5% and achieving a 2x speedup over traditional text communication.
🔹 Publication Date: Published on Oct 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.03215
• PDF: https://arxiv.org/pdf/2510.03215
• Project Page: https://fuvty.github.io/C2C_Project_Page/
• Github: https://github.com/thu-nics/C2C
🔹 Models citing this paper:
• https://huggingface.co/nics-efc/C2C_Fuser
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fuvty/C2C_demo
• https://huggingface.co/spaces/nics-efc/C2C_demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #SemanticCommunication #AI #DeepLearning #NLP
✨Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
📝 Summary:
Skyfall-GS synthesizes large-scale, explorable 3D urban scenes by combining satellite imagery for geometry and diffusion models for realistic textures. This framework offers improved cross-view consistent geometry and photorealistic appearances without needing costly 3D annotations.
🔹 Publication Date: Published on Oct 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15869
• PDF: https://arxiv.org/pdf/2510.15869
• Project Page: https://skyfall-gs.jayinnn.dev/
• Github: https://github.com/jayin92/skyfall-gs
🔹 Models citing this paper:
• https://huggingface.co/jayinnn/Skyfall-GS-ply
✨ Datasets citing this paper:
• https://huggingface.co/datasets/jayinnn/Skyfall-GS-eval
• https://huggingface.co/datasets/jayinnn/Skyfall-GS-datasets
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DReconstruction #ComputerVision #SatelliteImagery #DiffusionModels #UrbanModeling
📝 Summary:
Skyfall-GS synthesizes large-scale, explorable 3D urban scenes by combining satellite imagery for geometry and diffusion models for realistic textures. This framework offers improved cross-view consistent geometry and photorealistic appearances without needing costly 3D annotations.
🔹 Publication Date: Published on Oct 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15869
• PDF: https://arxiv.org/pdf/2510.15869
• Project Page: https://skyfall-gs.jayinnn.dev/
• Github: https://github.com/jayin92/skyfall-gs
🔹 Models citing this paper:
• https://huggingface.co/jayinnn/Skyfall-GS-ply
✨ Datasets citing this paper:
• https://huggingface.co/datasets/jayinnn/Skyfall-GS-eval
• https://huggingface.co/datasets/jayinnn/Skyfall-GS-datasets
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DReconstruction #ComputerVision #SatelliteImagery #DiffusionModels #UrbanModeling
arXiv.org
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack...
This media is not supported in your browser
VIEW IN TELEGRAM
✨Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
📝 Summary:
Easy Dataset is a framework that synthesizes LLM fine-tuning data from unstructured documents using a GUI and LLMs. It generates domain-specific question-answer pairs with human oversight. This improves LLM performance in specific domains while retaining general knowledge.
🔹 Publication Date: Published on Jul 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.04009
• PDF: https://arxiv.org/pdf/2507.04009
• Github: https://github.com/ConardLi/easy-dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #DataSynthesis #FineTuning #AI #NLP
📝 Summary:
Easy Dataset is a framework that synthesizes LLM fine-tuning data from unstructured documents using a GUI and LLMs. It generates domain-specific question-answer pairs with human oversight. This improves LLM performance in specific domains while retaining general knowledge.
🔹 Publication Date: Published on Jul 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.04009
• PDF: https://arxiv.org/pdf/2507.04009
• Github: https://github.com/ConardLi/easy-dataset
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #DataSynthesis #FineTuning #AI #NLP
✨InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/
🔹 Models citing this paper:
• https://huggingface.co/OpenGVLab/InternVL3-78B
• https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
• https://huggingface.co/OpenGVLab/InternVL3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
• https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI
📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/
🔹 Models citing this paper:
• https://huggingface.co/OpenGVLab/InternVL3-78B
• https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
• https://huggingface.co/OpenGVLab/InternVL3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
• https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI
arXiv.org
InternVL3: Exploring Advanced Training and Test-Time Recipes for...
We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a...