✨OpenVoice: Versatile Instant Voice Cloning
📝 Summary:
OpenVoice is a versatile voice cloning method using a short audio clip. It provides flexible control over voice styles and achieves zero-shot cross-lingual cloning for new languages without extensive training data. It is also highly efficient.
🔹 Publication Date: Published on Dec 3, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2312.01479
• PDF: https://arxiv.org/pdf/2312.01479
• Github: https://github.com/myshell-ai/openvoice
🔹 Models citing this paper:
• https://huggingface.co/rsxdalv/OpenVoiceV2
• https://huggingface.co/ameerazam08/Udiff
• https://huggingface.co/flopml/OpenVoice-v2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tsinghua-ee/QualiSpeech
• https://huggingface.co/datasets/dlxjj/Openvoice
• https://huggingface.co/datasets/Pendrokar/open_tts_tracker
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Russell1123213123/testOpenVoice
• https://huggingface.co/spaces/gauthamk28/gauthamk28_voice
• https://huggingface.co/spaces/blayks07/OpenVoice-main
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VoiceCloning #AIResearch #SpeechSynthesis #ZeroShotLearning #CrossLingualAI
📝 Summary:
OpenVoice is a versatile voice cloning method using a short audio clip. It provides flexible control over voice styles and achieves zero-shot cross-lingual cloning for new languages without extensive training data. It is also highly efficient.
🔹 Publication Date: Published on Dec 3, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2312.01479
• PDF: https://arxiv.org/pdf/2312.01479
• Github: https://github.com/myshell-ai/openvoice
🔹 Models citing this paper:
• https://huggingface.co/rsxdalv/OpenVoiceV2
• https://huggingface.co/ameerazam08/Udiff
• https://huggingface.co/flopml/OpenVoice-v2
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tsinghua-ee/QualiSpeech
• https://huggingface.co/datasets/dlxjj/Openvoice
• https://huggingface.co/datasets/Pendrokar/open_tts_tracker
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Russell1123213123/testOpenVoice
• https://huggingface.co/spaces/gauthamk28/gauthamk28_voice
• https://huggingface.co/spaces/blayks07/OpenVoice-main
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VoiceCloning #AIResearch #SpeechSynthesis #ZeroShotLearning #CrossLingualAI
arXiv.org
OpenVoice: Versatile Instant Voice Cloning
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages....
✨Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
📝 Summary:
LLMs struggle to authentically role-play villains due to safety alignment, showing a monotonic decline in fidelity as character morality decreases. The Moral RolePlay benchmark reveals models struggle with traits like deceit and manipulation, highlighting a tension between model safety and creati...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04962
• PDF: https://arxiv.org/pdf/2511.04962
• Github: https://github.com/Tencent/DigitalHuman/tree/main/RolePlay_Villain
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #AISafety #RolePlaying #NLP
📝 Summary:
LLMs struggle to authentically role-play villains due to safety alignment, showing a monotonic decline in fidelity as character morality decreases. The Moral RolePlay benchmark reveals models struggle with traits like deceit and manipulation, highlighting a tension between model safety and creati...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04962
• PDF: https://arxiv.org/pdf/2511.04962
• Github: https://github.com/Tencent/DigitalHuman/tree/main/RolePlay_Villain
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #AI #AISafety #RolePlaying #NLP
✨Visual Spatial Tuning
📝 Summary:
Visual Spatial Tuning VST is a framework that progressively trains Vision-Language Models VLMs using specialized datasets VST-P for spatial perception and VST-R for reasoning. VST achieves state-of-the-art results on spatial benchmarks without harming general VLM capabilities, leading to more phy...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05491
• PDF: https://arxiv.org/pdf/2511.05491
• Project Page: https://yangr116.github.io/vst_project/
• Github: https://github.com/Yangr116/VST
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #SpatialAI #ComputerVision #DeepLearning #AIResearch
📝 Summary:
Visual Spatial Tuning VST is a framework that progressively trains Vision-Language Models VLMs using specialized datasets VST-P for spatial perception and VST-R for reasoning. VST achieves state-of-the-art results on spatial benchmarks without harming general VLM capabilities, leading to more phy...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05491
• PDF: https://arxiv.org/pdf/2511.05491
• Project Page: https://yangr116.github.io/vst_project/
• Github: https://github.com/Yangr116/VST
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #SpatialAI #ComputerVision #DeepLearning #AIResearch
✨Dense Motion Captioning
📝 Summary:
The paper introduces Dense Motion Captioning, a new task for 3D human motion understanding. It presents CompMo, a large dataset with complex, temporally annotated motions, and DEMO, a model combining a language model with a motion adapter to generate detailed, grounded captions.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05369
• PDF: https://arxiv.org/pdf/2511.05369
• Project Page: https://xusy2333.com/demo/
• Github: https://github.com/41xu/DEMO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MotionCaptioning #3DMotion #ComputerVision #LanguageModels #AIResearch
📝 Summary:
The paper introduces Dense Motion Captioning, a new task for 3D human motion understanding. It presents CompMo, a large dataset with complex, temporally annotated motions, and DEMO, a model combining a language model with a motion adapter to generate detailed, grounded captions.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05369
• PDF: https://arxiv.org/pdf/2511.05369
• Project Page: https://xusy2333.com/demo/
• Github: https://github.com/41xu/DEMO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MotionCaptioning #3DMotion #ComputerVision #LanguageModels #AIResearch
✨DeepEyesV2: Toward Agentic Multimodal Model
📝 Summary:
DeepEyesV2 is an agentic multimodal model that uses a two-stage training pipeline for robust tool integration. This method, combining a cold-start stage and reinforcement learning, effectively enables task-adaptive tool invocation for real-world reasoning tasks.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05271
• PDF: https://arxiv.org/pdf/2511.05271
• Project Page: https://visual-agent.github.io/
• Github: https://github.com/Visual-Agent/DeepEyes
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #AgenticAI #ReinforcementLearning #DeepLearning #AIResearch
📝 Summary:
DeepEyesV2 is an agentic multimodal model that uses a two-stage training pipeline for robust tool integration. This method, combining a cold-start stage and reinforcement learning, effectively enables task-adaptive tool invocation for real-world reasoning tasks.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05271
• PDF: https://arxiv.org/pdf/2511.05271
• Project Page: https://visual-agent.github.io/
• Github: https://github.com/Visual-Agent/DeepEyes
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#MultimodalAI #AgenticAI #ReinforcementLearning #DeepLearning #AIResearch
✨Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
📝 Summary:
Large Vision-Language Models suffer from language bias leading to hallucinations. Our method refines textual embeddings by integrating average-pooled visual features. This simple approach improves visual grounding and reduces hallucinations.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05017
• PDF: https://arxiv.org/pdf/2511.05017
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AIHallucinations #VisualGrounding #DeepLearning #NLP
📝 Summary:
Large Vision-Language Models suffer from language bias leading to hallucinations. Our method refines textual embeddings by integrating average-pooled visual features. This simple approach improves visual grounding and reduces hallucinations.
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05017
• PDF: https://arxiv.org/pdf/2511.05017
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #AIHallucinations #VisualGrounding #DeepLearning #NLP
✨Jailbreaking in the Haystack
📝 Summary:
NINJA is a new jailbreak method for long-context LMs. It appends benign content to harmful goals, exploiting goal positioning. This significantly increases attack success rates, revealing fundamental vulnerabilities in modern models.
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04707
• PDF: https://arxiv.org/pdf/2511.04707
• Project Page: https://ar-forum.github.io/ninjaattackweb/
• Github: https://github.com/AR-FORUM/NINJA_Attack
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #Jailbreaking #AISafety #AI #Cybersecurity
📝 Summary:
NINJA is a new jailbreak method for long-context LMs. It appends benign content to harmful goals, exploiting goal positioning. This significantly increases attack success rates, revealing fundamental vulnerabilities in modern models.
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04707
• PDF: https://arxiv.org/pdf/2511.04707
• Project Page: https://ar-forum.github.io/ninjaattackweb/
• Github: https://github.com/AR-FORUM/NINJA_Attack
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #Jailbreaking #AISafety #AI #Cybersecurity
This media is not supported in your browser
VIEW IN TELEGRAM
LLM vs RAG vs Agent by hand ✍️ Workbook
Download PDF 👉 https://lnkd.in/gjf2F6M8
https://news.1rj.ru/str/DataScienceT
Download PDF 👉 https://lnkd.in/gjf2F6M8
https://news.1rj.ru/str/DataScienceT
❤1
🤖🧠 DeepAgent: A New Era of General AI Reasoning and Scalable Tool-Use Intelligence
🗓️ 09 Nov 2025
📚 AI News & Trends
Artificial intelligence has rapidly progressed from simple assistants to advanced reasoning systems capable of complex problem-solving. As tasks demand more autonomy, adaptability and real-world interaction, the AI field has entered the era of intelligent agent systems. These agents are expected not just to answer questions, but to think, plan, search, act and interact across digital ...
#GeneralAI #ArtificialIntelligence #AIReasoning #IntelligentAgents #ScalableAI #ToolUseAI
🗓️ 09 Nov 2025
📚 AI News & Trends
Artificial intelligence has rapidly progressed from simple assistants to advanced reasoning systems capable of complex problem-solving. As tasks demand more autonomy, adaptability and real-world interaction, the AI field has entered the era of intelligent agent systems. These agents are expected not just to answer questions, but to think, plan, search, act and interact across digital ...
#GeneralAI #ArtificialIntelligence #AIReasoning #IntelligentAgents #ScalableAI #ToolUseAI
✨Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
📝 Summary:
ROLL Flash enhances LLM RL post-training using asynchronous methods. It employs fine-grained parallelism and rollout-train decoupling to boost resource use and scalability. This achieves up to 2.72x speedup while matching synchronous training performance.
🔹 Publication Date: Published on Oct 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.11345
• PDF: https://arxiv.org/pdf/2510.11345
• Github: https://github.com/alibaba/ROLL
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AsynchronousAI #DeepLearning #AIResearch
📝 Summary:
ROLL Flash enhances LLM RL post-training using asynchronous methods. It employs fine-grained parallelism and rollout-train decoupling to boost resource use and scalability. This achieves up to 2.72x speedup while matching synchronous training performance.
🔹 Publication Date: Published on Oct 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.11345
• PDF: https://arxiv.org/pdf/2510.11345
• Github: https://github.com/alibaba/ROLL
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ReinforcementLearning #AsynchronousAI #DeepLearning #AIResearch
🤖🧠 PokeeResearch: Advancing Deep Research with AI and Web-Integrated Intelligence
🗓️ 09 Nov 2025
📚 AI News & Trends
In the modern information era, the ability to research fast, accurately and at scale has become a competitive advantage for businesses, researchers, analysts and developers. As online data expands exponentially, traditional search engines and manual research workflows are no longer sufficient to gather reliable insights efficiently. This need has fueled the rise of AI research ...
#AIResearch #DeepResearch #WebIntelligence #ArtificialIntelligence #ResearchAutomation #DataAnalysis
🗓️ 09 Nov 2025
📚 AI News & Trends
In the modern information era, the ability to research fast, accurately and at scale has become a competitive advantage for businesses, researchers, analysts and developers. As online data expands exponentially, traditional search engines and manual research workflows are no longer sufficient to gather reliable insights efficiently. This need has fueled the rise of AI research ...
#AIResearch #DeepResearch #WebIntelligence #ArtificialIntelligence #ResearchAutomation #DataAnalysis
🤖🧠 Pico-Banana-400K: The Breakthrough Dataset Advancing Text-Guided Image Editing
🗓️ 09 Nov 2025
📚 AI News & Trends
Text-guided image editing has rapidly evolved with powerful multimodal models capable of transforming images using simple natural-language instructions. These models can change object colors, modify lighting, add accessories, adjust backgrounds or even convert real photographs into artistic styles. However, the progress of research has been limited by one crucial bottleneck: the lack of large-scale, high-quality, ...
#TextGuidedEditing #MultimodalAI #ImageEditing #AIResearch #ComputerVision #DeepLearning
🗓️ 09 Nov 2025
📚 AI News & Trends
Text-guided image editing has rapidly evolved with powerful multimodal models capable of transforming images using simple natural-language instructions. These models can change object colors, modify lighting, add accessories, adjust backgrounds or even convert real photographs into artistic styles. However, the progress of research has been limited by one crucial bottleneck: the lack of large-scale, high-quality, ...
#TextGuidedEditing #MultimodalAI #ImageEditing #AIResearch #ComputerVision #DeepLearning
✨Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
📝 Summary:
ROLL is an efficient, scalable, and user-friendly library for large-scale reinforcement learning optimization. It features a simplified architecture, parallel training, flexible sample management, and resource mapping for developers and researchers.
🔹 Publication Date: Published on Jun 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.06122
• PDF: https://arxiv.org/pdf/2506.06122
• Github: https://github.com/alibaba/roll
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #LargeScaleAI #Optimization #AIResearch
📝 Summary:
ROLL is an efficient, scalable, and user-friendly library for large-scale reinforcement learning optimization. It features a simplified architecture, parallel training, flexible sample management, and resource mapping for developers and researchers.
🔹 Publication Date: Published on Jun 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.06122
• PDF: https://arxiv.org/pdf/2506.06122
• Github: https://github.com/alibaba/roll
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #MachineLearning #LargeScaleAI #Optimization #AIResearch
🤖🧠 Concerto: How Joint 2D-3D Self-Supervised Learning Is Redefining Spatial Intelligence
🗓️ 09 Nov 2025
📚 AI News & Trends
The world of artificial intelligence is rapidly evolving and self-supervised learning has become a driving force behind breakthroughs in computer vision and 3D scene understanding. Traditional supervised learning relies heavily on labeled datasets which are expensive and time-consuming to produce. Self-supervised learning, on the other hand, extracts meaningful patterns without manual labels allowing models to ...
#SelfSupervisedLearning #ComputerVision #3DSceneUnderstanding #SpatialIntelligence #AIResearch #DeepLearning
🗓️ 09 Nov 2025
📚 AI News & Trends
The world of artificial intelligence is rapidly evolving and self-supervised learning has become a driving force behind breakthroughs in computer vision and 3D scene understanding. Traditional supervised learning relies heavily on labeled datasets which are expensive and time-consuming to produce. Self-supervised learning, on the other hand, extracts meaningful patterns without manual labels allowing models to ...
#SelfSupervisedLearning #ComputerVision #3DSceneUnderstanding #SpatialIntelligence #AIResearch #DeepLearning
✨CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
📝 Summary:
CritiCal, a novel training method using natural language critiques, significantly improves LLM confidence calibration. This method outperforms other approaches, including GPT-4o, enhancing reliability and generalization across tasks.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24505
• PDF: https://arxiv.org/pdf/2510.24505
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ConfidenceCalibration #MachineLearning #NLP #AIResearch
📝 Summary:
CritiCal, a novel training method using natural language critiques, significantly improves LLM confidence calibration. This method outperforms other approaches, including GPT-4o, enhancing reliability and generalization across tasks.
🔹 Publication Date: Published on Oct 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24505
• PDF: https://arxiv.org/pdf/2510.24505
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ConfidenceCalibration #MachineLearning #NLP #AIResearch
✨HAFixAgent: History-Aware Automated Program Repair Agent
📝 Summary:
HAFixAgent enhances automated program repair for complex multi-hunk bugs by incorporating repository history. It significantly improves bug-fixing effectiveness over existing agent-based systems while maintaining efficiency. This offers a practical approach for history-aware agentic APR.
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01047
• PDF: https://arxiv.org/pdf/2511.01047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutomatedProgramRepair #SoftwareEngineering #AI #BugFixing #CodeRepair
📝 Summary:
HAFixAgent enhances automated program repair for complex multi-hunk bugs by incorporating repository history. It significantly improves bug-fixing effectiveness over existing agent-based systems while maintaining efficiency. This offers a practical approach for history-aware agentic APR.
🔹 Publication Date: Published on Nov 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01047
• PDF: https://arxiv.org/pdf/2511.01047
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutomatedProgramRepair #SoftwareEngineering #AI #BugFixing #CodeRepair
✨VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
📝 Summary:
VeriCoT is a neuro-symbolic method to validate LLM Chain-of-Thought reasoning. It formalizes CoT steps into first-order logic for automated verification of consistency. This improves LLM reliability by identifying flawed reasoning and enhancing overall accuracy.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.04662
• PDF: https://arxiv.org/pdf/2511.04662
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ChainOfThought #NeuroSymbolic #AI #Logic
📝 Summary:
VeriCoT is a neuro-symbolic method to validate LLM Chain-of-Thought reasoning. It formalizes CoT steps into first-order logic for automated verification of consistency. This improves LLM reliability by identifying flawed reasoning and enhancing overall accuracy.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.04662
• PDF: https://arxiv.org/pdf/2511.04662
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #ChainOfThought #NeuroSymbolic #AI #Logic
✨VGGT: Visual Geometry Grounded Transformer
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
📝 Summary:
VGGT is a novel feed-forward neural network that efficiently infers multiple key 3D scene attributes from single or multiple views. It outperforms existing specialized models without requiring post-processing, achieving state-of-the-art results across several 3D computer vision tasks. VGGT also s...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.11651
• PDF: https://arxiv.org/pdf/2503.11651
• Project Page: https://vgg-t.github.io/
• Github: https://github.com/facebookresearch/vggt
🔹 Models citing this paper:
• https://huggingface.co/facebook/VGGT-1B
• https://huggingface.co/facebook/VGGT-1B-Commercial
✨ Spaces citing this paper:
• https://huggingface.co/spaces/facebook/vggt
• https://huggingface.co/spaces/Pointcept/Concerto
• https://huggingface.co/spaces/HanzhouLiu/Stylos_Demo
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#3DComputerVision #Transformers #DeepLearning #ComputerVision #AI
arXiv.org
VGGT: Visual Geometry Grounded Transformer
We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or...