✨MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning
📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning
Media is too big
VIEW IN TELEGRAM
✨ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
📝 Summary:
ViSAudio is an end-to-end framework that generates high-quality binaural spatial audio directly from silent video. It uses conditional flow matching and a dual-branch architecture, outperforming previous methods in immersion and consistency. The paper also introduces the BiAudio dataset for this ...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03036
• PDF: https://arxiv.org/pdf/2512.03036
• Project Page: https://kszpxxzmc.github.io/ViSAudio-project/
• Github: https://github.com/kszpxxzmc/ViSAudio
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpatialAudio #AudioGeneration #DeepLearning #ComputerVision #AI
📝 Summary:
ViSAudio is an end-to-end framework that generates high-quality binaural spatial audio directly from silent video. It uses conditional flow matching and a dual-branch architecture, outperforming previous methods in immersion and consistency. The paper also introduces the BiAudio dataset for this ...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03036
• PDF: https://arxiv.org/pdf/2512.03036
• Project Page: https://kszpxxzmc.github.io/ViSAudio-project/
• Github: https://github.com/kszpxxzmc/ViSAudio
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpatialAudio #AudioGeneration #DeepLearning #ComputerVision #AI
✨MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
📝 Summary:
MultiShotMaster is a framework for controllable multi-shot video generation. It extends a single-shot model with novel RoPE variants for flexible shot arrangement, narrative order, and spatiotemporal reference injection. The framework also uses an automated data annotation pipeline to address dat...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03041
• PDF: https://arxiv.org/pdf/2512.03041
• Project Page: https://qinghew.github.io/MultiShotMaster/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AI #ComputerVision
📝 Summary:
MultiShotMaster is a framework for controllable multi-shot video generation. It extends a single-shot model with novel RoPE variants for flexible shot arrangement, narrative order, and spatiotemporal reference injection. The framework also uses an automated data annotation pipeline to address dat...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03041
• PDF: https://arxiv.org/pdf/2512.03041
• Project Page: https://qinghew.github.io/MultiShotMaster/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AI #ComputerVision
✨C^2DLM: Causal Concept-Guided Diffusion Large Language Models
📝 Summary:
C2DLM is a Causal Concept-Guided Diffusion Language Model that improves reasoning. It guides DLM attention with concept-level causal graphs from a teacher model to learn causal relationships. This achieves an average gain of over one percent on reasoning tasks and speeds up training by 3.2 times.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22146
• PDF: https://arxiv.org/pdf/2511.22146
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #CausalAI #DiffusionModels #AI #NLP
📝 Summary:
C2DLM is a Causal Concept-Guided Diffusion Language Model that improves reasoning. It guides DLM attention with concept-level causal graphs from a teacher model to learn causal relationships. This achieves an average gain of over one percent on reasoning tasks and speeds up training by 3.2 times.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22146
• PDF: https://arxiv.org/pdf/2511.22146
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLM #CausalAI #DiffusionModels #AI #NLP
✨The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models
📝 Summary:
LLMs can encode high-level relational concepts for analogies but struggle with missing relational information and transfer to new entities. Success depends on strong structural alignment. Their analogical reasoning is emerging but limited compared to humans.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20344
• PDF: https://arxiv.org/pdf/2511.20344
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #AnalogicalReasoning #AIResearch #NaturalLanguageProcessing #CognitiveAI
📝 Summary:
LLMs can encode high-level relational concepts for analogies but struggle with missing relational information and transfer to new entities. Success depends on strong structural alignment. Their analogical reasoning is emerging but limited compared to humans.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20344
• PDF: https://arxiv.org/pdf/2511.20344
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMs #AnalogicalReasoning #AIResearch #NaturalLanguageProcessing #CognitiveAI
✨Artemis: Structured Visual Reasoning for Perception Policy Learning
📝 Summary:
Artemis improves visual perception by using structured spatial reasoning with label bounding-box pairs instead of linguistic intermediate reasoning. This avoids language ambiguity, enables direct supervision, and leads to strong performance and generalization across diverse visual tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01988
• PDF: https://arxiv.org/pdf/2512.01988
• Project Page: https://vi-ocean.github.io/projects/artemis/
• Github: https://github.com/WayneTomas/Artemis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualPerception #ComputerVision #SpatialReasoning #AI #MachineLearning
📝 Summary:
Artemis improves visual perception by using structured spatial reasoning with label bounding-box pairs instead of linguistic intermediate reasoning. This avoids language ambiguity, enables direct supervision, and leads to strong performance and generalization across diverse visual tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01988
• PDF: https://arxiv.org/pdf/2512.01988
• Project Page: https://vi-ocean.github.io/projects/artemis/
• Github: https://github.com/WayneTomas/Artemis
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualPerception #ComputerVision #SpatialReasoning #AI #MachineLearning
✨SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
📝 Summary:
SimWorld is a new Unreal Engine 5 simulator for developing and evaluating LLM VLM agents in realistic, open-ended physical and social environments. It provides diverse scenarios and a rich interface, revealing distinct reasoning patterns and limitations across frontier LLM models.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01078
• PDF: https://arxiv.org/pdf/2512.01078
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #LLM #Simulation #AutonomousAgents #UnrealEngine5
📝 Summary:
SimWorld is a new Unreal Engine 5 simulator for developing and evaluating LLM VLM agents in realistic, open-ended physical and social environments. It provides diverse scenarios and a rich interface, revealing distinct reasoning patterns and limitations across frontier LLM models.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01078
• PDF: https://arxiv.org/pdf/2512.01078
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #LLM #Simulation #AutonomousAgents #UnrealEngine5
✨CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
📝 Summary:
CodeV improves faithful visual reasoning by training an agent with Tool-Aware Policy Optimization TAPO. TAPO uses dense rewards directly on visual tool inputs and outputs, encouraging evidence-consistent tool use. This approach significantly boosts faithful tool use and achieves competitive accur...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19661
• PDF: https://arxiv.org/pdf/2511.19661
🔹 Models citing this paper:
• https://huggingface.co/RenlyH/CodeV-RL
• https://huggingface.co/RenlyH/CodeV-SFT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/RenlyH/CodeV-RL-Data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AI #ToolLearning
📝 Summary:
CodeV improves faithful visual reasoning by training an agent with Tool-Aware Policy Optimization TAPO. TAPO uses dense rewards directly on visual tool inputs and outputs, encouraging evidence-consistent tool use. This approach significantly boosts faithful tool use and achieves competitive accur...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19661
• PDF: https://arxiv.org/pdf/2511.19661
🔹 Models citing this paper:
• https://huggingface.co/RenlyH/CodeV-RL
• https://huggingface.co/RenlyH/CodeV-SFT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/RenlyH/CodeV-RL-Data
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AI #ToolLearning
❤4
🚀 Pass Your IT Exam in 2025——Free Practice Tests & Premium Materials
SPOTO offers free, instant access to high-quality, up-to-date resources that help you study smarter and pass faster
✔️ Python, CCNA, CCNP, AWS, PMP, CISSP, Azure, & more
✔️ 100% Free, no sign-up, Instantly downloadable
📥Grab your free materials here:
·IT exams skill Test : https://bit.ly/443t4xB
·IT Certs E-book : https://bit.ly/4izDv1D
·Python, Excel, Cyber Security Courses : https://bit.ly/44LidZf
📱 Join Our IT Study Group for insider tips & expert support:
https://chat.whatsapp.com/K3n7OYEXgT1CHGylN6fM5a
💬 Need help ? Chat with an admin now:
wa.link/cbfsmf
⏳ Don’t Wait—Boost Your Career Today!
SPOTO offers free, instant access to high-quality, up-to-date resources that help you study smarter and pass faster
✔️ Python, CCNA, CCNP, AWS, PMP, CISSP, Azure, & more
✔️ 100% Free, no sign-up, Instantly downloadable
📥Grab your free materials here:
·IT exams skill Test : https://bit.ly/443t4xB
·IT Certs E-book : https://bit.ly/4izDv1D
·Python, Excel, Cyber Security Courses : https://bit.ly/44LidZf
📱 Join Our IT Study Group for insider tips & expert support:
https://chat.whatsapp.com/K3n7OYEXgT1CHGylN6fM5a
💬 Need help ? Chat with an admin now:
wa.link/cbfsmf
⏳ Don’t Wait—Boost Your Career Today!
❤4
✨Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
📝 Summary:
Skywork-R1V4 is a 30B multimodal agentic model that unifies image manipulation and deep search with interleaved reasoning. It achieves state-of-the-art performance in perception and multimodal search using only supervised fine-tuning, demonstrating advanced intelligence without reinforcement lear...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02395
• PDF: https://arxiv.org/pdf/2512.02395
• Project Page: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html
• Github: https://github.com/SkyworkAI/Skywork-R1V/tree/main/r1v4
🔹 Models citing this paper:
• https://huggingface.co/Skywork/R1V4
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultimodalAI #AgenticAI #DeepLearning #ComputerVision
📝 Summary:
Skywork-R1V4 is a 30B multimodal agentic model that unifies image manipulation and deep search with interleaved reasoning. It achieves state-of-the-art performance in perception and multimodal search using only supervised fine-tuning, demonstrating advanced intelligence without reinforcement lear...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02395
• PDF: https://arxiv.org/pdf/2512.02395
• Project Page: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html
• Github: https://github.com/SkyworkAI/Skywork-R1V/tree/main/r1v4
🔹 Models citing this paper:
• https://huggingface.co/Skywork/R1V4
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #MultimodalAI #AgenticAI #DeepLearning #ComputerVision
✨CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
📝 Summary:
CUDA-L2 uses LLMs and reinforcement learning to optimize Half-precision General Matrix Multiply CUDA kernels. It significantly outperforms major baselines like cuBLAS and torch.matmul, achieving up to 28.7% speedup in server mode. This demonstrates AI can enhance even highly optimized kernels.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02551
• PDF: https://arxiv.org/pdf/2512.02551
• Github: https://github.com/deepreinforce-ai/CUDA-L2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CUDA #ReinforcementLearning #LLM #MatrixMultiplication #AI
📝 Summary:
CUDA-L2 uses LLMs and reinforcement learning to optimize Half-precision General Matrix Multiply CUDA kernels. It significantly outperforms major baselines like cuBLAS and torch.matmul, achieving up to 28.7% speedup in server mode. This demonstrates AI can enhance even highly optimized kernels.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02551
• PDF: https://arxiv.org/pdf/2512.02551
• Github: https://github.com/deepreinforce-ai/CUDA-L2
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#CUDA #ReinforcementLearning #LLM #MatrixMultiplication #AI
👍1
✨Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models
📝 Summary:
Masked Diffusion Language Models MDLMs show locality bias and poor context comprehension due to appended mask tokens acting as distractors. A mask-agnostic loss function was introduced. This function improves MDLM robustness by mitigating the masks distracting effect.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21338
• PDF: https://arxiv.org/pdf/2511.21338
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LanguageModels #DiffusionModels #NLP #ContextComprehension #AIResearch
📝 Summary:
Masked Diffusion Language Models MDLMs show locality bias and poor context comprehension due to appended mask tokens acting as distractors. A mask-agnostic loss function was introduced. This function improves MDLM robustness by mitigating the masks distracting effect.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21338
• PDF: https://arxiv.org/pdf/2511.21338
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LanguageModels #DiffusionModels #NLP #ContextComprehension #AIResearch
❤1
🚀 Master Data Science & Programming!
Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!
🔰 Machine Learning with Python
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer
🔖 Machine Learning
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM
🧠 Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4
🎯 PyData Careers | Quiz
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ
💾 Kaggle Data Hub
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1
🧑🎓 Udemy Coupons | Courses
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC
😀 ML Research Hub
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT
💬 Data Science Chat
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9
🐍 Python Arab| بايثون عربي
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab
🖊 Data Science Jupyter Notebooks
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN
📺 Free Online Courses | Videos
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV
📈 Data Analytics
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX
🎧 Learn Python Hub
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53
⭐️ Research Papers
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY
━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY
━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3
Mistral released Ministral 3 — a new line of reasoning- and instruct-models!
Ministral 3 is available in versions 3B, 8B, and 14B, with vision support and top performance in its class.
The 14B model can be run locally on a machine with 24 GB RAM.
Guide + laptop: https://docs.unsloth.ai/new/ministral-3
GGUF builds: https://huggingface.co/collections/unsloth/ministral-3
👉 https://news.1rj.ru/str/DataScienceT
Ministral 3 is available in versions 3B, 8B, and 14B, with vision support and top performance in its class.
The 14B model can be run locally on a machine with 24 GB RAM.
Guide + laptop: https://docs.unsloth.ai/new/ministral-3
GGUF builds: https://huggingface.co/collections/unsloth/ministral-3
Please open Telegram to view this post
VIEW IN TELEGRAM
❤4
✨Qwen3-VL Technical Report
📝 Summary:
Qwen3-VL is a highly capable vision-language model, achieving superior performance across multimodal benchmarks. It supports 256K interleaved contexts and offers strong text understanding, robust long-context comprehension, and advanced multimodal reasoning through key architectural upgrades.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21631
• PDF: https://arxiv.org/pdf/2511.21631
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModel #MultimodalAI #AI #DeepLearning #LLM
📝 Summary:
Qwen3-VL is a highly capable vision-language model, achieving superior performance across multimodal benchmarks. It supports 256K interleaved contexts and offers strong text understanding, robust long-context comprehension, and advanced multimodal reasoning through key architectural upgrades.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21631
• PDF: https://arxiv.org/pdf/2511.21631
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModel #MultimodalAI #AI #DeepLearning #LLM
✨ViDiC: Video Difference Captioning
📝 Summary:
The ViDiC task and ViDiC-1K dataset evaluate MLLMs' ability to describe differences between video pairs, overcoming static image captioning limits. It assesses motion and event evolution, finding significant performance gaps in current models for comparative video understanding.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03405
• PDF: https://arxiv.org/pdf/2512.03405
• Project Page: https://vidic-1k.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoCaptioning #MLLM #VideoUnderstanding #ComputerVision #AIResearch
📝 Summary:
The ViDiC task and ViDiC-1K dataset evaluate MLLMs' ability to describe differences between video pairs, overcoming static image captioning limits. It assesses motion and event evolution, finding significant performance gaps in current models for comparative video understanding.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03405
• PDF: https://arxiv.org/pdf/2512.03405
• Project Page: https://vidic-1k.github.io/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoCaptioning #MLLM #VideoUnderstanding #ComputerVision #AIResearch
✨PretrainZero: Reinforcement Active Pretraining
📝 Summary:
PretrainZero is a reinforcement active learning framework that pretrains large models on unlabeled general corpora using RL. It significantly improves general reasoning abilities and benchmark performance, breaking the verification data-wall for artificial general intelligence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03442
• PDF: https://arxiv.org/pdf/2512.03442
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #ActiveLearning #AGI #Pretraining #MachineLearning
📝 Summary:
PretrainZero is a reinforcement active learning framework that pretrains large models on unlabeled general corpora using RL. It significantly improves general reasoning abilities and benchmark performance, breaking the verification data-wall for artificial general intelligence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03442
• PDF: https://arxiv.org/pdf/2512.03442
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ReinforcementLearning #ActiveLearning #AGI #Pretraining #MachineLearning
❤1
✨OneThinker: All-in-one Reasoning Model for Image and Video
📝 Summary:
OneThinker is an all-in-one model unifying image and video understanding across diverse tasks like QA, captioning, and tracking. It employs a new training corpus and RL method for balanced optimization, achieving strong performance and knowledge transfer across 31 benchmarks. This advances toward...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03043
• PDF: https://arxiv.org/pdf/2512.03043
• Project Page: https://github.com/tulerfeng/OneThinker
• Github: https://github.com/tulerfeng/OneThinker
🔹 Models citing this paper:
• https://huggingface.co/OneThink/OneThinker-8B
• https://huggingface.co/OneThink/OneThinker-SFT-Qwen3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OneThink/OneThinker-train-data
• https://huggingface.co/datasets/OneThink/OneThinker-eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #MultimodalAI #DeepLearning #VideoUnderstanding
📝 Summary:
OneThinker is an all-in-one model unifying image and video understanding across diverse tasks like QA, captioning, and tracking. It employs a new training corpus and RL method for balanced optimization, achieving strong performance and knowledge transfer across 31 benchmarks. This advances toward...
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03043
• PDF: https://arxiv.org/pdf/2512.03043
• Project Page: https://github.com/tulerfeng/OneThinker
• Github: https://github.com/tulerfeng/OneThinker
🔹 Models citing this paper:
• https://huggingface.co/OneThink/OneThinker-8B
• https://huggingface.co/OneThink/OneThinker-SFT-Qwen3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OneThink/OneThinker-train-data
• https://huggingface.co/datasets/OneThink/OneThinker-eval
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AI #ComputerVision #MultimodalAI #DeepLearning #VideoUnderstanding
arXiv.org
OneThinker: All-in-one Reasoning Model for Image and Video
Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, existing approaches typically train...
✨Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
📝 Summary:
PRIS adaptively revises prompts during text-to-visual generation inference to enhance user intent alignment. It reviews visual failures and redesigns prompts using fine-grained feedback, proving that jointly scaling prompts and visuals improves accuracy and quality.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03534
• PDF: https://arxiv.org/pdf/2512.03534
• Project Page: https://subin-kim-cv.github.io/PRIS
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#PromptEngineering #TextToImage #GenerativeAI #DeepLearning #AIResearch
📝 Summary:
PRIS adaptively revises prompts during text-to-visual generation inference to enhance user intent alignment. It reviews visual failures and redesigns prompts using fine-grained feedback, proving that jointly scaling prompts and visuals improves accuracy and quality.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03534
• PDF: https://arxiv.org/pdf/2512.03534
• Project Page: https://subin-kim-cv.github.io/PRIS
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#PromptEngineering #TextToImage #GenerativeAI #DeepLearning #AIResearch