✨MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
✨Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
📝 Summary:
TACO enhances VLA model stability and success rates by preventing distribution shifts at inference. It uses a lightweight pseudo-count estimator to verify and select optimal action chunks at test-time. This gradient-free method significantly improves performance in downstream tasks.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02834
• PDF: https://arxiv.org/pdf/2512.02834
• Project Page: https://vla-anti-exploration.github.io/
• Github: https://github.com/breez3young/TACO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLAModels #AntiExploration #AIResearch #MachineLearning #RoboticsAI
📝 Summary:
TACO enhances VLA model stability and success rates by preventing distribution shifts at inference. It uses a lightweight pseudo-count estimator to verify and select optimal action chunks at test-time. This gradient-free method significantly improves performance in downstream tasks.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02834
• PDF: https://arxiv.org/pdf/2512.02834
• Project Page: https://vla-anti-exploration.github.io/
• Github: https://github.com/breez3young/TACO
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLAModels #AntiExploration #AIResearch #MachineLearning #RoboticsAI
✨Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
📝 Summary:
A novel alignment strategy improves Normalizing Flows by aligning their generative reverse pass with vision foundation models. This boosts generative quality, classification accuracy, and training speed, achieving new state-of-the-art results for NFs.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22345
• PDF: https://arxiv.org/pdf/2511.22345
• Github: https://github.com/MCG-NJU/FlowBack
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#NormalizingFlows #GenerativeAI #DeepLearning #ComputerVision #MachineLearning
📝 Summary:
A novel alignment strategy improves Normalizing Flows by aligning their generative reverse pass with vision foundation models. This boosts generative quality, classification accuracy, and training speed, achieving new state-of-the-art results for NFs.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22345
• PDF: https://arxiv.org/pdf/2511.22345
• Github: https://github.com/MCG-NJU/FlowBack
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#NormalizingFlows #GenerativeAI #DeepLearning #ComputerVision #MachineLearning
✨Deep Research: A Systematic Survey
📝 Summary:
This survey systematically reviews Deep Research systems that integrate LLMs with external tools to enhance complex problem-solving. It provides a roadmap, key components, optimization techniques, and challenges for these advanced research agents.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02038
• PDF: https://arxiv.org/pdf/2512.02038
• Project Page: https://deep-research-survey.github.io/
• Github: https://github.com/mangopy/Deep-Research-Survey
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearch #LLMs #AI #ResearchAgents #SystematicSurvey
📝 Summary:
This survey systematically reviews Deep Research systems that integrate LLMs with external tools to enhance complex problem-solving. It provides a roadmap, key components, optimization techniques, and challenges for these advanced research agents.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02038
• PDF: https://arxiv.org/pdf/2512.02038
• Project Page: https://deep-research-survey.github.io/
• Github: https://github.com/mangopy/Deep-Research-Survey
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#DeepResearch #LLMs #AI #ResearchAgents #SystematicSurvey
✨Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
📝 Summary:
The Adversarial Confusion Attack systematically disrupts multimodal LLMs, causing incoherent or confidently incorrect outputs. This basic adversarial technique transfers to diverse models, including proprietary ones, potentially hindering AI Agent reliability.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20494
• PDF: https://arxiv.org/pdf/2511.20494
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AdversarialAttack #MultimodalAI #LLMs #AISecurity #AIResearch
📝 Summary:
The Adversarial Confusion Attack systematically disrupts multimodal LLMs, causing incoherent or confidently incorrect outputs. This basic adversarial technique transfers to diverse models, including proprietary ones, potentially hindering AI Agent reliability.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20494
• PDF: https://arxiv.org/pdf/2511.20494
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AdversarialAttack #MultimodalAI #LLMs #AISecurity #AIResearch
❤1👍1🔥1
✨Jina-VLM: Small Multilingual Vision Language Model
📝 Summary:
Jina-VLM is a 2.4B vision-language model achieving top multilingual VQA among open 2B-scale models. It couples a SigLIP2 vision encoder with a Qwen3 language backbone via an attention-pooling connector for efficient arbitrary-resolution image processing.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04032
• PDF: https://arxiv.org/pdf/2512.04032
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLM #MultilingualAI #ComputerVision #DeepLearning #VQA
📝 Summary:
Jina-VLM is a 2.4B vision-language model achieving top multilingual VQA among open 2B-scale models. It couples a SigLIP2 vision encoder with a Qwen3 language backbone via an attention-pooling connector for efficient arbitrary-resolution image processing.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04032
• PDF: https://arxiv.org/pdf/2512.04032
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VLM #MultilingualAI #ComputerVision #DeepLearning #VQA
❤2
🎁❗️TODAY FREE❗️🎁
Entry to our VIP channel is completely free today. Tomorrow it will cost $500! 🔥
JOIN 👇
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
Entry to our VIP channel is completely free today. Tomorrow it will cost $500! 🔥
JOIN 👇
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
https://news.1rj.ru/str/+MPpZ4FO2PHQ4OTZi
❤1
✨Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem
📝 Summary:
Analysis of Hugging Face data reveals a rebalancing of the open model economy. US industry dominance has declined as Chinese influence and community developers grow, alongside shifts in model properties and declining data transparency.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03073
• PDF: https://arxiv.org/pdf/2512.03073
✨ Spaces citing this paper:
• https://huggingface.co/spaces/economies-open-ai/open-model-evolution
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OpenModels #AIEconomy #HuggingFace #AIGeopolitics #DataTransparency
📝 Summary:
Analysis of Hugging Face data reveals a rebalancing of the open model economy. US industry dominance has declined as Chinese influence and community developers grow, alongside shifts in model properties and declining data transparency.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03073
• PDF: https://arxiv.org/pdf/2512.03073
✨ Spaces citing this paper:
• https://huggingface.co/spaces/economies-open-ai/open-model-evolution
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#OpenModels #AIEconomy #HuggingFace #AIGeopolitics #DataTransparency
❤3
🚀 Master Data Science & Programming!
Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!
🔰 Machine Learning with Python
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer
🔖 Machine Learning
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM
🧠 Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4
🎯 PyData Careers | Quiz
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ
💾 Kaggle Data Hub
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1
🧑🎓 Udemy Coupons | Courses
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC
😀 ML Research Hub
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT
💬 Data Science Chat
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9
🐍 Python Arab| بايثون عربي
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab
🖊 Data Science Jupyter Notebooks
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN
📺 Free Online Courses | Videos
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV
📈 Data Analytics
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX
🎧 Learn Python Hub
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53
⭐️ Research Papers
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY
━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://news.1rj.ru/str/CodeProgrammer
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://news.1rj.ru/str/DataScienceM
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://news.1rj.ru/str/DataScience4
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://news.1rj.ru/str/DataScienceQ
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://news.1rj.ru/str/datasets1
The first channel in Telegram that offers free Udemy coupons
https://news.1rj.ru/str/DataScienceC
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://news.1rj.ru/str/DataScienceT
An active community group for discussing data challenges and networking with peers.
https://news.1rj.ru/str/DataScience9
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://news.1rj.ru/str/PythonArab
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://news.1rj.ru/str/DataScienceN
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://news.1rj.ru/str/DataScienceV
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://news.1rj.ru/str/DataAnalyticsX
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://news.1rj.ru/str/Python53
Professional Academic Writing & Simulation Services
https://news.1rj.ru/str/DataScienceY
━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
❤2
✨BlurDM: A Blur Diffusion Model for Image Deblurring
📝 Summary:
BlurDM integrates blur formation into diffusion models for deblurring. It uses a dual forward process of diffusing noise and blur, then simultaneously denoises and deblurs to recover sharp images. This significantly enhances existing deblurring methods.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03979
• PDF: https://arxiv.org/pdf/2512.03979
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageDeblurring #DiffusionModels #ComputerVision #DeepLearning #AI
📝 Summary:
BlurDM integrates blur formation into diffusion models for deblurring. It uses a dual forward process of diffusing noise and blur, then simultaneously denoises and deblurs to recover sharp images. This significantly enhances existing deblurring methods.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03979
• PDF: https://arxiv.org/pdf/2512.03979
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ImageDeblurring #DiffusionModels #ComputerVision #DeepLearning #AI
✨AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
📝 Summary:
AdaptVision is an efficient VLM that adaptively acquires visual tokens through a coarse-to-fine approach, using a bounding box tool. Trained with reinforcement learning to balance accuracy and efficiency, it achieves superior VQA performance using fewer visual tokens.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03794
• PDF: https://arxiv.org/pdf/2512.03794
• Project Page: https://adaptvision.github.io/
• Github: https://github.com/AdaptVision/AdaptVision
🔹 Models citing this paper:
• https://huggingface.co/AdaptVision/AdaptVision-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #ReinforcementLearning #ComputerVision #AIResearch #EfficientAI
📝 Summary:
AdaptVision is an efficient VLM that adaptively acquires visual tokens through a coarse-to-fine approach, using a bounding box tool. Trained with reinforcement learning to balance accuracy and efficiency, it achieves superior VQA performance using fewer visual tokens.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03794
• PDF: https://arxiv.org/pdf/2512.03794
• Project Page: https://adaptvision.github.io/
• Github: https://github.com/AdaptVision/AdaptVision
🔹 Models citing this paper:
• https://huggingface.co/AdaptVision/AdaptVision-7B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageModels #ReinforcementLearning #ComputerVision #AIResearch #EfficientAI
✨AutoNeural: Co-Designing Vision-Language Models for NPU Inference
📝 Summary:
AutoNeural is an NPU-native VLM co-designed for efficient edge inference. It uses a MobileNetV5-style vision backbone for stable integer quantization and a hybrid SSM-Transformer language backbone. This design reduces quantization errors and latency, improving real-time performance on edge devices.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02924
• PDF: https://arxiv.org/pdf/2512.02924
🔹 Models citing this paper:
• https://huggingface.co/NexaAI/AutoNeural
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutoNeural #VisionLanguageModels #EdgeAI #AIHardware #EfficientAI
📝 Summary:
AutoNeural is an NPU-native VLM co-designed for efficient edge inference. It uses a MobileNetV5-style vision backbone for stable integer quantization and a hybrid SSM-Transformer language backbone. This design reduces quantization errors and latency, improving real-time performance on edge devices.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02924
• PDF: https://arxiv.org/pdf/2512.02924
🔹 Models citing this paper:
• https://huggingface.co/NexaAI/AutoNeural
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#AutoNeural #VisionLanguageModels #EdgeAI #AIHardware #EfficientAI
✨PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
📝 Summary:
PosterCopilot enhances professional graphic design by training LMMs with a three-stage strategy for geometrically accurate and aesthetically superior layouts. This framework enables controllable, iterative, layer-specific editing, improving on existing automated design methods.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04082
• PDF: https://arxiv.org/pdf/2512.04082
• Project Page: https://postercopilot.github.io/
• Github: https://github.com/JiazheWei/PosterCopilot
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GraphicDesign #AI #ComputationalDesign #LayoutDesign #DesignAutomation
📝 Summary:
PosterCopilot enhances professional graphic design by training LMMs with a three-stage strategy for geometrically accurate and aesthetically superior layouts. This framework enables controllable, iterative, layer-specific editing, improving on existing automated design methods.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04082
• PDF: https://arxiv.org/pdf/2512.04082
• Project Page: https://postercopilot.github.io/
• Github: https://github.com/JiazheWei/PosterCopilot
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#GraphicDesign #AI #ComputationalDesign #LayoutDesign #DesignAutomation
❤1
✨Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
📝 Summary:
Large Multimodal Models struggle with long video understanding due to context limits. The DIG framework adapts frame selection to query types, using efficient uniform sampling for global queries and specialized selection for localized ones. This approach significantly improves LMM performance on ...
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04000
• PDF: https://arxiv.org/pdf/2512.04000
• Project Page: https://github.com/Jialuo-Li/DIG
• Github: https://github.com/Jialuo-Li/DIG
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoUnderstanding #LMMs #MultimodalAI #DeepLearning #ComputerVision
📝 Summary:
Large Multimodal Models struggle with long video understanding due to context limits. The DIG framework adapts frame selection to query types, using efficient uniform sampling for global queries and specialized selection for localized ones. This approach significantly improves LMM performance on ...
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04000
• PDF: https://arxiv.org/pdf/2512.04000
• Project Page: https://github.com/Jialuo-Li/DIG
• Github: https://github.com/Jialuo-Li/DIG
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoUnderstanding #LMMs #MultimodalAI #DeepLearning #ComputerVision
❤1
✨PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
📝 Summary:
Pyramid Sparse Attention PSA introduces multi-level pooled key-value representations to overcome information loss in traditional sparse attention. It dynamically retains critical information, improving efficiency and performance for video understanding and generation tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04025
• PDF: https://arxiv.org/pdf/2512.04025
• Project Page: https://ziplab.co/PSA/
• Github: https://github.com/ziplab/Pyramid-Sparse-Attention
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SparseAttention #VideoUnderstanding #VideoGeneration #DeepLearning #ComputerVision
📝 Summary:
Pyramid Sparse Attention PSA introduces multi-level pooled key-value representations to overcome information loss in traditional sparse attention. It dynamically retains critical information, improving efficiency and performance for video understanding and generation tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04025
• PDF: https://arxiv.org/pdf/2512.04025
• Project Page: https://ziplab.co/PSA/
• Github: https://github.com/ziplab/Pyramid-Sparse-Attention
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#SparseAttention #VideoUnderstanding #VideoGeneration #DeepLearning #ComputerVision
✨4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
📝 Summary:
4DLangVGGT is a new Transformer framework for 4D scene understanding. It integrates geometry and language to enable scalable, open-vocabulary semantic fields, improving generalization and efficiency over prior methods.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05060
• PDF: https://arxiv.org/pdf/2512.05060
• Github: https://hustvl.github.io/4DLangVGGT/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DSceneUnderstanding #Transformer #ComputerVision #DeepLearning #AI
📝 Summary:
4DLangVGGT is a new Transformer framework for 4D scene understanding. It integrates geometry and language to enable scalable, open-vocabulary semantic fields, improving generalization and efficiency over prior methods.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05060
• PDF: https://arxiv.org/pdf/2512.05060
• Github: https://hustvl.github.io/4DLangVGGT/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#4DSceneUnderstanding #Transformer #ComputerVision #DeepLearning #AI
✨SIMA 2: A Generalist Embodied Agent for Virtual Worlds
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
✨Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
📝 Summary:
Reward Forcing improves streaming video generation by using EMA-Sink to update context tokens, preventing static initial frames. It also introduces Rewarded Distribution Matching Distillation to prioritize dynamic content, enhancing motion quality and achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04678
• PDF: https://arxiv.org/pdf/2512.04678
• Project Page: https://reward-forcing.github.io/
• Github: https://reward-forcing.github.io/
🔹 Models citing this paper:
• https://huggingface.co/JaydenLu666/Reward-Forcing-T2V-1.3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #ComputerVision #AIResearch
📝 Summary:
Reward Forcing improves streaming video generation by using EMA-Sink to update context tokens, preventing static initial frames. It also introduces Rewarded Distribution Matching Distillation to prioritize dynamic content, enhancing motion quality and achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04678
• PDF: https://arxiv.org/pdf/2512.04678
• Project Page: https://reward-forcing.github.io/
• Github: https://reward-forcing.github.io/
🔹 Models citing this paper:
• https://huggingface.co/JaydenLu666/Reward-Forcing-T2V-1.3B
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #ComputerVision #AIResearch
✨SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization
📝 Summary:
SeeNav-Agent improves Vision-Language Navigation with dual-view visual prompts, reducing perception errors and enhancing spatial understanding. It also uses SRGPO, a step-level reinforcement fine-tuning method, to boost planning and achieve higher success rates for VLN agents.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02631
• PDF: https://arxiv.org/pdf/2512.02631
• Github: https://github.com/WzcTHU/SeeNav-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageNavigation #AI #ReinforcementLearning #ComputerVision #DeepLearning
📝 Summary:
SeeNav-Agent improves Vision-Language Navigation with dual-view visual prompts, reducing perception errors and enhancing spatial understanding. It also uses SRGPO, a step-level reinforcement fine-tuning method, to boost planning and achieve higher success rates for VLN agents.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02631
• PDF: https://arxiv.org/pdf/2512.02631
• Github: https://github.com/WzcTHU/SeeNav-Agent
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#VisionLanguageNavigation #AI #ReinforcementLearning #ComputerVision #DeepLearning
✨Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting
📝 Summary:
Splannequin improves frozen 3D scenes from monocular videos by fixing artifacts in dynamic Gaussian splatting. It uses temporal anchoring for hidden or defective Gaussians to resolve ghosting and blur from sparse supervision. This boosts visual quality for high-fidelity, user-selectable frozen-ti...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05113
• PDF: https://arxiv.org/pdf/2512.05113
• Project Page: https://chien90190.github.io/splannequin/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #3DReconstruction #GaussianSplatting #NeuralRendering #DeepLearning
📝 Summary:
Splannequin improves frozen 3D scenes from monocular videos by fixing artifacts in dynamic Gaussian splatting. It uses temporal anchoring for hidden or defective Gaussians to resolve ghosting and blur from sparse supervision. This boosts visual quality for high-fidelity, user-selectable frozen-ti...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05113
• PDF: https://arxiv.org/pdf/2512.05113
• Project Page: https://chien90190.github.io/splannequin/
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#ComputerVision #3DReconstruction #GaussianSplatting #NeuralRendering #DeepLearning
✨Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
📝 Summary:
Training autonomous LLM agents requires scalable, high-quality interactive environments. The Nex ecosystem provides NexAU for complexity, NexA4A for diversity, and NexGAP for fidelity in environment construction. Nex-N1, trained using this infrastructure, outperforms SOTA models on agentic tasks.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04987
• PDF: https://arxiv.org/pdf/2512.04987
• Github: https://github.com/nex-agi/Nex-N1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #LargeLanguageModels #AI #AISimulation #AIResearch
📝 Summary:
Training autonomous LLM agents requires scalable, high-quality interactive environments. The Nex ecosystem provides NexAU for complexity, NexA4A for diversity, and NexGAP for fidelity in environment construction. Nex-N1, trained using this infrastructure, outperforms SOTA models on agentic tasks.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04987
• PDF: https://arxiv.org/pdf/2512.04987
• Github: https://github.com/nex-agi/Nex-N1
==================================
For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT
#LLMAgents #LargeLanguageModels #AI #AISimulation #AIResearch