NEW BOT Телеграм, страница

ML Research Hub

✨Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning

229 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

📝 Summary:
Researchers introduce Instruction-Guided Lesion Segmentation ILS for CXRs, allowing diverse lesion segmentation using simple instructions. They developed MIMIC-ILS, a large-scale dataset, and ROSALIA, a vision-language model. ROSALIA accurately segments various lesions and provides textual explan...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15186
• PDF: https://arxiv.org/pdf/2511.15186

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MedicalAI #LesionSegmentation #ChestXray #VisionLanguageModel #DeepLearning

238 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VisPlay: Self-Evolving Vision-Language Models from Images

📝 Summary:
VisPlay is a self-evolving RL framework that improves Vision-Language Models using unlabeled images. It employs interacting Questioner and Reasoner roles, trained with GRPO, to enhance reasoning, generalization, and reduce hallucination. This scalable method achieves consistent improvements.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15661
• PDF: https://arxiv.org/pdf/2511.15661

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #ReinforcementLearning #ArtificialIntelligence #MachineLearning #SelfEvolvingAI

286 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

📝 Summary:
ARC-Chapter is a large-scale video chaptering model trained on millions of long video chapters, using a new bilingual and hierarchical dataset. It introduces a novel evaluation metric, GRACE, to better reflect real-world chaptering. The model achieves state-of-the-art performance and demonstrates...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14349
• PDF: https://arxiv.org/pdf/2511.14349
• Project Page: https://arcchapter.github.io/index_en.html
• Github: https://github.com/TencentARC/ARC-Chapter

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoChaptering #AI #MachineLearning #VideoSummarization #ComputerVision

285 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Aligning Generative Music AI with Human Preferences: Methods and Challenges

📝 Summary:
This paper proposes applying preference alignment techniques to music AI to better match human preferences. It discusses methods like MusicRL and DiffRhythm+ to address unique challenges such as temporal coherence and harmonic consistency, aiming for improved interactive composition and personali...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15038
• PDF: https://arxiv.org/pdf/2511.15038

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MusicAI #PreferenceAlignment #AIResearch #ComputationalMusic

❤1

317 views07:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Medal S: Spatio-Textual Prompt Model for Medical Segmentation

📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S

🔹 Models citing this paper:
• https://huggingface.co/spc819/Medal-S-V1.0

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision

305 views09:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

292 viewsedited 09:03

ML Research Hub

✨OmniParser for Pure Vision Based GUI Agent

📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.

🔹 Publication Date: Published on Aug 1, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser

🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser

✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k

✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning

arXiv.org

OmniParser for Pure Vision Based GUI Agent

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...

389 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

📝 Summary:
MoS is a novel multimodal diffusion model that uses a learnable token-wise router for flexible state-based modality interactions. This achieves state-of-the-art text-to-image generation and editing with minimal parameters and computational overhead.

🔹 Publication Date: Published on Nov 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12207
• PDF: https://arxiv.org/pdf/2511.12207

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MultimodalAI #DiffusionModels #TextToImage #DeepLearning

455 views12:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

📝 Summary:
Ideation diversity significantly enhances AI research agent performance. Higher ideation diversity leads to stronger results on the MLE-bench benchmark across different models and scaffolds. This finding holds across various performance metrics.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15593
• PDF: https://arxiv.org/pdf/2511.15593

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AIResearch #IdeationDiversity #MachineLearning #AIagents #AIPerformance

489 views14:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

📝 Summary:
V-ReasonBench is a new benchmark to evaluate generative video models' reasoning across structured problem-solving, spatial cognition, pattern inference, and physical dynamics. It uses diverse tasks to reveal dimension-wise differences in models, aiming to support development of human-aligned reas...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16668
• PDF: https://arxiv.org/pdf/2511.16668
• Project Page: https://oahzxl.github.io/VReasonBench/
• Github: https://github.com/yangluo7/V-ReasonBench

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AIReasoning #GenerativeAI #Benchmarking #MachineLearning

❤1

282 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

📝 Summary:
VANS is a new model for Video-Next-Event Prediction VNEP that generates dynamic, visually and semantically accurate video responses. It uses reinforcement learning to align a Vision-Language Model with a Video Diffusion Model, achieving state-of-the-art performance.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16669
• PDF: https://arxiv.org/pdf/2511.16669
• Project Page: https://video-as-answer.github.io/
• Github: https://github.com/KlingTeam/VANS

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoAI #GenerativeAI #MachineLearning #ComputerVision #DeepLearning

250 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

173 viewsedited 04:05

ML Research Hub

✨Scaling Spatial Intelligence with Multimodal Foundation Models

📝 Summary:
SenseNova-SI is a new scaled multimodal foundation model that achieves superior spatial intelligence. By using 8 million diverse data samples, it sets unprecedented performance on various spatial benchmarks. The models are publicly released to foster further research.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13719
• PDF: https://arxiv.org/pdf/2511.13719
• Project Page: https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B
• Github: https://github.com/OpenSenseNova/SenseNova-SI

🔹 Models citing this paper:
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B
• https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #FoundationModels #SpatialIntelligence #ComputerVision #AI

arXiv.org

Scaling Spatial Intelligence with Multimodal Foundation Models

Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to...

181 views04:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Step-Audio-R1 Technical Report

📝 Summary:
Step-Audio-R1 is the first audio reasoning model. It uses Modality-Grounded Reasoning Distillation to achieve strong audio reasoning, outperforming previous models. This demonstrates that reasoning capabilities are transferable across different modalities.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15848
• PDF: https://arxiv.org/pdf/2511.15848

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AudioReasoning #MultimodalAI #AIResearch #MachineLearning #AudioAI

162 views04:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨First Frame Is the Place to Go for Video Content Customization

📝 Summary:
The first frame in video generation models functions as a conceptual memory buffer, storing visual elements for later reuse. This enables robust video content customization with minimal training examples, without major model changes.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15700
• PDF: https://arxiv.org/pdf/2511.15700
• Project Page: https://firstframego.github.io
• Github: http://firstframego.github.io

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #GenerativeAI #ComputerVision #DeepLearning #AICustomization

193 views04:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MiMo-Embodied: X-Embodied Foundation Model Technical Report

📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics

169 views04:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SAM 3D: 3Dfy Anything in Images

📝 Summary:
SAM 3D reconstructs 3D objects from single images, predicting geometry, texture, and layout. It uses a multi-stage training framework with synthetic pretraining and real-world alignment, breaking the 3D data barrier and achieving high human preference.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16624
• PDF: https://arxiv.org/pdf/2511.16624
• Project Page: https://ai.meta.com/sam3d/
• Github: https://github.com/facebookresearch/sam-3d-objects

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#3DReconstruction #ComputerVision #AI #DeepLearning #SingleImage3D

178 views04:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

📝 Summary:
Thinking-while-Generating TwiG interleaves textual reasoning throughout the visual generation process. This on-the-fly multimodal interaction guides and reflects on visual content as it is created, resulting in more context-aware and semantically rich outputs.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16671
• PDF: https://arxiv.org/pdf/2511.16671
• Project Page: https://think-while-gen.github.io/
• Github: https://github.com/ZiyuGuo99/Thinking-while-Generating

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MultimodalAI #ComputerVision #NLP #AIResearch

158 views04:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

📝 Summary:
Nemotron Elastic embeds multiple submodels within a single large language model, significantly reducing training costs by 360x compared to training separate models. This framework allows zero-shot extraction of optimized submodels for various deployment budgets without additional training or fine...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16664
• PDF: https://arxiv.org/pdf/2511.16664
• Project Page: https://huggingface.co/nvidia/Nemotron-Elastic-12B

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #AI #MachineLearning #DeepLearning #EfficientAI

178 views04:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

📝 Summary:
TimeViper is a hybrid Mamba-Transformer vision-language model for efficient long video understanding. It introduces a TransV module to compress redundant vision tokens into instruction tokens, enabling it to process over 10,000 frames. This achieves state-of-the-art performance while offering new...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16595
• PDF: https://arxiv.org/pdf/2511.16595
• Project Page: https://xuboshen.github.io/TimeViper/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#TimeViper #VisionLanguageModels #VideoUnderstanding #MambaTransformer #DeepLearning

171 views04:07

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform