NEW BOT Телеграм, страница

ML Research Hub

✨OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

📝 Summary:
OpenMMReasoner introduces a two-stage SFT+RL training approach with rigorous data curation. This method significantly enhances multimodal reasoning, improving performance by 11.6% over baselines across nine benchmarks.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16334
• PDF: https://arxiv.org/pdf/2511.16334
• Project Page: https://evolvinglmms-lab.github.io/OpenMMReasoner/
• Github: https://github.com/EvolvingLMMs-Lab/OpenMMReasoner

🔹 Models citing this paper:
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-RL
• https://huggingface.co/OpenMMReasoner/OpenMMReasoner-ColdStart

✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-SFT-874K
• https://huggingface.co/datasets/OpenMMReasoner/OpenMMReasoner-RL-74K

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #ReinforcementLearning #LLMs #AIResearch #DeepLearning

arXiv.org

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning...

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of...

❤1

341 views03:00

✨ Explore Data Science 📝 Write your paper

✨GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

📝 Summary:
GeoVista is a new agentic model for geolocalization that integrates tool invocation and reinforcement learning. It achieves high performance on the new GeoBench benchmark, surpassing open-source models and matching closed-source models.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15705
• PDF: https://arxiv.org/pdf/2511.15705
• Project Page: https://ekonwang.github.io/geo-vista/
• Github: https://github.com/ekonwang/GeoVista

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#Geolocalization #AI #ReinforcementLearning #ComputerVision #AIAgents

235 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SAM 3: Segment Anything with Concepts

📝 Summary:
SAM 3 is a unified model achieving state-of-the-art in promptable concept segmentation and tracking. It uses concept prompts for detecting, segmenting, and tracking objects, doubling accuracy over existing systems. The model and a new benchmark are open sourced.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16719
• PDF: https://arxiv.org/pdf/2511.16719
• Project Page: https://ai.meta.com/sam3/
• Github: https://github.com/facebookresearch/sam3

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ComputerVision #ImageSegmentation #ObjectTracking #AI #DeepLearning

226 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RynnVLA-002: A Unified Vision-Language-Action and World Model

📝 Summary:
RynnVLA-002 unifies a Vision-Language-Action and world model, enabling joint learning of environmental dynamics and action planning. This mutual enhancement leads to superior performance, achieving 97.4% success in simulation and a 50% boost in real-world robot tasks.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17502
• PDF: https://arxiv.org/pdf/2511.17502
• Github: https://github.com/alibaba-damo-academy/RynnVLA-002

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisionLanguageAction #WorldModels #Robotics #AI #DeepLearning

181 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

📝 Summary:
Video-R4 is a video reasoning LMM that improves text-rich video QA through iterative visual rumination. It simulates human behavior by iteratively selecting, zooming, and re-encoding frames to update its reasoning. This approach achieves state-of-the-art results on various QA tasks.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17490
• PDF: https://arxiv.org/pdf/2511.17490
• Project Page: https://yunlong10.github.io/Video-R4/
• Github: https://github.com/yunlong10/Video-R4

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoReasoning #LMM #MultimodalAI #DeepLearning #VideoQA

194 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldGen: From Text to Traversable and Interactive 3D Worlds

📝 Summary:
WorldGen transforms text prompts into interactive 3D worlds. It combines LLM reasoning with procedural and diffusion-based 3D generation to efficiently create coherent, navigable environments for gaming and simulation.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16825
• PDF: https://arxiv.org/pdf/2511.16825
• Project Page: https://www.meta.com/blog/worldgen-3d-world-generation-reality-labs-generative-ai-research/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#3DGeneration #GenerativeAI #LLMs #VirtualWorlds #AIResearch

198 views03:01

✨ Explore Data Science 📝 Write your paper

✨Planning with Sketch-Guided Verification for Physics-Aware Video Generation

📝 Summary:
SketchVerify improves video motion planning by iteratively refining candidate trajectories using lightweight sketch-based verification. This training-free method enhances physical realism and consistency more efficiently than full video generation.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17450
• PDF: https://arxiv.org/pdf/2511.17450
• Project Page: https://sketchverify.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MotionPlanning #AI #ComputerVision #PhysicsSimulation

188 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation

📝 Summary:
VLA-4D enhances robotic manipulation by integrating 4D spatial-temporal awareness into visual and action representations. This enables smoother and more coherent robot control for complex tasks by embedding time into 3D positions and extending action planning with temporal information.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17199
• PDF: https://arxiv.org/pdf/2511.17199

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#Robotics #AI #VLAModels #SpatialTemporalAI #RobotManipulation

190 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists

📝 Summary:
OmniScientist is a framework that models human scientific research's social and collaborative aspects into AI workflows. It provides a structured knowledge system, collaborative protocols, and an evaluation platform, fostering a co-evolving ecosystem of human and AI scientists.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16931
• PDF: https://arxiv.org/pdf/2511.16931
• Project Page: https://omniscientist.ai/chat

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #ScientificDiscovery #HumanAICollaboration #ResearchFramework

207 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

📝 Summary:
O-Mem, an active user profiling framework, improves LLM agent consistency and personalization. It updates user profiles and outperforms prior SOTA on LoCoMo and PERSONAMEM benchmarks, also boosting response efficiency.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13593
• PDF: https://arxiv.org/pdf/2511.13593

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLMAgents #Personalization #AIMemory #GenerativeAI #UserProfiling

277 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

📝 Summary:
Multi-Faceted Attack MFA reveals cross-model safety vulnerabilities in defense-equipped Vision-Language Models. It uses Attention-Transfer Attack to hide harmful instructions and bypass filters, exploiting shared visual representations for high success rates. MFA challenges the robustness of curr...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16110
• PDF: https://arxiv.org/pdf/2511.16110

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #AISecurity #AdversarialAttacks #AIvulnerabilities #MachineLearning

241 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #ComputerVision #Robotics #VLAModels #DeepLearning

❤1

310 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

📝 Summary:
VisMem equips Vision-Language Models with dynamic latent vision memories, inspired by human cognition. This framework helps VLMs maintain perceptual fidelity and semantic consistency, significantly boosting performance on complex visual tasks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11007
• PDF: https://arxiv.org/pdf/2511.11007
• Github: https://github.com/YU-deep/VisMem.git

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisMem #VisionLanguageModels #AI #DeepLearning #ComputerVision

276 views05:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs

📝 Summary:
PARROT evaluates LLM robustness to sycophancy by comparing neutral and false authoritative questions. Advanced models resist pressure well, but older ones show severe epistemic collapse, even reducing confidence in correct answers. This highlights the need for LLMs to resist pressure for safe dep...

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17220
• PDF: https://arxiv.org/pdf/2511.17220

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLMs #AISafety #ModelRobustness #Sycophancy #AIResearch

❤1

266 views05:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Rethinking Saliency Maps: A Cognitive Human Aligned Taxonomy and Evaluation Framework for Explanations

📝 Summary:
This paper introduces the RFxG taxonomy to categorize saliency map explanations by reference-frame and granularity. It proposes novel faithfulness metrics to improve evaluation, aiming to align explanations with diverse user intent and human understanding.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13081
• PDF: https://arxiv.org/pdf/2511.13081

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ExplainableAI #SaliencyMaps #CognitiveScience #AIEvaluation #AIResearch

295 views07:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Taming Generative Synthetic Data for X-ray Prohibited Item Detection

📝 Summary:
Xsyn introduces a one-stage text-to-image synthesis pipeline for X-ray security images. It eliminates labor costs and improves image quality and efficiency for training detection models. This method significantly enhances prohibited item detection performance, outperforming prior approaches.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15299
• PDF: https://arxiv.org/pdf/2511.15299
• Github: https://github.com/pILLOW-1/Xsyn/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#XraySecurity #GenerativeAI #ComputerVision #SyntheticData #ObjectDetection

267 views07:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

📝 Summary:
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
This study explores intrinsic dimension ID in large language models, revealing its independence from entropy and genre-specific stratification. Scientific texts show low ID, while creative/opinion writing exhibits hi...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15210
• PDF: https://arxiv.org/pdf/2511.15210

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#IntrinsicDimension #LargeLanguageModels #NLP #TextAnalytics #DataScience

342 views07:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models

📝 Summary:
Downscaling multimodal models disproportionately harms visual capabilities, including perception, more than LLM abilities. This paper introduces visual extraction tuning combined with step-by-step reasoning to improve smaller models efficiency and performance.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17487
• PDF: https://arxiv.org/pdf/2511.17487
• Project Page: https://web.stanford.edu/~markendo/projects/downscaling_intelligence
• Github: https://github.com/markendo/downscaling_intelligence

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #SmallModels #ComputerVision #EfficientAI #AIResearch

301 views08:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Diversity Has Always Been There in Your Visual Autoregressive Models

📝 Summary:
To combat diversity collapse in Visual Autoregressive models, DiverseVAR modifies feature maps without retraining. This restores generative diversity while maintaining high synthesis quality.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17074
• PDF: https://arxiv.org/pdf/2511.17074

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisualAI #GenerativeModels #ModelDiversity #MachineLearning #ComputerVision

319 views09:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:28

This media is not supported in your browser

VIEW IN TELEGRAM

✨Loomis Painter: Reconstructing the Painting Process

📝 Summary:
This paper proposes a unified diffusion model framework for generating consistent, high-fidelity multi-media painting processes. It uses semantic control and cross-medium style augmentation to replicate human artistic workflows, supported by a new dataset and evaluation metrics.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17344
• PDF: https://arxiv.org/pdf/2511.17344
• Project Page: https://markus-pobitzer.github.io/lplp/
• Github: https://github.com/Markus-Pobitzer/wlp

🔹 Models citing this paper:
• https://huggingface.co/Markus-Pobitzer/wlp-lora

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DiffusionModels #GenerativeAI #AIArt #ComputerGraphics #MachineLearning

443 views10:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Channel name was changed to «ML Research Hub»

10:16

About

Blog

Apps

Platform