NEW BOT Телеграм, страница

ML Research Hub

✨Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

📝 Summary:
This paper shows audio-video joint denoising significantly improves video generation quality. By using audio as a privileged signal, the AVFullDiT model regularizes video dynamics, leading to better video quality beyond just synchrony.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02457
• PDF: https://arxiv.org/pdf/2512.02457
• Project Page: https://jianzongwu.github.io/projects/does-hearing-help-seeing/
• Github: https://github.com/jianzongwu/Does-Hearing-Help-Seeing

✨ Datasets citing this paper:
• https://huggingface.co/datasets/jianzongwu/ALT-Merge
• https://huggingface.co/datasets/jianzongwu/VGGSound-T2AV

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MultimodalAI #DeepLearning #ComputerVision #AIResearch

154 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PAI-Bench: A Comprehensive Benchmark For Physical AI

📝 Summary:
PAI-Bench is a new benchmark evaluating multi-modal LLMs and video generative models for physical AI perception and prediction. It reveals current models struggle with physical coherence, forecasting, and causal reasoning in real-world dynamics. This highlights significant gaps for future physica...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01989
• PDF: https://arxiv.org/pdf/2512.01989
• Github: https://github.com/SHI-Labs/physical-ai-bench

✨ Spaces citing this paper:
• https://huggingface.co/spaces/shi-labs/physical-ai-bench-leaderboard

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#PhysicalAI #LLMs #Benchmarking #GenerativeAI #ComputerVision

194 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

📝 Summary:
Concise Chain-of-Thought steps, specifically minimal visual grounding, are most effective for achieving generalizable visual reasoning in vision-language models. Longer or visual CoT primarily accelerate training but do not improve final performance or generalization across tasks.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22586
• PDF: https://arxiv.org/pdf/2511.22586

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ChainOfThought #VisionLanguageModels #VisualReasoning #AIGeneralization #DeepLearning

198 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

📝 Summary:
GUI Exploration Lab is a simulation environment to train GUI agents for screen navigation. It finds supervised fine-tuning establishes basics, single-turn reinforcement learning improves generalization, and multi-turn RL enhances exploration for superior navigation performance.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02423
• PDF: https://arxiv.org/pdf/2512.02423

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #GUIAgents #AINavigation #MachineLearning #AIResearch

199 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

📝 Summary:
VideoScience-Bench introduces a new benchmark evaluating video models scientific reasoning. It assesses their ability to generate phenomena consistent with undergraduate physics and chemistry, filling a critical gap. It is the first to evaluate models as scientific reasoners.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02942
• PDF: https://arxiv.org/pdf/2512.02942

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AIResearch #ScientificReasoning #AIModels #Benchmarking

202 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits

📝 Summary:
This paper tackles image editing model performance gaps due to data scarcity by introducing UnicEdit-10M, a 10M-scale high-quality dataset from a lightweight verified pipeline. It also proposes UnicBench, a new benchmark with novel metrics to diagnose reasoning limitations in models.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02790
• PDF: https://arxiv.org/pdf/2512.02790

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageEditing #AI #Dataset #Benchmark #ComputerVision

206 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Guided Self-Evolving LLMs with Minimal Human Supervision

📝 Summary:
R-Few enables stable LLM self-evolution using a guided Self-Play Challenger-Solver framework with minimal human input. It leverages human examples for synthetic data and a curriculum for training, consistently improving math and reasoning.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02472
• PDF: https://arxiv.org/pdf/2512.02472

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #SelfEvolvingAI #MachineLearning #DeepLearning #AIResearch

170 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

📝 Summary:
DualCamCtrl is a novel diffusion model for camera-controlled video generation. It employs a dual-branch framework and Semantic Guided Mutual Alignment to generate consistent RGB and depth, better disentangling appearance and geometry for accurate camera trajectories.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23127
• PDF: https://arxiv.org/pdf/2511.23127
• Project Page: https://soyouthinkyoucantell.github.io/dualcamctrl-page/
• Github: https://github.com/EnVision-Research/DualCamCtrl

🔹 Models citing this paper:
• https://huggingface.co/FayeHongfeiZhang/DualCamCtrl

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DiffusionModels #VideoGeneration #ComputerVision #GenerativeAI #DeepLearning

141 views04:03

✨ Explore Data Science 📝 Write your paper

✨DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models

📝 Summary:
DiG-Flow enhances VLA model robustness by using geometric regularization to align observation and action embeddings. It measures embedding discrepancy, applies residual updates, and consistently boosts performance on complex tasks and with limited data.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01715
• PDF: https://arxiv.org/pdf/2512.01715
• Project Page: https://beingbeyond.github.io/DiG-Flow/
• Github: https://beingbeyond.github.io/DiG-Flow

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VLAModels #RobustAI #FlowMatching #MachineLearning #DeepLearning

👍1

143 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Glance: Accelerating Diffusion Models with 1 Sample

📝 Summary:
Glance accelerates diffusion models with a phase-aware strategy using lightweight LoRA adapters. This method applies varying speedups across denoising stages, achieving up to 5x acceleration and strong generalization with minimal retraining on just 1 sample.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02899
• PDF: https://arxiv.org/pdf/2512.02899

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DiffusionModels #ModelAcceleration #LoRA #DeepLearning #GenerativeAI

151 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

📝 Summary:
Video4Spatial uses video diffusion models with only visual data to perform complex spatial tasks like navigation and object grounding. It demonstrates strong spatial understanding, planning, and generalization, advancing visuospatial reasoning.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03040
• PDF: https://arxiv.org/pdf/2512.03040

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#Video4Spatial #VisuospatialAI #DiffusionModels #SpatialReasoning #ComputerVision

164 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨YingVideo-MV: Music-Driven Multi-Stage Video Generation

📝 Summary:
YingVideo-MV is the first framework to generate high-quality, music-driven long performance videos with synchronized camera motion. It uses audio analysis, diffusion transformers, and a camera adapter, achieving precise music-motion-camera synchronization.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02492
• PDF: https://arxiv.org/pdf/2512.02492

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MusicAI #GenerativeAI #DiffusionModels #ComputerVision

145 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SimScale: Learning to Drive via Real-World Simulation at Scale

📝 Summary:
SimScale is a simulation framework synthesizing diverse driving scenarios from logs. Co-training with this data significantly improves autonomous driving robustness and generalization, scaling with simulation data even without new real-world input.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23369
• PDF: https://arxiv.org/pdf/2511.23369
• Project Page: https://opendrivelab.com/SimScale
• Github: https://github.com/OpenDriveLab/SimScale

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AutonomousDriving #Simulation #AI #MachineLearning #Robotics

177 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia

🔹 Models citing this paper:
• https://huggingface.co/opendatalab/TRivia-3B

✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/TRivia-3B

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning

159 views05:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

📝 Summary:
SwiftVLA enhances compact VLA models with efficient 4D understanding. It uses a 4D geometry transformer, Fusion Tokens, and a mask-and-reconstruct strategy. This rivals larger models while drastically improving speed and memory efficiency.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00903
• PDF: https://arxiv.org/pdf/2512.00903
• Project Page: https://swiftvla.github.io/
• Github: https://swiftvla.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#SwiftVLA #VLAModels #SpatiotemporalAI #EfficientAI #Transformers

153 views05:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mixture of Horizons in Action Chunking

📝 Summary:
VLA models struggle with a fixed action chunk horizon. The Mixture of Horizons MoH strategy combines different horizons for both global foresight and fine-grained precision. This improves robotic performance, generalizability, and throughput, achieving new state-of-the-art.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19433
• PDF: https://arxiv.org/pdf/2511.19433
• Project Page: https://timsty1.github.io/moh/
• Github: https://github.com/Timsty1/MixtureOfHorizons/tree/main

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#Robotics #AI #MachineLearning #DeepLearning #ReinforcementLearning

183 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

📝 Summary:
WorldMM is a novel multimodal memory agent for long video reasoning. It uses episodic, semantic, and visual memories with adaptive retrieval across multiple temporal scales, significantly outperforming prior methods on long video question-answering benchmarks.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02425
• PDF: https://arxiv.org/pdf/2512.02425
• Project Page: https://worldmm.github.io
• Github: https://github.com/wgcyeo/WorldMM

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #VideoReasoning #MemoryNetworks #DeepLearning #AI

193 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

1:00

This media is not supported in your browser

VIEW IN TELEGRAM

✨BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

📝 Summary:
BlockVid introduces a block diffusion framework for high-quality, coherent minute-long video generation. It overcomes error accumulation via a semantic-aware sparse KV cache, Block Forcing training, and dedicated noise scheduling. BlockVid outperforms existing methods and proposes LV-Bench, a new...

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22973
• PDF: https://arxiv.org/pdf/2511.22973
• Project Page: https://ziplab.co/BlockVid/
• Github: https://github.com/alibaba-damo-academy/Inferix/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #DeepLearning #ComputerVision

244 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

📝 Summary:
Click2Graph is an interactive framework for Panoptic Video Scene Graph Generation. It uses a single user click to segment, track, discover interactions, and predict triplets for temporally consistent scene graphs. This enables user-guided, controllable video scene understanding.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15948
• PDF: https://arxiv.org/pdf/2511.15948

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoUnderstanding #SceneGraphs #ComputerVision #InteractiveAI #AIResearch

224 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning

190 views06:06

✨ Explore Data Science 📝 Write your paper

✨ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

📝 Summary:
ViSAudio is an end-to-end framework that generates high-quality binaural spatial audio directly from silent video. It uses conditional flow matching and a dual-branch architecture, outperforming previous methods in immersion and consistency. The paper also introduces the BiAudio dataset for this ...

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03036
• PDF: https://arxiv.org/pdf/2512.03036
• Project Page: https://kszpxxzmc.github.io/ViSAudio-project/
• Github: https://github.com/kszpxxzmc/ViSAudio

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#SpatialAudio #AudioGeneration #DeepLearning #ComputerVision #AI

217 views06:06

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform