NEW BOT Телеграм, страница

ML Research Hub

✨TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

📝 Summary:
TUNA is a unified multimodal model that builds a single continuous visual representation. This enables end-to-end understanding and generation, avoiding mismatches found in decoupled models and achieving state-of-the-art performance across multimodal tasks.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02014
• PDF: https://arxiv.org/pdf/2512.02014
• Project Page: https://tuna-ai.org/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #ComputerVision #DeepLearning #GenerativeAI #AIResearch

147 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

📝 Summary:
Lotus-2 is a two-stage deterministic framework adapting powerful diffusion models for accurate geometric inference. It achieves top monocular depth and competitive surface normal prediction with very limited training data.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01030
• PDF: https://arxiv.org/pdf/2512.01030
• Project Page: https://lotus-2.github.io/
• Github: https://github.com/EnVision-Research/Lotus-2

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ComputerVision #DeepLearning #DiffusionModels #GeometricPrediction #MonocularDepth

176 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

📝 Summary:
Generalist LLMs like GPT-5 outperformed specialized clinical AI tools on a medical benchmark. This reveals that clinical decision support tools may lag behind frontier models and need urgent, independent evaluation before clinical deployment.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01191
• PDF: https://arxiv.org/pdf/2512.01191

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #HealthcareAI #AIinMedicine #ClinicalAI #MedicalResearch

198 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Doppler-Enhanced Deep Learning: Improving Thyroid Nodule Segmentation with YOLOv5 Instance Segmentation

📝 Summary:
YOLOv5 algorithms accurately segment thyroid nodules in ultrasound images. Incorporating doppler images significantly improves segmentation performance across all models, offering a real-time solution for clinical diagnostics.

🔹 Publication Date: Published on Nov 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00639
• PDF: https://arxiv.org/pdf/2512.00639

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DeepLearning #MedicalImaging #ThyroidHealth #YOLOv5 #AIinHealthcare

213 views04:04

✨ Explore Data Science 📝 Write your paper

✨Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

📝 Summary:
Infinity-RoPE is a new inference-time framework for autoregressive video diffusion, enabling continuous generation, fine-grained action control, and cinematic transitions without retraining. It addresses limitations like finite temporal horizons and slow prompt responsiveness, outperforming prior...

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20649
• PDF: https://arxiv.org/pdf/2511.20649
• Github: https://infinity-rope.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AI #DeepLearning #ComputerVision #DiffusionModels

221 views04:32

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

📝 Summary:
Envision is a new benchmark for chained text-to-multi-image generation assessing models dynamic causal process and world knowledge. Unified multimodal models outperform specialized ones in causal coherence but still struggle with spatiotemporal consistency, due to static training.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01816
• PDF: https://arxiv.org/pdf/2512.01816
• Project Page: https://opendatalab-raiser.github.io/Envision/
• Github: https://github.com/opendatalab-raiser/Envision

✨ Datasets citing this paper:
• https://huggingface.co/datasets/opendatalab-raiser/Envision

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #CausalReasoning #AIBenchmarking #GenerativeAI #ComputerVision

261 views05:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

146 viewsedited 05:05

ML Research Hub

✨The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

📝 Summary:
ImageCritic corrects inconsistent fine-grained details in generated images using a reference-guided post-editing approach. It employs attention alignment loss and a detail encoder to precisely rectify inconsistencies and improve accuracy.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20614
• PDF: https://arxiv.org/pdf/2511.20614
• Project Page: https://ouyangziheng.github.io/ImageCritic-Page/
• Github: https://github.com/HVision-NKU/ImageCritic

🔹 Models citing this paper:
• https://huggingface.co/ziheng1234/ImageCritic

✨ Datasets citing this paper:
• https://huggingface.co/datasets/ziheng1234/Critic-10K

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ziheng1234/ImageCritic

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageGeneration #ComputerVision #DeepLearning #AI #ImageEditing

arXiv.org

The Consistency Critic: Correcting Inconsistencies in Generated...

Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is...

162 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HiconAgent: History Context-aware Policy Optimization for GUI Agents

📝 Summary:
HiconAgent introduces History Context-aware Policy Optimization HCPO for GUI agents. HCPO efficiently leverages historical context using dynamic sampling and compression, achieving better performance than larger models with reduced computational cost and significant speedups.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01763
• PDF: https://arxiv.org/pdf/2512.01763
• Github: https://github.com/JiuTian-VL/HiconAgent

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#HiconAgent #GUIAgents #AIResearch #ReinforcementLearning #ContextAwareAI

132 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

📝 Summary:
InternVideo-Next proposes a two-stage Encoder-Predictor-Decoder framework for general video representation learning without text supervision. It uses a conditional diffusion decoder to bridge pixel fidelity with semantics in Stage 1, then a latent world model in Stage 2 to learn world knowledge a...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01342
• PDF: https://arxiv.org/pdf/2512.01342

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoFoundationModels #VideoAI #DeepLearning #UnsupervisedLearning #DiffusionModels

129 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

✨Seeing the Wind from a Falling Leaf

📝 Summary:
This paper presents an end-to-end differentiable inverse graphics framework that recovers invisible force representations from video observations. This innovation enables estimating physical forces, like wind from a falling leaf, leading to physics-based video generation and editing.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00762
• PDF: https://arxiv.org/pdf/2512.00762

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#InverseGraphics #PhysicsAI #ComputerVision #VideoGeneration #DeepLearning

153 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling

📝 Summary:
ChronosObserver generates high-fidelity, 3D-consistent, and time-synchronized multi-view videos. It is a training-free method leveraging World State Hyperspace and Hyperspace Guided Sampling to synchronize views. This approach overcomes challenges in 4D world generation without model training.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01481
• PDF: https://arxiv.org/pdf/2512.01481
• Project Page: https://icvteam.github.io/ChronosObserver.html

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#4DGeneration #DiffusionModels #ComputerVision #MultiViewVideo #AIResearch

125 views05:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

📝 Summary:
WiseEdit is a new benchmark for evaluating image editing models, focusing on cognition and creativity. It decomposes editing into Awareness, Interpretation, and Imagination tasks, assessing declarative, procedural, and metacognitive knowledge. This reveals limitations in current models.

🔹 Publication Date: Published on Nov 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00387
• PDF: https://arxiv.org/pdf/2512.00387
• Project Page: https://qnancy.github.io/wiseedit_project_page/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageEditing #ComputerVision #AIResearch #CognitiveAI #CreativeAI

❤1

138 views05:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

📝 Summary:
IndicParam is a new benchmark with over 13000 multiple-choice questions for 11 low-resource Indic languages. It reveals that even top LLMs achieve only ~45% accuracy, showing limitations in cross-lingual transfer and grammatical proficiency. The benchmark also assesses diverse question formats.

🔹 Publication Date: Published on Nov 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00333
• PDF: https://arxiv.org/pdf/2512.00333
• Project Page: https://huggingface.co/datasets/bharatgenai/IndicParam
• Github: https://github.com/ayushbits/IndicParam

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #NLP #LowResourceLanguages #IndicLanguages #AIResearch

❤1

151 views05:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

📝 Summary:
This paper provides a practical guide to code LLMs, covering their lifecycle from data to deployment. It examines techniques, analyzes various models, and discusses real-world challenges like correctness and security. Experiments on pre-training and fine-tuning are included.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18538
• PDF: https://arxiv.org/pdf/2511.18538

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#CodeLLMs #AI #MachineLearning #SoftwareEngineering #FoundationModels

154 views06:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

📝 Summary:
This paper presents a theoretical foundation for stabilizing RL with LLMs, optimizing sequence rewards via token-level objectives. It highlights that importance sampling, clipping, and Routing Replay minimize policy staleness, crucial for stable training. Stabilized training consistently yields c...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01374
• PDF: https://arxiv.org/pdf/2512.01374

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #LLMs #AI #MachineLearning #AIResearch

143 views06:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PromptBridge: Cross-Model Prompt Transfer for Large Language Models

📝 Summary:
PromptBridge combats Model Drifting, where prompts lose effectiveness across LLMs. Training-free, it enables cross-model prompt transfer by mapping source prompts to optimized target prompts, improving accuracy, reducing re-optimization.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01420
• PDF: https://arxiv.org/pdf/2512.01420

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #PromptEngineering #AI #ModelDrifting #PromptBridge

186 views06:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Art of Scaling Test-Time Compute for Large Language Models

📝 Summary:
This study systematically compares Test-Time Scaling TTS strategies for LLMs. It finds no single dominant strategy, identifies distinct model trace-quality patterns, and shows optimal performance scales with compute. A practical guide for selecting TTS strategies is provided.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02008
• PDF: https://arxiv.org/pdf/2512.02008
• Github: https://github.com/Aradhye2002/art_of_tts

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #TestTimeScaling #AI #DeepLearning #NLP

214 views06:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

📝 Summary:
StreamGaze is a new benchmark evaluating how MLLMs use human gaze for temporal and proactive reasoning in streaming videos. It reveals significant performance gaps between current AI models and human abilities in gaze-based temporal reasoning and proactive prediction.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01707
• PDF: https://arxiv.org/pdf/2512.01707
• Project Page: https://streamgaze.github.io/
• Github: https://github.com/daeunni/StreamGaze

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#StreamGaze #MLLMs #TemporalReasoning #ComputerVision #AI

174 views07:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Asking like Socrates: Socrates helps VLMs understand remote sensing images

📝 Summary:
Remote sensing models often show fake reasoning from coarse image understanding. This paper introduces RS-EoT, an iterative, language-driven system with a Socratic multi-agent approach and RL to seek visual evidence. It achieves state-of-the-art results, enabling genuine, evidence-grounded reason...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22396
• PDF: https://arxiv.org/pdf/2511.22396
• Project Page: https://geox-lab.github.io/Asking_like_Socrates/
• Github: https://github.com/GeoX-Lab/Asking_like_Socrates

🔹 Models citing this paper:
• https://huggingface.co/ShaoRun/RS-EoT-7B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/ShaoRun/RS-EoT-4K

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VLM #RemoteSensing #AI #ReinforcementLearning #MultiAgentSystems

164 views07:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization

205 views07:08

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform