NEW BOT Телеграм, страница

ML Research Hub

🤖🧠 Distil-Whisper: Faster, Smaller, and Smarter Speech Recognition by Hugging Face

🗓️ 08 Dec 2025
📚 AI News & Trends

The evolution of Automatic Speech Recognition (ASR) has reshaped how humans interact with technology. From dictation tools and live trannoscription to smart assistants and media captioning, ASR technology continues to bridge the gap between speech and digital communication. However, achieving real-time, high-accuracy trannoscription often comes at the cost of heavy computational requirements until now. Enter ...

#DistilWhisper #FasterSpeechRecognition #SmallerModels #HuggingFace #ASRTechnology #RealTimeTrannoscription

289 views21:06

📖 Read More

📣 BEST TELEGRAM CHANNELS

ML Research Hub

✨Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

📝 Summary:
Colon-X introduces ColonR1, a novel reasoning-centric model for intelligent colonoscopy. It achieves 56.61% accuracy, outperforming traditional methods by 25.22% under data scarcity, by leveraging new comprehensive multimodal datasets.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03667
• PDF: https://arxiv.org/pdf/2512.03667
• Github: https://github.com/ai4colonoscopy/Colon-X

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

296 views22:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems

📝 Summary:
DoVer is an intervention-driven debugging approach for LLM multi-agent systems. It validates failure hypotheses and measures progress via targeted interventions, improving reliability. DoVer converts 18-49% of failed tasks into successes, offering an outcome-oriented debugging method.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06749
• PDF: https://arxiv.org/pdf/2512.06749
• Project Page: https://aka.ms/DoVer

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #MultiAgentSystems #Debugging #AI #Research

224 views04:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

📝 Summary:
The paper proposes a method to enhance Rotary Position Embeddings by utilizing both the real and imaginary components of the complex-valued dot product, improving long-context modeling in Large Langua...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07525
• PDF: https://arxiv.org/pdf/2512.07525
• Github: https://github.com/OpenMOSS/rope_pp

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

185 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LongCat-Image Technical Report

📝 Summary:
LongCat-Image is a bilingual open-source foundation model for image generation that addresses multilingual text rendering, photorealism, and deployment efficiency through rigorous data curation, compa...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07584
• PDF: https://arxiv.org/pdf/2512.07584
• Project Page: https://longcat.chat/
• Github: https://github.com/meituan-longcat/LongCat-Image

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

183 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

✨EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

📝 Summary:
EgoEdit is a real-time, instruction-following egocentric video editor that addresses challenges in handling egomotion and hand-object interactions, outperforming existing methods on egocentric editing...

🔹 Publication Date: Published on Dec 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06065
• PDF: https://arxiv.org/pdf/2512.06065
• Project Page: https://snap-research.github.io/EgoEdit/
• Github: https://github.com/snap-research/EgoEdit

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

184 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Scaling Zero-Shot Reference-to-Video Generation

📝 Summary:
Saber is a scalable zero-shot framework for reference-to-video generation that uses video-text pairs to learn identity-consistent representations and outperforms models trained with explicit reference...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06905
• PDF: https://arxiv.org/pdf/2512.06905
• Project Page: https://franciszzj.github.io/Saber/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

178 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Rethinking Training Dynamics in Scale-wise Autoregressive Generation

📝 Summary:
Self-Autoregressive Refinement (SAR) improves the quality of autoregressive generative models by addressing exposure bias through Stagger-Scale Rollout and Contrastive Student-Forcing Loss, leading to...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06421
• PDF: https://arxiv.org/pdf/2512.06421
• Project Page: https://gengzezhou.github.io/SAR/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

186 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Embodied Referring Expression Comprehension in Human-Robot Interaction

📝 Summary:
A large-scale dataset and multimodal model improve embodied interaction comprehension in robots by addressing perspective bias and enhancing multimodal signal integration. AI-generated summary As robo...

🔹 Publication Date: Published on Dec 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06558
• PDF: https://arxiv.org/pdf/2512.06558

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

207 views04:01

✨ Explore Data Science 📝 Write your paper

✨Unified Video Editing with Temporal Reasoner

📝 Summary:
VideoCoF, a Chain-of-Frames approach, improves video editing precision and instruction-to-region mapping by using reasoning tokens without requiring user-provided masks. AI-generated summary Existing ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07469
• PDF: https://arxiv.org/pdf/2512.07469
• Project Page: https://videocof.github.io/
• Github: https://github.com/knightyxp/VideoCoF

🔹 Models citing this paper:
• https://huggingface.co/XiangpengYang/VideoCoF

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

172 views05:02

✨ Explore Data Science 📝 Write your paper

✨Relational Visual Similarity

📝 Summary:
Vision-Language models fine-tuned on anonymized image captions can capture relational similarity between images, a capability lacking in current visual similarity metrics. AI-generated summary Humans ...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07833
• PDF: https://arxiv.org/pdf/2512.07833
• Project Page: https://thaoshibe.github.io/relsim/
• Github: https://github.com/thaoshibe/relsim

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

169 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Group Representational Position Encoding

📝 Summary:
GRAPE is a unified positional encoding framework that combines multiplicative rotations and additive logit biases, extending existing methods like RoPE and ALiBi. AI-generated summary We present GRAPE...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07805
• PDF: https://model-architectures.github.io/GRAPE/GRAPE.pdf
• Github: https://model-architectures.github.io/GRAPE/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

163 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:10

This media is not supported in your browser

VIEW IN TELEGRAM

✨Voxify3D: Pixel Art Meets Volumetric Rendering

📝 Summary:
Voxify3D is a two-stage framework that combines 3D mesh optimization with 2D pixel art supervision to generate high-quality voxel art with semantic preservation, pixel-art aesthetics, and discrete col...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07834
• PDF: https://arxiv.org/pdf/2512.07834
• Project Page: https://yichuanh.github.io/Voxify-3D/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

162 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

📝 Summary:
A controlled experimental framework isolates and evaluates the contributions of pre-training, mid-training, and reinforcement learning in improving language model reasoning, demonstrating the necessit...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07783
• PDF: https://arxiv.org/pdf/2512.07783
• Github: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

224 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Vector Quantization using Gaussian Variational Autoencoder

📝 Summary:
Gaussian Quant (GQ) converts Gaussian VAE to VQ-VAE without training, outperforming previous VQ-VAEs and Gaussian VAE discretization methods across different architectures. AI-generated summary Vector...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06609
• PDF: https://arxiv.org/pdf/2512.06609
• Github: https://github.com/Stability-AI/generative-models

🔹 Models citing this paper:
• https://huggingface.co/xutongda/GQModel

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

249 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:08

This media is not supported in your browser

VIEW IN TELEGRAM

✨VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

📝 Summary:
VideoVLA uses a multi-modal Diffusion Transformer to predict actions and visual outcomes from language and image inputs, enabling strong generalization in robotic manipulation tasks. AI-generated summ...

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06963
• PDF: https://arxiv.org/pdf/2512.06963
• Project Page: https://videovla-nips2025.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

284 views05:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

📝 Summary:
NPR is a teacher-free framework enabling LLMs to perform genuine parallel reasoning. It uses self-distilled training and a new optimization algorithm. This achieves significant performance gains and speedups on reasoning benchmarks.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07461
• PDF: https://arxiv.org/pdf/2512.07461
• Github: https://bigai-nlco.github.io/Native-Parallel-Reasoner

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

263 views07:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Distribution Matching Variational AutoEncoder

📝 Summary:
DMVAE explicitly aligns VAE latent distributions with arbitrary reference distributions, generalizing beyond fixed priors. This improves modeling efficiency and image synthesis fidelity, with SSL-derived distributions showing excellent balance.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07778
• PDF: https://arxiv.org/pdf/2512.07778
• Github: https://github.com/sen-ye/dmvae%7D

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VAE #DeepLearning #GenerativeAI #ImageSynthesis #ArtificialIntelligence

182 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Multi-view Pyramid Transformer: Look Coarser to See Broader

📝 Summary:
MVP is a scalable multi-view transformer that reconstructs large 3D scenes from many images. It uses a dual hierarchy of local-to-global inter-view and fine-to-coarse intra-view processing. This achieves efficient, state-of-the-art 3D scene reconstruction quality.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07806
• PDF: https://arxiv.org/pdf/2512.07806
• Project Page: https://gynjn.github.io/MVP/
• Github: https://github.com/Gynjn/MVP

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#3DReconstruction #ComputerVision #Transformers #DeepLearning #AI

199 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch

267 views08:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

📝 Summary:
ReCamDriving generates camera-controlled novel-trajectory videos using dense 3DGS renderings and a two-stage training approach, achieving state-of-the-art results in controllability and consistency. A...

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03621
• PDF: https://arxiv.org/pdf/2512.03621
• Project Page: https://recamdriving.github.io/
• Github: https://recamdriving.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

268 views08:04

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform