ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4K photos
228 videos
23 files
4.31K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI
1
Learning to Refocus with Video Diffusion Models

📝 Summary:
A novel method enables realistic post-capture refocusing from a single defocused image. It uses video diffusion models to generate a focal stack for interactive focus adjustment. This approach outperforms existing methods, improving photography focus-editing.

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19823
• PDF: https://arxiv.org/pdf/2512.19823
• Project Page: https://learn2refocus.github.io/
• Github: https://github.com/tedlasai/learn2refocus

🔹 Models citing this paper:
https://huggingface.co/tedlasai/learn2refocus

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoDiffusionModels #ComputationalPhotography #ImageRefocusing #DeepLearning #ComputerVision
2
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

📝 Summary:
T2AV-Compass introduces a unified benchmark for text-to-audio-video generation evaluation. It features 500 diverse prompts and a dual-level framework. Evaluations reveal current T2AV models struggle significantly with realism and cross-modal consistency.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21094
• PDF: https://arxiv.org/pdf/2512.21094
• Project Page: https://nju-link.github.io/T2AV-Compass/
• Github: https://github.com/NJU-LINK/T2AV-Compass/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TextToAudioVideo #MultimodalAI #AIEvaluation #GenerativeAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

📝 Summary:
DSR Suite improves vision language models weak dynamic spatial reasoning. It creates 4D training data from videos using an automated pipeline and integrates geometric priors via a Geometry Selection Module. This significantly enhances VLM dynamic spatial reasoning capability while maintaining gen...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20557
• PDF: https://arxiv.org/pdf/2512.20557

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #SpatialReasoning #4D #ComputerVision #AIResearch
NVIDIA Nemotron 3: Efficient and Open Intelligence

📝 Summary:
NVIDIA introduces Nemotron 3, a family of models with strong agentic, reasoning, and conversational capabilities. They feature a hybrid Mamba-Transformer MoE architecture for high throughput and long context, plus advanced post-training for tool use. The models will be openly released.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20856
• PDF: https://arxiv.org/pdf/2512.20856

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #LLM #DeepLearning #NVIDIA #OpenSource
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

📝 Summary:
We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique t...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20848
• PDF: https://arxiv.org/pdf/2512.20848

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Streaming Video Instruction Tuning

📝 Summary:
We present Streamo, a real-time streaming video LLM that serves as a general-purpose interactive assistant. Unlike existing online video models that focus narrowly on question answering or captioning,...

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21334
• PDF: https://arxiv.org/pdf/2512.21334

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

📝 Summary:
The rapid proliferation of Large Language Models (LLMs) and diverse specialized benchmarks necessitates a shift from fragmented, task-specific metrics to a holistic, competitive ranking system that ef...

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21010
• PDF: https://arxiv.org/pdf/2512.21010

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

📝 Summary:
TurboDiffusion significantly accelerates video generation by 100-200x while maintaining quality. It achieves this speedup through attention acceleration, step distillation, and W8A8 quantization. Experiments confirm the substantial speedup on a single GPU.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16093
• PDF: https://jt-zhang.github.io/files/TurboDiffusion_Technical_Report.pdf
• Project Page: https://github.com/thu-ml/TurboDiffusion
• Github: https://github.com/thu-ml/TurboDiffusion

🔹 Models citing this paper:
https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P
https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P
https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-14B-720P

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

📝 Summary:
High-resolution video generation, while crucial for digital media and film, is computationally bottlenecked by the quadratic complexity of diffusion models, making practical inference infeasible. To a...

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21338
• PDF: https://arxiv.org/pdf/2512.21338
• Project Page: http://haonanqiu.com/projects/HiStream.html
• Github: https://github.com/arthur-qiu/HiStream

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

📝 Summary:
VLMs exhibit a significant popularity bias, performing better on famous items via memorization rather than general understanding. We introduce YearGuessr, a large multi-modal dataset and benchmark, confirming VLMs struggle with unrecognized subjects.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21337
• PDF: https://arxiv.org/pdf/2512.21337
• Project Page: https://sytwu.github.io/BeyondMemo/
• Github: https://sytwu.github.io/BeyondMemo/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations

📝 Summary:
Recent advances in pretraining general foundation models have significantly improved performance across diverse downstream tasks. While autoregressive (AR) generative models like GPT have revolutioniz...

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21004
• PDF: https://arxiv.org/pdf/2512.21004
• Github: https://github.com/Singularity0104/NExT-Vid

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

📝 Summary:
Tokenizers provide the fundamental basis through which text is represented and processed by language models (LMs). Despite the importance of tokenization, its role in LM performance and behavior is po...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20757
• PDF: https://arxiv.org/pdf/2512.20757
• Github: https://github.com/r-three/Tokenizers

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

📝 Summary:
DreaMontage is a framework for generating seamless, expressive, long-duration one-shot videos from diverse inputs. It integrates an intermediate-conditioning DiT, a tailored DPO for smoothness, and a segment-wise auto-regressive inference strategy for long sequences.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21252
• PDF: https://arxiv.org/pdf/2512.21252
• Project Page: https://dreamontage.github.io/DreaMontage/
• Github: https://dreamontage.github.io/DreaMontage/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
🔥 NEW YEAR 2026 – PREMIUM

nature papers: 400$

Q1 and  Q2 papers    300$

Q3 and Q4 papers   200$

Doctoral thesis (complete)    500$

M.S thesis         300$

paper simulation   150$

Contact me: @Omidyzd62
Multi-hop Reasoning via Early Knowledge Alignment

📝 Summary:
Early Knowledge Alignment EKA improves iterative RAG by aligning LLMs with relevant knowledge before planning. This enhances retrieval, reduces errors, and boosts performance and efficiency.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20144
• PDF: https://arxiv.org/pdf/2512.20144

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultiHopReasoning #LLM #RAG #KnowledgeAlignment #AI
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470

Datasets citing this paper:
https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
1