ML Research Hub – Telegram
ML Research Hub
32.9K subscribers
4.63K photos
285 videos
24 files
5K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

📝 Summary:
PingPong is a new human-authored benchmark for natural, multi-party code-switching dialogues, including trilingual conversations. It offers greater structural diversity than machine-generated data. Evaluations show current language models struggle with code-switched inputs, emphasizing the need f...

🔹 Publication Date: Published on Jan 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.17277
• PDF: https://arxiv.org/pdf/2601.17277

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#CodeSwitching #NLP #DialogueSystems #MultilingualAI #LLMs
Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

📝 Summary:
Visual token compression degrades LVLM robustness via unstable token importance ranking. This causes critical information loss, creating vulnerabilities only under compression. An attack exploits this, revealing an efficiency-security trade-off.

🔹 Publication Date: Published on Jan 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.12042
• PDF: https://arxiv.org/pdf/2601.12042

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LVLM #AIsecurity #VisionAI #ModelRobustness #DeepLearning
1
Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control

📝 Summary:
FluidGym presents a standalone, fully differentiable reinforcement learning benchmark for active flow control that operates without external CFD solvers and supports standardized evaluation protocols....

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15015v1
• PDF: https://arxiv.org/pdf/2601.15015
• Github: https://github.com/safe-autonomous-systems/fluidgym

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion

📝 Summary:
STAR improves table representation for table retrieval tasks. It uses header-aware clustering to create diverse partial tables and generate cluster-specific queries. STAR then employs weighted fusion for fine-grained alignment, outperforming previous methods on benchmarks.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15860
• PDF: https://arxiv.org/pdf/2601.15860
• Github: https://github.com/adsl135789/STAR

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TableRepresentation #InformationRetrieval #Clustering #DataScience #MachineLearning
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

📝 Summary:
DeepPlanning is a new benchmark for long-horizon agent planning, addressing the lack of global optimization and fine-grained local constraints in current LLM assessments. It features complex real-world tasks where even frontier LLMs struggle, highlighting the need for explicit reasoning and paral...

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18137
• PDF: https://arxiv.org/pdf/2601.18137

Datasets citing this paper:
https://huggingface.co/datasets/Qwen/DeepPlanning

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIPlanning #LLMs #AgentAI #Benchmarking #DeepLearning
A Mechanistic View on Video Generation as World Models: State and Dynamics

📝 Summary:
Video generation models are categorized based on state construction and dynamics modeling approaches, with emphasis on transitioning evaluation metrics from visual quality to functional capabilities l...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.17067
• PDF: https://arxiv.org/pdf/2601.17067

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

📝 Summary:
TensorLens presents a novel mathematical framework that represents the complete transformer architecture as a single input-dependent linear operator using high-order tensors, enabling comprehensive an...

🔹 Publication Date: Published on Jan 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.17958
• PDF: https://arxiv.org/pdf/2601.17958

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

📝 Summary:
HalluGuard presents a theoretical framework that decomposes LLM hallucination risk into data-driven and reasoning-driven components. It introduces an NTK-based score to jointly detect both types of hallucinations, achieving state-of-the-art performance across various benchmarks and LLMs.

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18753
• PDF: https://arxiv.org/pdf/2601.18753

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #AI #MachineLearning #Hallucination #NLP
1👍1
MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts

📝 Summary:
Specialized AI reasoning models prioritize task completion over safety. Our MortalMATH benchmark shows these models ignore emergencies to complete math, unlike generalist models. This relentless focus on correctness may remove crucial safety instincts and cause dangerous delays.

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18790
• PDF: https://arxiv.org/pdf/2601.18790

Datasets citing this paper:
https://huggingface.co/datasets/sileod/MortalMATH

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AISafety #AIethics #MachineLearning #AIReasoning #MortalMATH
1
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing

📝 Summary:
Interp3D is a training-free framework for textured 3D morphing. It solves existing issues of structural misalignment and texture blurring by ensuring geometric consistency and texture alignment using generative priors and progressive alignment. The method outperforms prior approaches.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14103
• PDF: https://arxiv.org/pdf/2601.14103
• Project Page: https://interp3d.github.io/
• Github: https://github.com/xiaolul2/Interp3D

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DMorphing #GenerativeAI #ComputerGraphics #DeepLearning #AIResearch
1
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

📝 Summary:
TSRBench introduces a multi-modal benchmark to evaluate generalist models on time series reasoning. It reveals scaling laws break down for prediction, strong reasoning doesnt guarantee accurate forecasting, and multimodal models fail to effectively fuse diverse inputs.

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18744
• PDF: https://arxiv.org/pdf/2601.18744

Datasets citing this paper:
https://huggingface.co/datasets/umd-zhou-lab/TSRBench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TimeSeries #MultimodalAI #GeneralistModels #MachineLearning #AIResearch