ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

📝 Summary:
SafeGRPO introduces a self-rewarded, rule-governed framework for multimodal safety alignment in MLLMs. It integrates verifiable reward construction and step-guided safety thinking to improve robustness against compositional risks and enhance reasoning stability.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12982
• PDF: https://arxiv.org/pdf/2511.12982

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MLLMs #AISafety #MultimodalAI #ReinforcementLearning #AIResearch
Error-Driven Scene Editing for 3D Grounding in Large Language Models

📝 Summary:
DEER-3D improves 3D LLM grounding by iteratively editing and retraining models. It diagnoses predicate-level errors, then generates targeted 3D scene edits as counterfactuals to enhance spatial understanding and accuracy.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14086
• PDF: https://arxiv.org/pdf/2511.14086
• Github: https://github.com/zhangyuejoslin/Deer-3D

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #3DGrounding #ComputerVision #DeepLearning #AIResearch
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

📝 Summary:
ATLAS is a new, high-difficulty, multidisciplinary benchmark for LLMs, featuring 800 original problems across seven scientific fields. It addresses current benchmark limitations with complex, open-ended answers and aims to differentiate advanced scientific reasoning, serving as a ruler for AGI pr...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14366
• PDF: https://arxiv.org/pdf/2511.14366

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AGI #AIResearch #ScientificReasoning #Benchmark
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle

Spaces citing this paper:
https://huggingface.co/spaces/Kwai-Kolors/CoTyle

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

📝 Summary:
MVI-Bench introduces a new benchmark to evaluate Large Vision-Language Models robustness against misleading visual inputs. It utilizes a hierarchical taxonomy and a novel metric to uncover significant vulnerabilities in state-of-the-art LVLMs.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14159
• PDF: https://arxiv.org/pdf/2511.14159
• Github: https://github.com/chenyil6/MVI-Bench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LVLMs #ComputerVision #AIrobustness #MachineLearning #AI
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

📝 Summary:
Text-only self-reflection is insufficient for long-form video understanding. REVISOR is a new framework enabling MLLMs to perform multimodal introspective reflection across text and visual modalities. This significantly enhances reasoning for long videos without extra fine-tuning, achieving stron...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13026
• PDF: https://arxiv.org/pdf/2511.13026

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #VideoUnderstanding #MLLMs #AIResearch #ComputerVision
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

📝 Summary:
This paper clarifies RL for LLM Agents by extending the MDP framework. It introduces Agent-R1, a modular and flexible training framework, demonstrating its effectiveness on Multihop QA tasks.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14460
• PDF: https://arxiv.org/pdf/2511.14460
• Github: https://github.com/0russwest0/Agent-R1

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMAgents #ReinforcementLearning #AI #DeepLearning #NLP
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio

🔹 Models citing this paper:
https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency
This media is not supported in your browser
VIEW IN TELEGRAM
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

📝 Summary:
Think-at-Hard TaH improves LLM reasoning by dynamically refining only hard tokens. It uses a neural decider to identify them and LoRA for focused refinement, boosting performance with minimal overhead.

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08577
• PDF: https://arxiv.org/pdf/2511.08577
• Github: https://github.com/thu-nics/TaH

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #MachineLearning #NaturalLanguageProcessing #Reasoning
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.

🔹 Publication Date: Published on May 18, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

📝 Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14295
• PDF: https://arxiv.org/pdf/2511.14295

Datasets citing this paper:
https://huggingface.co/datasets/hammh0a/AraLingBench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
Mitigating Label Length Bias in Large Language Models

📝 Summary:
Large Language Models exhibit a label length bias with multi-token class labels. This paper introduces Normalized Contextual Calibration NCC to mitigate this issue by normalizing and calibrating predictions at the full-label level. NCC significantly improves performance and reliability across div...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14385
• PDF: https://arxiv.org/pdf/2511.14385

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #NLP #BiasInAI #MachineLearning
Φeat: Physically-Grounded Feature Representation

📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #XMC #MultiModalAI #MachineLearning #AIResearch
A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

📝 Summary:
RBTransformer, a Transformer-based model, improves EEG-based emotion recognition by modeling inter-cortical neural dynamics. It uses Band Differential Entropy tokens and multi-head attention. This approach significantly outperforms existing state-of-the-art methods on multiple datasets and dimens...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13954
• PDF: https://arxiv.org/pdf/2511.13954
• Github: https://github.com/nnilayy/RBTransformer

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EEG #EmotionRecognition #Transformers #Neuroscience #MachineLearning
Media is too big
VIEW IN TELEGRAM
Proactive Hearing Assistants that Isolate Egocentric Conversations

📝 Summary:
A proactive hearing assistant system automatically identifies and isolates the wearers conversation partners from binaural audio. It uses a dual-model AI architecture that adapts to conversational dynamics in real-time, improving speech clarity without user prompts.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11473
• PDF: https://arxiv.org/pdf/2511.11473
• Project Page: https://proactivehearing.cs.washington.edu/
• Github: https://github.com/guilinhu/proactive_hearing_assistant

🔹 Models citing this paper:
https://huggingface.co/guilinhu/proactive_hearing

Datasets citing this paper:
https://huggingface.co/datasets/guilinhu/libri_conversation

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#HearingTech #AI #SpeechEnhancement #AssistiveTechnology #AudioProcessing
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

📝 Summary:
NORA-1.5, an enhanced vision-language-action model with a flow-matching-based action expert and reward-driven post-training, improves performance and reliability in both simulated and real-world setti...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14659
• PDF: https://arxiv.org/pdf/2511.14659
• Project Page: https://declare-lab.github.io/nora-1.5
• Github: https://github.com/declare-lab/nora-1.5

🔹 Models citing this paper:
https://huggingface.co/declare-lab/nora-1.5

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1