ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

📝 Summary:
SpatialThinker is a new 3D-aware MLLM that uses RL and dense spatial rewards to significantly improve spatial understanding. It integrates structured spatial grounding and multi-step reasoning, outperforming existing models and GPT-4o on spatial VQA and real-world benchmarks.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07403
• PDF: https://arxiv.org/pdf/2511.07403
• Github: https://github.com/hunarbatra/SpatialThinker

🔹 Models citing this paper:
https://huggingface.co/OX-PIXL/SpatialThinker-3B
https://huggingface.co/OX-PIXL/SpatialThinker-7B

Datasets citing this paper:
https://huggingface.co/datasets/OX-PIXL/STVQA-7K

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalLLM #3DReasoning #ReinforcementLearning #AIResearch #ComputerVision
DoPE: Denoising Rotary Position Embedding

📝 Summary:
DoPE improves Transformer length generalization by detecting and mitigating noisy frequency bands in positional embeddings. This training-free method enhances retrieval accuracy and reasoning stability across extended contexts up to 64K tokens.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09146
• PDF: https://arxiv.org/pdf/2511.09146
• Project Page: https://The-physical-picture-of-LLMs.github.io

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#Transformers #PositionalEmbedding #LLMs #DeepLearning #AIResearch
LiteAttention: A Temporal Sparse Attention for Diffusion Transformers

📝 Summary:
LiteAttention accelerates video generation by exploiting temporal coherence in diffusion attention. It propagates skip decisions for non-essential attention tiles across denoising steps, eliminating redundant computations. This achieves substantial speedups without quality loss.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11062
• PDF: https://arxiv.org/pdf/2511.11062
• Github: https://github.com/moonmath-ai/LiteAttention

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DiffusionModels #VideoGeneration #Transformers #SparseAttention #ComputationalEfficiency
Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

📝 Summary:
This survey examines methods for using large language models to generate scientific ideas, categorizing them into five families and aligning them with creativity frameworks to improve scientific sound...

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07448
• PDF: https://arxiv.org/pdf/2511.07448

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Qwen3 Technical Report

📝 Summary:
Qwen3 is a new series of large language models integrating thinking and non-thinking modes for unified performance and efficiency. It achieves state-of-the-art results across diverse tasks and expands multilingual support to 119 languages.

🔹 Publication Date: Published on May 14

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/qwen3-technical-report
• PDF: https://arxiv.org/pdf/2505.09388
• Project Page: https://qwenlm.github.io/blog/qwen3/
• Github: https://github.com/QwenLM/Qwen3

🔹 Models citing this paper:
https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
https://huggingface.co/Qwen/Qwen3-235B-A22B
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct

Spaces citing this paper:
https://huggingface.co/spaces/modelscope/DocResearch
https://huggingface.co/spaces/enzostvs/deepsite
https://huggingface.co/spaces/multimodalart/Eigen-Banana

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #MultilingualAI #NLP #Qwen3
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

📝 Summary:
WEAVE introduces a suite with a large dataset and benchmark to assess multi-turn context-dependent image generation and editing in multimodal models. It enables new capabilities like visual memory in models while exposing current limitations in these complex tasks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11434
• PDF: https://arxiv.org/pdf/2511.11434
• Project Page: https://weichow23.github.io/weave/
• Github: https://github.com/weichow23/weave

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #ImageGeneration #GenerativeAI #ComputerVision #AIResearch
HI-TransPA: Hearing Impairments Translation Personal Assistant

📝 Summary:
HI-TransPA, an instruction-driven audio-visual personal assistant, uses Omni-Model paradigm to translate and dialogue by fusing speech with lip dynamics, achieving state-of-the-art performance in assi...

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09915
• PDF: https://arxiv.org/pdf/2511.09915

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Simulating the Visual World with Artificial Intelligence: A Roadmap

📝 Summary:
Video generation is evolving towards foundation models that integrate world simulation and rendering to produce physically plausible and interactive videos. AI-generated summary The landscape of video...

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08585
• PDF: https://arxiv.org/pdf/2511.08585
• Github: https://github.com/ziqihuangg/Awesome-From-Video-Generation-to-World-Model

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Workload Schedulers -- Genesis, Algorithms and Differences

📝 Summary:
This paper categorizes modern workload schedulers into three classes: OS, Cluster, and Big Data. It details their evolution, algorithms, and differences. The conclusion highlights similarities in scheduling strategy design across both local and distributed systems.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10258
• PDF: https://arxiv.org/pdf/2511.10258

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#WorkloadScheduling #OperatingSystems #DistributedComputing #SchedulingAlgorithms #ComputerScience
MediaPipe: A Framework for Building Perception Pipelines

📝 Summary:
MediaPipe is a framework for building perception applications, offering tools to combine components, prototype, and measure performance across platforms. It helps developers iteratively improve AI models with reproducible results.

🔹 Publication Date: Published on Jun 14, 2019

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/1906.08172
• PDF: https://arxiv.org/pdf/1906.08172
• Github: https://github.com/google-ai-edge/mediapipe

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

📝 Summary:
VOIX is a web framework using declarative HTML tags like tool and context for websites to explicitly define AI agent capabilities. This enables reliable, privacy-preserving, and secure agent interaction with human-oriented interfaces, fostering the Agentic Web.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11287
• PDF: https://arxiv.org/pdf/2511.11287

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AgenticWeb #AIAgents #WebFramework #DeclarativeAI #FutureofWeb
Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding

📝 Summary:
A framework integrates human priors into end-to-end generative recommenders, enhancing accuracy and beyond-accuracy objectives by leveraging lightweight adapter heads and hierarchical composition stra...

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10492
• PDF: https://arxiv.org/pdf/2511.10492
• Github: https://github.com/zhykoties/Multi-Head-Recommendation-with-Human-Priors

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

📝 Summary:
RF-DETR is a light-weight detection transformer leveraging weight-sharing NAS to optimize accuracy-latency tradeoffs across diverse datasets. It significantly outperforms prior state-of-the-art, being the first real-time detector to surpass 60 AP on COCO.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09554
• PDF: https://arxiv.org/pdf/2511.09554

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ObjectDetection #ComputerVision #MachineLearning #NeuralArchitectureSearch #Transformers
This media is not supported in your browser
VIEW IN TELEGRAM
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

📝 Summary:
MeshCoder reconstructs complex 3D objects from point clouds into editable Blender Python noscripts using a multimodal LLM. This enables superior shape-to-code reconstruction, intuitive editing via code, and enhances 3D shape understanding.

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/meshcoder-llm-powered-structured-mesh-code-generation-from-point-clouds
• PDF: https://arxiv.org/pdf/2508.14879
• Project Page: https://daibingquan.github.io/MeshCoder
• Github: https://daibingquan.github.io/MeshCoder

🔹 Models citing this paper:
https://huggingface.co/InternRobotics/MeshCoder

Datasets citing this paper:
https://huggingface.co/datasets/InternRobotics/MeshCoderDataset

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MeshCoder #LLM #3DReconstruction #PointClouds #ComputerGraphics
1
Experience-Guided Adaptation of Inference-Time Reasoning Strategies

📝 Summary:
Experience-Guided Reasoner EGuR dynamically generates and optimizes complete computational strategies at inference time using accumulated experience. It adapts LLM calls tools and control logic improving accuracy up to 14 percent and reducing costs by up to 111x.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11519
• PDF: https://arxiv.org/pdf/2511.11519

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #Reasoning #Optimization #MachineLearning
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

📝 Summary:
Tool-augmented LLMs exhibit Tool-Induced Myopia TIM, treating tool outputs as substitutes for true reasoning. This improves final answer accuracy but significantly degrades reasoning quality. A proposed framework realigns these models to use tools as assistive evidence, enhancing both accuracy an...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10899
• PDF: https://arxiv.org/pdf/2511.10899

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AIResearch #Reasoning #ToolAugmentation #AIHallucinations
1
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

📝 Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03108
• PDF: https://arxiv.org/pdf/2511.03108
• Github: https://github.com/roozbeh-yz/miniF2F_v2

Datasets citing this paper:
https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

📝 Summary:
A parallel multimodal diffusion framework, MMaDA-Parallel, enhances cross-modal alignment and semantic consistency in thinking-aware image synthesis by addressing error propagation issues in sequentia...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09611
• PDF: https://arxiv.org/pdf/2511.09611
• Project Page: https://tyfeld.github.io/mmadaparellel.github.io/
• Github: https://github.com/tyfeld/MMaDA-Parallel

🔹 Models citing this paper:
https://huggingface.co/tyfeld/MMaDA-Parallel-A
https://huggingface.co/tyfeld/MMaDA-Parallel-M

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #DiffusionModels #ImageSynthesis #LLM #AIResearch