ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
HI-TransPA: Hearing Impairments Translation Personal Assistant

📝 Summary:
HI-TransPA, an instruction-driven audio-visual personal assistant, uses Omni-Model paradigm to translate and dialogue by fusing speech with lip dynamics, achieving state-of-the-art performance in assi...

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09915
• PDF: https://arxiv.org/pdf/2511.09915

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Simulating the Visual World with Artificial Intelligence: A Roadmap

📝 Summary:
Video generation is evolving towards foundation models that integrate world simulation and rendering to produce physically plausible and interactive videos. AI-generated summary The landscape of video...

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08585
• PDF: https://arxiv.org/pdf/2511.08585
• Github: https://github.com/ziqihuangg/Awesome-From-Video-Generation-to-World-Model

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Workload Schedulers -- Genesis, Algorithms and Differences

📝 Summary:
This paper categorizes modern workload schedulers into three classes: OS, Cluster, and Big Data. It details their evolution, algorithms, and differences. The conclusion highlights similarities in scheduling strategy design across both local and distributed systems.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10258
• PDF: https://arxiv.org/pdf/2511.10258

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#WorkloadScheduling #OperatingSystems #DistributedComputing #SchedulingAlgorithms #ComputerScience
MediaPipe: A Framework for Building Perception Pipelines

📝 Summary:
MediaPipe is a framework for building perception applications, offering tools to combine components, prototype, and measure performance across platforms. It helps developers iteratively improve AI models with reproducible results.

🔹 Publication Date: Published on Jun 14, 2019

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/1906.08172
• PDF: https://arxiv.org/pdf/1906.08172
• Github: https://github.com/google-ai-edge/mediapipe

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

📝 Summary:
VOIX is a web framework using declarative HTML tags like tool and context for websites to explicitly define AI agent capabilities. This enables reliable, privacy-preserving, and secure agent interaction with human-oriented interfaces, fostering the Agentic Web.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11287
• PDF: https://arxiv.org/pdf/2511.11287

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AgenticWeb #AIAgents #WebFramework #DeclarativeAI #FutureofWeb
Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding

📝 Summary:
A framework integrates human priors into end-to-end generative recommenders, enhancing accuracy and beyond-accuracy objectives by leveraging lightweight adapter heads and hierarchical composition stra...

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10492
• PDF: https://arxiv.org/pdf/2511.10492
• Github: https://github.com/zhykoties/Multi-Head-Recommendation-with-Human-Priors

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

📝 Summary:
RF-DETR is a light-weight detection transformer leveraging weight-sharing NAS to optimize accuracy-latency tradeoffs across diverse datasets. It significantly outperforms prior state-of-the-art, being the first real-time detector to surpass 60 AP on COCO.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09554
• PDF: https://arxiv.org/pdf/2511.09554

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ObjectDetection #ComputerVision #MachineLearning #NeuralArchitectureSearch #Transformers
This media is not supported in your browser
VIEW IN TELEGRAM
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

📝 Summary:
MeshCoder reconstructs complex 3D objects from point clouds into editable Blender Python noscripts using a multimodal LLM. This enables superior shape-to-code reconstruction, intuitive editing via code, and enhances 3D shape understanding.

🔹 Publication Date: Published on Aug 20

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/meshcoder-llm-powered-structured-mesh-code-generation-from-point-clouds
• PDF: https://arxiv.org/pdf/2508.14879
• Project Page: https://daibingquan.github.io/MeshCoder
• Github: https://daibingquan.github.io/MeshCoder

🔹 Models citing this paper:
https://huggingface.co/InternRobotics/MeshCoder

Datasets citing this paper:
https://huggingface.co/datasets/InternRobotics/MeshCoderDataset

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MeshCoder #LLM #3DReconstruction #PointClouds #ComputerGraphics
1
Experience-Guided Adaptation of Inference-Time Reasoning Strategies

📝 Summary:
Experience-Guided Reasoner EGuR dynamically generates and optimizes complete computational strategies at inference time using accumulated experience. It adapts LLM calls tools and control logic improving accuracy up to 14 percent and reducing costs by up to 111x.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11519
• PDF: https://arxiv.org/pdf/2511.11519

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #Reasoning #Optimization #MachineLearning
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

📝 Summary:
Tool-augmented LLMs exhibit Tool-Induced Myopia TIM, treating tool outputs as substitutes for true reasoning. This improves final answer accuracy but significantly degrades reasoning quality. A proposed framework realigns these models to use tools as assistive evidence, enhancing both accuracy an...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10899
• PDF: https://arxiv.org/pdf/2511.10899

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AIResearch #Reasoning #ToolAugmentation #AIHallucinations
1
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

📝 Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03108
• PDF: https://arxiv.org/pdf/2511.03108
• Github: https://github.com/roozbeh-yz/miniF2F_v2

Datasets citing this paper:
https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

📝 Summary:
A parallel multimodal diffusion framework, MMaDA-Parallel, enhances cross-modal alignment and semantic consistency in thinking-aware image synthesis by addressing error propagation issues in sequentia...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09611
• PDF: https://arxiv.org/pdf/2511.09611
• Project Page: https://tyfeld.github.io/mmadaparellel.github.io/
• Github: https://github.com/tyfeld/MMaDA-Parallel

🔹 Models citing this paper:
https://huggingface.co/tyfeld/MMaDA-Parallel-A
https://huggingface.co/tyfeld/MMaDA-Parallel-M

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #DiffusionModels #ImageSynthesis #LLM #AIResearch
Media is too big
VIEW IN TELEGRAM
UFO^3: Weaving the Digital Agent Galaxy

📝 Summary:
UFO^3 unifies diverse digital devices into a single orchestration fabric, enabling AI agents to collaborate seamlessly across platforms. It models tasks dynamically for asynchronous execution, achieving efficient, resilient, and accurate cross-device task orchestration with improved parallelism a...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11332
• PDF: https://arxiv.org/pdf/2511.11332
• Project Page: https://microsoft.github.io/UFO/
• Github: https://github.com/microsoft/UFO/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIAgents #TaskOrchestration #DistributedSystems #EdgeAI #MultiAgentSystems
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models

📝 Summary:
VLMs degrade under test-time domain shifts. Spectrum-Aware Test-Time Steering STS is a lightweight method that adapts VLM latent representations by steering them using textual embedding subspaces, without backpropagation. STS surpasses state-of-the-art, offering faster inference and less memory.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09809
• PDF: https://arxiv.org/pdf/2511.09809

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #ZeroShotGeneralization #DomainAdaptation #DeepLearning #AI
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

📝 Summary:
Uni-MoE 2.0-Omni is an open-source omnimodal large model improving multimodal understanding, reasoning, and generation. It uses dynamic MoE and progressive training to achieve state-of-the-art results across 85 benchmarks, outperforming leading models like Qwen2.5-Omni.

🔹 Publication Date: Published on Nov 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12609
• PDF: https://arxiv.org/pdf/2511.12609
• Project Page: https://idealistxy.github.io/Uni-MoE-v2.github.io/
• Github: https://github.com/HITsz-TMG/Uni-MoE

🔹 Models citing this paper:
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Omni
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Base
https://huggingface.co/HIT-TMG/Uni-MoE-2.0-Image

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#OmnimodalAI #LLMs #MixtureOfExperts #MultimodalLearning #AIResearch
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

📝 Summary:
GroupRank introduces a novel groupwise reranking paradigm addressing limitations of pointwise and listwise methods. It processes queries with document groups to assign comparative relevance scores, combining flexibility with global context. Trained via reinforcement learning and synthesized data,...

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11653
• PDF: https://arxiv.org/pdf/2511.11653

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#Reranking #ReinforcementLearning #InformationRetrieval #MachineLearning #DataScience
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

📝 Summary:
TiViBench is a new benchmark assessing image-to-video models reasoning across four dimensions and 24 tasks. Commercial models show stronger reasoning potential. VideoTPO, a test-time strategy, significantly enhances performance, advancing reasoning in video generation.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13704
• PDF: https://arxiv.org/pdf/2511.13704
• Project Page: https://haroldchen19.github.io/TiViBench-Page/
• Github: https://haroldchen19.github.io/TiViBench-Page/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AIBenchmark #ComputerVision #DeepLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything

Datasets citing this paper:
https://huggingface.co/datasets/Caoza/PhysX-Mobility

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

📝 Summary:
Part-X-MLLM is a 3D multimodal large language model that unifies diverse 3D tasks by generating structured programs from RGB point clouds and language prompts. It outputs part-level data and edit commands, enabling state-of-the-art 3D generation and editing through one interface.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13647
• PDF: https://arxiv.org/pdf/2511.13647
• Project Page: https://chunshi.wang/Part-X-MLLM/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3D #MLLM #GenerativeAI #ComputerVision #AIResearch