ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

📝 Summary:
Gradio is an open-source Python package that creates visual interfaces for ML models, making them accessible to non-specialized users via a URL. This improves collaboration by allowing easy interaction, feedback, and trust-building in interdisciplinary settings.

🔹 Publication Date: Published on Jun 6, 2019

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/1906.02569
• PDF: https://arxiv.org/pdf/1906.02569
• Github: https://github.com/gradio-app/gradio

🔹 Models citing this paper:
https://huggingface.co/CxECHO/CE

Datasets citing this paper:
https://huggingface.co/datasets/society-ethics/papers

Spaces citing this paper:
https://huggingface.co/spaces/orYx-models/Nudge_Generator
https://huggingface.co/spaces/society-ethics/about
https://huggingface.co/spaces/mindmime/gradio

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#Gradio #MachineLearning #MLOps #Python #DataScience
This media is not supported in your browser
VIEW IN TELEGRAM
NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

📝 Summary:
NAF upsamples Vision Foundation Model features zero-shot by learning adaptive spatial-and-content weights. It outperforms VFM-specific upsamplers without retraining, achieving state-of-the-art performance across various tasks efficiently.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18452
• PDF: https://arxiv.org/pdf/2511.18452
• Github: https://github.com/valeoai/NAF?tab=readme-ov-file

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ZeroShotLearning #ComputerVision #FeatureUpsampling #DeepLearning #AIResearch
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

📝 Summary:
G^2VLM integrates 3D geometry learning into vision-language models to overcome their spatial intelligence deficits. It unifies 3D reconstruction and spatial reasoning, leveraging learned 3D features to achieve strong performance in both tasks.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21688
• PDF: https://arxiv.org/pdf/2511.21688
• Project Page: https://gordonhu608.github.io/g2vlm.github.io/
• Github: https://github.com/InternRobotics/G2VLM

🔹 Models citing this paper:
https://huggingface.co/InternRobotics/G2VLM-2B-MoT

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #3DReconstruction #SpatialReasoning #ComputerVision #ArtificialIntelligence
1
MIRA: Multimodal Iterative Reasoning Agent for Image Editing

📝 Summary:
MIRA is a multimodal iterative reasoning agent that enhances diffusion-based image editing. It tackles complex instructions by breaking them into atomic edits via a perception-reasoning-action loop with visual feedback. This improves semantic consistency and perceptual quality, outperforming othe...

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21087
• PDF: https://arxiv.org/pdf/2511.21087

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #ImageEditing #MultimodalAI #DiffusionModels #ComputerVision
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

📝 Summary:
Multi-Crit evaluates multimodal models as judges on following diverse criteria using novel metrics. Findings reveal current models struggle with consistent adherence and flexibility to pluralistic criteria. This highlights gaps in capabilities and lays a foundation for building reliable AI evalua...

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21662
• PDF: https://arxiv.org/pdf/2511.21662
• Project Page: https://multi-crit.github.io/
• Github: https://multi-crit.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultimodalAI #AIEvaluation #BenchmarkingAI #AIJudges #MachineLearning
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

📝 Summary:
MLLMs often repeat errors due to insufficient multimodal memory. ViLoMem is a dual-stream memory framework that builds schema-based knowledge by separately encoding visual distractions and logical errors. This method significantly improves accuracy and reduces repeated errors across multiple benc...

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21678
• PDF: https://arxiv.org/pdf/2511.21678

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MLLMs #MultimodalAI #AIMemory #DeepLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

📝 Summary:
Canvas-to-Image unifies diverse controls like text, poses, and layouts into a single canvas image for high-fidelity compositional image generation. Its multi-task training helps it understand and integrate these controls, outperforming existing methods in adherence and identity.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21691
• PDF: https://arxiv.org/pdf/2511.21691
• Project Page: https://snap-research.github.io/canvas-to-image/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ImageGeneration #GenerativeAI #MultimodalAI #ComputerVision #DeepLearning
Video Generation Models Are Good Latent Reward Models

📝 Summary:
Traditional video reward models are inefficient, operating in pixel space. PRFL uses pre-trained video generation models as latent reward models, optimizing preferences entirely in latent space. This significantly improves human alignment and reduces memory and training time.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21541
• PDF: https://arxiv.org/pdf/2511.21541
• Project Page: https://kululumi.github.io/PRFL/
• Github: https://kululumi.github.io/PRFL/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #ReinforcementLearning #LatentSpace #AIResearch #MachineLearning
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

📝 Summary:
Applying a head-specific sigmoid gate after Scaled Dot-Product Attention in large language models significantly improves performance, stability, and scaling. This simple modification mitigates attention sink and enhances long-context extrapolation by introducing non-linearity and sparse gating.

🔹 Publication Date: Published on May 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.06708
• PDF: https://arxiv.org/pdf/2505.06708
• Github: https://github.com/qiuzh20/gated_attention

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AttentionMechanism #DeepLearning #NLP #AIResearch
Media is too big
VIEW IN TELEGRAM
Paper2Video: Automatic Video Generation from Scientific Papers

📝 Summary:
PaperTalker is a multi-agent framework for automatic academic video production, integrating slides, subnoscripts, speech, and talking heads. It produces more faithful, informative videos than existing methods, simplifying labor-intensive research communication.

🔹 Publication Date: Published on Oct 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.05096
• PDF: https://arxiv.org/pdf/2510.05096
• Project Page: https://showlab.github.io/Paper2Video/
• Github: https://showlab.github.io/Paper2Video/

Datasets citing this paper:
https://huggingface.co/datasets/ZaynZhu/Paper2Video

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoGeneration #AI #AcademicCommunication #MachineLearning #MultimodalAI
What does it mean to understand language?

📝 Summary:
Deep language understanding involves more than just surface meaning. It requires transferring information from the core language system to other brain regions for mental models, world knowledge, and memories. This offers a new strategy to study language comprehension.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19757
• PDF: https://arxiv.org/pdf/2511.19757

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LanguageUnderstanding #CognitiveScience #Neuroscience #MentalModels #NLP
This media is not supported in your browser
VIEW IN TELEGRAM
NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

📝 Summary:
NAF upsamples Vision Foundation Model features zero-shot by learning adaptive spatial-and-content weights. It outperforms VFM-specific upsamplers without retraining, achieving state-of-the-art performance across various tasks efficiently.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18452
• PDF: https://arxiv.org/pdf/2511.18452
• Github: https://github.com/valeoai/NAF?tab=readme-ov-file

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ZeroShotLearning #ComputerVision #FeatureUpsampling #DeepLearning #AIResearch
🪙 +30.560$ with 300$ in a month of trading! We can teach you how to earn! FREE!

It was a challenge - a marathon 300$ to 30.000$ on trading, together with Lisa!

What is the essence of earning?: "Analyze and open a deal on the exchange, knowing where the currency rate will go. Lisa trades every day and posts signals on her channel for free."

🔹Start: $150
🔹 Goal: $20,000
🔹Period: 1.5 months.

Join and get started, there will be no second chance👇

https://news.1rj.ru/str/+L9_l-dxOJxI2ZGUy
https://news.1rj.ru/str/+L9_l-dxOJxI2ZGUy
https://news.1rj.ru/str/+L9_l-dxOJxI2ZGUy
2
FastVLM: Efficient Vision Encoding for Vision Language Models

📝 Summary:
FastVLM optimizes Vision Language Models for high-resolution images using FastViTHD. It reduces encoding latency and visual tokens by scaling input, significantly improving time-to-first-token up to 85x while maintaining performance.

🔹 Publication Date: Published on Dec 17, 2024

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/fastvlm-efficient-vision-encoding-for-vision-language-models
• PDF: https://arxiv.org/pdf/2412.13303
• Github: https://github.com/apple/ml-fastvlm

🔹 Models citing this paper:
https://huggingface.co/apple/FastVLM-0.5B
https://huggingface.co/apple/FastVLM-7B
https://huggingface.co/onnx-community/FastVLM-0.5B-ONNX

Spaces citing this paper:
https://huggingface.co/spaces/jairwaal/image
https://huggingface.co/spaces/akhaliq/FastVLM-0.5B-gradio
https://huggingface.co/spaces/akhaliq/FastVLM-7B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#FastVLM #VLM #AI #MachineLearning #ComputerVision
1
Media is too big
VIEW IN TELEGRAM
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

📝 Summary:
ENACT is a benchmark evaluating embodied cognition in vision-language models through egocentric world modeling tasks. It reveals a performance gap between VLMs and humans that widens with interaction, and models exhibit anthropocentric biases.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20937
• PDF: https://arxiv.org/pdf/2511.20937

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#EmbodiedCognition #VisionLanguageModels #AIResearch #WorldModeling #CognitiveScience
1
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

📝 Summary:
GigaBrain-0 is a VLA model that uses world model-generated data to overcome limitations of real robot data, improving cross-task generalization and policy robustness. This boosts real-world performance on complex manipulation tasks.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19430
• PDF: https://arxiv.org/pdf/2510.19430
• Project Page: https://gigabrain0.github.io/
• Github: https://github.com/open-gigaai/giga-brain-0

🔹 Models citing this paper:
https://huggingface.co/open-gigaai/GigaBrain-0-3.5B-Base

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VLAModels #WorldModels #Robotics #AI #MachineLearning
2
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

📝 Summary:
DocETL is an agent-based system that optimizes complex document processing pipelines to significantly improve LLM accuracy. It uses logical rewriting and agent-guided evaluation to achieve 1.34 to 4.6 times higher quality outputs than current baselines.

🔹 Publication Date: Published on Oct 16, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.12189
• PDF: https://arxiv.org/pdf/2410.12189
• Github: https://github.com/ucbepic/docetl

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #DocumentProcessing #AgentSystems #NaturalLanguageProcessing
Vidi: Large Multimodal Models for Video Understanding and Editing

📝 Summary:
Vidi is a family of Large Multimodal Models for video understanding and editing, excelling at temporal retrieval in long, multimodal videos. It significantly outperforms proprietary models like GPT-4o on the new VUE-TR benchmark, which supports hour-long videos and audio queries.

🔹 Publication Date: Published on Apr 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2504.15681
• PDF: https://arxiv.org/pdf/2504.15681
• Project Page: https://bytedance.github.io/vidi-website/
• Github: https://github.com/bytedance/vidi

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LMMs #VideoAI #MultimodalAI #AIResearch #DeepLearning
4
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

📝 Summary:
PPTAgent improves presentation generation with a two-stage approach that analyzes reference presentations to ensure structural and content consistency. It outperforms traditional methods across content, design, and coherence.

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.03936
• PDF: https://arxiv.org/pdf/2501.03936
• Github: https://github.com/icip-cas/PPTAgent

Datasets citing this paper:
https://huggingface.co/datasets/Forceless/Zenodo10K

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AIPresentations #GenerativeAI #MachineLearning #NLP #TechResearch