NEW BOT Телеграм, страница - 473204099

ML Research Hub

32.7K subscribers

4.01K photos

229 videos

23 files

4.32K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.7K subscribers

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

📝 Summary:
ENACT is a benchmark evaluating embodied cognition in vision-language models through egocentric world modeling tasks. It reveals a performance gap between VLMs and humans that widens with interaction, and models exhibit anthropocentric biases.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20937
• PDF: https://arxiv.org/pdf/2511.20937

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#EmbodiedCognition #VisionLanguageModels #AIResearch #WorldModeling #CognitiveScience

❤1

690 views18:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GigaBrain-0: A World Model-Powered Vision-Language-Action Model

📝 Summary:
GigaBrain-0 is a VLA model that uses world model-generated data to overcome limitations of real robot data, improving cross-task generalization and policy robustness. This boosts real-world performance on complex manipulation tasks.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19430
• PDF: https://arxiv.org/pdf/2510.19430
• Project Page: https://gigabrain0.github.io/
• Github: https://github.com/open-gigaai/giga-brain-0

🔹 Models citing this paper:
• https://huggingface.co/open-gigaai/GigaBrain-0-3.5B-Base

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VLAModels #WorldModels #Robotics #AI #MachineLearning

❤2

816 views01:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

📝 Summary:
DocETL is an agent-based system that optimizes complex document processing pipelines to significantly improve LLM accuracy. It uses logical rewriting and agent-guided evaluation to achieve 1.34 to 4.6 times higher quality outputs than current baselines.

🔹 Publication Date: Published on Oct 16, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.12189
• PDF: https://arxiv.org/pdf/2410.12189
• Github: https://github.com/ucbepic/docetl

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #AI #DocumentProcessing #AgentSystems #NaturalLanguageProcessing

828 views17:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Vidi: Large Multimodal Models for Video Understanding and Editing

📝 Summary:
Vidi is a family of Large Multimodal Models for video understanding and editing, excelling at temporal retrieval in long, multimodal videos. It significantly outperforms proprietary models like GPT-4o on the new VUE-TR benchmark, which supports hour-long videos and audio queries.

🔹 Publication Date: Published on Apr 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2504.15681
• PDF: https://arxiv.org/pdf/2504.15681
• Project Page: https://bytedance.github.io/vidi-website/
• Github: https://github.com/bytedance/vidi

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LMMs #VideoAI #MultimodalAI #AIResearch #DeepLearning

❤4

837 views04:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

📝 Summary:
PPTAgent improves presentation generation with a two-stage approach that analyzes reference presentations to ensure structural and content consistency. It outperforms traditional methods across content, design, and coherence.

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.03936
• PDF: https://arxiv.org/pdf/2501.03936
• Github: https://github.com/icip-cas/PPTAgent

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Forceless/Zenodo10K

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AIPresentations #GenerativeAI #MachineLearning #NLP #TechResearch

653 views16:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldVLA: Towards Autoregressive Action World Model

📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.

🔹 Publication Date: Published on Jun 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA

🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#AI #MachineLearning #Robotics #ComputerVision #WorldModels

❤1

580 views20:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning

❤1

340 views04:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DiP: Taming Diffusion Models in Pixel Space

📝 Summary:
DiP is an efficient pixel space diffusion framework addressing the quality-efficiency trade-off without VAEs. It combines a Diffusion Transformer for global structure and a Patch Detailer Head for local details, achieving high-quality images up to 10x faster.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18822
• PDF: https://arxiv.org/pdf/2511.18822

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DiffusionModels #GenerativeAI #ImageGeneration #DeepLearning #ComputerVision

236 views04:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Architecture Decoupling Is Not All You Need For Unified Multimodal Model

📝 Summary:
Unified multimodal models struggle with task conflicts. This paper introduces Attention Interaction Alignment AIA loss, which learns task-specific cross-modal attention patterns. AIA loss improves generation and understanding performance without model decoupling.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22663
• PDF: https://arxiv.org/pdf/2511.22663
• Project Page: https://zhengdian1.github.io/AIA-project/
• Github: https://github.com/zhengdian1/AIA

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalAI #DeepLearning #AttentionMechanisms #AIResearch #ArtificialIntelligence

235 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

📝 Summary:
DualVLA tackles action degeneration in VLAs by boosting action performance while retaining reasoning. It uses dual-layer data pruning and dual-teacher adaptive distillation. This balances precise action and multimodal understanding, leading to high success rates.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22134
• PDF: https://arxiv.org/pdf/2511.22134
• Project Page: https://costaliya.github.io/DualVLA/
• Github: https://costaliya.github.io/DualVLA/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#EmbodiedAI #VLAs #AIagents #DeepLearning #AIResearch

198 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

📝 Summary:
AnyTalker generates scalable multi-person talking videos using an identity-aware Diffusion Transformer. It trains mostly on single-person videos, refining interactivity with minimal multi-person data, achieving high lip sync and naturalness.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23475
• PDF: https://arxiv.org/pdf/2511.23475
• Project Page: https://hkust-c4g.github.io/AnyTalker-homepage/
• Github: https://github.com/HKUST-C4G/AnyTalker

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VideoGeneration #GenerativeAI #DiffusionModels #ComputerVision #DeepLearning

165 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

📝 Summary:
This paper introduces Hierarchical Sparse Attention HSA to enable Transformers to handle ultra-long contexts efficiently. The HSA-UltraLong model achieves over 90 percent accuracy on 16M token retrieval tasks, matching full attention on shorter contexts. It lays a foundation for future long conte...

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23319
• PDF: https://arxiv.org/pdf/2511.23319

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #LongContext #SparseAttention #Transformers #AIResearch

173 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Captain Safari: A World Engine

📝 Summary:
Captain Safari is a pose-conditioned world engine that generates high-quality, 3D-consistent long videos with precise camera paths. It uses a dynamic memory and retriever of pose-aligned world tokens to outperform existing methods in quality, consistency, and trajectory following.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22815
• PDF: https://arxiv.org/pdf/2511.22815
• Project Page: https://johnson111788.github.io/open-safari/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#GenerativeAI #3DVideo #ComputerVision #WorldEngine #AIResearch

155 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Test-time scaling of diffusions with flow maps

📝 Summary:
The Flow Map Trajectory Tilting FMTT algorithm enhances test-time diffusion models by using flow maps to align better with user rewards. This approach solves the ill-posed problem of reward gradients, achieving superior reward ascent for improved sampling and novel image editing.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22688
• PDF: https://arxiv.org/pdf/2511.22688
• Project Page: https://flow-map-trajectory-tilting.github.io/

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#DiffusionModels #GenerativeAI #ImageEditing #MachineLearning #FlowMaps

❤1

159 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨REASONEDIT: Towards Reasoning-Enhanced Image Editing Models

📝 Summary:
REASONEDIT integrates MLLM reasoning thinking and reflection into image editing models. This enables a thinking-editing-reflection loop, improving instruction understanding and editing accuracy by interpreting abstract instructions and correcting results. The approach achieves significant perform...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22625
• PDF: https://arxiv.org/pdf/2511.22625

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageEditing #AIReasoning #MLLM #ComputerVision #AI

166 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Collapse of Patches

📝 Summary:
Patch collapse is a novel image modeling perspective where observing certain patches reduces uncertainty in others. An autoencoder learns patch dependencies to determine an optimal realization order. This improves masked image modeling and promotes vision efficiency, achieving high accuracy with ...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22281
• PDF: https://arxiv.org/pdf/2511.22281
• Github: https://github.com/wguo-ai/CoP

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#ImageModeling #ComputerVision #Autoencoders #DeepLearning #MaskedImageModeling

172 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information

📝 Summary:
Focused Chain-of-Thought F-CoT is an input-centric method that improves LLM reasoning efficiency. It structures query information into a concise context, guiding models to focus reasoning. This approach reduces token usage by 2-3x while maintaining accuracy on arithmetic problems.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22176
• PDF: https://arxiv.org/pdf/2511.22176

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#LLM #ChainOfThought #AI #NLP #Efficiency

209 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SO-Bench: A Structural Output Evaluation of Multimodal LLMs

📝 Summary:
SO-Bench is a new benchmark evaluating MLLMs ability to generate schema-compliant structured outputs from visual inputs. It reveals significant gaps in current models performance, highlighting the need for better multimodal structured reasoning.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21750
• PDF: https://arxiv.org/pdf/2511.21750

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#MultimodalLLMs #StructuredOutput #LLMEvaluation #AIResearch #ComputerVision

227 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

📝 Summary:
This study challenges the understanding of Distribution Matching Distillation DMD for text-to-image generation. It reveals that CFG Augmentation is the primary driver of few-step distillation, while distribution matching acts as a regularizer. This new insight enables improved distillation method...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22677
• PDF: https://arxiv.org/pdf/2511.22677
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image/tree/main

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#TextToImage #GenerativeAI #DiffusionModels #ModelDistillation #AIResearch

247 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

📝 Summary:
FedRE is a federated learning framework for model-heterogeneous environments. Clients create and upload entangled representations and entangled-label encodings to train a global classifier. This method enhances performance, protects privacy, and reduces communication overhead.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22265
• PDF: https://arxiv.org/pdf/2511.22265

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#FederatedLearning #MachineLearning #AI #PrivacyPreservingAI #RepresentationLearning

244 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Vision Bridge Transformer at Scale

📝 Summary:
Vision Bridge Transformer ViBT is a large-scale model for conditional generation. It efficiently translates data by directly modeling input-to-output trajectories, unlike diffusion models. ViBT scales to billions of parameters, achieving robust performance in image and video editing tasks.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23199
• PDF: https://arxiv.org/pdf/2511.23199
• Project Page: https://yuanshi9815.github.io/ViBT_homepage/
• Github: https://github.com/Yuanshi9815/ViBT

✨ Spaces citing this paper:
• https://huggingface.co/spaces/Yuanshi/ViBT

==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

#VisionTransformer #GenerativeAI #ComputerVision #DeepLearning #AI

239 views05:04

✨ Explore Data Science 📝 Write your paper