ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

📝 Summary:
This study challenges the understanding of Distribution Matching Distillation DMD for text-to-image generation. It reveals that CFG Augmentation is the primary driver of few-step distillation, while distribution matching acts as a regularizer. This new insight enables improved distillation method...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22677
• PDF: https://arxiv.org/pdf/2511.22677
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image/tree/main

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TextToImage #GenerativeAI #DiffusionModels #ModelDistillation #AIResearch
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

📝 Summary:
FedRE is a federated learning framework for model-heterogeneous environments. Clients create and upload entangled representations and entangled-label encodings to train a global classifier. This method enhances performance, protects privacy, and reduces communication overhead.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22265
• PDF: https://arxiv.org/pdf/2511.22265

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#FederatedLearning #MachineLearning #AI #PrivacyPreservingAI #RepresentationLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Vision Bridge Transformer at Scale

📝 Summary:
Vision Bridge Transformer ViBT is a large-scale model for conditional generation. It efficiently translates data by directly modeling input-to-output trajectories, unlike diffusion models. ViBT scales to billions of parameters, achieving robust performance in image and video editing tasks.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23199
• PDF: https://arxiv.org/pdf/2511.23199
• Project Page: https://yuanshi9815.github.io/ViBT_homepage/
• Github: https://github.com/Yuanshi9815/ViBT

Spaces citing this paper:
https://huggingface.co/spaces/Yuanshi/ViBT

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionTransformer #GenerativeAI #ComputerVision #DeepLearning #AI
OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

📝 Summary:
OralGPT-Omni is the first dental MLLM for comprehensive image analysis, using TRACE-CoT reasoning. It introduces the MMOral-Uni benchmark and dramatically outperforms GPT-5, advancing intelligent dentistry.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22055
• PDF: https://arxiv.org/pdf/2511.22055
• Github: https://github.com/isbrycee/OralGPT

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DentalAI #MLLM #GenerativeAI #HealthcareTech #MedicalImaging
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models

📝 Summary:
LVLMs struggle to preserve cultural identities in mixed visual scenes. Researchers created CultureMix, a VQA benchmark, finding consistent failures and background reliance. Supervised fine-tuning with diverse culture mixing data significantly improves model consistency and reduces background sens...

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22787
• PDF: https://arxiv.org/pdf/2511.22787

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VisionLanguageModels #CulturalAI #ComputerVision #AIML #AIResearch
1
RefineBench: Evaluating Refinement Capability of Language Models via Checklists

📝 Summary:
RefineBench evaluates language models' self-refinement and guided refinement capabilities using 1,000 problems and a checklist. It finds that LMs perform poorly at self-refinement, often failing to improve without guidance, but excel at guided refinement with targeted feedback.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22173
• PDF: https://arxiv.org/pdf/2511.22173

Datasets citing this paper:
https://huggingface.co/datasets/RefineBench/RefineBench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #NLP #ModelEvaluation #Refinement
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

📝 Summary:
MLLMs struggle with human cognitive perception of images like memorability or aesthetics. CogIP-Bench evaluates this gap, showing post-training significantly improves alignment. This enhances human-like perception and improves creative AI tasks.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22805
• PDF: https://arxiv.org/pdf/2511.22805
• Project Page: https://follen-cry.github.io/MLLM-Cognition-project-page/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MLLM #CognitiveAI #ImagePerception #AIAlignment #AIResearch
Adversarial Flow Models

📝 Summary:
Adversarial flow models unify adversarial and flow-based generative models for stable training and efficient one-step generation. They learn a deterministic noise-to-data mapping, achieving record FIDs of 1.94 on ImageNet-256px with a single pass, outperforming consistency models.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22475
• PDF: https://arxiv.org/pdf/2511.22475

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GenerativeAI #DeepLearning #AdversarialModels #FlowModels #ImageSynthesis
1
Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets

📝 Summary:
This paper introduces a cluster-based frame selection strategy for video datasets. It groups similar frames to prevent information leakage and create more balanced and reliable dataset partitions for training, validation, and testing.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13944
• PDF: https://arxiv.org/pdf/2511.13944

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoDatasets #DataLeakage #MachineLearning #Clustering #DatasetSplitting
YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection

📝 Summary:
A new Mixture-of-Experts framework uses adaptive routing among multiple YOLOv9-T experts. This improves object detection performance, achieving higher mAP and AR.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13344
• PDF: https://arxiv.org/pdf/2511.13344

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ObjectDetection #YOLO #MixtureOfExperts #DeepLearning #ComputerVision
Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models

📝 Summary:
This paper introduces a dual-backbone framework combining convolutional and transformer representations with top-k pooling to detect abnormal events in surveillance videos. The weakly supervised model achieved 90.7% AUC on the UCF-Crime dataset.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13276
• PDF: https://arxiv.org/pdf/2511.13276

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ComputerVision #DeepLearning #Surveillance #AnomalyDetection #WeaklySupervisedLearning
1
CaptionQA: Is Your Caption as Useful as the Image Itself?

📝 Summary:
CaptionQA assesses if AI captions adequately substitute images for downstream tasks. This benchmark uses over 33000 visual questions across 4 domains. It shows large utility gaps as MLLMs perform up to 32% worse with captions than with images.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21025
• PDF: https://arxiv.org/pdf/2511.21025
• Github: https://github.com/bronyayang/CaptionQA

Datasets citing this paper:
https://huggingface.co/datasets/Borise/CaptionQA

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AICaptions #MultimodalAI #ComputerVision #AIevaluation #NLP
Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

📝 Summary:
Fast3Dcache accelerates 3D diffusion model inference using a training-free geometry-aware caching framework. It uses dynamic cache quotas and spatiotemporal stability criteria to reuse computations, achieving significant speed-up and FLOPs reduction with minimal geometric quality degradation.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22533
• PDF: https://arxiv.org/pdf/2511.22533
• Project Page: https://fast3dcache-agi.github.io/
• Github: https://fast3dcache-agi.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#3DGeometry #DiffusionModels #ComputerVision #DeepLearning #ComputationalEfficiency
🔥1
OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

📝 Summary:
OmniRefiner enhances reference-guided image generation by overcoming fine detail loss. It uses a two-stage framework: a fine-tuned diffusion editor for global coherence, then reinforcement learning for localized detail accuracy. This significantly improves detail preservation and consistency.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19990
• PDF: https://arxiv.org/pdf/2511.19990
• Github: https://github.com/yaoliliu/OmniRefiner

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DiffusionModels #ImageGeneration #ReinforcementLearning #GenerativeAI #ComputerVision
👍1
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

📝 Summary:
DeepSeekMath-V2 trains a self-verifying LLM for theorem proving. It uses a verifier as a reward model to incentivize rigorous, step-by-step derivations and issue resolution in proofs. This approach achieves gold-level scores in major math competitions.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22570
• PDF: https://arxiv.org/pdf/2511.22570

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DeepSeekMath #LLM #AI #MathematicalReasoning #TheoremProving
This media is not supported in your browser
VIEW IN TELEGRAM
Layer-Aware Video Composition via Split-then-Merge

📝 Summary:
Split-then-Merge is a novel framework improving generative video composition. It learns dynamic foreground-background interactions by unsupervisedly splitting unlabeled videos into layers and then self-composing them. This approach achieves state-of-the-art performance and addresses data scarcity.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20809
• PDF: https://arxiv.org/pdf/2511.20809
• Project Page: https://split-then-merge.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#VideoComposition #GenerativeAI #ComputerVision #DeepLearning #AIResearch
👍1
MRI Super-Resolution with Deep Learning: A Comprehensive Survey

📝 Summary:
This survey comprehensively reviews deep learning methods for MRI super-resolution, enabling high-resolution imaging from low-resolution scans. It categorizes techniques, discusses challenges, and provides valuable resources for the community.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16854
• PDF: https://arxiv.org/pdf/2511.16854
• Github: https://github.com/mkhateri/Awesome-MRI-Super-Resolution

🔹 Models citing this paper:
https://huggingface.co/mkhateri/Awesome-MRI-Super-Resolution

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DeepLearning #MRI #SuperResolution #MedicalImaging #AI
1
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

📝 Summary:
This study optimizes small language models for real-device latency by identifying key architectural factors and efficient operators. It introduces Nemotron-Flash, a new family of hybrid SLMs that significantly improves accuracy, latency, and throughput compared to current models.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2511.18890
• PDF: https://arxiv.org/pdf/2511.18890

🔹 Models citing this paper:
https://huggingface.co/nvidia/Nemotron-Flash-3B-Instruct
https://huggingface.co/nvidia/Nemotron-Flash-1B
https://huggingface.co/nvidia/Nemotron-Flash-3B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SmallLanguageModels #LatencyOptimization #AI #DeepLearning #NLP
1
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

📝 Summary:
Xmodel-2.5 is a 1.3B language model designed for efficient edge deployments. It uses maximal-update parameterization and a novel training curriculum that switches from AdamW to Muon, improving reasoning skills by 4.58% while maintaining efficiency.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19496
• PDF: https://arxiv.org/pdf/2511.19496
• Github: https://github.com/XiaoduoAILab/Xmodel-2.5

🔹 Models citing this paper:
https://huggingface.co/XiaoduoAILab/Xmodel-2.5

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SLM #EdgeAI #LanguageModels #DeepLearning #ReasoningAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
Geometrically-Constrained Agent for Spatial Reasoning

📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
1