ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.01K photos
229 videos
23 files
4.32K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

📝 Summary:
Thinking-while-Generating TwiG interleaves textual reasoning throughout the visual generation process. This on-the-fly multimodal interaction guides and reflects on visual content as it is created, resulting in more context-aware and semantically rich outputs.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16671
• PDF: https://arxiv.org/pdf/2511.16671
• Project Page: https://think-while-gen.github.io/
• Github: https://github.com/ZiyuGuo99/Thinking-while-Generating

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#GenerativeAI #MultimodalAI #ComputerVision #NLP #AIResearch
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

📝 Summary:
Nemotron Elastic embeds multiple submodels within a single large language model, significantly reducing training costs by 360x compared to training separate models. This framework allows zero-shot extraction of optimized submodels for various deployment budgets without additional training or fine...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16664
• PDF: https://arxiv.org/pdf/2511.16664
• Project Page: https://huggingface.co/nvidia/Nemotron-Elastic-12B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #AI #MachineLearning #DeepLearning #EfficientAI
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

📝 Summary:
TimeViper is a hybrid Mamba-Transformer vision-language model for efficient long video understanding. It introduces a TransV module to compress redundant vision tokens into instruction tokens, enabling it to process over 10,000 frames. This achieves state-of-the-art performance while offering new...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16595
• PDF: https://arxiv.org/pdf/2511.16595
• Project Page: https://xuboshen.github.io/TimeViper/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TimeViper #VisionLanguageModels #VideoUnderstanding #MambaTransformer #DeepLearning
Media is too big
VIEW IN TELEGRAM
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

📝 Summary:
SAM2S is a foundation model enhancing interactive video object segmentation in surgery. It leverages a new large benchmark, robust memory, and temporal learning to achieve superior accuracy 80.42 J and F and real-time performance in surgical video analysis.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16618
• PDF: https://arxiv.org/pdf/2511.16618
• Project Page: https://jinlab-imvr.github.io/SAM2S
• Github: https://github.com/jinlab-imvr/SAM2S

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#SurgicalAI #MedicalImaging #ComputerVision #FoundationModels #DeepLearning
1
NaTex: Seamless Texture Generation as Latent Color Diffusion

📝 Summary:
NaTex directly generates 3D textures using latent color diffusion and geometry-aware models. It predicts texture color in 3D space, outperforming prior methods in coherence and alignment by avoiding 2D multi-view limitations.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16317
• PDF: https://arxiv.org/pdf/2511.16317
• Project Page: https://natex-ldm.github.io/
• Github: https://natex-ldm.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#TextureGeneration #DiffusionModels #3DGraphics #ComputerVision #DeepLearning
PartUV: Part-Based UV Unwrapping of 3D Meshes

📝 Summary:
PartUV is a novel UV unwrapping pipeline for noisy AI-generated 3D meshes. It uses part decomposition and geometric heuristics to generate significantly fewer, part-aligned charts with low distortion. PartUV outperforms existing methods in chart count and seam length on diverse datasets.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16659
• PDF: https://arxiv.org/pdf/2511.16659
• Project Page: https://www.zhaoningwang.com/PartUV/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#UVUnwrapping #3DMeshes #ComputerGraphics #GeometricProcessing #AI
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

📝 Summary:
TurkColBERT, the first benchmark for Turkish IR, shows late-interaction models significantly outperform dense encoders. They offer superior parameter efficiency, faster indexing, and better performance for Turkish retrieval tasks.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16528
• PDF: https://arxiv.org/pdf/2511.16528

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#InformationRetrieval #TurkishNLP #MachineLearning #DeepLearning #Benchmarking
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

📝 Summary:
SRPO is a VLA-RL framework that eliminates the need for expert demonstrations. It assigns progress-wise rewards to failed trajectories using latent world representations and the models own successes. This achieved 99.2% success on LIBERO, a significant improvement.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15605
• PDF: https://arxiv.org/pdf/2511.15605

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#ReinforcementLearning #VLAModels #PolicyOptimization #AIResearch #MachineLearning
Draft and Refine with Visual Experts

📝 Summary:
The Draft and Refine DnR framework improves visual grounding in LVLMs. It uses a novel question-conditioned utilization metric to measure visual evidence reliance. DnR refines responses with external visual experts, reducing hallucinations and boosting accuracy.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11005
• PDF: https://arxiv.org/pdf/2511.11005
• Github: https://github.com/EavnJeong/Draft-and-Refine-with-Visual-Experts

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LVLMs #VisualGrounding #AIHallucinations #ComputerVision #DeepLearning
🚀 THE 7-DAY PROFIT CHALLENGE! 🚀

Can you turn $100 into $5,000 in just 7 days?
Lisa can. And she’s challenging YOU to do the same. 👇

https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
1
BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks

📝 Summary:
ImageNet accuracy poorly predicts performance on scientific imagery. BioBench is a new ecology vision benchmark unifying diverse tasks, kingdoms, and modalities with 3.1M images, offering a better evaluation for scientific ML.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16315
• PDF: https://arxiv.org/pdf/2511.16315
• Project Page: https://samuelstevens.me/biobench
• Github: https://github.com/samuelstevens/biobench

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#BioBench #MachineLearning #ComputerVision #ScientificML #Ecology
1
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

📝 Summary:
EntroPIC stabilizes entropy during long-term LLM training by adaptively tuning loss coefficients with Proportional-Integral Control. This novel method ensures efficient exploration and prevents sub-optimal behaviors, leading to stable and optimal reinforcement learning for LLMs.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15248
• PDF: https://arxiv.org/pdf/2511.15248
• Project Page: https://huggingface.co/spaces/yangkaiSIGS/entropic
• Github: https://github.com/yk7333/EntroPIC

🔹 Models citing this paper:
https://huggingface.co/hunterbown/shannon-control-unit

Spaces citing this paper:
https://huggingface.co/spaces/yangkaiSIGS/entropic

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #MachineLearning #ReinforcementLearning #ControlTheory #DeepLearning
FinTRec: Transformer Based Unified Contextual Ads Targeting and Personalization for Financial Applications

📝 Summary:
FinTRec is a transformer-based framework for financial recommendation systems. It handles complex user interactions and multiple products, outperforming traditional tree models. This unified approach improves performance and reduces costs.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14865
• PDF: https://arxiv.org/pdf/2511.14865

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#FinTech #RecommendationSystems #Transformers #AI #MachineLearning
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

📝 Summary:
Lang1, a specialized clinical language model, significantly outperforms generalist models in predicting hospital operational metrics after supervised finetuning. This suggests that effective healthcare AI requires in-domain pretraining and finetuning for specialized tasks.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13703
• PDF: https://arxiv.org/pdf/2511.13703

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#HealthcareAI #ClinicalNLP #LLM #HospitalOperations #AIResearch
Boosting Medical Visual Understanding From Multi-Granular Language Learning

📝 Summary:
MGLL enhances visual understanding by improving multi-label and cross-granularity alignment in image-text pretraining, outperforming existing methods in complex domains like medical imaging.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15943
• PDF: https://arxiv.org/pdf/2511.15943
• Project Page: https://github.com/HUANGLIZI/MGLL
• Github: https://github.com/HUANGLIZI/MGLL

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MedicalAI #ComputerVision #DeepLearning #NLP #ImageTextPretraining
2
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

📝 Summary:
Agent0 is a self-evolving framework that trains LLM agents without human data. It uses two competing agents and tool integration in a multi-step co-evolution process. This significantly boosts reasoning capabilities, improving math by 18% and general reasoning by 24% on benchmarks.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16043
• PDF: https://arxiv.org/pdf/2511.16043
• Github: https://github.com/aiming-lab/Agent0

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMAgents #SelfEvolvingAI #ToolIntegration #AIResearch #Reasoning
🚀 THE 7-DAY PROFIT CHALLENGE! 🚀

Can you turn $100 into $5,000 in just 7 days?
Lisa can. And she’s challenging YOU to do the same. 👇

https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
https://news.1rj.ru/str/+AOPQVJRWlJc5ZGRi
MobiAgent: A Systematic Framework for Customizable Mobile Agents

📝 Summary:
MobiAgent is a comprehensive mobile agent system designed to improve real-world task execution accuracy and efficiency. It uses MobiMind models, the AgentRR framework, and MobiFlow benchmarking, plus an AI-assisted data collection pipeline. MobiAgent achieves state-of-the-art performance in mobil...

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00531
• PDF: https://arxiv.org/pdf/2509.00531
• Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk

🔹 Models citing this paper:
https://huggingface.co/IPADS-SAI/MobiMind-Grounder-3B
https://huggingface.co/IPADS-SAI/MobiMind-Decider-7B
https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MobileAgents #AI #DeepLearning #Robotics #Automation
1
Code2Video: A Code-centric Paradigm for Educational Video Generation

📝 Summary:
Code2Video is a code-centric agent framework generating educational videos via executable Python code. It uses three collaborative agents to improve coherence and interpretability, outperforming direct code generation by 40% and matching human-crafted tutorials.

🔹 Publication Date: Published on Oct 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.01174
• PDF: https://arxiv.org/pdf/2510.01174
• Project Page: https://showlab.github.io/Code2Video/
• Github: https://github.com/showlab/code2video

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #VideoGeneration #EducationalTech #CodeGeneration #DeepLearning
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics

📝 Summary:
Enterprise Deep Research EDR is a multi-agent system for automated report generation and real-time data analysis in enterprises. It integrates specialized agents, tools, and a reflection mechanism for adaptive research. EDR outperforms state-of-the-art systems on open benchmarks without human ste...

🔹 Publication Date: Published on Oct 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.17797
• PDF: https://arxiv.org/pdf/2510.17797
• Github: https://github.com/SalesforceAIResearch/enterprise-deep-research

Datasets citing this paper:
https://huggingface.co/datasets/Salesforce/EDR-200

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MultiAgentSystems #EnterpriseAI #DataAnalytics #AIResearch #AutomatedReporting
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

📝 Summary:
Hulu-Med is a transparent medical vision-language model unifying diverse data modalities like text, 2D/3D images, and video. It achieves state-of-the-art performance across 30 clinical benchmarks with efficient training, promoting accessible AI.

🔹 Publication Date: Published on Oct 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08668
• PDF: https://arxiv.org/pdf/2510.08668
• Github: https://github.com/ZJUI-AI4H/Hulu-Med

🔹 Models citing this paper:
https://huggingface.co/ZJU-AI4H/Hulu-Med-32B
https://huggingface.co/ZJU-AI4H/Hulu-Med-7B
https://huggingface.co/ZJU-AI4H/Hulu-Med-14B

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#MedicalAI #VisionLanguageModel #MultimodalAI #HealthcareAI #AIResearch