ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Bolmo: Byteifying the Next Generation of Language Models

📝 Summary:
Bolmo introduces competitive byte-level language models by efficiently converting existing subword models. This byteification overcomes subword limitations, matching performance with minimal training. Bolmo makes byte-level LMs practical.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15586
• PDF: https://arxiv.org/pdf/2512.15586

🔹 Models citing this paper:
https://huggingface.co/allenai/Bolmo-7B
https://huggingface.co/allenai/Bolmo-1B

Datasets citing this paper:
https://huggingface.co/datasets/allenai/bolmo_mix

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LanguageModels #ByteLevelLMs #NLP #DeepLearning #AIResearch
1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

📝 Summary:
DataFlow is an LLM-driven framework for unified, high-quality data preparation. It automates pipeline generation from natural language, significantly boosting LLM performance across diverse tasks like math, code, and text. DataFlow ensures reproducible data and provides a scalable foundation for AI.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16676
• PDF: https://arxiv.org/pdf/2512.16676
• Project Page: https://github.com/OpenDCAI/DataFlow
• Github: https://github.com/OpenDCAI/DataFlow

Datasets citing this paper:
https://huggingface.co/datasets/OpenDCAI/dataflow-demo-code
https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text2SQL
https://huggingface.co/datasets/OpenDCAI/dataflow-instruct-10k

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #DataPreparation #DataCentricAI #WorkflowAutomation #AIResearch
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

📝 Summary:
LLMs poorly estimate human cognitive difficulty for educational tasks. Scaling models does not improve alignment with humans; they converge to a machine consensus and fail to simulate student struggles or show introspection.

🔹 Publication Date: Published on Dec 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18880
• PDF: https://arxiv.org/pdf/2512.18880
• Github: https://github.com/MingLiiii/Difficulty_Alignment

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLM #EducationalAI #ItemDifficulty #HumanAIAlignment #AIResearch
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

📝 Summary:
The Prism Hypothesis posits semantic encoders capture low-frequency meaning, while pixel encoders retain high-frequency details. Unified Autoencoding UAE leverages this with a frequency-band modulator to harmonize both into a single latent space. This achieves state-of-the-art performance on imag...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19693
• PDF: https://arxiv.org/pdf/2512.19693
• Github: https://github.com/WeichenFan/UAE

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#DeepLearning #ComputerVision #Autoencoders #RepresentationLearning #AIResearch
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

📝 Summary:
GenEnv, a framework using a co-evolutionary game with a generative environment simulator, enhances LLM agent performance by 40.3% over 7B baselines and uses less data than offline augmentation. AI-gen...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19682
• PDF: https://arxiv.org/pdf/2512.19682
• Github: https://github.com/Gen-Verse/GenEnv

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
StoryMem: Multi-shot Long Video Storytelling with Memory

📝 Summary:
StoryMem enhances multi-shot video generation with cinematic quality and long-range consistency using a memory bank and pre-trained single-shot video diffusion models. AI-generated summary Visual stor...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19539
• PDF: https://arxiv.org/pdf/2512.19539

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

📝 Summary:
MobileWorld, a more challenging benchmark than AndroidWorld, includes diverse real-world mobile tasks and interactions, revealing significant gaps in current model capabilities. AI-generated summary A...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19432
• PDF: https://arxiv.org/pdf/2512.19432

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Name That Part: 3D Part Segmentation and Naming

📝 Summary:
ALIGN-Parts addresses semantic 3D part segmentation by aligning implicit 3D part representations with part denoscriptions using geometric, appearance, and semantic cues, supporting open-vocabulary part ...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18003
• PDF: https://arxiv.org/pdf/2512.18003
• Project Page: https://name-that-part.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

📝 Summary:
QuCo-RAG uses objective corpus statistics to mitigate hallucinations in large language models during generation, improving accuracy across various benchmarks. AI-generated summary Dynamic Retrieval-Au...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19134
• PDF: https://arxiv.org/pdf/2512.19134

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Region-Constraint In-Context Generation for Instructional Video Editing

📝 Summary:
ReCo is a novel instructional video editing paradigm that enhances accuracy and reduces token interference by incorporating constraint modeling and regularization techniques during in-context generati...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17650
• PDF: https://arxiv.org/pdf/2512.17650
• Project Page: https://zhw-zhang.github.io/ReCo-page/
• Github: https://github.com/HiDream-ai/ReCo

Datasets citing this paper:
https://huggingface.co/datasets/HiDream-ai/ReCo-Data

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

📝 Summary:
WorldWarp addresses the challenge of generating consistent long-range videos by integrating a 3D geometric cache with a spatio-temporal diffusion model, ensuring structural consistency and textural re...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19678
• PDF: https://arxiv.org/pdf/2512.19678
• Project Page: https://hyokong.github.io/worldwarp-page/
• Github: https://hyokong.github.io/worldwarp-page/

🔹 Models citing this paper:
https://huggingface.co/imsuperkong/worldwarp

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

📝 Summary:
A framework called Real2Edit2Real generates new manipulation demonstrations by using 3D reconstruction, editing, and video synthesis, improving data efficiency in robot learning. AI-generated summary ...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19402
• PDF: https://arxiv.org/pdf/2512.19402
• Github: https://real2edit2real.github.io/

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

📝 Summary:
Reasoning Palette enhances large language models by using a latent-modulation framework to guide internal planning and improve both inference and reinforcement learning performance. AI-generated summa...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17206
• PDF: https://arxiv.org/pdf/2512.17206

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

📝 Summary:
LoGoPlanner is an end-to-end navigation framework integrating localization, scene geometry, and policy conditioning. It provides implicit state estimation and dense environmental awareness, improving obstacle avoidance and outperforming oracle-localization baselines by over 27 percent.

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19629
• PDF: https://arxiv.org/pdf/2512.19629

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

📝 Summary:
InfCam generates high-fidelity videos with accurate camera poses by using infinite homography warping and augmenting synthetic datasets with diverse trajectories. AI-generated summary Recent progress ...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.17040
• PDF: https://arxiv.org/pdf/2512.17040
• Project Page: https://emjay73.github.io/InfCam/
• Github: https://github.com/emjay73/InfCam

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

📝 Summary:
IPC is an unsupervised framework that uses internal probing of large language models to generate code without labeled datasets, achieving competitive performance with reduced resource dependency. AI-g...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17385
• PDF: https://arxiv.org/pdf/2512.17385

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Brain-Grounded Axes for Reading and Steering LLM States

📝 Summary:
Neurophysiological brain activity is used to create interpretable axes for large language models, enhancing their controllability and interpretability. AI-generated summary Interpretability methods fo...

🔹 Publication Date: Published on Dec 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.19399
• PDF: https://arxiv.org/pdf/2512.19399
• Github: https://github.com/sandroandric/Brain-Grounded-Axes-for-Reading-and-Steering-LLM-States

Spaces citing this paper:
https://huggingface.co/spaces/AI-nthusiast/cognitive-proxy

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives

📝 Summary:
This study explores syllogistic reasoning in LLMs, examining both symbolic inference and natural language understanding. Some models achieve perfect symbolic performance, leading to questions about whether LLMs are becoming more formal reasoning mechanisms.

🔹 Publication Date: Published on Dec 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12620
• PDF: https://arxiv.org/pdf/2512.12620
• Github: https://github.com/XAheli/Logic-in-LLMs

==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT

#LLMs #SyllogisticReasoning #NaturalLanguageProcessing #AIResearch #FormalLogic
🔥 NEW YEAR 2026 – PREMIUM

nature papers: 400$

Q1 and  Q2 papers    300$

Q3 and Q4 papers   200$

Doctoral thesis (complete)    500$

M.S thesis         300$

paper simulation   150$

Contact me: @Omidyzd62
Please open Telegram to view this post
VIEW IN TELEGRAM
ML Research Hub pinned «🔥 NEW YEAR 2026 – PREMIUM nature papers: 400$ Q1 and  Q2 papers    300$ Q3 and Q4 papers   200$ Doctoral thesis (complete)    500$ M.S thesis         300$ paper simulation   150$ Contact me: @Omidyzd62»