ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
ChatGPT Cheat Sheet for Business (2025).pdf
8 MB
ChatGPT Cheat Sheet for Business - DataCamp

Unlock the full potential of AI with our comprehensive ChatGPT Cheat Sheet for Business! Tailored specifically for professionals and entrepreneurs, this guide offers actionable insights on leveraging ChatGPT to streamline workflows, enhance customer interactions, and drive business growth. Whether you're a marketing specialist, project manager, or CEO, this cheat sheet is your go-to resource for mastering conversational AI.

From crafting compelling content to automating routine tasks, learn how to harness the power of ChatGPT in real-world business scenarios. With clear examples and step-by-step instructions, you’ll be able to integrate ChatGPT seamlessly into your operations, improving efficiency and innovation.

Don’t miss out on staying ahead of the competition by embracing the future of AI-driven solutions!

#ChatGPT #AIforBusiness #DataCamp #CheatSheet #ConversationalAI #BusinessGrowth #Automation #CustomerEngagement #ContentCreation #EfficiencyBoost #Innovation #FutureOfWork #TechTrends #AIInnovation #DigitalTransformation #BusinessSuccess

https://news.1rj.ru/str/CodeProgrammer ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training. Extensive experiments show that JanusFlow achieves comparable or superior performance to specialized models in their respective domains, while significantly outperforming existing unified approaches across standard benchmarks. This work represents a step toward more efficient and versatile vision-language models.

Paper: https://arxiv.org/pdf/2411.07975v1.pdf

Code: https://github.com/deepseek-ai/janus

Datasets: GQA MMBench MM-Vet SEED-Bench

https://news.1rj.ru/str/DataScienceT 💚
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
🐫Tülu 3 (what a name) 405B - ​​another release!

An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks

Scalable to 405B - ​​with performance on par with GPT-4o and outperforming previous models in the same class.

Blog: https://allenai.org/blog/tulu-3-405B
You can test it here: https://playground.allenai.org/?model=tulu3-405b
Technical report: https://allenai.org/blog/tulu-3-technical
Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

#llm #ml #ai #opensource

https://news.1rj.ru/str/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
⭐️ Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph

🖥 Github: https://github.com/dosonleung/fasttog

📕 Paper: https://arxiv.org/abs/2501.14300v1

https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥🔥🔥 SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!

Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!

You can now train any of the SmolVLMs or create your own VLMs!

Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh

Code: https://github.com/huggingface/smollm/tree/main/vision
SmolVLM: https://github.com/huggingface/smollm/tree/main

#SmolVLM #llm #opensource #ml #ai
👍3
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Paper: https://arxiv.org/pdf/2501.17811v1.pdf

Code: https://github.com/deepseek-ai/janus

DataSets: #ImageNet - GQA - MM-Vet

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek


https://news.1rj.ru/str/DataScienceT
1
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper: https://arxiv.org/pdf/2401.02954v1.pdf

Code: https://github.com/deepseek-ai/deepseek-llm

Dataset: AlignBench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek


https://news.1rj.ru/str/DataScienceT
👍1
DeepSeek-VL: Towards Real-World Vision-Language Understanding

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.

Paper: https://arxiv.org/pdf/2403.05525v2.pdf

Code: https://github.com/deepseek-ai/deepseek-vl

Datasets: MMLU - GSM8K- HellaSwag

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍1
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

🖥 Github: https://github.com/penfever/wildchat-50m

📕 Paper: https://arxiv.org/abs/2501.18511v1

🧠 Dataset: https://huggingface.co/collections/nyu-dice-lab/wildchat-50m-679a5df2c5967db8ab341ab7

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
PaSa: An LLM Agent for Comprehensive Academic Paper Search

We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.

Paper: https://arxiv.org/pdf/2501.10120v1.pdf

Code: https://github.com/bytedance/pasa

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍31
LitGPT

20+ productive LLMs written from scratch, with detailed denoscriptions, instructions, fine-tuning and deployment.

Peculiarities :
🟢 Models are written from scratch
🟢 No abstractions
🟢 Suitable for teaching beginners
🟢 Flash attention
🟢 FSDP
🟢 LoRA, QLoRA, Adapter
🟢 Reduce GPU memory (fp4/8/16/32)
🟢 1-1000+ GPUs/TPUs
🟢 20+ LLMs

Installation:
pip install 'litgpt[all]'

Example :
from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the family goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.


Github
Docs
Video

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍63
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant challenge, as traditional methods fail to capture the complexity and variability of real-world interactions. We introduce IntellAgent, a scalable, open-source multi-agent framework designed to evaluate conversational AI systems comprehensively. IntellAgent automates the creation of diverse, synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. This innovative approach provides fine-grained diagnostics, addressing the limitations of static and manually curated benchmarks with coarse-grained metrics. IntellAgent represents a paradigm shift in evaluating conversational AI. By simulating realistic, multi-policy scenarios across varying levels of complexity, IntellAgent captures the nuanced interplay of agent capabilities and policy constraints. Unlike traditional methods, it employs a graph-based policy model to represent relationships, likelihoods, and complexities of policy interactions, enabling highly detailed diagnostics. IntellAgent also identifies critical performance gaps, offering actionable insights for targeted optimization. Its modular, open-source design supports seamless integration of new domains, policies, and APIs, fostering reproducibility and community collaboration. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.

Paper: https://arxiv.org/pdf/2501.11067v1.pdf

Code: https://github.com/plurai-ai/intellagent

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍32
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks, yet their application to specialized domains remains challenging due to the need for deep expertise. Retrieval-augmented generation (RAG) has emerged as a promising solution to customize LLMs for professional fields by seamlessly integrating external knowledge bases, enabling real-time access to domain-specific expertise during inference. Despite its potential, traditional RAG systems, based on flat text retrieval, face three critical challenges: (i) complex query understanding in professional contexts, (ii) difficulties in knowledge integration across distributed sources, and (iii) system efficiency bottlenecks at scale. This survey presents a systematic analysis of Graph-based Retrieval-Augmented Generation (GraphRAG), a new paradigm that revolutionizes domain-specific LLM applications. GraphRAG addresses traditional RAG limitations through three key innovations: (i) graph-structured knowledge representation that explicitly captures entity relationships and domain hierarchies, (ii) efficient graph-based retrieval techniques that enable context-preserving knowledge retrieval with multihop reasoning ability, and (iii) structure-aware knowledge integration algorithms that leverage retrieved knowledge for accurate and logical coherent generation of LLMs. In this survey, we systematically analyze the technical foundations of GraphRAG and examine current implementations across various professional domains, identifying key technical challenges and promising research directions.

Paper: https://arxiv.org/pdf/2501.13958v1.pdf

Code: https://github.com/deep-polyu/awesome-graphrag

Datasets: DBpedia - MetaQA - MINTAKA

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍4
LLMs can see and hear without any training

30 Jan 2025 · Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar ·

We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.

Paper: https://arxiv.org/pdf/2501.18096v1.pdf

Code: https://github.com/facebookresearch/mils

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
4👍2
🌟 Mixture-of-Mamba: A Method to Increase MMLM Efficiency.

Mixture-of-Mamba is an experimental architecture that makes multimodal models (those that work on different types of data, such as text, images, and speech) more efficient and faster. It uses the idea of ​​sparsity to reduce the amount of computation while maintaining high model performance.

Sparsity is an approach where the model focuses only on the most important data, ignoring the less important ones. It is similar to how a person reads a text: we do not delve into every letter, but rather grasp key words and phrases. In ML, sparsity allows: to reduce computational costs, speed up training and inference, and improve quality.


Mixture-of-Mamba adds modality-aware sparsity to Mamba blocks and dynamically selects modality-specific weights in each input processing component of Mamba blocks.

Unlike MoE-Mamba, where sparsity is applied only to MLP layers, Mixture-of-Mamba modifies the Mamba block structure directly. Modality-specific parameterization is applied to the input projection, intermediate and output projections. Convolutional layers and state transitions remain shared.

Mixture-of-Mamba is trained in 3 modal modes: Transfusion (alternating text and continuous image tokens with diffusion loss), Chameleon (alternating text and discrete image tokens), and an extended tri-modal environment with speech inclusion.

In Transfusion, Mixture-of-Mamba achieves equivalent image loss while using only 34.76% of the total compute resources (FLOPs) at 1.4B model scale. In the Chameleon scenario, it achieves equivalent image loss while using 42.50% of the FLOPs and text loss while using 65.40% of the FLOPs. In the trimodal environment, Mixture-of-Mamba achieves speech loss while using 24.80% of the FLOPs at 1.4B model scale.

▶️ A practical implementation of the architecture is available in the project's Github repository .

📌 Licensing: MIT License.

🟡 Arxiv
🖥 GitHub

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
4👍2
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

28 Jan 2025 · Li Yin, Zhangyang Wang ·

Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows. Yet, prompt engineering -- the task of crafting textual inputs to effectively direct LLMs -- remains difficult and labor-intensive, particularly for complex pipelines that combine multiple LLM calls with functional operations like retrieval and data formatting. We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE) that extends textual gradient-based methods (such as Text-Grad) to multi-component, potentially cyclic LLM architectures. Implemented within the AdalFlow library, LLM-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine LLM to generate feedback-akin to textual gradients -- that guide iterative prompt updates. Unlike prior single-node approaches, LLM-AutoDiff inherently accommodates functional nodes, preserves time-sequential behavior in repeated calls (e.g., multi-hop loops), and combats the "lost-in-the-middle" problem by isolating distinct sub-prompts (instructions, formats, or few-shot examples). It further boosts training efficiency by focusing on error-prone samples through selective gradient computation. Across diverse tasks, including single-step classification, multi-hop retrieval-based QA, and agent-driven pipelines, LLM-AutoDiff consistently outperforms existing textual gradient baselines in both accuracy and training cost. By unifying prompt optimization through a graph-centric lens, LLM-AutoDiff offers a powerful new paradigm for scaling and automating LLM workflows - mirroring the transformative role that automatic differentiation libraries have long played in neural network research.

Paper: https://arxiv.org/pdf/2501.16673v2.pdf

Code: https://github.com/sylphai-inc/adalflow

Dataset: HotpotQA

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
2🔥2👍1