NEW BOT Телеграм, страница

ML Research Hub

⭐️

Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph

🖥

Github: https://github.com/dosonleung/fasttog

📕

Paper: https://arxiv.org/abs/2501.14300v1

https://news.1rj.ru/str/DataScienceT

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

2.48K views08:07

ML Research Hub

🔥🔥🔥 SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!

Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!

You can now train any of the SmolVLMs or create your own VLMs!

Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh

▪ Code: https://github.com/huggingface/smollm/tree/main/vision
▪ SmolVLM: https://github.com/huggingface/smollm/tree/main

#SmolVLM #llm #opensource #ml #ai

👍3

1.85K viewsedited 06:47

ML Research Hub

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Paper: https://arxiv.org/pdf/2501.17811v1.pdf

Code: https://github.com/deepseek-ai/janus

DataSets: #ImageNet - GQA - MM-Vet

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

❤1

1.88K viewsedited 07:23

ML Research Hub

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper: https://arxiv.org/pdf/2401.02954v1.pdf

Code: https://github.com/deepseek-ai/deepseek-llm

Dataset: AlignBench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

👍1

1.95K viewsedited 07:25

ML Research Hub

DeepSeek-VL: Towards Real-World Vision-Language Understanding

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.

Paper: https://arxiv.org/pdf/2403.05525v2.pdf

Code: https://github.com/deepseek-ai/deepseek-vl

Datasets: MMLU - GSM8K- HellaSwag

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

👍1

2.28K viewsedited 07:48

ML Research Hub

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

🖥

Github: https://github.com/penfever/wildchat-50m

📕

Paper: https://arxiv.org/abs/2501.18511v1

🧠

Dataset: https://huggingface.co/collections/nyu-dice-lab/wildchat-50m-679a5df2c5967db8ab341ab7

Please open Telegram to view this post

VIEW IN TELEGRAM

👍2❤1

2.19K viewsedited 16:44

ML Research Hub

PaSa: An LLM Agent for Comprehensive Academic Paper Search

We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.

Paper: https://arxiv.org/pdf/2501.10120v1.pdf

Code: https://github.com/bytedance/pasa

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

👍3❤1

2.17K viewsedited 06:54

ML Research Hub

⚡ LitGPT

20+ productive LLMs written from scratch, with detailed denoscriptions, instructions, fine-tuning and deployment.

Peculiarities :
🟢 Models are written from scratch
🟢 No abstractions
🟢 Suitable for teaching beginners
🟢 Flash attention
🟢 FSDP
🟢 LoRA, QLoRA, Adapter
🟢 Reduce GPU memory (fp4/8/16/32)
🟢 1-1000+ GPUs/TPUs
🟢 20+ LLMs

Installation:

pip install 'litgpt[all]'

Example :

from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the family goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.

▪ Github
▪Docs
▪ Video

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍6❤3

1.91K views13:17

ML Research Hub

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant challenge, as traditional methods fail to capture the complexity and variability of real-world interactions. We introduce IntellAgent, a scalable, open-source multi-agent framework designed to evaluate conversational AI systems comprehensively. IntellAgent automates the creation of diverse, synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. This innovative approach provides fine-grained diagnostics, addressing the limitations of static and manually curated benchmarks with coarse-grained metrics. IntellAgent represents a paradigm shift in evaluating conversational AI. By simulating realistic, multi-policy scenarios across varying levels of complexity, IntellAgent captures the nuanced interplay of agent capabilities and policy constraints. Unlike traditional methods, it employs a graph-based policy model to represent relationships, likelihoods, and complexities of policy interactions, enabling highly detailed diagnostics. IntellAgent also identifies critical performance gaps, offering actionable insights for targeted optimization. Its modular, open-source design supports seamless integration of new domains, policies, and APIs, fostering reproducibility and community collaboration. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.

Paper: https://arxiv.org/pdf/2501.11067v1.pdf

Code: https://github.com/plurai-ai/intellagent

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

👍3❤2

1.92K viewsedited 18:43

ML Research Hub

A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks, yet their application to specialized domains remains challenging due to the need for deep expertise. Retrieval-augmented generation (RAG) has emerged as a promising solution to customize LLMs for professional fields by seamlessly integrating external knowledge bases, enabling real-time access to domain-specific expertise during inference. Despite its potential, traditional RAG systems, based on flat text retrieval, face three critical challenges: (i) complex query understanding in professional contexts, (ii) difficulties in knowledge integration across distributed sources, and (iii) system efficiency bottlenecks at scale. This survey presents a systematic analysis of Graph-based Retrieval-Augmented Generation (GraphRAG), a new paradigm that revolutionizes domain-specific LLM applications. GraphRAG addresses traditional RAG limitations through three key innovations: (i) graph-structured knowledge representation that explicitly captures entity relationships and domain hierarchies, (ii) efficient graph-based retrieval techniques that enable context-preserving knowledge retrieval with multihop reasoning ability, and (iii) structure-aware knowledge integration algorithms that leverage retrieved knowledge for accurate and logical coherent generation of LLMs. In this survey, we systematically analyze the technical foundations of GraphRAG and examine current implementations across various professional domains, identifying key technical challenges and promising research directions.

Paper: https://arxiv.org/pdf/2501.13958v1.pdf

Code: https://github.com/deep-polyu/awesome-graphrag

Datasets: DBpedia - MetaQA - MINTAKA

👍4

2.13K viewsedited 18:50

ML Research Hub

LLMs can see and hear without any training

30 Jan 2025 · Kumar Ashutosh, Yossi Gandelsman, Xinlei Chen, Ishan Misra, Rohit Girdhar ·

We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate ability to perform multi-step reasoning, MILS prompts the LLM to generate candidate outputs, each of which are scored and fed back iteratively, eventually generating a solution to the task. This enables various applications that typically require training specialized models on task-specific data. In particular, we establish a new state-of-the-art on emergent zero-shot image, video and audio captioning. MILS seamlessly applies to media generation as well, discovering prompt rewrites to improve text-to-image generation, and even edit prompts for style transfer! Finally, being a gradient-free optimization approach, MILS can invert multimodal embeddings into text, enabling applications like cross-modal arithmetic.

Paper: https://arxiv.org/pdf/2501.18096v1.pdf

Code: https://github.com/facebookresearch/mils

❤4👍2

2.14K viewsedited 06:59

ML Research Hub

🌟 Mixture-of-Mamba: A Method to Increase MMLM Efficiency.

Mixture-of-Mamba is an experimental architecture that makes multimodal models (those that work on different types of data, such as text, images, and speech) more efficient and faster. It uses the idea of sparsity to reduce the amount of computation while maintaining high model performance.

Mixture-of-Mamba adds modality-aware sparsity to Mamba blocks and dynamically selects modality-specific weights in each input processing component of Mamba blocks.

Unlike MoE-Mamba, where sparsity is applied only to MLP layers, Mixture-of-Mamba modifies the Mamba block structure directly. Modality-specific parameterization is applied to the input projection, intermediate and output projections. Convolutional layers and state transitions remain shared.

Mixture-of-Mamba is trained in 3 modal modes: Transfusion (alternating text and continuous image tokens with diffusion loss), Chameleon (alternating text and discrete image tokens), and an extended tri-modal environment with speech inclusion.

In Transfusion, Mixture-of-Mamba achieves equivalent image loss while using only 34.76% of the total compute resources (FLOPs) at 1.4B model scale. In the Chameleon scenario, it achieves equivalent image loss while using 42.50% of the FLOPs and text loss while using 65.40% of the FLOPs. In the trimodal environment, Mixture-of-Mamba achieves speech loss while using 24.80% of the FLOPs at 1.4B model scale.

▶️ A practical implementation of the architecture is available in the project's Github repository .

📌 Licensing: MIT License.

🟡

Arxiv

🖥

GitHub

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

❤4👍2

1.63K views11:03

ML Research Hub

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

28 Jan 2025 · Li Yin, Zhangyang Wang ·

Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows. Yet, prompt engineering -- the task of crafting textual inputs to effectively direct LLMs -- remains difficult and labor-intensive, particularly for complex pipelines that combine multiple LLM calls with functional operations like retrieval and data formatting. We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE) that extends textual gradient-based methods (such as Text-Grad) to multi-component, potentially cyclic LLM architectures. Implemented within the AdalFlow library, LLM-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine LLM to generate feedback-akin to textual gradients -- that guide iterative prompt updates. Unlike prior single-node approaches, LLM-AutoDiff inherently accommodates functional nodes, preserves time-sequential behavior in repeated calls (e.g., multi-hop loops), and combats the "lost-in-the-middle" problem by isolating distinct sub-prompts (instructions, formats, or few-shot examples). It further boosts training efficiency by focusing on error-prone samples through selective gradient computation. Across diverse tasks, including single-step classification, multi-hop retrieval-based QA, and agent-driven pipelines, LLM-AutoDiff consistently outperforms existing textual gradient baselines in both accuracy and training cost. By unifying prompt optimization through a graph-centric lens, LLM-AutoDiff offers a powerful new paradigm for scaling and automating LLM workflows - mirroring the transformative role that automatic differentiation libraries have long played in neural network research.

Paper: https://arxiv.org/pdf/2501.16673v2.pdf

Code: https://github.com/sylphai-inc/adalflow

Dataset: HotpotQA

❤2🔥2👍1

1.71K views12:12

ML Research Hub

0:28

This media is not supported in your browser

VIEW IN TELEGRAM

⭐️

The first Open Source analogue of Deep Research from OpenAI.

Implementation of an AI researcher that continuously searches for information based on a user's request until the system is sure that it has collected all the necessary data.

To do this, he uses several services:

- SERPAPI : To perform a search on Google.
- Jina : To retrieve and extract the contents of web pages.
- OpenRouter (default model: anthropic/claude-3.5-haiku): Interacts with LLM to generate search queries, evaluate page relevance, and understand context.

🟢

Functions
- Iterative research cycle : The system iteratively refines its search queries.
- Asynchronous processing: Searching, web parsing and context evaluation are all performed in parallel for increased speed.
- Duplicate Filtering: Aggregates and deduplicates links on each cycle, ensuring that the same information is not processed twice.

▪️ Github
▪️ Google Colab

Please open Telegram to view this post

VIEW IN TELEGRAM

👍2❤1

1.8K viewsedited 14:38

ML Research Hub

CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation

31 Jan 2025 · Javier Solís-García, Belén Vega-Márquez, Juan A. Nepomuceno, Isabel A. Nepomuceno-Chamorro ·

Multivariate Time Series Imputation (MTSI) is crucial for many applications, such as healthcare monitoring and traffic management, where incomplete data can compromise decision-making. Existing state-of-the-art methods, like Denoising Diffusion Probabilistic Models (DDPMs), achieve high imputation accuracy; however, they suffer from significant computational costs and are notably time-consuming due to their iterative nature. In this work, we propose CoSTI, an innovative adaptation of Consistency Models (CMs) for the MTSI domain. CoSTI employs Consistency Training to achieve comparable imputation quality to DDPMs while drastically reducing inference times, making it more suitable for real-time applications. We evaluate CoSTI across multiple datasets and missing data scenarios, demonstrating up to a 98% reduction in imputation time with performance on par with diffusion-based models. This work bridges the gap between efficiency and accuracy in generative imputation tasks, providing a scalable solution for handling missing data in critical spatio-temporal systems.

Paper: https://arxiv.org/pdf/2501.19364v1.pdf

Code: https://github.com/javiersgjavi/costi

👍1🔥1

2.07K views16:13

ML Research Hub

RIGNO: A Graph-based framework for robust and accurate operator learning for PDEs on arbitrary domains

31 Jan 2025 · Sepehr Mousavi, Shizheng Wen, Levi Lingsch, Maximilian Herde, Bogdan Raonić, Siddhartha Mishra ·

Learning the solution operators of PDEs on arbitrary domains is challenging due to the diversity of possible domain shapes, in addition to the often intricate underlying physics. We propose an end-to-end graph neural network (GNN) based neural operator to learn PDE solution operators from data on point clouds in arbitrary domains. Our multi-scale model maps data between input/output point clouds by passing it through a downsampled regional mesh. Many novel elements are also incorporated to ensure resolution invariance and temporal continuity. Our model, termed RIGNO, is tested on a challenging suite of benchmarks, composed of various time-dependent and steady PDEs defined on a diverse set of domains. We demonstrate that RIGNO is significantly more accurate than neural operator baselines and robustly generalizes to unseen spatial resolutions and time instances.

Paper: https://arxiv.org/pdf/2501.19205v1.pdf

Code: https://github.com/camlab-ethz/rigno

❤4👍3

2.55K views07:21

About

Blog

Apps

Platform