ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap

Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large language models (LLMs) have achieved tremendous success and possess strong capabilities in modeling user and item information, providing new potential for cold-start recommendations. However, the research community on CSR still lacks a comprehensive review and reflection in this field. Based on this, in this paper, we stand in the context of the era of large language models and provide a comprehensive review and discussion on the roadmap, related literature, and future directions of CSR. Specifically, we have conducted an exploration of the development path of how existing CSR utilizes information, from content features, graph relations, and domain information, to the world knowledge possessed by large language models, aiming to provide new insights for both the research and industrial communities on CSR. Related resources of cold-start recommendations are collected and continuously updated for the community in https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation.

Paper: https://arxiv.org/pdf/2501.01945v2.pdf

Code: https://github.com/yuanchenbei/awesome-cold-start-recommendation

https://news.1rj.ru/str/DataScienceT 🩷
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced #LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of #LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence.

Paper: https://arxiv.org/pdf/2401.10034v3.pdf

Code: https://github.com/wuxingyu-ai/llm4ec

https://news.1rj.ru/str/DataScienceT ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
Forwarded from Learn Python Hub
Unlock a treasure trove of knowledge with our exclusive paid channel! For just $2 a month, gain access to thousands of valuable resources, including essential books and premium courses from Coursera and Udemy. Plus, dive into exciting paid projects! Enjoy hassle-free automatic payments via Telegram. Join us today! Link
👍2
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos. Unlike existing multi-modal large language models, which are often limited to specific modalities and tasks, Sa2VA supports a wide range of image and video tasks, including referring segmentation and conversation, with minimal one-shot instruction tuning. Sa2VA combines SAM-2, a foundation video segmentation model, with LLaVA, an advanced vision-language model, and unifies text, image, and video into a shared LLM token space. Using the LLM, Sa2VA generates instruction tokens that guide SAM-2 in producing precise masks, enabling a grounded, multi-modal understanding of both static and dynamic visual content. Additionally, we introduce Ref-SAV, an auto-labeled dataset containing over 72k object expressions in complex video scenes, designed to boost model performance. We also manually validate 2k video objects in the Ref-SAV datasets to benchmark referring video object segmentation in complex environments. Experiments show that Sa2VA achieves state-of-the-art across multiple tasks, particularly in referring video object segmentation, highlighting its potential for complex real-world applications.

Paper: https://arxiv.org/pdf/2501.04001v1.pdf

Code: https://github.com/magic-research/Sa2VA

Dataset: Visual Question Answering (VQA)

https://news.1rj.ru/str/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
2👍2
3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh

3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions, but these scenes often require specialised renderers for effective visualisation. In contrast, point clouds are a widely used 3D representation and are compatible with most popular 3D processing software, yet converting 3DGS scenes into point clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible and highly customisable framework that is capable of transforming 3DGS scenes into dense, high-accuracy point clouds. We sample points probabilistically from each Gaussian as a 3D density function. We additionally threshold new points using the Mahalanobis distance to the Gaussian centre, preventing extreme outliers. The result is a point cloud that closely represents the shape encoded into the 3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours depending on view, and each point may contribute only subtle colour hints to the resulting rendered scene. To avoid spurious or incorrect colours that do not fit with the final point cloud, we recalculate Gaussian colours via a customised image rendering approach, assigning each Gaussian the colour of the pixel to which it contributes most across all views. 3DGS-to-PC also supports mesh generation through Poisson Surface Reconstruction, applied to points sampled from predicted surface Gaussians. This allows coloured meshes to be generated from 3DGS scenes without the need for re-training. This package is highly customisable and capability of simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful tool for converting 3DGS data into point cloud and surface-based formats.

Paper: https://arxiv.org/pdf/2501.07478v1.pdf

Code: https://github.com/lewis-stuart-11/3dgs-to-pc

Dataset: NeRF

https://news.1rj.ru/str/DataScienceT 💚
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in #DeepSeek V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Paper: https://arxiv.org/pdf/2412.19437v1.pdf

Code: https://github.com/deepseek-ai/deepseek-v3

#aiagents #ai #llm #ml #machinelearning #python

https://news.1rj.ru/str/DataScienceT 💚
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Paper: https://arxiv.org/pdf/2408.01800v1.pdf

Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v

Datasets: Video-MME

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics

https://news.1rj.ru/str/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models.

Paper: https://arxiv.org/pdf/2501.08331v2.pdf

Code:
https://github.com/gowiththeflowpaper/gowiththeflowpaper.github.io
https://github.com/vgenai-netflix-eyeline-research/go-with-the-flow

https://news.1rj.ru/str/DataScienceT 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
1👍1
Forwarded from Tomas
🎁 Your balance is credited $4,000 , the owner of the channel wants to contact you!

Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today

T.me/Lisainvestor

Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.

You can follow the link :
https://news.1rj.ru/str/+-FM_9cBcSGUyZmFh
👍31
Transformers 2: Self-adaptive LLMs

Paper: https://arxiv.org/pdf/2501.06252v2.pdf

Code:
https://github.com/SakanaAI/self-adaptive-llms
https://github.com/codelion/adaptive-classifier

Datasets: GSM8K - HumanEval - MATH
MBPP - TextVQA - OK-VQA - ARC (AI2 Reasoning Challenge)

https://news.1rj.ru/str/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

paper: https://arxiv.org/pdf/2412.00733v3.pdf

Code: https://github.com/fudan-generative-vision/hallo3

https://news.1rj.ru/str/DataScienceT 😮
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Paper: https://arxiv.org/pdf/2412.15838v2.pdf

Code: https://github.com/pku-alignment/align-anything

Dataset: LLaVA-Bench

https://news.1rj.ru/str/DataScienceT 😱
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems.

paper: https://arxiv.org/pdf/2501.05366v1.pdf

Code: https://github.com/sunnynexus/search-o1

Datasets: Natural Questions - TriviaQA - MATH - HotpotQA - GPQA - Bamboogle

#Search_o1 #LargeReasoningModels #AgenticRAG #ReasonInDocuments #DynamicKnowledgeRetrieval #ComplexReasoning #ScienceMathCoding #OpenDomainQA #TrustworthyAI #IntelligentSystems #python

https://news.1rj.ru/str/DataScienceT 😱
Please open Telegram to view this post
VIEW IN TELEGRAM
👍31
Click-Calib: A Robust Extrinsic Calibration Method for Surround-View Systems

Surround-View System (SVS) is an essential component in Advanced Driver Assistance System (ADAS) and requires precise calibrations. However, conventional offline extrinsic calibration methods are cumbersome and time-consuming as they rely heavily on physical patterns. Additionally, these methods primarily focus on short-range areas surrounding the vehicle, resulting in lower calibration quality in more distant zones. To address these limitations, we propose Click-Calib, a pattern-free approach for offline SVS extrinsic calibration. Without requiring any special setup, the user only needs to click a few keypoints on the ground in natural scenes. Unlike other offline calibration approaches, Click-Calib optimizes camera poses over a wide range by minimizing reprojection distance errors of keypoints, thereby achieving accurate calibrations at both short and long distances. Furthermore, Click-Calib supports both single-frame and multiple-frame modes, with the latter offering even better results. Evaluations on our in-house dataset and the public WoodScape dataset demonstrate its superior accuracy and robustness compared to baseline methods.

Paper: https://arxiv.org/pdf/2501.01557v2.pdf

Code: https://github.com/lwangvaleo/click_calib

Dataset: WoodScape

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI

https://news.1rj.ru/str/DataScienceT 👩‍💻
Please open Telegram to view this post
VIEW IN TELEGRAM
👍31
Machine learning and deep learning
@Machine_learn

Large language Model Git

🔺https://news.1rj.ru/str/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
🚀 Boost Your IT Exam Prep with SPOTO's FREE Study Materials! 🎉

💡 Ready to Pass Your IT Exam?
SPOTO is here to help you succeed! Get SPOTO FREE IT study materials to jumpstart your certification journey. Whether you're preparing for #Cisco, #AWS, #PMP, #Python, #Excel, #Google, #Microsoft, or other certifications, we've got you covered.

🔗🎒Download Free IT Certs Exam E-book: https://bit.ly/4fJSoLP

🔗👩‍💻Test Your IT Skills for Free: https://bit.ly/3PoKH39

🔗📝Download Free Cloud Certs Study Materials:https://bit.ly/4gI4KWk

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
2
DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Paper: https://arxiv.org/pdf/2412.19437v1.pdf

Code: https://github.com/deepseek-ai/deepseek-v3

Datasets: MMLU - GSM8K

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT 😱
Please open Telegram to view this post
VIEW IN TELEGRAM
3