ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
3.99K photos
226 videos
23 files
4.29K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Machine learning and deep learning
@Machine_learn

Large language Model Git

🔺https://news.1rj.ru/str/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
🚀 Boost Your IT Exam Prep with SPOTO's FREE Study Materials! 🎉

💡 Ready to Pass Your IT Exam?
SPOTO is here to help you succeed! Get SPOTO FREE IT study materials to jumpstart your certification journey. Whether you're preparing for #Cisco, #AWS, #PMP, #Python, #Excel, #Google, #Microsoft, or other certifications, we've got you covered.

🔗🎒Download Free IT Certs Exam E-book: https://bit.ly/4fJSoLP

🔗👩‍💻Test Your IT Skills for Free: https://bit.ly/3PoKH39

🔗📝Download Free Cloud Certs Study Materials:https://bit.ly/4gI4KWk

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
2
DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Paper: https://arxiv.org/pdf/2412.19437v1.pdf

Code: https://github.com/deepseek-ai/deepseek-v3

Datasets: MMLU - GSM8K

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT 😱
Please open Telegram to view this post
VIEW IN TELEGRAM
3
LOOKING FOR A NEW SOURCE OF INCOME?
Average earnings from 100$ a day

Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.

WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn

❗️ Requires 20 people ❗️
Access is available at the link below
👇

https://news.1rj.ru/str/+EWM2hR1d_As0ZDA5
👍21
ChatGPT Cheat Sheet for Business (2025).pdf
8 MB
ChatGPT Cheat Sheet for Business - DataCamp

Unlock the full potential of AI with our comprehensive ChatGPT Cheat Sheet for Business! Tailored specifically for professionals and entrepreneurs, this guide offers actionable insights on leveraging ChatGPT to streamline workflows, enhance customer interactions, and drive business growth. Whether you're a marketing specialist, project manager, or CEO, this cheat sheet is your go-to resource for mastering conversational AI.

From crafting compelling content to automating routine tasks, learn how to harness the power of ChatGPT in real-world business scenarios. With clear examples and step-by-step instructions, you’ll be able to integrate ChatGPT seamlessly into your operations, improving efficiency and innovation.

Don’t miss out on staying ahead of the competition by embracing the future of AI-driven solutions!

#ChatGPT #AIforBusiness #DataCamp #CheatSheet #ConversationalAI #BusinessGrowth #Automation #CustomerEngagement #ContentCreation #EfficiencyBoost #Innovation #FutureOfWork #TechTrends #AIInnovation #DigitalTransformation #BusinessSuccess

https://news.1rj.ru/str/CodeProgrammer ⭐️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training. Extensive experiments show that JanusFlow achieves comparable or superior performance to specialized models in their respective domains, while significantly outperforming existing unified approaches across standard benchmarks. This work represents a step toward more efficient and versatile vision-language models.

Paper: https://arxiv.org/pdf/2411.07975v1.pdf

Code: https://github.com/deepseek-ai/janus

Datasets: GQA MMBench MM-Vet SEED-Bench

https://news.1rj.ru/str/DataScienceT 💚
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
🐫Tülu 3 (what a name) 405B - ​​another release!

An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks

Scalable to 405B - ​​with performance on par with GPT-4o and outperforming previous models in the same class.

Blog: https://allenai.org/blog/tulu-3-405B
You can test it here: https://playground.allenai.org/?model=tulu3-405b
Technical report: https://allenai.org/blog/tulu-3-technical
Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

#llm #ml #ai #opensource

https://news.1rj.ru/str/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
⭐️ Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph

🖥 Github: https://github.com/dosonleung/fasttog

📕 Paper: https://arxiv.org/abs/2501.14300v1

https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥🔥🔥 SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!

Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!

You can now train any of the SmolVLMs or create your own VLMs!

Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh

Code: https://github.com/huggingface/smollm/tree/main/vision
SmolVLM: https://github.com/huggingface/smollm/tree/main

#SmolVLM #llm #opensource #ml #ai
👍3
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Paper: https://arxiv.org/pdf/2501.17811v1.pdf

Code: https://github.com/deepseek-ai/janus

DataSets: #ImageNet - GQA - MM-Vet

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek


https://news.1rj.ru/str/DataScienceT
1
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper: https://arxiv.org/pdf/2401.02954v1.pdf

Code: https://github.com/deepseek-ai/deepseek-llm

Dataset: AlignBench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek


https://news.1rj.ru/str/DataScienceT
👍1
DeepSeek-VL: Towards Real-World Vision-Language Understanding

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.

Paper: https://arxiv.org/pdf/2403.05525v2.pdf

Code: https://github.com/deepseek-ai/deepseek-vl

Datasets: MMLU - GSM8K- HellaSwag

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍1
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

🖥 Github: https://github.com/penfever/wildchat-50m

📕 Paper: https://arxiv.org/abs/2501.18511v1

🧠 Dataset: https://huggingface.co/collections/nyu-dice-lab/wildchat-50m-679a5df2c5967db8ab341ab7

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
PaSa: An LLM Agent for Comprehensive Academic Paper Search

We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.

Paper: https://arxiv.org/pdf/2501.10120v1.pdf

Code: https://github.com/bytedance/pasa

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT
👍31
LitGPT

20+ productive LLMs written from scratch, with detailed denoscriptions, instructions, fine-tuning and deployment.

Peculiarities :
🟢 Models are written from scratch
🟢 No abstractions
🟢 Suitable for teaching beginners
🟢 Flash attention
🟢 FSDP
🟢 LoRA, QLoRA, Adapter
🟢 Reduce GPU memory (fp4/8/16/32)
🟢 1-1000+ GPUs/TPUs
🟢 20+ LLMs

Installation:
pip install 'litgpt[all]'

Example :
from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the family goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.


Github
Docs
Video

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://news.1rj.ru/str/DataScienceT

Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍63