Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Paper: https://arxiv.org/pdf/2412.15838v2.pdf
Code: https://github.com/pku-alignment/align-anything
Dataset: LLaVA-Bench
https://news.1rj.ru/str/DataScienceT😱
Paper: https://arxiv.org/pdf/2412.15838v2.pdf
Code: https://github.com/pku-alignment/align-anything
Dataset: LLaVA-Bench
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems.
paper: https://arxiv.org/pdf/2501.05366v1.pdf
Code: https://github.com/sunnynexus/search-o1
Datasets: Natural Questions - TriviaQA - MATH - HotpotQA - GPQA - Bamboogle
#Search_o1 #LargeReasoningModels #AgenticRAG #ReasonInDocuments #DynamicKnowledgeRetrieval #ComplexReasoning #ScienceMathCoding #OpenDomainQA #TrustworthyAI #IntelligentSystems #python
https://news.1rj.ru/str/DataScienceT😱
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems.
paper: https://arxiv.org/pdf/2501.05366v1.pdf
Code: https://github.com/sunnynexus/search-o1
Datasets: Natural Questions - TriviaQA - MATH - HotpotQA - GPQA - Bamboogle
#Search_o1 #LargeReasoningModels #AgenticRAG #ReasonInDocuments #DynamicKnowledgeRetrieval #ComplexReasoning #ScienceMathCoding #OpenDomainQA #TrustworthyAI #IntelligentSystems #python
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3❤1
Click-Calib: A Robust Extrinsic Calibration Method for Surround-View Systems
Surround-View System (SVS) is an essential component in Advanced Driver Assistance System (ADAS) and requires precise calibrations. However, conventional offline extrinsic calibration methods are cumbersome and time-consuming as they rely heavily on physical patterns. Additionally, these methods primarily focus on short-range areas surrounding the vehicle, resulting in lower calibration quality in more distant zones. To address these limitations, we propose Click-Calib, a pattern-free approach for offline SVS extrinsic calibration. Without requiring any special setup, the user only needs to click a few keypoints on the ground in natural scenes. Unlike other offline calibration approaches, Click-Calib optimizes camera poses over a wide range by minimizing reprojection distance errors of keypoints, thereby achieving accurate calibrations at both short and long distances. Furthermore, Click-Calib supports both single-frame and multiple-frame modes, with the latter offering even better results. Evaluations on our in-house dataset and the public WoodScape dataset demonstrate its superior accuracy and robustness compared to baseline methods.
Paper: https://arxiv.org/pdf/2501.01557v2.pdf
Code: https://github.com/lwangvaleo/click_calib
Dataset: WoodScape
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI
https://news.1rj.ru/str/DataScienceT👩💻
Surround-View System (SVS) is an essential component in Advanced Driver Assistance System (ADAS) and requires precise calibrations. However, conventional offline extrinsic calibration methods are cumbersome and time-consuming as they rely heavily on physical patterns. Additionally, these methods primarily focus on short-range areas surrounding the vehicle, resulting in lower calibration quality in more distant zones. To address these limitations, we propose Click-Calib, a pattern-free approach for offline SVS extrinsic calibration. Without requiring any special setup, the user only needs to click a few keypoints on the ground in natural scenes. Unlike other offline calibration approaches, Click-Calib optimizes camera poses over a wide range by minimizing reprojection distance errors of keypoints, thereby achieving accurate calibrations at both short and long distances. Furthermore, Click-Calib supports both single-frame and multiple-frame modes, with the latter offering even better results. Evaluations on our in-house dataset and the public WoodScape dataset demonstrate its superior accuracy and robustness compared to baseline methods.
Paper: https://arxiv.org/pdf/2501.01557v2.pdf
Code: https://github.com/lwangvaleo/click_calib
Dataset: WoodScape
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3❤1
Forwarded from Machine Learning with Python
Machine learning and deep learning
✅ @Machine_learn
Large language Model Git
🔺 https://news.1rj.ru/str/deep_learning_proj
Large language Model Git
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
🚀 Boost Your IT Exam Prep with SPOTO's FREE Study Materials! 🎉
💡 Ready to Pass Your IT Exam?
SPOTO is here to help you succeed! Get SPOTO FREE IT study materials to jumpstart your certification journey. Whether you're preparing for #Cisco, #AWS, #PMP, #Python, #Excel, #Google, #Microsoft, or other certifications, we've got you covered.
🔗🎒Download Free IT Certs Exam E-book: https://bit.ly/4fJSoLP
🔗👩💻Test Your IT Skills for Free: https://bit.ly/3PoKH39
🔗📝Download Free Cloud Certs Study Materials:https://bit.ly/4gI4KWk
🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
💡 Ready to Pass Your IT Exam?
SPOTO is here to help you succeed! Get SPOTO FREE IT study materials to jumpstart your certification journey. Whether you're preparing for #Cisco, #AWS, #PMP, #Python, #Excel, #Google, #Microsoft, or other certifications, we've got you covered.
🔗🎒Download Free IT Certs Exam E-book: https://bit.ly/4fJSoLP
🔗👩💻Test Your IT Skills for Free: https://bit.ly/3PoKH39
🔗📝Download Free Cloud Certs Study Materials:https://bit.ly/4gI4KWk
🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
❤2
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3❤1
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper: https://arxiv.org/pdf/2501.12948v1.pdf
Codes:
https://github.com/zhaoolee/garss
https://github.com/deepseek-ai/deepseek-r1
Datasets: MMLU - IFEval - GPQA - MMLU-Pro
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI
https://news.1rj.ru/str/DataScienceT❤️
Paper: https://arxiv.org/pdf/2501.12948v1.pdf
Codes:
https://github.com/zhaoolee/garss
https://github.com/deepseek-ai/deepseek-r1
Datasets: MMLU - IFEval - GPQA - MMLU-Pro
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
DeepSeek-V3 Technical Report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Paper: https://arxiv.org/pdf/2412.19437v1.pdf
Code: https://github.com/deepseek-ai/deepseek-v3
Datasets: MMLU - GSM8K
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT😱
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Paper: https://arxiv.org/pdf/2412.19437v1.pdf
Code: https://github.com/deepseek-ai/deepseek-v3
Datasets: MMLU - GSM8K
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3
Forwarded from Machine Learning with Python
LOOKING FOR A NEW SOURCE OF INCOME?
Average earnings from 100$ a day
Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.
WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn
❗️ Requires 20 people ❗️
Access is available at the link below
👇
https://news.1rj.ru/str/+EWM2hR1d_As0ZDA5
Average earnings from 100$ a day
Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.
WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn
❗️ Requires 20 people ❗️
Access is available at the link below
👇
https://news.1rj.ru/str/+EWM2hR1d_As0ZDA5
👍2❤1
Forwarded from Machine Learning with Python
ChatGPT Cheat Sheet for Business (2025).pdf
8 MB
ChatGPT Cheat Sheet for Business - DataCamp
Unlock the full potential of AI with our comprehensive ChatGPT Cheat Sheet for Business! Tailored specifically for professionals and entrepreneurs, this guide offers actionable insights on leveraging ChatGPT to streamline workflows, enhance customer interactions, and drive business growth. Whether you're a marketing specialist, project manager, or CEO, this cheat sheet is your go-to resource for mastering conversational AI.
From crafting compelling content to automating routine tasks, learn how to harness the power of ChatGPT in real-world business scenarios. With clear examples and step-by-step instructions, you’ll be able to integrate ChatGPT seamlessly into your operations, improving efficiency and innovation.
Don’t miss out on staying ahead of the competition by embracing the future of AI-driven solutions!
#ChatGPT #AIforBusiness #DataCamp #CheatSheet #ConversationalAI #BusinessGrowth #Automation #CustomerEngagement #ContentCreation #EfficiencyBoost #Innovation #FutureOfWork #TechTrends #AIInnovation #DigitalTransformation #BusinessSuccess
https://news.1rj.ru/str/CodeProgrammer⭐️
Unlock the full potential of AI with our comprehensive ChatGPT Cheat Sheet for Business! Tailored specifically for professionals and entrepreneurs, this guide offers actionable insights on leveraging ChatGPT to streamline workflows, enhance customer interactions, and drive business growth. Whether you're a marketing specialist, project manager, or CEO, this cheat sheet is your go-to resource for mastering conversational AI.
From crafting compelling content to automating routine tasks, learn how to harness the power of ChatGPT in real-world business scenarios. With clear examples and step-by-step instructions, you’ll be able to integrate ChatGPT seamlessly into your operations, improving efficiency and innovation.
Don’t miss out on staying ahead of the competition by embracing the future of AI-driven solutions!
#ChatGPT #AIforBusiness #DataCamp #CheatSheet #ConversationalAI #BusinessGrowth #Automation #CustomerEngagement #ContentCreation #EfficiencyBoost #Innovation #FutureOfWork #TechTrends #AIInnovation #DigitalTransformation #BusinessSuccess
https://news.1rj.ru/str/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training. Extensive experiments show that JanusFlow achieves comparable or superior performance to specialized models in their respective domains, while significantly outperforming existing unified approaches across standard benchmarks. This work represents a step toward more efficient and versatile vision-language models.
Paper: https://arxiv.org/pdf/2411.07975v1.pdf
Code: https://github.com/deepseek-ai/janus
Datasets: GQA MMBench MM-Vet SEED-Bench
https://news.1rj.ru/str/DataScienceT💚
We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training. Extensive experiments show that JanusFlow achieves comparable or superior performance to specialized models in their respective domains, while significantly outperforming existing unified approaches across standard benchmarks. This work represents a step toward more efficient and versatile vision-language models.
Paper: https://arxiv.org/pdf/2411.07975v1.pdf
Code: https://github.com/deepseek-ai/janus
Datasets: GQA MMBench MM-Vet SEED-Bench
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
🐫Tülu 3 (what a name) 405B - another release!
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - with performance on par with GPT-4o and outperforming previous models in the same class.
▪ Blog: https://allenai.org/blog/tulu-3-405B
▪You can test it here: https://playground.allenai.org/?model=tulu3-405b
▪ Technical report: https://allenai.org/blog/tulu-3-technical
▪ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
#llm #ml #ai #opensource
https://news.1rj.ru/str/DataScienceT❤️
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - with performance on par with GPT-4o and outperforming previous models in the same class.
▪ Blog: https://allenai.org/blog/tulu-3-405B
▪You can test it here: https://playground.allenai.org/?model=tulu3-405b
▪ Technical report: https://allenai.org/blog/tulu-3-technical
▪ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
#llm #ml #ai #opensource
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
https://news.1rj.ru/str/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥🔥🔥 SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!
Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!
You can now train any of the SmolVLMs or create your own VLMs!
Starting training for SmolVLM 256M is very simple:
▪ Code: https://github.com/huggingface/smollm/tree/main/vision
▪ SmolVLM: https://github.com/huggingface/smollm/tree/main
#SmolVLM #llm #opensource #ml #ai
Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!
You can now train any of the SmolVLMs or create your own VLMs!
Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh▪ Code: https://github.com/huggingface/smollm/tree/main/vision
▪ SmolVLM: https://github.com/huggingface/smollm/tree/main
#SmolVLM #llm #opensource #ml #ai
👍3
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Paper: https://arxiv.org/pdf/2501.17811v1.pdf
Code: https://github.com/deepseek-ai/janus
DataSets: #ImageNet - GQA - MM-Vet
https://news.1rj.ru/str/DataScienceT
Paper: https://arxiv.org/pdf/2501.17811v1.pdf
Code: https://github.com/deepseek-ai/janus
DataSets: #ImageNet - GQA - MM-Vet
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
❤1
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper: https://arxiv.org/pdf/2401.02954v1.pdf
Code: https://github.com/deepseek-ai/deepseek-llm
Dataset: AlignBench
https://news.1rj.ru/str/DataScienceT
Paper: https://arxiv.org/pdf/2401.02954v1.pdf
Code: https://github.com/deepseek-ai/deepseek-llm
Dataset: AlignBench
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍1
DeepSeek-VL: Towards Real-World Vision-Language Understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.
Paper: https://arxiv.org/pdf/2403.05525v2.pdf
Code: https://github.com/deepseek-ai/deepseek-vl
Datasets: MMLU - GSM8K- HellaSwag
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead. This design choice ensures the model's ability to capture critical semantic and detailed information across various visual tasks. We posit that a proficient Vision-Language Model should, foremost, possess strong language abilities. To ensure the preservation of LLM capabilities during pretraining, we investigate an effective VL pretraining strategy by integrating LLM training from the beginning and carefully managing the competitive dynamics observed between vision and language modalities. The DeepSeek-VL family (both 1.3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks. We have made both 1.3B and 7B models publicly accessible to foster innovations based on this foundation model.
Paper: https://arxiv.org/pdf/2403.05525v2.pdf
Code: https://github.com/deepseek-ai/deepseek-vl
Datasets: MMLU - GSM8K- HellaSwag
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek
https://news.1rj.ru/str/DataScienceT
👍1