🚀 Day 0: Warming up for #OpenSourceWeek!
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
🐳32❤3🔥2🥰1🤓1🫡1
🚀 Day 1 of #OpenSourceWeek: FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡️ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡️ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
🐳13❤3👍2🥰1🆒1
🚀 Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: github.com/deepseek-ai/DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: github.com/deepseek-ai/DeepEP
🐳14🔥7❤2🥰1👏1
🚀 Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡️ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡️ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
🐳8❤3👍2🔥1👏1
🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies
✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🔗 https://github.com/deepseek-ai/DualPipe
✅ EPLB - an expert-parallel load balancer for V3/R1.
🔗 https://github.com/deepseek-ai/eplb
📊 Analyze computation-communication overlap in V3/R1.
🔗 https://github.com/deepseek-ai/profile-data
✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
🔗 https://github.com/deepseek-ai/DualPipe
✅ EPLB - an expert-parallel load balancer for V3/R1.
🔗 https://github.com/deepseek-ai/eplb
📊 Analyze computation-communication overlap in V3/R1.
🔗 https://github.com/deepseek-ai/profile-data
🐳16🔥3
🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access
Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.
⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster
⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
⚡ 40+ GiB/s peak throughput per client node for KVCache lookup
🧬 Disaggregated architecture with strong consistency semantics
✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1
📥 3FS → github.com/deepseek-ai/3FS
⛲ Smallpond - data processing framework on 3FS → github.com/deepseek-ai/smallpond
Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.
⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster
⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
⚡ 40+ GiB/s peak throughput per client node for KVCache lookup
🧬 Disaggregated architecture with strong consistency semantics
✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1
📥 3FS → github.com/deepseek-ai/3FS
⛲ Smallpond - data processing framework on 3FS → github.com/deepseek-ai/smallpond
🐳21👍3❤1🥰1
🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview
Optimized throughput and latency via:
🔧 Cross-node EP-powered batch scaling
🔄 Computation-communication overlap
⚖️ Load balancing
Statistics of DeepSeek's Online Service:
⚡ 73.7k/14.8k input/output tokens per second per H800 node
🚀 Cost profit margin 545%
💡 We hope this week's insights offer value to the community and contribute to our shared AGI goals.
📖 Deep Dive: bit.ly/4ihZUiO
Optimized throughput and latency via:
🔧 Cross-node EP-powered batch scaling
🔄 Computation-communication overlap
⚖️ Load balancing
Statistics of DeepSeek's Online Service:
⚡ 73.7k/14.8k input/output tokens per second per H800 node
🚀 Cost profit margin 545%
💡 We hope this week's insights offer value to the community and contribute to our shared AGI goals.
📖 Deep Dive: bit.ly/4ihZUiO
🐳17❤3👍2👏2🥰1
🚀 DeepSeek-V3-0324 is out now!
🔹 Major boost in reasoning performance
🔹 Stronger front-end development skills
🔹 Smarter tool-use capabilities
✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink”
🔌 API usage remains unchanged
📜 Models are now released under the MIT License, just like DeepSeek-R1!
🔗 Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
🔹 Major boost in reasoning performance
🔹 Stronger front-end development skills
🔹 Smarter tool-use capabilities
✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink”
🔌 API usage remains unchanged
📜 Models are now released under the MIT License, just like DeepSeek-R1!
🔗 Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
👍8🐳6❤5🔥4🥰1
🚀 DeepSeek-R1-0528 is here!
🔹 Improved benchmark performance
🔹 Enhanced front-end capabilities
🔹 Reduced hallucinations
🔹 Supports JSON output & function calling
✅ Try it now: https://chat.deepseek.com
🔌 No change to API usage — docs here: https://api-docs.deepseek.com/guides/reasoning_model
🔗 Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
🔹 Improved benchmark performance
🔹 Enhanced front-end capabilities
🔹 Reduced hallucinations
🔹 Supports JSON output & function calling
✅ Try it now: https://chat.deepseek.com
🔌 No change to API usage — docs here: https://api-docs.deepseek.com/guides/reasoning_model
🔗 Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
🐳30❤7👏4🔥3❤🔥2☃1👀1🆒1
Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀
🧠 Hybrid inference: Think & Non-Think — one model, two modes
⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528
🛠️ Stronger agent skills: Post-training boosts tool use and multi-step agent tasks
Try it now — toggle Think/Non-Think via the "DeepThink" button: chat.deepseek.com
🧠 Hybrid inference: Think & Non-Think — one model, two modes
⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528
🛠️ Stronger agent skills: Post-training boosts tool use and multi-step agent tasks
Try it now — toggle Think/Non-Think via the "DeepThink" button: chat.deepseek.com
🐳4⚡2👏2❤1👍1👎1
API Update ⚙️
🔹 deepseek-chat → non-thinking mode
🔹 deepseek-reasoner → thinking mode
🧵 128K context for both
🔌 Anthropic API format supported: api-docs.deepseek.com/guides/anthrop…
✅ Strict Function Calling supported in Beta API: api-docs.deepseek.com/guides/anthropic_api
🚀 More API resources, smoother API experience
🔹 deepseek-chat → non-thinking mode
🔹 deepseek-reasoner → thinking mode
🧵 128K context for both
🔌 Anthropic API format supported: api-docs.deepseek.com/guides/anthrop…
✅ Strict Function Calling supported in Beta API: api-docs.deepseek.com/guides/anthropic_api
🚀 More API resources, smoother API experience
🤔2🐳2✍1❤1👎1👨💻1
Tools & Agents Upgrades 🧰
📈 Better results on SWE / Terminal-Bench
🔍 Stronger multi-step reasoning for complex search tasks
⚡️ Big gains in thinking efficiency
📈 Better results on SWE / Terminal-Bench
🔍 Stronger multi-step reasoning for complex search tasks
⚡️ Big gains in thinking efficiency
🤯5❤3👎1🔥1🐳1
Model Update 🤖
🔹 V3.1 Base: 840B tokens continued pretraining for long context extension on top of V3
🔹 Tokenizer & chat template updated — new tokenizer config: https://huggingface.co/deepseek-ai/DeepSeek-V3.1/blob/main/tokenizer_config.json
🔗 V3.1 Base Open-source weights: huggingface.co/deepseek-ai/DeepSeek-V3.1-Base
🔗 V3.1 Open-source weights: huggingface.co/deepseek-ai/DeepSeek-V3.1
🔹 V3.1 Base: 840B tokens continued pretraining for long context extension on top of V3
🔹 Tokenizer & chat template updated — new tokenizer config: https://huggingface.co/deepseek-ai/DeepSeek-V3.1/blob/main/tokenizer_config.json
🔗 V3.1 Base Open-source weights: huggingface.co/deepseek-ai/DeepSeek-V3.1-Base
🔗 V3.1 Open-source weights: huggingface.co/deepseek-ai/DeepSeek-V3.1
🐳9❤1🖕1
Pricing Changes 💳
🔹 New pricing starts & off-peak discounts end at Sep 5th, 2025, 16:00 (UTC Time)
🔹 Until then, APIs follow current pricing
📝 Pricing page: https://api-docs.deepseek.com/quick_start/pricing/
🔹 New pricing starts & off-peak discounts end at Sep 5th, 2025, 16:00 (UTC Time)
🔹 Until then, APIs follow current pricing
📝 Pricing page: https://api-docs.deepseek.com/quick_start/pricing/
😱16❤2🐳2👎1
🚀 DeepSeek-V3.1 → DeepSeek-V3.1-Terminus
The latest update builds on V3.1’s strengths while addressing key user feedback.
✨ What’s improved?
🌐 Language consistency: fewer CN/EN mix-ups & no more random chars.
🤖 Agent upgrades: stronger Code Agent & Search Agent performance.
The latest update builds on V3.1’s strengths while addressing key user feedback.
✨ What’s improved?
🌐 Language consistency: fewer CN/EN mix-ups & no more random chars.
🤖 Agent upgrades: stronger Code Agent & Search Agent performance.
👍5🐳5🤩2
📊 DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version.
👉 Available now on: App / Web / API
🔗 Open-source weights here: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
Thanks to everyone for your feedback. It drives us to keep improving and refining the experience! 🚀
👉 Available now on: App / Web / API
🔗 Open-source weights here: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
Thanks to everyone for your feedback. It drives us to keep improving and refining the experience! 🚀
🐳9🔥7
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model!
✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
👉 Now live on App, Web, and API.
💰 API prices cut by 50%+!
✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
👉 Now live on App, Web, and API.
💰 API prices cut by 50%+!
🐳7👍2
⚡️ Efficiency Gains
🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost.
📊 Benchmarks show V3.2-Exp performs on par with V3.1-Terminus.
🤖 DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost.
📊 Benchmarks show V3.2-Exp performs on par with V3.1-Terminus.
🐳4🔥3