DeepSeek
🌟 Impressive Results of DeepSeek-R1-Lite-Preview Across Benchmarks!
🌟 Inference Scaling Laws of DeepSeek-R1-Lite-Preview
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.
🐳1
Re DeepSeek has not issued any cryptocurrency. Currently, there is only one official account on the Twitter platform. We will not contact anyone through other accounts.Please stay vigilant and guard against potential scams.
via Twitter @DeepSeek
via Twitter @DeepSeek
🐳1
🎉 Introducing DeepSeek App!
💡 Powered by world-class DeepSeek-V3
🆓 FREE to use with seamless interaction
📱 Now officially available on App Store & Google Play & Major Android markets
🔗Download now: https://download.deepseek.com/app/
🌟 1/3
via Twitter @DeepSeek
💡 Powered by world-class DeepSeek-V3
🆓 FREE to use with seamless interaction
📱 Now officially available on App Store & Google Play & Major Android markets
🔗Download now: https://download.deepseek.com/app/
🌟 1/3
via Twitter @DeepSeek
❤2🔥1🐳1
Re ✨ Key Features of DeepSeek App:
🔐 Easy login: E-mail/Google Account/Apple ID
☁️ Cross-platform chat history sync
🔍 Web search & Deep-Think mode
📄 File upload & text extraction
🌟 2/3
via Twitter @DeepSeek
🔐 Easy login: E-mail/Google Account/Apple ID
☁️ Cross-platform chat history sync
🔍 Web search & Deep-Think mode
📄 File upload & text extraction
🌟 2/3
via Twitter @DeepSeek
👍2❤1🐳1
Re ⚠️ Important Notice:
✅ 100% FREE - No ads, no in-app purchases
🛡️ Download only from official channels to avoid being misled
📲 Search "DeepSeek" in your app store or visit our website for direct links
🌟 3/3
via Twitter @DeepSeek
✅ 100% FREE - No ads, no in-app purchases
🛡️ Download only from official channels to avoid being misled
📲 Search "DeepSeek" in your app store or visit our website for direct links
🌟 3/3
via Twitter @DeepSeek
👍2❤1🔥1🐳1
🚀 DeepSeek-R1 is here!
⚡ Performance on par with OpenAI-o1
📖 Fully open-source model & technical report
🏆 MIT licensed: Distill & commercialize freely!
🌐 Website & API are live now! Try DeepThink at http://chat.deepseek.com today!
🐋 1/n
via Twitter @DeepSeek
⚡ Performance on par with OpenAI-o1
📖 Fully open-source model & technical report
🏆 MIT licensed: Distill & commercialize freely!
🌐 Website & API are live now! Try DeepThink at http://chat.deepseek.com today!
🐋 1/n
via Twitter @DeepSeek
🐳4🔥2
Re 🛠️ DeepSeek-R1: Technical Highlights
📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
🐋 4/n
via Twitter @DeepSeek
📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
🐋 4/n
via Twitter @DeepSeek
❤4🐳4🔥2🥰1👏1
Re 🌐 API Access & Pricing
⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner
💰 $0.14 / million input tokens (cache hit)
💰 $0.55 / million input tokens (cache miss)
💰 $2.19 / million output tokens
📖 API guide: https://api-docs.deepseek.com/guides/reasoning_model
🐋 5/n
via Twitter @DeepSeek
⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner
💰 $0.14 / million input tokens (cache hit)
💰 $0.55 / million input tokens (cache miss)
💰 $2.19 / million output tokens
📖 API guide: https://api-docs.deepseek.com/guides/reasoning_model
🐋 5/n
via Twitter @DeepSeek
👍6🐳6❤3💩3🔥1
To prevent any potential harm, we reiterate that @deepseek_ai is our sole official account on Twitter/X.
Any accounts:
- representing us
- using identical avatars
- using similar names
are impersonations.
Please stay vigilant to avoid being misled!
Any accounts:
- representing us
- using identical avatars
- using similar names
are impersonations.
Please stay vigilant to avoid being misled!
X (formerly Twitter)
DeepSeek (@deepseek_ai) on X
Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
🔥8🐳8❤3👍1🥰1🤩1
📢 Terminology Correction: DeepSeek-R1’s code and models are released under the MIT License.
🐳14🔥4❤2
🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:
• No system prompt
• Temperature: 0.6
• Official prompts for search & file upload: bit.ly/4hyH8np
• Guidelines to mitigate model bypass thinking: bit.ly/4gJrhkF
The official DeepSeek deployment runs the same model as the open-source version—enjoy the full DeepSeek-R1 experience! 🚀
• No system prompt
• Temperature: 0.6
• Official prompts for search & file upload: bit.ly/4hyH8np
• Guidelines to mitigate model bypass thinking: bit.ly/4gJrhkF
The official DeepSeek deployment runs the same model as the open-source version—enjoy the full DeepSeek-R1 experience! 🚀
🐳16👍2🔥2🫡1
🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!
Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection
💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
📖 For more details, check out our paper here: https://arxiv.org/abs/2502.11089
Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection
💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
📖 For more details, check out our paper here: https://arxiv.org/abs/2502.11089
arXiv.org
Native Sparse Attention: Hardware-Aligned and Natively Trainable...
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention...
👍8🐳7🔥2❤1
🚀 Day 0: Warming up for #OpenSourceWeek!
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
🐳32❤3🔥2🥰1🤓1🫡1
🚀 Day 1 of #OpenSourceWeek: FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡️ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
✅ BF16 support
✅ Paged KV cache (block size 64)
⚡️ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
🐳13❤3👍2🥰1🆒1
🚀 Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: github.com/deepseek-ai/DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: github.com/deepseek-ai/DeepEP
🐳14🔥7❤2🥰1👏1
🚀 Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡️ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡️ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
🐳8❤3👍2🔥1👏1