
DeepSeek Explained Part 5: DeepSeek-V3-Base
Last Updated on April 28, 2025 by Editorial Team
Author(s): Nehdiii
Originally published on Towards AI.
This article is the fifth installment of our DeepSeek series and the first to specifically highlight the training methodology of DeepSeek-V3 [1, 2].
As illustrated in the figure below, DeepSeek-V3 undergoes a multi-stage training process, including
An initial pre-training stage that results in DeepSeek-V3-Base.Starting from DeepSeek-V3-Base, DeepSeek-R1-Zero and DeepSeek-R1 are trained by employing large-scale Reinforcement Learning, exploring scenarios with and without Supervised Finetuning as a cold-start.DeepSeek-R1 is subsequently utilized to generate reasoning data during the Supervised Finetuning stage of DeepSeek-V3, which is followed by an additional RL stage not shown in the figure.
Specifically, this article will focus on the pre-training stage that produces DeepSeek-V3-Base, detailing the key techniques employed to ensure the pre-training is both effective and efficient.
Subsequently, we will cover additional topics such as Grouped Relative Policy Optimization (GRPO) [7], the training processes of DeepSeek-R1-Zero and DeepSeek-R1, and finally revisit the post-training phase of DeepSeek-V3, encompassing both the supervised finetuning stage and the RL stage.
Table of contents for this article:
Background: introduce the key techniques used in the pre-training phase of DeepSeek-V3, including document packing, Fill-in-Middle, and long context extension.Pre-training: describe the construction of the pre-training data, emphasize… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI