Huawei Ascend 910C Post-Trains DeepSeek V4-Pro

Editor J
Huawei Ascend 910C Post-Trains DeepSeek V4-Pro

A Huawei-led team post-trained DeepSeek's 1.6T V4-Pro on 1,000+ Ascend 910C chips—Chinese silicon reaching training, though no benchmarks were shared.

A Huawei-led Chinese research team has successfully post-trained a massive AI model using only domestic hardware. The group utilized a cluster of approximately 1,000 Huawei Ascend 910C chips to complete the post-training of DeepSeek's 1.6-trillion-parameter V4-Pro model.

The project was first disclosed in a June 5 social media post by the Shenzhen municipal government. According to the South China Morning Post (SCMP), the initiative was a joint effort between Huawei, the Shenzhen Loop Area Institute, the Harbin Institute of Technology (Shenzhen), and the Shenzhen Research Institute of Big Data. The achievement marks a notable step for China's AI self-reliance, as training has long been the weakest link for Chinese silicon.

Breaking Away From Nvidia GPUs

In China, domestic AI accelerators have primarily excelled at inference—the relatively lightweight task of processing prompts and generating outputs from completed models. Training, by contrast, recalculates a model's weights wholesale, and that work has remained heavily dependent on Nvidia GPUs.

For this project, the team conducted full-parameter post-training, meaning every weight within the model was updated rather than relying on a thin, external adapter layer. A Shenzhen government description cited by the SCMP compared the process to road infrastructure. If inference is a 'one-way road where a query goes in and an answer comes out,' this run added 'complex flyovers and loops that multiplied the compute and communication load several times over.'

The cluster of more than 1,000 Ascend chips completed over 1,500 training iterations without interruption, according to the Shenzhen government. Post-training is the alignment stage that follows initial pre-training, teaching a model to follow instructions, adhere to safety guardrails, and execute specific tasks.

DeepSeek V4-Pro and the Huawei Ascend 910C

DeepSeek logo
The DeepSeek logo

DeepSeek's V4-Pro model, unveiled in April, is the lab's largest model to date. It features a Mixture-of-Experts (MoE) architecture with 1.6 trillion parameters, activating approximately 49 billion parameters per token and supporting a context window of 1 million tokens. According to Tom's Hardware, its pre-training corpus exceeds 32 trillion tokens.

The Huawei Ascend 910C is the company's flagship AI accelerator. This dual-die processor previously delivered approximately 60% of the inference performance of Nvidia's H100 in early DeepSeek evaluations. Reports indicate the Ascend 910C cluster achieved a model FLOPs utilization (MFU) of over 30% and a 14% improvement in key operator efficiency.

V4-Pro is the first DeepSeek model optimized for the Ascend architecture from its inception. Consequently, the Huawei Ascend 910C, once confined to inference workloads, has now been used to train the model itself. This comes as DeepSeek continues to gain global traction, recently topping a U.S. app-spending tracker.

Export Controls and the Push for Domestic AI Self-Reliance

The initiative is driven by tightening U.S. semiconductor export controls. With access to high-end Nvidia GPUs restricted, Chinese technology companies are racing to migrate training workloads onto domestic AI chips. This push to build out domestic AI chips reflects a wider drive for self-reliance, the goal the Shenzhen government invoked when it claimed the project would 'help enhance the self-reliance of China's AI industry chain.'

However, industry analysts advise caution about reading this as full self-reliance, as the announcement lacked concrete benchmark data. Tom's Hardware noted that critical details—such as the total duration of the training run, direct performance comparisons with Nvidia hardware, and the hardware efficiency of the 1,000-chip cluster—were not disclosed.

Furthermore, the track record of domestic hardware in training remains mixed. Last August, reports surfaced that DeepSeek failed to complete a single training run for its next-generation R2 model on Ascend hardware, despite on-site support from Huawei engineers. The effort was reportedly hindered by hardware instability, slow interconnect speeds, and optimization gaps in Huawei's CANN software stack compared to Nvidia's CUDA. DeepSeek subsequently reverted to Nvidia hardware for training while limiting Ascend to inference tasks.

The extent to which this V4-Pro post-training run resolved these technical bottlenecks remains unverified by public data. DeepSeek has not commented on the Shenzhen government's announcement. While Chinese silicon has demonstrably reached the threshold of model training, whether domestic AI chips can deliver genuine self-reliance for China remains to be seen.

List Next ›
Menu