China News .online

What Motivates DeepSeek to Develop Its Own Storage Solutions?

3 April 2025 · Uncategorized ·

Source: · https://www.cnr.cn/tech/techgd/20250302/t20250302_527086582.shtml

What Motivates DeepSeek to Develop Its Own Storage Solutions?
Recently, news that DeepSeek has open-sourced its 3FS parallel file system shook up the AI infrastructure sector. This new storage solution can move data at a rate of 6.6 TB per second—equivalent to transferring about 700 high-definition movies in one second—and it automatically adjusts resources for better performance, reducing the gap between domestic chips and international brands by as much as 15%.

This is not just an ordinary technological update; when Llama3's model training requires ingesting up to 15 PB of data (equivalent to playing over two million hours of continuous 4K video), global AI labs suddenly realize that the efficiency of large-scale models' training isn't solely determined by GPU computing power. Storage systems are becoming a significant bottleneck.

As AI computational capabilities soar, storage is emerging as an invisible battlefield.
In early 2024, one leading AI company suffered from insufficient bandwidth in its data center, causing less than 40% utilization of over two thousand A100 GPUs for extended periods and daily losses exceeding $1 million. This incident exposed the 'bucket effect' during the era of artificial intelligence—when GPU computation speeds reach microseconds, a sudden storage glitch can crash an entire training task, erasing weeks’ worth of computational results in seconds.

This might be one reason why DeepSeek decided to develop its own storage system.
Data shows that optimizing storage systems could reduce the training cycle for 175B parameter models by up to 30%, saving millions. In inference scenarios, a mere 10% abnormal request encountering storage latency can cause P99 response times to exceed service level agreement (SLA) limits—this is often why certain autonomous driving companies experience sudden performance degradation.

The essence of DeepSeek's decision to develop its own 3FS lies in recognizing the critical role that storage architecture plays within AI applications. According to a report on advanced AI computing power, under similar GPU scales, differences in storage performance can lead to training cycles differing by multiples. Behind this race for computational capacity is an escalating covert competition over storage efficiency density; self-developed solutions are thus becoming increasingly important.

Five leading vendors compete fiercely in the field of AI storage—can domestic companies take the lead?
Compared with traditional needs, large models demand more significant amounts and scales of data as well as longer training cycles.
To speed up model training for these massive datasets requires rapid loading capabilities. Typically, hundreds or even thousands of GPUs are used to form a computing cluster that performs efficient parallel computations; this necessitates high concurrency in input/output (I/O) processing. Training datasets often consist of billions of small files and require bandwidths reaching several terabytes per second—this demands robust data management from storage systems.

Among the few companies capable of meeting these requirements, IBM has successfully transitioned its HPC products to AI applications while DDN stands out globally for its performance in key metrics like read/write throughput. However, due to technical exclusivity and proprietary hardware issues, building with DDN can be prohibitively expensive.
DeepSeek's 3FS is a new open-source product that performs well compared to established storage brands; it boasts excellent reading bandwidth at an average of 6.6 TB/s per cluster or about 36.7 GB/s per node.

Another domestic vendor, JD Cloud Yunhai, also excels in this field with single-node read/write speeds reaching up to 95/60 GB/s respectively—further narrowing the gap between Chinese and international storage solutions.
Compared to DeepSeek's solution, JD Cloud’s product offers broader applicability while maintaining high performance levels suitable for over twenty mainstream large models including those from DeepSeek itself.

In summary, as AI applications delve deeper into complex scenarios requiring massive datasets, demands on storage systems are increasing. Domestic products like 3FS and Yunhai demonstrate superior capabilities in handling such data volumes. Whether these domestic solutions can lead the pack remains to be seen.
(Translator's Note: This article represents commercial information published by China Radio International Network; its content does not reflect our network’s viewpoint but is provided for reference only.)

Read Also

© 2025 CHINA NEWS .online beta

Write us hi@chinanews.online