SC24 Shows AI Drive Prompts Higher Capacity SSDs
I recently attended SC24 in Atlanta, GA. This conference focuses on high performance computing and, of course, AI of various types was a big driver of attendance as well as exhibitors at the conference. In addition to increased demand for GPU and other processing devices, AI also drives demand for in-data center network performance as well as increasing demand for storage and memory capacity and bandwidth. I had a chance to talk with storage device and system companies before and during SC24, but more on that below.
In recent weeks SK hynix announced the introduction of an HBM3E high bandwidth memory with up to a 16 high DRAM stack and with memory capacities up to 48GB. HBM memories are a favored DRAM memory package that are often located in chiplet packages adjacent to GPUs and other high performance computing applications. 12 high stacks are the prior limit in HBM stacking. These devices are said to provide 18% performance improvements in AI training and 32% improvement in inference versus 12-high DRAM stack devices.
Samsung as well as Micron are also making HBM devices. SK hynix said during the introduction that 16-high DRAM stacks will be more common in next generation HBM4 devices. These are expected to be in mass production by 2026.
In solid state storage devices, several companies have announced high-capacity SSDs for data center applications. SK hynix’s Solidigm unit launched a 122TB PCIe SSD using quad-level NAND flash, QLC flash. The company said that this doubles the storage capacity of its earlier 61.44TB SSD. The D5-P5336 drive is intended for AI training and inference workloads with high storage density, minimal size and lower power consumption.
The company says this SSD consumes up to 84% less power in network attached storage, NAS, applications compared to HDD + TLC flash, improves storage power density with 3.4X more terabytes per watt compared to 30TB TLC NAND and enables up to 4PB of storage per one rack unit. The drive is sampling now and should be more widely available in early 2025.
Phison, a long-time manufacturer of SSD controllers and a manufacturer of other company branded SSDs has recently started to sell SSDs under its own brand. They also recently announced up to a 128TB PCIe Gen 5 SSD using 2Tb 3D QLC NAND chips. Note that the press release title says 128TB capacity while the text in the press release says 122.88TB capacity for this SSD.
The Pascari D205V SSD is available as a single and dual port PCIe Gen 5 products as well as NVMe 2.0 interface device and includes Opal supported encryption. It also has 128 namespaces and supports 0.3 drive writes per day, DWPD with a 2.5 M hour mean time between failures, MTBF. Sequential read and write performance are 14.6 GB/s and 3.3 GB/s and random read and write performance are 3M IOPS at 4K bytes per block and 35K IOPS at 16K bytes per block. The product is also intended for AI-driven data processing. This drive is expected to ship in early Q2 2025 in U.2 and E3.L form factors.
Phison also offers a 61.44TB D200V SSD and a higher endurance 15TB SA50V 2.5” SATA III SSD. Phison announced a partnership with VDURA, formerly Panasas. Phison also announced that its PS7161 is the first PCIe Gen 6 redriver that is expected to ship in January 2025.
Micron also recently introduced a E3.5, U.2 and E1.L form factor 60TB SSD, the Micron ION 6550 for data center applications. This is also a PCIe Gen 5 SSD and is in qualification now. Micron said that this product should be used in high-capacity NVMe applications including network data lakes, data ingest as well as data preparation and checkpointing. It uses 20W of power and offers OCP 2.5 support that includes advanced features such as active state power management, ASPM, which allows an L1 idle state at 4W power.
Micron is using its 8th generation NAND. Micron said that this product enables storage capacities of over 1.2PB per rack unit and the company says that this product can support 1 random drive writes per day for 16KB random writes and includes advanced security features.
At SC24 I had a chance to talk with Vaughn Stewart, VP of Systems Engineering at VAST. The were there and had announced a partnership with the National Center for Supercomputing Applications, NCSA, at the University of Illinois Urbana-Champaign, which will use their products in their Harbor home and software file system as well as a partnership of the company’s Federal division with MITRE on their NVIDIA DGX SuperPOD to power the MITRE Federal AI sandbox.
Vaughn told me that 70% of their business is AI related using QLC and SLC SSD Just a Bunch of Flash, JBOD, tiers, much of it with NVIDIA GPU-based systems. VAST is supporting S3 object storage with CPU-direct using remote direct memory access, RDMA. They, like companies such as Pure Storage are selling all-flash storage systems.
I spoke with Vaughn about using SSDs versus HDD for bulk data storage and he said that with data reduction techniques such as compression and deduplication the $/TB cost of all-flash secondary storage for AI applications can approach those of HDDs. At SC24 I spoke with folks at the WDC and Seagate exhibits and they confirmed that in fact, HDDs can use the same data reduction techniques as SSDs.
In addition, Seagate announcements on their 30TB Mozaic HAMR HDDs quote a power per TB of 0.32W/TB. The Solidigm 122TB drive specifications quote a 25W operating power consumption (with some lower power modes available), resulting in 0.20W/TB. So, this SSD would have about a 38% advantage in operating power compared to the Mozaic HDD. However, the NAND flash chips used in SSDs also involve much higher energy consumption to produce than HDDs, making the embedded carbon content of SSDs higher than that of HDDs.
4U HDD enclosures used in data centers can hold 24 HDDs with some specialized enclosures able to hold up to 60 HDDs. Sixty 30TB HDDs would provide 1.8 PB in 4U, or 0.45PB/U. This can be compared to the Solidigm claim of 4PB/U, or roughly 9 times more capacity per unit of rack space. This higher density per rack can be important for some applications.
However, SSD storage capital costs are significantly higher than HDD and will likely remain so with the introduction of HAMR HDDs. Below is an updated chart from Coughlin Associates, Inc. showing raw storage capacity $/GB for enterprise SSDs and HDDs. This chart used data from Coughlin Associates for HDDs and SSD uses data from a 2020 Chris Mellor, Blocks and Files article, Enterprise SSDs cost ten times more than nearline disk drives, supplemented by data from Jim Handy of Objective Analysis on enterprise SSD costs per TB.
This chart shows that SSD raw storage capacity costs have been about 10X higher than HDDs and, with expected HAMR drive areal density advances they should remain about 5X higher at least through the decade. I expect that 122TB SSDs should have a cost of about $10,000 each, or about $82/TB, compared to a 30TB HDD which should cost between $13/TB and 16/TB.
It should be noted that increases in storage capacity density have become harder for both SSDs as well as HDDs. NAND flash doesn’t get the cost reduction advantage by adding additional layers as they did in the past and the last two years have seen little increases in the number of layers in 3D flash. Recent NAND flash capacity increases focus on smaller diameter memory cells, closer memory cells and particularly on using more bits per cell.
However, unlike HDDs, QLC NAND flash has poorer endurance and therefore the number of times the drive can be rewritten is limited, although Solidigm says that for their expected workloads these SSDs should last for 5 years. However, in general, QLC NAND flash are best used for write seldom, read many applications, while HDDs may be best for applications requiring frequent writing.
I believe that HDDs will continue to provide cost effective secondary storage for many AI and other data intensive applications with SSDs used for primary storage with higher performance to support data requirements for the HBMs used in GPUs. However, these high-capacity SSDs may provide some secondary storage, where the highest storage density in racks and somewhat lower operating power is a predominant requirement, rather than the price of storage.
QLC 60-122+ TB SSDs have been announced by several companies, targeted for secondary storage in AI applications. I expect that they may have some impact on HDD secondary storage where storage density and power consumption in a rack has a priority over price.