The Oak Ridge Leadership Computing Facility (OLCF) has announced the first details on the Orion storage subsystem of its upcoming Frontier exascale supercomputer which is expected to go live in late 2021. Being the industry’s first ExaFLOPS 1.5 supercomputer, Frontier will need a very fast storage subsystem. It looks like it’s on the verge of getting one with up to 700 petabytes of storage, 75TB / s of throughput, and 15 billion IOPS (yes, billion) of performance under pressure.
“To our knowledge, Orion will be the world’s largest and fastest single file POSIX namespace filesystem,” said Sarp Oral, I / O working group leader for Frontier at OLCF. .
The Frontier supercomputer will actually have two storage subsystems: a storage layer in the system providing massive sequential read performance of over 75TB / s and around 15 billion read IOPS, as well as a central files called Orion which offers a whopping 700 PB of capacity.
The Orion Global File Storage System layer: capacity from 700 PB to 10 TB / s
Since Frontier is based on HPE’s Cray Shasta architecture, its overall file storage system will be largely based on the ClusterStor multi-tier architecture which uses both PCIe 4.0 / NVMe SSDs as well as hard drives. traditional.
Cray ClusterStor machines use AMD EPYC processors and can automatically align data flows in the file system to the workload and move I / O operations between different storage tiers as needed. Such a change makes applications believe that they are accessing high-performance all-flash arrays, thereby maximizing performance.
On the software side, Orion will use an open source Luster parallel file system (used by many supercomputers around the world, including OLCF’s Titan and Jaguar) as well as ZFS with a volume manager.
Typically, Orion in the center of Frontier will have three levels:
- A metadata tier comprising 480 NVMe SSDs with a capacity of 10PB.
- A tier of NVMe storage that utilizes 5,400 SSDs offering 11.5PB of capacity, maximum read-write speeds of 10TB / s, and over 2 million random read I / O operations per second (IOPS ).
- A level of HDD storage based on 47,700 PMR hard drives offering a capacity of 679 PB, a maximum read speed of 5.5 TB / s, a maximum write speed of 4.6 TB / s and over 2 million ‘Random read IOPS.
OLCF says Orion will have 40 Luster Metadata Server nodes and 450 Luster Object Storage (OSS) service nodes for a total of 1350 OST system-wide. Each OSS node will provide one Object Storage Target (OST) device for performance and two OST devices for capacity. Additionally, Orion will use 160 nodes for routing which will deliver peak read-write speeds of 3.2 TB / s available for other OLCF resources and platforms.
“Orion pushes the boundaries of what is technically possible due to its extreme scale and hybrid hard drive / NVMe nature,” said Dustin Leverman, leader of the high performance compute storage and archiving group at OLCF . “It’s a complex system, but our experience and best practices will help us create a resource that enables our users to push the boundaries of science using Frontier.”
The storage layer: up to 75 TB / s at 15 billion read IOPS
Frontier’s storage layer includes SSD drives installed directly into compute nodes and connected to AMD’s EPYC processors using a PCIe Gen 4 interface. These NVMe drives will deliver overall performance of over 75TB / s read speed, over 35TB / write speed and over 15 billion random read IOPS.
The OLCF has not disclosed the capacity of the storage layer, but this is only local storage, so don’t expect tens of petabytes here.
Overall, the storage layer gives Frontier a whopping 75TB / s performance, while the Orion in the middle offers a capacity of around 700PB. The combination of this dual-layer storage subsystem and tiered provides exactly what a 1.5 EFLOPS machine with 20MWatt power consumption needs: unbeatable storage performance to power data to processors and GPUs and the ultimate capacity to store large data sets on which the supercomputer is designed. treat.