This article is by Matt Lakin, science editor at Oak Ridge National Laboratory
The world’s fastest supercomputer comes with some assembly required.
Frontier, the country’s first exascale computer system, will not be assembled as a whole until all parts have been delivered to the US Department of Energy’s (DOE) Oak Ridge National Laboratory for installation – under the eyes of the world – on the data center. floor inside the Oak Ridge Leadership Computing Facility (OLCF).
Once these components work in harmony as advertised, David Bernholdt and his team can take the time to bow quickly and then get back to work.
Bernholdt and his team are leading efforts to ensure that the myriad of compilers, performance tools, debuggers, and other pieces of the Frontier puzzle all fit together to deliver the peak results expected. The feeling can be like a refueling team tweaking a race car, except this car is the first of its kind and heads for the last lap while they are working on it.
HPE Cray System Promises Computing Speeds Over 1 Quintillion Calculations Per Second – More Than 1018, or one billion billion operations, and could help solve problems across the scientific spectrum when Frontier opens its doors to full user operations in 2022.
“When the DOE started to think seriously about exascale, around 2009, we wondered if it would even be possible to do it,” said Bernholdt. “I always thought we would find a way. Now the technology has evolved to the point that it’s not as scary to build, program, and debug as we thought it was, and we’re coming to the finish line. It has been quite a journey.
Getting Frontier to the finish line for his team includes interacting with users – scientists and engineers keen to enter their codes and run high-speed simulations of everything from weather models and stages of cancer to nuclear and collapsing star reactions – and corporate suppliers work tirelessly to meet specifications and deliver a one-of-a-kind product. As the OLCF encourages standards-based programming environments where possible, the team also works closely with various standards organizations to ensure that these standards reflect user needs and optimize uptake. load of the material capacities of the suppliers.
“It’s sometimes intimidating to remember: it’s serial number 1,” said Bernholdt. “They created this machine just for us, and even the best hardware and software won’t be perfect, at least not right out of the box. A big part of what we do is try to understand what users will need from the system in order to use it effectively and help them represent it in their codes.
“At the same time, we are working with our vendors and compilers to make sure their solutions implement the standards we need and deliver the necessary performance on the system. We need to make sure that vendors provide enough detailed and granular information about the system on time for software developers to take advantage of, and we need to make sure that the languages have evolved enough to perform the tasks. There are a lot of moving parts, and they just speed up as we go along.
Bernholdt’s scientific training prepared him for the mission. He spent his early research career in computational chemistry and helped develop NWchem, an evolving software package for computational chemistry still in use around the world. Plans call for a revamped version of the package to run on Frontier and other exascale supercomputers, such as Aurora.
Bernholdt then turned to computer science and software development to design and refine tools that tackle the same problems he encountered as a scientific user. He and his team helped program and debug Frontier’s supercomputing predecessors: Titan (27 petaflops, or 27 quadrillion calculations per second) and Summit (200 petaflops, or 200 quadrillion calculations per second).
“This is our third accelerator-based machine, so we have a pretty good idea of how to program them,” said Bernholdt. “The biggest challenge has been the timing. We had maybe half the time to prepare Frontier that we had for Summit, and it’s a new software stack that had everyone to scramble. But that means more opportunities and incentives for optimization.
This 24 hour job will not end when Frontier turns on. Bernholdt and his team will continue to monitor supercomputer performance and look for ways to raise standards and improve performance.
“It never stops,” Bernholdt said. “It’s always very satisfying to see people able to use the system wisely, but that’s not the end of it. Frontier will continue to evolve and improve, and we will be a part of it. I feel pretty confident in saying that there is no other place on earth right now that could support a similar project of this scale and importance.