Photos by Evan Krape and courtesy of the U.S. Department of Energy
October 18, 2022
UD Prof. Sunita Chandrasekaran, students play a key role in exascale computing
From fast food to rapid COVID testing, the world has an unrelenting “need for speed.”
The fastest drive-thru in the United States this year, with the shortest average service time from placing your order to receiving your food, was Taco Bell at 221.99 seconds.
The fastest car, the Bugatti Chiron Super Sport 300+, broke into the record books at 304.7 miles per hour in 2019 and, to this day, still holds the title.
And then there’s Frontier, the supercomputer at the US Department of Energy’s Oak Ridge National Lab in Oak Ridge, Tennessee. In May 2022, it was named the fastest computer in the world, with 1.1 exaflops, or more than a quintillion calculations per second. That’s a bunch of math problems to solve – over 1,000,000,000,000,000,000 of them – in the blink of an eye, a feat that has earned Frontier the coveted status of the first computer to reach a power of exascale calculation.
Scientists are eager to harness Frontier for a wide range of studies, from mapping the brain to creating more realistic climate models, exploring fusion energy, improving our understanding of new materials at the level of nanosciences, strengthening national security and obtaining a clearer and more in-depth view. of the universe, from particle physics to star formation. And that only scratches the surface.
At the University of Delaware, Sunita Chandrasekaran, associate professor and David L. and Beverly JC Mills Chair in Career Development in the Department of Computing and Information Science, and her students worked to ensure that software keys will be ready to run on Frontier when the exascale computer is “open for business” to the scientific community in 2023.
Since existing computer codes are not automatically transferred to the exascale, she worked with a team of researchers in the United States and at HZDR in Germany to test a high-performance computer application called “Particle in Cell” (PIConGPU).
A key tool in plasma physics, the Particle-in-Cell algorithm describes the dynamics of a plasma – matter rich in charged particles (ions and electrons) – by calculating the movement of these charged particles using Maxwell’s equations. (James Maxwell was a 19th century physicist best known for using four equations to describe electromagnetic theory. Albert Einstein said Maxwell’s impact on physics was the most profound since Sir Issac Newton.) Such tools are essential to the evolution of radiotherapy for cancer, as well as the expansion of the use of X-rays to probe the structure of materials.
“I tell my students, imagine your laptop connected to millions of other laptops and able to harness all that power,” Chandrasekaran said. “But then comes the exascale – that’s a 1 followed by 18 zeros. Think of the size and power of such a massive system. Such a system could potentially light up an entire city.
Executing instructions on an exascale system requires a “different programming framework” than other systems, Chandrasekaran explained, given the unique architectural design consisting of numerous parallel processing units and single high-performance graphics processing units. .
Overall, Frontier packs 9,408 central processing units (CPUs), 37,632 graphics processing units (GPUs), and 8,730,112 cores, all connected by over 90 miles of network cables. All that computing power has helped Frontier break through the exascale barrier, and Chandrasekaran is working to ensure the software will make the leap as well.
To take advantage of the system’s specialized architecture, she and her fellow researchers strive to ensure that the computer code of high-priority software is literally Frontier-speed – and bug-free – some of the key components of the Exascale Computing Project SOLLVE, which Chandrasekaran now leads. It is a collaboration between Brookhaven National Laboratory, Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, Georgia Tech, and UD.
“Our team has been working together since 2017 to test the software to improve the system,” Chandrasekaran said, noting that the work involves collaborations with several compiler developers who provide implementations for Frontier.
“The machine is so new that the tools we need to operate it are also immature,” Chandrasekaran said. “Our goal is to have programs ready for use by scientists. We help by reporting bugs, offering fixes, testing beta versions, and helping vendors prepare robust tools for use by scientists.
UD students debug vital programming tools
Thomas Huber, who earned his bachelor’s degree at UD, worked on the project with Chandrasekaran for more than two years before graduating with his master of science degree in computer and information science from the University last May. . Originally from Linwood, New Jersey, he is now employed as a software engineer at Cornelis Networks, a computer hardware company.
“When we started working on it a few years ago, we knew Frontier would come in at exascale speed, and that required bringing together a ton of people to work on the 20 or so core apps that had been deemed essential to the mission,” Huber said. . “All this software must work perfectly.”
Through this unique opportunity made possible by Chandrasekaran, Huber gained valuable research and real-world experience. He also trained four undergraduate students on the project, as they worked together to validate that OpenMP, a popular programming tool, could run on Frontier.
As the group’s work progressed in evaluating compilers that provide implementations for new programming features, they found a few bugs, and then a few more bugs. And that’s when they decided to create a GitHub, a forum for software developers, to share their discoveries and open source code, as part of ECP-SOLLVE.
“We launched a GitHub to review OpenMP specification releases. They come out every few years, and they’re like new features — 600 pages of what you can and can’t do,” Huber said. “Most importantly, the section at the end indicates all the differences between the versions of the program. We take the list of all new features and review and create test cases for each one. We write code that no one else has written before, and we make all our code public.
Huber estimates the UD team, working with Oak Ridge National Lab, has written about 500 tests and 50,000 lines of code so far.
“The whole thing with high-performance computing is parallel programming,” Huber said. “Imagine you’re in a ton of traffic headed for a toll booth with a single EZ pass lane. Parallel programming allows you to split into multiple EZ pass lanes. OpenMP allows you to do this parallel work and to work extremely fast. What we have done with OpenMP ensures that scientists and others will be able to use the program on Frontier. We are the guinea pigs for that.
Huber was drawn to research through the College of Engineering’s Vertically Integrated Program (VIP). Chandrasekaran was the group leader of the project. He stayed for a semester, got to work on a research paper (“It was amazing,” he said), and met colleagues who became best friends. They even won a poster contest.
He thanks Chandrasekaran for hiring him in the field.
“Being so enthusiastic and emphasizing how important this material is to help researchers and the real world, she made a difference,” Huber said. “She is a top teacher in high performance computing.”