Summit, Sunway, ThetaGPU and more

In this regular feature, HPCwire highlights recently published research in the high performance computing community and related fields. From parallel programming to exascale to quantum computing, the details are here.


Proteome-scale deployment of protein structure prediction workflows on the Summit supercomputer

Illustration of the Summit AlphaFold workflow for asynchronously processing a batch of inferences from input features. Credit: Gao et al.

The authors of this preprint paper propose that “leading-class computational resources can be used to perform genome-scale protein structure prediction using state-of-the-art deep learning models, providing multitude of new data for systems biology applications”. The authors go on to describe their efforts “to efficiently deploy the AlphaFold2 program, for full proteome structure prediction, at scale on the resources of the Oak Ridge Leadership Computing Facility, including the Summit supercomputer.” The inference workload used nearly 4,000 Summit node hours in total. Deployed in 2018, Summit covers 4,608 GPU-accelerated nodes and currently ranks number two on the Top500 list with 148.6 Linpack petaflops.

Authors: Mu Gao, Mark Coletti, Russell B. Davidson, Ryan Prout, Subil Abraham, Benjamin Hernandez and Ada Sedova

Bridging the gap between deep learning and the frustrated quantum spin system for large-scale simulations on the next-generation Sunway supercomputer

A team of Chinese researchers observes that the “computational complexity to obtain the wave functions allowing to accurately describe the quantum states increases exponentially compared to the number of particles”. Addressing this challenge, they then present a “novel convolutional neural network to simulate the highly frustrated two-dimensional spin-1/2 J1-J2 Heisenberg model, [such that] the simulation is performed on a large-scale system with low cost and high scalability. With this research, the authors demonstrated the effectiveness of the CNN-based representation of the quantum state. Their calculation leveraged 31 million cores from the new Sunway supercomputer, which would be an exascale-class system with around 42 million SW26010Pro cores. The authors state that they believe the application should be able to scale system-wide.

Authors: Mingfan Li, Junshi Chen, Qian Xiao, Fei Wang, Qingcai Jiang, Xuncheng Zhao, Rongfen Lin, Hong An, Xiao Liang and Lixin He

Analytical energy model parameterized by workload, clock rate, and number of active cores for high-performance shared-memory computing applications

In this study, authors from the Department of Electronics and Microelectronics at the University of Mons in Belgium provide “an analytical modeling of application architecture and behavior that can be used to estimate energy-optimal software configurations and provide sound advice to improve DVFS and DPM techniques”. for single-node high-performance computing applications. Their results show that up to 70% energy could be saved (in the best case) compared to the default Linux choice, with an average of 14% energy saved.

Authors: Vitor Ramos Gomes da Silva, Carlos Valderrama, Pierre Manneback and Samuel Xavier-de-Souza

Verified Tensor program optimization through high-level planning rewrites

In this article, “Optimization of the Verified Tensor program via high-level scheduling rewrites”, the authors present a new programming language for high-performance computers that addresses both speed and accuracy. They developed “a lightweight Coq framework for optimizing tensor cores written in a pure functional array language”. In their paper, they “demonstrate that not only is this system capable of deriving optimizations from existing state-of-the-art languages ​​like Halide and generating comparably performing code, but it is also capable of planning a family of program transformations useful beyond of what is accessible to Halide.”

Authors: Manda Liu, Gilbert Louis Berntein, Adam Chlipala and Jonathan Ragan-Kelley

Supercomputer simulation technology of turbulent flows in the heyday of exascale computing

Written by two researchers from the Russian Academy of Sciences, this article presents a “technology for scale-resolved simulations of turbulent flows in aerodynamics and aeroacoustics problems”, targeting a range of HPC platforms – from small clusters to exascale computers. The paper summarizes the advantages of a hybrid modeling method that combines Navier-Stokes (RANS) and Reynolds Large Scale Simulation (LES) methods, which the authors state as “widely recognized as the most cost-effective in many applications of computational aerodynamics and aeroacoustics.” Other key technologies include “a numerical scheme for discretization in space, a parallel algorithm, and a portable software implementation for modern hybrid systems with extra massive parallelism.” With parallel efficiency gains on exascale supercomputers, the authors suggest that it will be possible to solve previously unsolvable mesh problems, for example by modeling an entire aircraft.

Authors: Andrey V. Gorobets and Alexey P. Duben

Unprecedented cloud resolution in a GPU-enabled fully physical atmospheric climate simulation on OLCF’s Summit supercomputer

A team of researchers from multiple Department of Energy labs is studying the performance of the Energy Exascale Earth System Model-MMF (E3SM-MMF) code on the Oak Ridge Leadership Computing Facility Summit supercomputer. “Hundreds of cores in the approximately 10,000 lines of code in the E3SM-MMF CRM have been ported to GPUs with OpenACC guidelines,” the authors note. “A high-resolution benchmark using 4,600 nodes on Summit demonstrates the computational capability of GPU-enabled E3SM-MMF code in a full physical climate simulation,” they write. The research marks an important step forward in integrating the main effects of clouds into the climate model.

Authors: Matthew R. Norman, David A. Bader, Christopher Eldred, Walter M. Hannah, Benjamin R. Hillman, Christopher R. Jones, Jungmin M. Lee, LR Leung, Isaac Lyngaas, Kyle G. Pressel, Sarat Sreepathi, Mark A. Taylor and Xingqiu Yuan

AI optimized for inference and high-performance computing for large-scale gravitational wave detection

The authors of this paper show how to perform accelerated gravitational wave sensing based on large-scale artificial intelligence. The authors, using the ThetaGPU supercomputer at the Argonne Leadership Computing Facility, noted that their “AI set powered by Nvidia TensorRT processed an entire month of advanced LIGO data (including Hanford and Livingston data streams) in 50 seconds”. According to them, this was a triple speed-up that was achieved while maintaining the same sensitivity as traditional AI models.

Authors: Pranshu Chaturvedi, Asad Khan, Minyang Tian, ​​EA Huerta and Huihuo Zheng


Do you know of any research that should be included in next month’s list? If so, email us at [email protected] We look forward to hearing from you.

About Mariel Baker

Check Also

AMD Releases Latest Consistent Device Memory Mapping Linux Code – Designed for Frontier

Over the past year, we have seen various patches released by AMD engineers with a …