LLNL, IBM and Red Hat to Explore Standardized Resource Management Interface for Cloud HPC

Lawrence Livermore National Laboratory (LLNL), IBM and Red Hat are teaming up to develop best practices for interfacing high performance computing (HPC) planners and cloud orchestrators, an effort designed for supercomputers that take advantage of cloud technologies.

Under a recently signed Memorandum of Understanding (MOU), the organizations said researchers aim to enable next-generation workloads by integrating LLNL’s Flow planning framework with Red Hat OpenShift – a platform- Kubernetes business form – to enable more traditional HPC jobs to use cloud and container technologies. A new standardized interface would help meet a growing demand for compute-intensive jobs combining HPC and cloud computing across a wide range of industrial sectors, the researchers said.

“Cloud systems are increasingly setting the direction of the broader IT ecosystem, and the economy is a primary driver,” said Bronis R. de Supinski, CTO of Livermore Computing at LLNL. “With the increasing prevalence of cloud-based systems, we need to align our HPC strategy with cloud technologies, especially with respect to their software environments, to ensure the long-term sustainability and accessibility of our critical HPC systems. “

LLNL’s open source workflow planning framework builds on the lab’s extensive experience in HPC and enables the deployment of new types of resources, schedulers, and services as data centers continue to evolve. , including the emergence of exascale computing. Its ability to make intelligent investment decisions and rich expression of resources make it well suited to facilitate orchestration using tools such as Red Hat OpenShift on large-scale HPC clusters, which LLNL researchers predict. to become more common in the years to come.

“One of the trends we’ve seen at Livermore is the loose coupling of HPC applications and applications like machine learning and data analytics on the orchestrated side, but in the near future we expect to see a tighter mesh of these two technologies, ”said Dan Milroy, postdoctoral researcher at LLNL. “We believe that unifying Flux with cloud orchestration frameworks like Red Hat OpenShift and Kubernetes will allow both HPC and cloud technologies to come together in the future, helping to evolve workflows everywhere. . I think the co-development of Flux with OpenShift will be really beneficial. “

Red Hat OpenShift is an open source container platform based on the Kubernetes Container Orchestrator for developing and deploying business applications. Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. Researchers aim to further improve Red Hat OpenShift and make it a common platform for a wide range of IT infrastructures, including large-scale HPC
systems, business systems, and public cloud offerings, starting with commercial HPC workloads.

“We would love to see a platform like Red Hat OpenShift be able to run a wide range of workloads on a wide range of platforms, from supercomputers to clusters,” said Claudia Misale, staff member at IBM research. “We see difficulties in the HPC world with having many different types of HPC software stacks, and container platforms like OpenShift can address these difficulties. We think OpenShift can be the common denominator, as Red Hat Enterprise Linux has been a common denominator on HPC systems. “

The impetus to enable Flux as a Kubernetes planning plug-in started with a successful prototype from a collaboration between Oak Ridge, Argonne and Livermore (CORAL) and a Centers of Excellence project between LLNL and IBM to understand the formation of cancer. The plug-in enabled more sophisticated planning of Kubernetes workflows, which convinced researchers that they could integrate Flux with Red Hat OpenShift, the researchers said.

Since many HPC centers use their own planners, a primary goal is to ‘democratize’ the Kubernetes interface for HPC users, pursuing an open interface that any site or HPC center could use and incorporate their existing planners. .

“We are seeing a constant trend towards data-centric computing, which includes the convergence of artificial intelligence / machine learning and HPC
workloads, ”said Chris Wright, senior vice president and chief technology officer at Red Hat. “The HPC community has long been at the forefront of data analysis. Bringing their expertise in large-scale complex planning to a common cloud native platform is a perfect expression of the power of open source collaboration. This brings new planning capabilities to Red Hat OpenShift and Kubernetes and brings modern cloud native AI / ML applications to large labs. “

Researchers plan to initially integrate Flux to run in the Red Hat OpenShift environment, using Flux as a driver for other commonly used schedulers to interface with OpenShift and Kubernetes, ultimately making it easier to use. platform with any HPC workload and on any HPC machine.
“This effort will allow HPC workflows to easily leverage leading HPC planners such as Flux to fully harness the potential of the emerging HPC and cloud.
Said Dong H. Ahn, leader of LLNL’s Advanced Technology Development and Mitigation Next Generation IT Enabling Project.

The team has started work on the planning topology and plans to define an interface within the next six months. Future goals include exploring different integration models such as colocation, extending advanced management and configuration beyond the node.

Founded in 1952, the Lawrence Livermore National Laboratory (www.llnl.gov) provides solutions to our nation’s most important national security challenges through innovative science, engineering and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the National Nuclear Security Administration of the US Department of Energy.

Source link

About Mariel Baker

Check Also

The LLNL team uses the Ruby supercomputer to study the effects of nuclear weapons on detonations near the surface

June 8, 2021 – A team from Lawrence Livermore National Laboratory (LLNL) took a closer …