AMD Releases Latest Consistent Device Memory Mapping Linux Code – Designed for Frontier

Over the past year, we have seen various patches released by AMD engineers with a state effort around the preparations for the Frontier supercomputer. Most of these fixes involved memory management in Linux and special purpose memory management between CPUs/GPUs. Released on Monday was their latest work on consistent device memory mappings for the Linux kernel.

This “MEMORY_DEVICE_COHERENT” was worked on by AMD engineers for their Frontier supercomputer effort, but may be relevant to other future supercomputers and the code may also be of interest to other hardware vendors. This latest effort can be summarized as follows:

This patch series introduces MEMORY_DEVICE_COHERENT, a device-owned memory type that can be mapped into CPU page tables as MEMORY_DEVICE_GENERIC and can also be migrated as MEMORY_DEVICE_PRIVATE.

System stability and performance are not affected according to our ongoing tests, including xfstests.

How it works: The system BIOS advertises the GPU device memory (aka VRAM) as SPM (special purpose memory) in the UEFI system address map.

The amdgpu driver registers memory with devmap as MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user of this hardware page migration feature is the Frontier supercomputer project. This feature is not specific to AMD. We expect other GPU vendors to find this feature useful, and possibly other types of hardware in the future.

Our lab test nodes are similar to the Frontier configuration, with 0.5TB of system memory and 256GB of device memory spread across 4 GPUs, all in a single consistent address space. Migrating pages should significantly improve application efficiency. We will communicate empirical results as they become available.

See the latest set of MEMORY_DEVICE_COHERENT patches for more technical details if you’re interested.

ORNL photo showing Frontier under construction.

Frontier is the exascale supercomputer currently being built for Oak Ridge National Laboratory and is expected to reach full capacity this calendar year using a combination of 3rd Gen AMD EPYC processors and AMD Instinct 250X GPUs. Consistent interconnects between CPUs and GPUs with xGMI has been what sees most Linux support patches mentioning Frontier to get software support in order. When fully operational, Frontier is expected to deliver compute performance above 1.5 exaflops.

About Mariel Baker

Check Also

How they could drastically increase energy efficiency

Traditionally, “quantum supremacy” is sought from the point of view of raw computing power: we …