Lenovo became a powerhouse in the HPC and supercomputer spaces in 2014, when it bought IBM’s System x server division for $ 2.1 billion in a deal that also allowed it to license the Big Blue’s storage and system management software. The acquisition propelled the company into a highly competitive field that includes Hewlett Packard Enterprise, Dell EMC, IBM (with its Power systems), Fujitsu and Atos.
The HPC space only became more competitive in the years that followed, with HPE ramping up its capabilities with acquisitions first of SGI two years later for $ 275 million, then supercomputer pioneer Cray for 1 , $ 3 billion in 2019 and Chinese suppliers like Inspur and Sugon on the rise.
Nevertheless, Lenovo was able to build on the foundations of the IBM agreement to develop its HPC business. In May, the company said that in its fiscal fourth quarter, its data center group saw revenue increase 32% year-on-year to $ 1.6 billion, with record revenue in a certain number of areas, including HPC and artificial intelligence. Moreover, in the latest Top500 list of the fastest supercomputers in the world released in November 2020, 182 of these systems were based on Lenovo systems, representing 36.4% of the supercomputers on the list. Inspur comes in second with 66 systems.
Number 15 on the list is SuperMUC-NG, a water-cooled supercomputer housed in the Leibniz Supercomputer Center (LRZ) (in the image above) of the Bavarian Academy of Sciences and Humanities in Germany. Work on the system began in 2017 and was completed a year later, powered by 6,500 dual-socket ThinkSystem SD650 “thin nodes” and 305,856 3.1GHz Intel Xeon Platinum 8174 cores, all up to nearly 26.9 petaflops of performance.
The workloads performed on the supercomputer range from simulation and modeling to new computational and memory-intensive tasks, ranging from automating the recognition of images and models in planet observations and processing of planets. climate data to the execution of medical visuals and health records as well as demographic data.
Now SuperMUC-NG is set to undergo a series of upgrades to enable the supercomputer to better leverage AI to perform advanced simulation, modeling, machine learning, and data analysis tasks. which are becoming more and more common and to do so in a more energy efficient manner. Lenovo announced this week the launch of phase two of the system. The work will not only create a more powerful system to handle these advanced workloads, but will also help accelerate the push to make AI more accessible to organizations outside of the traditional HPC realm, according to Scott Tease, vice president and general manager. of HPC and AI at Lenovo.
“AI is increasingly seen as a tool in HPC workloads large and small,” says Tease The next platform. “Researchers are using AI to further analyze the data and spot anomalies or variations in these mega-large datasets. This applies to large-scale bioinformatics, climate or space research as well as CAE and CFD workloads used in engineering and manufacturing around the world. We are entering an era where computing power in itself may no longer be the trigger for innovation and research. The exa-scale innovations we see will usher in an era where more people than ever will have access to great HPC performance capabilities, both fart and exascale. “
At the same time, the industry “is entering an era where adjectives like” sustainable “,” green “and” carbon neutral “are associated with everything. HPC is no different. As long as there are problems to be solved that require computing power, customers like LRZ will drive innovation alongside energy efficiency and the use of efficient liquid cooling will increase in demand ”, did he declare.
Phase two will rely heavily on the latest technology from Intel. It will include 240 compute nodes that will each house two Intel “Sapphire Rapids” Xeon Scalable processors (to be released later this year) and four of Intel’s upcoming “Ponte Vecchio” Xe-HPC GPUs designed for supercomputers running in what Tease ThinkSystem SD6450 calls “greasy nodes. The system will deliver over 13 petaflops of performance, he says. Overall, the second phase compute nodes will deliver four times the performance per watt of the first phase.”
A key feature of the upgrade will be Lenovo’s Distributed Asynchronous Object Storage (DAOS) system, which runs on high-performance solid-state technologies like SSDs and NVMe rather than spinning drives. This allows DAOS to bypass the operating system to generate ultra-low latency, which is “essential for mega-datasets used in modeling and simulating HPC workloads,” Tease explains.
DAOS will use Intel “Ice Lake” Xeon SP processors integrated into Lenovo’s ThinkSystem 1U SR630 V2 platform. It will provide a petabyte of data storage and fast throughput for large volumes of data.
To improve energy efficiency, Lenovo will also bring its Neptune direct hot water cooling solution (below) which will be connected to DAOS via a broadband network. Liquid cooling has been used in a limited way in data centers for years, but the rise of AI, the expansion of HPC workloads in traditional enterprise computing environments and the increased density in data centers have rekindled interest in technology.
Liquid cooling is more efficient and less expensive than air cooling, and the Neptune system can remove approximately 90% of the heat from a computer system, reducing overall power consumption and allowing processors to operate at high speeds. optimal performance. The benefits are especially true for hot cooling, says Tease.
“Hot water cooling inherently saves energy and costs because it does not require chillers to cool the water before it is pumped through the system,” he says. “Water can be reused for things like building heat or sent to an absorption chiller, where the stored energy can be recycled to create cold water for other purposes. Either way, the hotter the water the better – reusing this energy source takes what was once waste (heat production) and turns it into a valuable commodity. In addition to the cost, [ecological] and the operational benefits of hot water cooling allow our systems to support higher power / performance processors and GPUs beyond what air cooling allows.
HPC organizations notice when a leading site like LRZ uses liquid cooling for x86 servers, according to Tease. Lenovo now has Neptune systems operating in North America, Europe, Asia and Australia. Many are able to reduce the number of racks needed to run the same workloads. Now, a single rack of Lenovo SD650 systems with GPUs can deliver the performance of a supercomputer that would rank among the fastest 300 on the Top500 list, a trend that will expand access for researchers needing such capabilities. supercomputing, he said.
One barrier to liquid cooling for enterprise data centers is the cost and difficulty of installing the necessary plumbing, but solutions like Lenovo’s ThinkSystem SR670 V2 use liquid cooling technology that is contained inside the server itself, eliminating the need for plumbing, with such designs showing HPC and enterprise organizations that liquid cooling can be used inside air-cooled data centers.
The DAOS system for phase two will arrive in the last quarter of this year, with the compute system being delivered in the second quarter of 2022.