Cerebras CS-2 helps fight SARS-CoV-2

Decoding the replication mechanisms of the SARS-CoV-2 virus has been a key research quest as the COVID-19 pandemic continues. For the world of scientific computing, creating precise models of how the virus replicates so effectively has been a business at every level.

One company contributing to the fight with artificial intelligence and machine learning is Cerebras Systems. The second iteration of the company’s Wafer-Scale Engine, the CS-2 system, plays an important role as part of the Argonne National Laboratory’s AI test bed and has contributed to a COVID reproduction study -19 multi-agency who was nominated as the Gordon Bell Special Award Finalist.

The research team behind the Gordon Bell Special Prize nominated article, made up of scientists from 12 national laboratories, universities and companies, began by taking three-dimensional cryo-electron microscopy images that show the virus in “near atomic resolution”. However, these images are not robust enough to study the roughly 2 million atoms that make up its complex replication system.

Lead author of the study and computer biologist at Argonne, Arvind Ramanathan, explains that the virus is like a “Swiss watch, with precisely organized enzymes and nanomachines that come together like tiny gears” in order to replicate themselves. .

Diagram of the COVID-19 reproduction study demonstrating how the different elements of the research are coordinated. Computer piloting work is highlighted in the lower right quadrant.

To identify the tiny gears of this molecular “machine”, analysis tools are applied to 3D images using a hierarchical artificial intelligence framework to obtain the missing data needed for modeling.

These simulation experiments require thousands of node hours on a supercomputer, and the study authors sought to increase their computational efficiency in order to analyze more 3D images. Freeing up compute nodes saves computational time and power, and machine learning responds to this through computer steering, stopping faulty simulations and encouraging more promising simulations. This is accomplished by training a machine learning model called a “convolutional variational autoencoder”, or CVAE.

“We train the model by letting it observe snapshots of the simulations. We then perform the reverse transformation – or decode it, ”said Vishal Subbiah, Technical Manager of ML Frameworks at Cerebras and co-author of the study, in a corporate blog post. “If the decoded version matches the original, we know the CVAE works. This trained model can then be used during “real” experiments by another algorithm which performs the real piloting. “

The researchers then performed comparisons: by training the CVAE model on a Cerebras CS-2 system and also on 256 nodes of ORNL’s Summit supercomputer, which runs 1,536 GPUs. As noted in their article, they found that “the CS-2 delivers out-of-the-box performance of 24,000 samples / s, or roughly the equivalent of 110-120 GPUs.”

Cerebras was happy to tout this achievement of GPU equivalence, but noted that the out-of-the-box aspect is just as important. Subbiah mentions how the CS-2 “is intentionally designed as a single, ultra-powerful node with cluster-wide performance” and their software “makes it easy to run a neural network by modifying just a few lines of coded. “

Released earlier this year, the CS-2 is based on Cerebras’ second-generation Wafer-Scale Engine (WSE-2) chip, manufactured by TSMC on its 7nm node, with 2,600 billion transistors and 850,000 cores. . The WSE-2 features 40 GB of on-chip SRAM memory, along with 20 petabytes of memory bandwidth and 220 petabits of overall fabric bandwidth. The Argonne National Laboratory was one of the first users of the Cerebras CS-1 machine and was one of the first customers to take delivery of the CS-2.

Source link

About Mariel Baker

Check Also

Kyoto University of Japan lost 77TB of important data due to Hewlett Packard Japan supercomputer- Gizchina.com

Kyoto University of Japan yesterday issued a statement saying that from December 14-16, their supercomputer’s …