AWS client Descartes Labs uses HPC to understand the world and manage the flow of data from sensors on the ground, in water and in space. The company has been cloud-based from the start and focuses on geospatial applications that often involve petabytes of data.
CTO and co-founder Mike Warren told me their intention is to never be limited by computing power. Early in his career, Mike worked on simulations of the universe and built several clusters and supercomputers, including Loki, Avalon, and Space Simulator. Mike was one of the first to create clusters from basic material and learned a lot along the way.
After retiring from Los Alamos National Lab, Mike co-founded Descartes Labs. In 2019, Descartes Labs used AWS to power a TOP500 run that delivered 1.93 PFLOPS, landing at position 136 on the TOP500 list for June 2019. This run used 41,472 cores on a cluster of C5 instances. Notably, Mike told me that they started this run without any help or coordination with the EC2 team (because Descartes Labs regularly runs production jobs of this magnitude for their clients, their account already had service quotas. sufficiently high). To learn more about this run, read Thunder from the Cloud: 40,000 Cores Performed Live on AWS. Here is my favorite part of this story:
We obtained access to a group of nodes in AWS US-East Region 1 for approximately $ 5,000 charged to the company’s credit card. The potential for democratizing HPC was palpable as the cost of running custom hardware at this speed is probably closer to $ 20-30 million. Not to mention a waiting time of 6 to 12 months.
After the success of this race, Mike and his team decided to work on an even more substantial one for 2021, with a target of 7.5 PFLOPS. In collaboration with the EC2 team, they obtained a capacity reservation on demand EC2 for a period of 48 hours at the beginning of June. After a few “small” runs that only used 1024 instances at a time, they were ready to fire. They launched 4,096 EC2 instances (C5, C5d, R5, R5d, M5, and M5d) with a total of 172,692 cores. Here are the results:
- Rmax – 9.95 PFLOPS. This is the actual performance that has been achieved: nearly 10 quadrillion floating point operations per second.
- Rpeak – 15.11 PFLOPS. This is the theoretical maximum performance.
- HPL efficiency – 65.87%. The ratio of Rmax to Rpeak, or a measure of how well the material is used.
- N: 7864 320. This is the size of the matrix that is inverted to perform the Top500 benchmark. NOT2 is approximately $ 61.84 trillion.
- P x Q: 64 x 128. This is a runtime parameter and represents a processing grid.
Read the entire blog to find out how Descartes Labs built one of the world’s most powerful supercomputers on AWS and took it apart in just 24 minutes.
Remember: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube Channel, and according to AWS HPC Blog Channel.