November 13, 2018

HPC in the cloud 2018: way to go, Oracle!

Benchmarking the latest generation of cloud hardware for materials modeling

Benchmarking the latest generation of cloud hardware for materials modeling

High-performance computing (HPC) in the cloud

In 2011 cloud computing was yet to become mainstream and it was hard to think that one day the computational tasks that required expensive and highly sophisticated supercomputers could be performed in the cloud. It was simply too inefficient with many performance concerns. However, when I saw the painful and cost-intensive process or starting a new supercomputing center first-hand, I realized that once the performance concerns are solved HPC in the cloud will become the new norm.

Fast-forward to 2017 and it was time to evaluate whether the cloud caught up with traditional supercomputing centers for us at Exabyte.io. Our platform performs compute-intensive materials modeling and simulation tasks, and our customers led us to make a thorough study. We compared multiple vendors with respect to their performance for distributed memory calculations (full text available here) and discovered that Microsoft Azure indeed could perform very well already. We were convinced that high-performance computing in the cloud is ready for a widespread adoption.

Oracle HPC

Somewhat surprisingly, in 2018 we learned that Oracle was working on HPC. Historically, the domain was mainly “populated” by science and tech nerds and received little or no mainstream attention. At Oracle OpenWorld 2018, however, Larry Ellison spoke about large-scale engineering simulations in his keynote presentation and announced the availability of one of the fastest and cost-effective HPC offerings. Excited we were!

Larry Ellison keynote during the Oracle Open World 2018 mentioning scientific workflows for computational fluid dynamics in the cloud [1]

Oracle HPC team was very kind to invite Exabyte, as an independent party, to study the suitability of the latest generation of their high-performance computing hardware for materials modeling and simulations. We did several benchmarks, including the general dense matrix algebra (Linpack), Density Functional Theory (VASP), and Molecular Dynamics (GROMACS). Full explanation available elsewhere online [2].

Results

Below we demonstrate some of the results. As it can be seen, Oracle shows the best performance due to the combination of the latest generation of computing hardware and low-latency / high-bandwidth interconnect network that facilitates efficient scaling.

Speedup Ratio

Speedup Ratio,
Normalized ratio of the performance for a given number of nodes to the performance for a single node.
AZ — Microsoft Azure, OL — Oracle Cloud, AWS — Amazon Web Services.

VASP

Speedup vs Number of Nodes for Vienna ab-initio simulations package, parallelization over k-points.
OL-NHT-16 — Oracle Cloud, Non-Hyperthreaded, 16 cores.
AWS-NHT-16 — Amazon Web Services, Non-Hyperthreaded, 16 cores.

GROMACS

Speedup — inverse total runtime for the task (in seconds) — for GROMACS, Polystyrene.
OL-NHT — Oracle Cloud, Non-Hyperthreaded. Last two digits show the number of cores per node.

Conclusion

The future of high-performance computing is in the cloud. AWS made the first move in 2015 with the introduction of c4-type instances. Microsoft Azure set the trend by deploying low-latency interconnect in 2016-2017, and Oracle is making a strong move in 2018. Running modeling and simulations on the cloud with similar performance as on-premises is no longer a dream. If you had doubts about the this before, now might be the right time to give it another try.

Links

[1] Larry Ellison keynote presentation at Oracle Open World 2018
[2] Exabyte.io documentation: benchmarking cloud vendors in 2018