January 22, 2019

Collaborative Evaluation of Cloud Infrastructure Performance

Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications.

Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to continuously evaluate the performance of the infrastructure with respect to the most commonly-used simulation workflows.

For that reason, we introduce an online ecosystem and open-source the tools aimed at providing a collaborative and repeatable way to assess the performance of cloud and on-premises hardware for multiple real-world application-specific benchmark cases.

Here we briefly describe the components of the ecosystem, our vision for it, and refer the readers to the original manuscript [1] and related resources [2, 3] listed below.

Ecosystem

The ecosystem is an online platform allowing multiple people to collaboratively evaluate the performance of computing hardware for resource-intensive applications. The ecosystem consists of the following components.

  • ExaBench [2], an open-source modular software tool to facilitate the performance assessment of computing systems. The tool supports multiple benchmark cases to evaluate the performance of scientific applications.
  • Results database, a centrally accessible repository to store the results in order to facilitate their comparison, traceability, and curation.
  • Results page, an online resource presenting the results in a visual manner [3].
  • Sites, or physical location with unique identifiers where the benchmarks are executed.
  • Contributors, who use the ExaBench tool to submit the benchmark results to the central database and/or contribute to the ExaBench source code by adding support for new benchmark cases and metrics.
  • Community, the broader set of users and interested parties.

The architecture of the ecosystem is presented in Figure-1. Sites’ administrators install ExaBench tool from the source code available online, developed and maintained continuously by the codebase contributors. Benchmarks are executed on the underlying hardware and their results are stored automatically in the database in a certain format for the community to analyze the efficiency of the computing systems.

Figure-1: Schematic representation of the online ecosystem. The three main components — Open-source codebase, Database of results and their online visual representation, are outlined in the middle. “EB” denotes the ExaBench tool. “Cluster” refers to on-premises computing clusters. “Cloud” denotes the public/private cloud systems. Two types of contributors — to codebase (in orange) and to results (green) are shown. Codebase contributors help extend the test cases. Results contributors run ExaBench tool and publish the results to the centrally accessible database. The results are available to the wider community.

Perspectives and Outlook

High-performance and parallel computing today is more important than ever due to the end of Moore’s law in conventional semiconductor technology scaling. HPC is no longer a domain of highly specialized applications only. The latter still exist and are needed, but gradually become a minority. For that reason and in order to facilitate the timely and objective insights, the importance of a continuous collaborative performance assessment is strong today and will grow further in the future. Following the limited set of applications we incorporated into the ecosystem today, many more use cases in computational fluid dynamics, electronic design automation, drug discovery, computational chemistry, etc. can be introduced by extending the source code and contributing the results.

We envision that the ecosystem will help the community to choose the optimal setup for running resource-intensive workloads, and let cloud vendors improve their services in a competitive and transparent environment. We see how such an environment can lead to further democratization of HPC and its proliferation in industrial research and development, which in turn will accelerate progress in the corresponding industries.

References

[1] Exabyte Benchmarks Ecosystem Manuscript, arXiv.org

[2] Exabyte Benchmarks Suite, GitHub Repository

[3] Exabyte Benchmarks Suite Results, Google Spreadsheet