Several years ago, Appsbroker, Google Cloud and Intel collaborated to build an HPC capability for customers to showcase and build the largest compute solutions on Google’s public cloud. We work with customers who have the most demanding workloads across Media and Entertainment, Health and Life Sciences, Manufacturing, and Financial Services.
A concern for our customers is always whether or not cloud hosted servers can perform as well as bare metal servers. Google was able to showcase their performance across ultramem instances in late 2018 for the STAC-M3 benchmark, which focuses on time-series databases and market tick data:
Can cloud instances perform better than bare metal?
At Appsbroker, we wanted to highlight how the latest Intel Xeon 2nd gen Scalable Processors perform on Google Cloud, with a Financial Services benchmark that is becoming more relevant in recent years: the STAC-A2 benchmark for Derivative Risk.
We collaborated with Intel and their STAC-A2 experts to deploy their STAC-A2 pack based on the Intel oneAPI compilers and libraries that can leverage the performance of the latest generation of Intel CPUs. In particular, this means:
- Accessing specific instructions for HPC including SIMD/AVX-512.
- Vector programming based on the OpenMP specification
- Parallelising workloads using the oneAPI Threading Building Blocks and MPI libraries in the oneAPI HPC Toolkit.
- Intel has recently standardised its hardware offerings, compilers and software categories with the oneAPI suite for cross-architecture performance.
Read the full press release now for the key facts surrounding the STAC-A2 Benchmark.
The Cluster Build
Stack under test:
- STAC-A2 Pack for Intel oneAPI (Rev N)
- CentOS Linux 8.3
- Virtualised 2nd Gen Intel Xeon (Cascade Lake) CPU @ 3.1GHz (burst to 3.8GHz)
- 10 x Google Cloud c2-standard-60 (compute optimised) with 60vCPU / VM and 240GB RAM
For a deep dive into the benchmark, please visit STAC to access the report.
We chose the Google Cloud c2 compute optimised instances to ensure the best possible performance. These Compute Engine VMs can achieve 3.8GHz sustained all-core turbo frequency and provide full NUMA transparency into the underlying server architecture.
STAC-A2 is a series of benchmarks that will scale across a multi-node cluster. Performance is ultimately limited by the network interconnect’s latency and speed. Following Google Cloud’s best practices for tightly coupled HPC workloads, the 10 node cluster was deployed with a Compact Placement Policy to ensure the co-location of nodes in a physical location, which minimised latency between the nodes.
Appsbroker automated creating both the individual node software installation and deploying the 10 node cluster on Google Cloud using Infrastructure as Code techniques. This means that a powerful cluster, capable of breaking records, can be deployed and ready for processing derivative risk workloads in under five minutes from a repeatable template. We strongly believe that risk workloads due to their spiky nature, dependent on market trading conditions, will benefit from the dynamic deployment patterns that Google Cloud makes available to financial services organisations.
Cost is a key consideration for using services in the Cloud. The configuration described above meant that for burst workloads, 12,398 options per dollar was achievable. If a one-year commit was locked-in for these resources, this figure increased to 19,693 options per dollar.
Achieving a STAC-A2 first in the public cloud
We were able to achieve two (2) records in the STAC-A2 benchmark and be the first to achieve this benchmark in the public cloud. The two records were:
- The highest throughput (STAC-A2.β2.HPORTFOLIO.SPEED)
- The fastest cold time in the large problem size (STAC-A2.β2.GREEKS.10-100k-1260.TIME.COLD)
Thank you to Intel and Google Cloud for collaborating and supporting creating this benchmark. We’re looking forward to rerunning this benchmark later in the year when the latest Ice Lake (3rd gen Xeon Scalable Processors) are available on Google Cloud. Plus for reduced inter-node latency, we will be taking advantage of the new 100Gbps Tier-1 network now available on Google Cloud for compute optimised VMs.
- Appsbroker Breaks 2X Records for HPC Speed in the STAC-A2™ Benchmark: https://blog.appsbroker.com/articles/press-release-appsbroker-intel-google-cloud-hpc-stac-a2-benchmark
- STAC-A2 benchmark for Appsbroker/Google Cloud/Intel: https://stacresearch.com/news/INTC210331
- STAC-M3 Google Cloud press release: https://cloud.google.com/blog/products/compute/can-cloud-instances-perform-better-than-bare-metal-latest-stac-m3-benchmarks-say-yes>
- Intel oneAPI: https://software.intel.com/content/www/us/en/develop/tools/oneapi.html
- Google Cloud best practices for running tightly coupled HPC applications on Compute Engine: https://cloud.google.com/architecture/best-practices-for-using-mpi-on-compute-engine
- Compute Engine – 100Gbps networking for C2 instances: https://cloud.google.com/blog/products/networking/increasing-bandwidth-to-c2-and-n2-vms