Scalable System Improvement (SSI): An Application Performance Benchmarking Metric for HPC
Scalable System Improvement (SSI) provides a means to measure relative application performance between two high-performance computing (HPC) platforms. In defining SSI, it was desired to have a single metric to measure performance improvement for a wide variety of application and platform characteristics, for example capability, throughput, strong scaling, weak scaling, system size, etc. It is also desirable to provide parameters that allow architecture teams and benchmark analysts to define the workload characteristics and to weight benchmarks independently, a desirable characteristic in procurements that represent more than one organization and/or varied workloads.
Given two platforms using one as a reference, SSI is defined as a weighted geometric mean using the following equation.
- M - total number of applications,
- c - capability scaling factor,
- U - utilization factor = (n_ref / n) x (N / N_ref),
- n is the total number of nodes used for the application,
- N is the total number of nodes in the respective platform,
- ref refers to the reference system,
- S - application speedup = (t_ref / t) or (FOM / FOM_ref),
- S must be >= 1.0,
- w - weighting factor.
The capability factor allows the design team to define weak scaled problems. For example, if for a given application the problem size (or some other metric of complexity) is four times larger than the problem run on the reference system c_i would be 4 for that application.
The utilization factor is the ratio of the platform utilizations used in obtaining the reported time or figure of merit (FOM). The utilization factor rewards using fewer nodes (n) to achieve a given speedup, and it also rewards providing more nodes in aggregate (N).
Speedup is calculated using an application specific figure of merit. Nominally, speedup is defined as the ratio of the execution times. Some applications define a different FOM such as: a dimensionless number, time per iteration for a key code segment, grind time, floating-point operations per second, etc. Speedup rewards a faster time, or a higher FOM.
A necessary condition of the SSI calculation is that speedup must be >= 1.0. The reason for this condition is a user expects a turn-around time to be at least the same as on a previous generation machine. In addition, one could run a given benchmark on an unreasonably small number of nodes on the target system in order to minimize node-hours (and avoid scaling effects for example) and hence increase SSI.
The weighting factor allows an architecture team or benchmark analyst to weight some applications heavier than others. If all applications have equal weight, the weighted geometric mean is equivalent to the geometric mean.
Analyzing the SSI calculation, it can be observed that SSI is maximized by minimizing (n x t) or (n / FOM).
SSI is best illustrated with an example. This example uses data obtained from a workshop publication comparing NERSC’s Hopper (Cray XE6) and Edison (Cray XC30) platforms. Application names, nodes counts and timing are summarized in the following table.
|Hopper (6,384 node)||Edison (5,576 nodes)|
|# Nodes||Time (sec)||# Nodes||Time (sec)|
The weighted geometric mean can be easily calculated in a spreadsheet using the following form.
Where: x = cUS.
While the original study was a strong scaling analysis, for illustrative purposes we’re going to assume that the UMT and MiniFE benchmarks were run at four times the problem size on Edison and hence c=4. The weights are assigned arbitrarily, again for illustrative purposes.
Appendix: Which Mean to Use
There are a few excellent references on which Pythagorean mean to use when benchmarking systems.[2,3] Fleming states that the arithmetic mean should NOT be used to average normalized numbers and to use the geometric mean instead. Smith summarizes that “If performance is to be normalized with respect to a specific machine, an aggregate performance measure such as total time or harmonic mean rate should be calculated before any normalizing is done. That is, benchmarks should not be individually normalized first.” However, the SSI metric normalizes each benchmark first and then calculates the geometric mean for the following reasons.
- The geometric mean is best when comparing different figures of merit. One might think that the use of speedup is a single FOM, but for SSI each application’s FOM is independent. Hence we cannot add results together to calculate total time, nor total work, nor total rate as is recommended by Smith and as would be needed for correctness in the arithmetic and harmonic means.
- The geometric mean normalizes the ranges being averaged so that no single application result dominates the resultant mean. The central tendency of the geometric mean emphasizes this more in that it is always less than or equal to the arithmetic mean.
- The geometric mean is the only mean which has the property the geometric mean of (Xi/Yi) = geometric mean of (Xi) / geometric mean of (Yi), and hence has the property that the resultant ranking is independent of which platform is used for normalization when calculating speedup.
- Cordery, M.J.; B. Austin, H. J. Wasserman, C. S. Daley, N. J. Wright, S. D. Hammond, D. Doerfler, "Analysis of Cray XC30 Performance using Trinity-NERSC-8 benchmarks and comparison with Cray XE6 and IBM BG/Q", PMBS2013: Sixth International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, November 11, 2013.
- Fleming, Philip J.; John J. Wallace, "How not to lie with statistics: the correct way to summarize benchmark results". Communications of the ACM 29 (3): 218–221, 1986.
- Smith, James E., "Characterizing computer performance with a single number". Communications of the ACM 31 (10): 1202–1206, 1988.