Performance & Monitoring Tools
In order for parallel computers to attain their best performance and utilization, careful attention is needed in monitoring workloads and system resources.
There are many stakeholders in the performance of application codes. In order to provide NERSC users and center staff with a consistent, portable, and scalable view into performance, NERSC uses IPM (Integrated Performance Monitoring), a lightweight application profiling layer that captures summaries of computation, communication, and IO. The performance profiles that are generated are available to users, consultants, and inform center decisions about matching HPC architectires to NERSC applications. An example IPM profile is available here. Since 2004 NERSC has captured over 300,000 aplication profiles across many architectures, providing a basis for aiding in HPC R&D.
HPC Filesystems Monitoring
Power, Cooling, and Facilities
It is now generally recognized in the high performance computing community that peak performance does not adequately predict the usefulness of a system for a given set of applications. One of the first benchmarks designed to measure system performance in a real-world operational environment was NERSC's Effective System Performance (ESP) test. NERSC introduced ESP in 1999 with the hope that this test would be of use to system managers and would help to spur the community (both researchers and vendors) to improve system efficiency.
The overriding question for an HEC procurement team is: "What performance will this system deliver to our workload?" A procurement team needs a normalizing metric that tells them how efficiently the architecture can execute their typical workload given the proposed system's peak performance so that different systems can be compared on an equal footing. The Sustained System Performance metric, along with quantifying the impact of Effective System Performance and variability, provides an excellent approximation of how well high performance computers will serve the scientific community. On-going use of benchmarks ensures that all effects on performance are reflected in the actual measured SSP value.