NERSCPowering Scientific Discovery Since 1974

Performance and Optimization

Note: some of the performance tips and recommendations provided in this page were based on the Edison Phase I test results. While we do not expect major chagnes in our recommendations, we will sill update this webpage when we have new data for Edison Phase II.

Performance comparison between Edison and Hopper

Edison used the same benchmark suite as Hopper, the NERSC-6 benchmark suite, to measure system performance throughout its procurement process. Instead of peak flops, NERSC uses the sustained system performance (SSP) to measure the system computational capability. Edison is 2-3 times faster than Hopper with the seven applications used for SSP. Read More »

Compiler Comparisons

Using a set of benchmarks described below, different optimization options for the different compilers on Edison are compared.  The compilers are also compared against one another on the… Read More »

Math Library Performance

Fully optimizing a given application’s performance often requires a deep understanding of the source, an accurate profile for a representative run, and the ability to have changes to the source accepted upstream. However, in many cases, significant performance gains can be achieved by simply optimizing the code over the matrix of possible compilers, compiler options and libraries available on a given machine. Here, we explore the performance variability of common materials science… Read More »

Core Specialization

Core Specialization (CS) is a feature of the Cray operating system that allows the user to reserve one or more cores per node for handling system services, and thus reduce the effects of timing jitter due to interruptions from the operating system at the expense of (possibly) requiring more nodes to run an application. The specialized cores may also be used in conjunction with Cray's MPI asynchronous progress engine [1] to improve the overlap of communication and computation for applications… Read More »


Edison includes Intel processors with Hyper-Threading Technology. When Hyper-Threading (HT) is enabled, the operating system recognizes each physical core as two logical cores. Each of the two logical cores has resources to store a program state, but they share most of their execution resources. Thus, two independent streams (i.e., processes or threads) can run simultaneously on the same physical core, but at roughly half the speed of a single stream. If a stream running on one of the logical… Read More »

DLFM library tools for large scale dynamic applications

DLFM is no longer actively supported on NERSC hardware.  Users looking to scale up Python and other dynamic applications and avoid the start up overhead should instead user Shifter. Large scale Python and other dynamic applications may spend huge time at startup. The DLFM library, developed by Mike Davis at Cray, Inc., is a set of functions that can be incorporated into a dynamically-linked application to provide improved performance during the loading of dynamic libraries when running the… Read More »