NERSC's newest supercomputer, named Cori, currently has Phase I (the Haswell partition) installed. Cori Phase I has a (theoretical) peak performance of 1.92 petaflops/sec, 52,160 compute cores for running scientific applications, 203 terabytes of memory, and 28 petabytes of online disk storage with a peak I/O bandwidth of > 700 gigabytes (GB) per second.
The cabinets of the Cori Phase II system (the Knights Landing partition) arrived in July 2016, and integration with the Phase I system began in September 2016. A brief introduction of the Cori Phase II architecture can be found here. The detailed Cori upgrade and installation schedule can be found here.
Cori Phase I and II system configuration details are given below.
- Cray XC40 supercomputer
- Theoretical peak performance: Phase I Haswell: 1.92 PFlops/sec; Phase II KNL: 27.9 PFlops/sec.
- Sustained application performance on NERSC SSP codes: Phase I Haswell: 83 TFlop/s (vs. 129 TFlop/s for Edison and 52.1 TFlop/s for Hopper); Phase II KNL: TBA.
- Total compute nodes: Phase I Haswell: 2,004 computes nodes, 52,160 cores in total (32 cores per node); Phase II KNL: 9,304 compute nodes, 632,672 cores in total (68 cores per node).
- Cray Aries high-speed interconnect with Dragonfly topology as on Edison (0.25 μs to 3.7 μs MPI latency, ~8GB/sec MPI bandwidth)
- Aggregate memory: Phase I Haswell partition: 203 TB; Phase II KNL partition: 1 PB.
- Scratch storage capacity: 30 PB
|Haswell Cabinets||12||Each cabinet has 3 chassis; each chassis has 16 compute blades, each compute blade has 4 dual socket nodes|
|Haswell Compute nodes||2,004||Each node has two sockets, each socket is populated with a 16-core Intel® Xeon™ Processor E5-2698 v3 ("Haswell") at 2.3 GHz|
|32 cores per node|
|Each core supports 2 hyper-threads, and has 2 256-bit-wide vector units|
|36.8 Gflops/core; 1.2 TFlops/node; 1.92 PFlops total (theoretical peak)|
|Each node has 128 GB DDR4 2133 MHz memory (four 16 GB DIMMs per socket); 203 TB total aggregate memory.|
|Each core has its own L1 and L2 caches, with 64 KB (32 KB instruction cache, 32 KB data) and 256 KB, respectively; there is also a 40-MB shared L3 cache per socket|
|KNL Cabinets||52||Each cabinet has 3 chassis; each chassis has 16 compute blades, each compute blade has 4 nodes|
|KNL Compute nodes||9304||Each node is a single-socket Intel® Xeon Phi™ Processor 7250 ("Knights Landing") processor with 68 cores per node @ 1.4 GHz|
|Each core has two 512-bit-wide vector processing units. Each core has 4 hardware threads (272 threads total). Two cores form a tile.|
|44 GFlops/core; 3 TFlops/node; 27.9 PFlops total (theoretical peak)|
|Each node has 96 GB DDR4 2400 MHz memory, six 16 GB DIMMs (102 GB/s peak bandwidth). Total aggregate memory (combined with MCDRAM) is 1 PB.|
|Each node has 16 GB MCDRAM (multi-channel DRAM), > 460 GB/s peak bandwidth|
|Each core has its own L1 caches, with 64 KB (32 KB instruction cache, 32 KB data). Each tile (2 cores) shares a 1MB L2 cache.|
|Interconnect||Cray Aries with Dragonfly topology with 5.625 TB/s global bandwidth (Phase I). 45.0 TB/s global peak bisection bandwidth (Phase II).|
|Login nodes||12||Dual socket (16 cores per socket, 32 total cores), 2.3 GHz Intel® Xeon™ Processor E5-2698 v3 ("Haswell") with 512 GB memory.|
|MOM nodes||--||Unlike Edison there are no dedicated MOM nodes on Cori|
|Shared Root Server Nodes||16|
|Lustre Router nodes||130|
|DVS Server Nodes||32|
|Scratch storage system||Cray Sonexion 2000 Lustre appliance. Scratch storage maximum aggregate bandwidth: > 700 GB/sec|
|Operating System||CNL on compute nodes||Compute nodes run a lightweight kernel and run-time environment based on the SuSE Linux Enterprise Server (SLES) Linux distribution.|
|Full SUSE Linux on Login nodes||External login nodes run a standard SLES distribution similar to the internal service nodes.|
The Haswell processors in Cori's data partition have a "Turbo Boost" feature to dynamically adjust CPU frequency and achieve the maximum possible performance. When Turbo Boost is enabled, the processor operates at the maximum frequency allowed by the available power and thermal limits. Further, on Cori (unlike Edison), each core can operate at a different frequency. The instantaneous turbo frequency could be above or below the nominal 2.3 GHz frequency depending on the number of active cores… Read More »
The Xeon-Phi "Knights-Landing" 7250 processors in Cori have 68 CPU cores where are organized into 34 "tiles" (each tile comprising two CPU cores and a shared 1MB L2 cache) which are placed in a 2D mesh, connected via an on-chip interconnect as shown in the following figure: As shown in the figure, the KNL processor has 6 DDR channels, with controllers to the right and left of the mesh 8 MCDRAM channels, with controllers spread across 4 "corners" of the mesh. NUMA on KNL NUMA stands for… Read More »