Cori Configuration

NERSC's newest supercomputer, named Cori, includes the Haswell partition (Phase I) and the KNL partition (Phase II).  The Haswell partition has a (theoretical) peak performance of 1.92 petaflops/sec, 2,004 compte nodes (64,128 compute cores) for running scientific applications, 203 terabytes of memory. The KNL partition has a (theoretical) peak performance of 27.9 petaflops/sec, 9,688 compute nodes (658,784 cores in total), and 1 PB of memory. The Cori system also has a 28 petabytes of online disk storage with a peak I/O bandwidth of > 700 gigabytes (GB) per second, and 1.8 petabytes (PB) of SSDs in the Burst Buffer, with >1.7TB/s aggregate I/O bandwidth.

Cori Phase I and II system configuration details are given below. 

System Overview 

  • Cray XC40 supercomputer
  • Theoretical peak performance: Haswell: 2.29 PFlops/sec; KNL: 29.1 PFlops/sec.
  • Sustained application performance on NERSC SSP codes: Haswell: 98.9 TFlop/s (vs. 129 TFlop/s for Edison and 52.1 TFlop/s for Hopper); KNL: 562.31 TFlop/s.
  • Total compute nodes: Haswell: 2,388 computes nodes, 64,128 cores in total (32 cores per node); KNL: 9,688 compute nodes, 658,784 cores in total (68 cores per node).
  • Cray Aries high-speed interconnect with Dragonfly topology as on Edison (0.25 μs to 3.7 μs MPI latency, ~8GB/sec MPI bandwidth) 
  • Aggregate memory: Haswell partition: 242 TB; KNL partition: 1 PB. 
  • Scratch storage capacity: 30 PB
  • Burst Buffer capacity: 1.8 PB

System Details 

Haswell Cabinets 12 Each cabinet has 3 chassis; each chassis has 16 compute blades, each compute blade has 4 dual socket nodes
Haswell Compute nodes  2,388 Each node has two sockets, each socket is populated with a 16-core Intel® Xeon™ Processor E5-2698 v3 ("Haswell") at 2.3 GHz
    32 cores per node
    Each core supports 2 hyper-threads, and has 2 256-bit-wide vector units
    36.8 Gflops/core; 1.2 TFlops/node; 2.29 PFlops total (theoretical peak)
    Each node has 128 GB DDR4 2133 MHz memory (four 16 GB DIMMs per socket); 203 TB total aggregate memory.
    Each core has its own L1 and L2 caches, with 64 KB (32 KB instruction cache, 32 KB data) and 256 KB, respectively; there is also a 40-MB shared L3 cache per socket
KNL Cabinets 52 Each cabinet has 3 chassis; each chassis has 16 compute blades, each compute blade has 4 nodes
KNL Compute nodes 9,688 Each node is a single-socket Intel® Xeon Phi™ Processor 7250 ("Knights Landing") processor with 68 cores per node @ 1.4 GHz
    Each core has two 512-bit-wide vector processing units. Each core has 4 hardware threads (272 threads total). Two cores form a tile. 
    44 GFlops/core; 3 TFlops/node; 29.1 PFlops total (theoretical peak)
    Each node has 96 GB DDR4 2400 MHz memory, six 16 GB DIMMs (102 GB/s peak bandwidth). Total aggregate memory (combined with MCDRAM) is 1 PB.
    Each node has 16 GB MCDRAM (multi-channel DRAM), > 460 GB/s peak bandwidth
    Each core has its own L1 caches, with 64 KB (32 KB instruction cache, 32 KB data). Each tile (2 cores) shares a 1MB L2 cache. 
Interconnect   Cray Aries with Dragonfly topology with 5.625 TB/s global bandwidth (Phase I).  45.0 TB/s global peak bisection bandwidth (Phase II).
Login nodes  12 Dual socket (16 cores per socket, 32 total cores), 2.3 GHz Intel® Xeon™ Processor E5-2698 v3 ("Haswell") with 512 GB memory.
MOM nodes  -- Unlike Edison there are no dedicated MOM nodes on Cori
Shared Root Server Nodes 16  
Lustre Router nodes 130  
DVS Server Nodes 32  
RSIP nodes 10  
Scratch storage system   Cray Sonexion 2000 Lustre appliance. Scratch storage maximum aggregate bandwidth: > 700 GB/sec
Burst Buffer   Cray DataWarp system consisting of 288 Burst Buffer nodes, with maximum aggregate I/O >1.7 TB/s, and >28M IOP/s


CategorySoftware NameDescription
Operating System CNL on compute nodes Compute nodes run a lightweight kernel and run-time environment based on the SuSE Linux Enterprise Server (SLES) Linux distribution.
 Full SUSE Linux on Login nodes External login nodes run a standard SLES distribution similar to the internal service nodes. 
Batch System SLURM  





















Processor Frequency on the Cori Data Partition

The Haswell processors in Cori's data partition have a "Turbo Boost" feature to dynamically adjust CPU frequency and achieve the maximum possible performance. When Turbo Boost is enabled, the processor operates at the maximum frequency allowed by the available power and thermal limits. Further, on Cori (unlike Edison), each core can operate at a different frequency. The instantaneous turbo frequency could be above or below the nominal 2.3 GHz frequency depending on the number of active cores… Read More »

KNL Processor Modes

The Xeon-Phi "Knights-Landing" 7250 processors in Cori have 68 CPU cores where are organized into 34 "tiles" (each tile comprising two CPU cores and a shared 1MB L2 cache) which are placed in a 2D mesh, connected via an on-chip interconnect as shown in the following figure: As shown in the figure, the KNL processor has 6 DDR channels, with controllers to the right and left of the mesh 8 MCDRAM channels, with controllers spread across 4 "corners" of the mesh.  NUMA on KNL NUMA stands for… Read More »