NERSCPowering Scientific Discovery Since 1974

Cori Intel Xeon Phi (KNL) Nodes

The second phase of the Cori system was installed in the second half of 2016 and consists of 9,668 compute nodes based on  the second generation of the Intel®  Xeon Phi™ product family, part of the Intel® Many Integrated Core (MIC) Architecture; the code name for this architecture is "Knights Landing" ("KNL").  The system has a sustained performance that is at least ten times that of the NERSC-6 "Hopper" system, based on a set of characteristic benchmarks. Some important characteristics of the system include:

    • The version of Knights Landing on Cori is a self-hosted architecture, not a co-processor, not an accelerator.  "Self-hosted" means that it is a standalone bootable processor (running host OS).
    • Next-generation Intel® Xeon Phi product with improved single thread performance targeted for highly parallel computing. KNL has over 8 billion transistors per die based on Intel’s 14-nanometer manufacturing technology.
    • Intel® "Silvermont" architecture enhanced for high performance computing; features 2X the out-of-order buffer depth of current Silvermont, gather/scatter in hardware, Advanced Branch Prediction, 32KB Icache and Dcache, 2 x 64B load ports in Dcache, and 46/48 physical/virtual address bits to match Xeon™.
    • 9,668 single-socket compute nodes in the system.
    • Each node contains an Intel® Xeon Phi™ Processor 7250 @ 1.40GHz.
    • 68 cores per node with support for 4 hardware threads each (272 threads total).
    • Better performance per watt than previous generation Xeon Phi systems and 3X single-thread performance.
    • AVX-512 vector pipelines with a hardware vector length of 512 bits (eight double-precision elements).
    • 96 GB DDR4 2400 MHz memory per node using six 16GB DIMMs (115.2 GB/s peak bandwidth). The total aggregate memory (combined with MCDRAM) is 1 PB.
    • 16 GB of on-package, high-bandwidth memory with bandwidth projected to be 5X the bandwidth of DDR4 DRAM memory, (>460 GB/sec); over 5x energy efficiency vs. GDDR52; over 3x density vs. GDDR52. This multi-channel DRAM (MCDRAM) memory will have flexible memory modes, including "cache mode" (effectively an L3 cache), "flat mode" (a unique NUMA domain, separate from DDR4), and a hybrid of the two. More information about MCDRAM on Xeon Phi is available here.
    • Cray Aries high speed "dragonfly" topology interconnect, cabinets, and cooling (same as in Edison).
    • Processor cores connected in a 2D mesh network with 2 cores per tile, with a 1 MB cache-coherent L2 cache shared between 2 cores in a tile, with two vector processing units per core.
    • Multiple NUMA domains available per socket.
    • Supports MPI + OpenMP programming model.
    • Intel, Cray, and GNU programming environments.
    • The I/O subsystem (Lustre and DataWarp) will be the same as Phase I

Please see Cori System Configuration for more information.  

Code modifications for the Intel Xeon Phi

A key characteristic of Cori is its energy efficient "manycore" Intel Xeon Phi (Knights Landing, or "KNL") processor architecture.  While the Phi should be able to run many applications unmodified, it is expected that most applications will require code changes to achieve good performance.  To help the community prepare for Cori, NERSC launched the  NERSC Exascale Science Application Program (NESAP) program in the fall of 2014 to closely partner with selected application, library and tools teams.   Lessons learned through the NESAP program are being disseminated to all NERSC users at meetings, through training classes, and on the NERSC web site.  

At a high level there are three key aspects to achieving code performance on KNL:

    1. Using fine-grained parallelism to exploit the 68 cores per node
    2. Taking advantage of the 512-bit vector units on KNL
    3. Structuring your code to maximize memory access from KNL's 16 GB of onboard MCDRAM memory

Please see Application Porting and Performance for details.