Updates and Status
Below are some recent Cori Status and Updates. Please also refer to Cori Timeline for more history information.
Apr 18-21, 2017: Two new Haswell cabinets (384 compute nodes) added to the system. The new total number of Haswell nodes becomes 2,388. The total number of all nodes (2,388 Haswell plus 9,688 KNL) on Cori becomes 12,076.
Mar 22-24, 2017: Cori OS upgraded from CLE6.0UP01 to CLE6.pUP03. As part of this upgrade, the craype-hugepages module file was updated. Applications built against the old modules are likely to see decreased performance. We recommend recompiling all KNL applications, especially those using hugepages.
Mar 16, 2017: In order to work around an Intel compiler bug in versions 220.127.116.11 and 18.104.22.168, the default setting of Fortran buffered IO is turned off, i.e.,(the environment variable of FORT_BUFFERED=1 has been removed. More details are at: https://www.nersc.gov/users/computational-systems/edison/updates-and-status/open-issues/fortran-buffered-io-with-intel-compilers-is-no-longer-enabled-by-default-on-edison/
Mar 3, 2017: Cori system returned with 2 new KNL cabinets (384 compute nodes) being added. The new total number of KNL nodes becomes 9,688. Currently, there are 3400 nodes allowed for cluster mode reboot, and the rest are quad cache nodes.
Mar 1, 2017: Non-NESAP users can apply for Gaining Full Access to the KNL System
Jan 23, 2017: All NERSC users are now also eligible to use the KNL nodes of Cori via the regular partition, in which users can run jobs of up to 512 nodes for up to 2 hours.
Jan 15, 2017: We have temporarily disabled KNL mode reboot into a different cluster and memory mode due to potential system crash it could cause. Currently, there are 1200 quad flat nodes available, and the rest are quad cache nodes. We are investigating this problem, and expect that the earliest time rebooting node may be reenabled is during the Feb 6 system maintenance.
Dec 21, 2016: All NERSC users are now eligible to use the KNL nodes of Cori via the debug partition, in which users can run jobs of up to 512 nodes for up to 30 minutes.
Oct 31, 2016: The Cori Haswell nodes were returned to production for all NERSC users The system had been unavailable to users since Sep. 19 while 9,300 Xeon Phi nodes were added to the system.
Please note that the option passed to the job launcher (srun -c) for requesting optimal process and thread affinity on Haswell changed on Oct. 31, 2016. You must also now use the -C option to request the node type. Please see details of these changes here.
Gaining Full Access to the KNL System
Because the KNL architecture is so different from what most NERSC users have seen, it is important that users understand their codes' performance before gaining full access to the KNL nodes. NERSC has developed a gating and application procedure for allowing new users onto the KNL nodes.
Running on the KNL Nodes
To use the KNL nodes, please compile (or recompile) your executables targeting the KNL architecture. The simplest way to do this is to do "module swap craype-haswell craype-mic-knl" before compiling in the usual way.
The KNL architecture is complex but affords a great deal of flexibility for experienced users. There are multiple memory modes, and changing from one to another requires a time-consuming reboot of the node. Therefore we recommend that users begin their experience on the KNL nodes in quad cache mode (NERSC's default memory mode). This mode will work well for the majority of use cases.
Software Packages for KNL
Many NERSC-provided applications are not yet available for KNL. At this time we are supporting only a limited number of applications to run on KNL. Most Cray-provided libraries support the KNL architecture, and some NERSC-provided libraries have been built to support KNL as well. We do not plan to compile more libraries for KNL until next year.