Compiling Codes on Cori
Cray provides a convenient set of wrapper commands that should be used in almost all cases for compiling and linking parallel programs. Invoking the wrappers will automatically link codes with MPI libraries and other Cray system software. All MPI and Cray system include directories are also transparently imported. In addition the wrappers append the compiler's target processor arguments for the Cori compute node processors.
NOTE: The intention is that programs are compiled on the login nodes and executed on the compute nodes. Because the compute nodes and login nodes have different operating systems, binaries created for compute nodes may not run on the login node. The wrappers mentioned above guarantee that codes compiled using the wrappers are prepared for running on the compute nodes.
Also there are two type of compute nodes: Haswell and KNL. While binaries built for Haswell do run on KNL (not vice versa), it is necessary to build for KNL explicitly in order to exploit the new AVX512 architecture. Please see below for more information on how to compile for KNL compute nodes.
For Fortran source code use ftn
% ftn -o example.x example.f90
For C source code use cc
% cc -o example.x example.c
For C++ source code use CC
% CC -o example.x example.C
All compilers on Cori (Intel, Cray, and GNU), are provided via three programming environments that are accessed via the module utility. Each programming environment contains the full set of compatible compilers and libraries. To change from one compiler to the other you change the programming environment via the 'module swap' command.
Intel Compilers (C/C++/Fortran)
NERSC has installed the high performance compilers produced by Intel on Cori. Intel compiler is the default compiler on Cori.
Based on our experience with several benchmarks described below, NERSC recommends the default level of optimization for most codes, i.e. no optimization arguments to the compiler.
Use the flag -qopenmp to compile for OpenMP
% ftn -qopenmp MyCode.F90
The Intel compilers support the OpenMP 4.0 standard.
For the full list of compiler options type man ifort, man icc, and/or man icpc. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
For further information about the Intel compilers go to Intel Compiler Documentation.
The current default version of the Intel compilers is loaded automatically. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail intel. To use a different version type module swap intel intel/<new_version>.
Cray Compilers (C/C++/UPC/Fortran)
Cray produces compilers for its own systems.
To use the Cray Compiling Environment (CCE), swap to the Cray programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling.
% module swap PrgEnv-intel PrgEnv-cray
Recommended Compiler Flags
Based on the performance of several benchmarks, NERSC recommends the default level of optimization for most codes, i.e. no optimization arguments.
Cray benchmarkers have this advice for tuning with the Cray compilers:
- Use default optimization levels
- It’s the equivalent of most other compilers -O3 or -fast
- Using -O3,fp3 (or -O3 -hfp3, or some variation):
- -O3 only gives you slightly more than -O2
- -hfp3 gives you a lot more floating point optimization, esp. 32-bit
- If an application is intolerant of floating point reassociation, try a lower -hfp number. Try -hfp1 first, only -hfp0 if absolutely necessary.
- Might be needed for tests that require strict IEEE conformance
- Or applications that have ‘validated’ results from a different compiler
- In general, avoid using -Oipa5, -Oaggress, and so on; higher numbers are not always correlated with better performance
- Compiler feedback: -rm (Fortran) ‐hlist=m (C)
- If you know you don’t want OpenMP: -h noomp, ‐x omp or ‐O thread0
OpenMP flags are on by default with the Cray compilers. No extra flags are needed.
% ftn MyCode.F90
To disable OpenMP use -h noomp (only needed if your code contains OpenMP directives).
The Cray compilers support the OpenMP 4.0 standard.
For further information refer to the Cray Fortran Reference Manual and Cray C and C++ Reference Manual from CrayDoc. Go to CrayDoc and search for CCE.
The current default version of the Cray compilers is loaded automatically when you swap to the PrgEnv-cray module file. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail cce. To use a different version type module swap cce cce/<new_version>.
UPC and CoArray Fortran
The Cray compilers support UPC and CoArray Fortran.
GNU Compilers (C/C++/Fortran)
NERSC has installed MPI versions of the GNU compilers on Cori.
To use the GNU compilers, swap to the GNU programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling. Note: do not use /usr/bin/gcc, since that is usually a much older version of gcc than the one available with the PrgEnv-gnu module.
% module swap PrgEnv-intel PrgEnv-gnu
Based on our experience with several benchmarks described below, NERSC recommends the following compiler flag for most codes: -Ofast.
In addition Cray benchmarkers make the following suggestions for tuning the GNU compiler:
- -O3 -ffast-math -funroll-loops
- Compiler feedback: -ftree-vectorizer-verbose=2
ftn -fopenmp MyCode.F90
Version 4.9 and later of the GNU compilers support OpenMP 4.0 standard.
For the full list of compiler options type man gfortran, man gcc, man g++ or man gCC. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
The current default version of the GNU compilers is loaded automatically when you swap to the PrgEnv-gnu module file. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail gcc. To use a different version type module swap gcc gcc/<new_version>.
For further documentation about these compilers see GNU Compiler Manuals.
The default environment on Cori currently has the "craype-haswell" module loaded as default. As a result, executables built with the Cray compiler wrappers as above (ftn, cc, CC) are targeted to run on the Haswell compute nodes.
To build for KNL compute nodes, do the following module swap before compiling:
% module swap craype-haswell craype-mic-knl
Alternatively, under the craype-haswell environment, one can explicitly add a compiler flag to build executables target to run on KNL:
- "-xMIC-AVX512" for Intel compilers. There is also a compiler flag "-axMIC-AVX512,AVX" for Intel compilers that will build a fat binary for both Haswell and KNL, and at run time will use the binary for the correct architecture.
- "-hcpu=mic-knl" for Cray compilers
- "-march=knl" for GNU compilers
Building on the KNL compute nodes does not work on Cori with NERSC custom environment. Users can build applications on Haswell login or compute nodes using cross-compilation. For example, with autoconf, the easiest method is:
% module load craype-haswell
% ./configure CC=cc FTN=ftn CXX=CC ...
% module swap craype-haswell craype-mic-knl
If not do the above "module swap cray-haswell craype-mic-knl" step explicitly, adding an explicit "-xMIC-AVX512" flag for the compiler wrapper to build for the KNL target is OK too.
Documentation on cross compilation (including cmake) can be found at http://docs.cray.com/books/S-2801-1608//S-2801-1608.pdf
Warnings when linking with MKL
When linking with "-mkl" on Cori, you may see a warning like:
mkl_memory.c:(.text+0x5a9): warning: Using 'dlopen' in statically linked
applications requires at runtime the shared libraries from the glibc
version used for linking
This is harmless as long as the executable is only used on Cori - either Haswell or KNL nodes, since all nodes on Cori use the same OS version.
When linking a dynamic executable (which is not the default behavior of the Cray compiler wrappers) and using MKL, you will need to add the following to your link line:
Without this, there is a high probability that your application will crash, because some KNL-optimized routines in MKL use libmemkind to make use of the MCDRAM. Although the memkind and jemalloc libraries are built into the MKL static libraries, they are not present in the shared-object versions. This explicit link is a temporary fix, a future version of MKL will no longer need it.
Fortran Standards Compliance
The level of compliance of each compiler with the Fortran 2003 standards can be found at Fortran 2003 compliance. Among other things, this standard implements object-oriented programming in Fortran.
The level of compliance of each compiler with the Fortran 2008 standards can be found at Fortran 2008 compliance. Among other things, this standard implements Fortran coarrays.
Compiler Flag Comparison
Below is a comparison of the most important and useful compiler flags.
|(default)||(default)||-O3 -ffast-math||Produce high level of optimization|
|-Oomp (default)||-qopenmp||-fopenmp||Activate OpenMP directives and pragmas in the code|
|-h byteswapio||-convert big_endian||-fconvert=swap||Read and write Fortran unformatted data files as big-endian|
|-f fixed||-fixed||-ffixed-form||Process Fortran source using fixed form specifications.|
|-f free||-free||-ffree-form||Process Fortran source using free form specifications.|
|-V||--version||--version||Show version number of the compiler.|
|-h zero||not implemented||-finit-local-zero||Zero fill all uninitialized variables.|
|-e m||Creates .mod files to hold Fortran90 module information for future compiles|
|-J dir_name||Specifies the directory to which file.mod files are written when the -e m option is specified|
Compiling with Intel MPI
Intel MPI is supported on Cori. Codes built with Intel MPI can be launched with mpirun in the SLURM batch script to to run on the compute nodes. While building codes with Intel is possible, the use of Cray MPICH and the compiler wrappers is the recommend way to use MPI on Cray systems. To compile with Intel MPI:
% module load impi
% mpiicc -openmp -o mycode.exe mycode.c
In this case, users should compile an application without the Cray parallel wrappers and instead use the Intel MPI compiler wrapper directly, i.e., mpiifort, mpiicc, mpiicpc, etc.
A sample script for Running Executables built with Intel MPI can be found here.
Compiling for a login node
All parallel applications must be run on compute nodes. Sometimes, however, a user wants to run a very small executable on the login nodes. This is permitted as long as the application does not impede the performance of the login node for other users. In this case, users should compile an application without the Cray parallel wrappers and instead use a compiler directly, such as gcc, g++, icc, icpc, etc.