There are three compiler suites available on Carver: Portland Group (PGI), Intel, and GCC. The PGI compilers are the default, to provide compatibility with other NERSC platforms. Because Carver uses Intel processors, some benchmarks have shown better performance when compiled with the Intel compilers. The GCC compilers are available primarily to facilitate building open-source tools, although they can also be used for scientific applications.
The only supported MPI implementation on Carver is Open MPI, which is descended from LAM. In particular, note that Open MPI is not part of the MPICH family of MPI implementations.
For each supported compiler suite, NERSC provides a version of Open MPI that is compatible with that compiler. The default is PGI. In order to use other compilers, it is necessary to swap both the compiler module and the MPI module. For example, to use the Intel compilers with Open MPI:
carver% module swap pgi intel
carver% module swap openmpi openmpi-intel
To use the GCC compilers with Open MPI:
carver% module swap pgi gcc
carver% module swap openmpi openmpi-gcc
The above swap commands may be required in your batch scripts as well, if you plan to submit calculations in future sessions based on executables compiled with intel or gcc.
Compiler "wrappers" provided by Open MPI supply the correct compiler and linker flags for MPI applications. When compiling non-MPI programs (that is, either serial or shared-memory parallel applications), the "native" compilers may be used directly.
|Language||Open MPI||Native PGI||Native Intel||Native GCC|
|Fortran||mpif77, mpif90||pgf77, pgf90, pghpf||ifort||gfortran|
|C++||mpiCC, mpic++, mpicxx||pgCC||icpc||g++, c++|
carver% mpif90 -fast -o example.x example.f90
carver% pgf90 -fast -mp -o example.x example.f90
Open MPI defines a single option for the compiler wrappers:
carver% mpif90 -showme ...
The "showme" option shows the command line that would be executed, without actually invoking the underlying compiler.
All remaining compiler options depend on the underlying native compiler; complete deccriptions are available via the "man" command. Some common options are summarized below.
|-fast||-O3||-O3||Produce high level of optimization|
|-mp||-openmp||-fopenmp||Activate OpenMP directives and pragmas in the code|
|-byteswapio||-convert big_endian||-fconvert=swap||Read and write Fortran unformatted data files as big-endian|
|-Mfixed||-fixed||-ffixed-form||Process Fortran source using fixed form specifications.|
|-Mfree||-free||-ffree-form||Process Fortran source using free form specifications.|
|-V||-V||--version||Show version number of the compiler.|
|not implemented||-zero||-finit-local-zero||Zero fill all uninitialized variables.|
|-mcmodel=medium||-mcmodel=medium||-mcmodel=medium||Allow data sections greate than 2GB|
Based on vendor recommendations and our own experiences with these compilers, we recommend these options to generate fast executables:
Intel: the compiler's default options, i.e. no explicit optimization options, gives a very high level of optimization
GCC: -O3 -ffast-math
Actual benchmark results will be shown in the next section.
There are three compilers available to users on Carver: PGI (the default), Intel, and the gnu family of compilers. The fact that the PGI compiler is the default is not a recommendation of that compiler. As we show below, this compiler actual produces slower code on the average than the other two compilers.
For compiles of MPI codes the compiler wrappers, mpif90, mpicc, and mpiCC should be used instead of the actual name of the compiler in order that the mpi header files libraries be included with the compile. If the Intel or gnu compilers are used, you should always swap the compiler module for the pgi module so that you get the proper version of the compiler as shown above.
We ran several benchmarks to determine the best optimization arguments for each compiler and the best compiler for each benchmark. These benchmarks are described at
Intel Compiler Option Comparisons
The following Intel optimization options will be compared:
default (no optimization flags) - By default the Intel compiler has a high level of optimization. It is comparable to the -O2 optimization level.
-O2 - This "enables optimizations for speed", and is the recommended option for codes in the online man page.
-O3 - This performs all of the -O2 options as well as additional more aggressive loop transformations.
-O3 -unroll-aggressive -opt-prefetch - This was recommended to us by benchmarkers as being a good supplement to the -O3 optimizations.
-fast - This "maximizes speed across the entire program". It is a very high level of optimization, much more aggressive than that provided by the pgi "-fast" option, and includes interprocedural optimizations across files. It increases compilation time significantly, and occasionally compiles will fail with this option which succeed with the other options, probably due to the greater processor and memory requirements. Note: The -fast option does not work for MPI codes on Carver, since some of the optimizations it performs are incompatible with the Carver OpenMPI configuration.
GNU Compiler Option Comparisons
The following gnu optimization options will be compared:
-O3 - This compiles with a high level of optimization.
-O3 -ffast-math - This performs optimizations at the expense of an exact implementation of IEEE or ISO rules/specifications for math functions.
-O3 -funroll-loops - This unrolls loops whose number of iterations can be determined at compile time or upon entry to the loop. It also turns on complete loop peeling (i.e. complete removal of loops with a small constant number of iterations). This option makes code larger, and may or may not make it run faster.
-O3 -ffast-math -funroll-loops
PGI Compiler Option Comparisons
The following PGI optimization options will be compared:
-fast - A level of optimization which chooses generally optimal flags for the target platform.
-fast -Mipa=fast - Enables interprocedural analysis and chooses generally optimal interprocedural options for the target platform.
-fast -Mfprelaxed - Generates relaxed precision code for those floating point operations that generate a significant performance improvement, depending on the target processor.
-fast -Mipa=fast -Mfprelaxed
In this section, for each benchmark, the best results for each compiler with the NERSC recommended optimization arguments are compared against each other.
The results are normalized against the PGI compiler.
For 2 out of the 11 benchmarks PGI produced the fastest times, for 8 out of the 11 Intel produced the fastest times, and for 2 out of the 11 the Gnu Compiler produced the fastest times.
On the average, the Intel compiled codes run over 10% faster than those compiled with the other compilers.