Compiling Codes on Hopper
Cray provides a convenient set of wrapper commands that should be used in almost all cases for compiling and linking parallel programs. Invoking the wrappers will automatically link codes with MPI libraries and other Cray system software. All MPI and Cray system include directories are also transparently imported. In addition the wrappers append the compiler's target processor arguments for the hopper compute node processors.
NOTE: The intention is that programs are compiled on the login nodes and executed on the compute nodes. Because the compute nodes and login nodes have different operating systems, binaries created for compute nodes may not run on the login node. The wrappers mentioned above guarantee that codes compiled using the wrappers are prepared for running on the compute nodes.
For Fortran source code use ftn
% ftn -fast -o example.x example.f90
For C source code use cc
% cc -fast -o example.x example.c
For C++ source code use CC
% CC -fast -o example.x example.C
All compilers on Hopper, PGI, Pathscale, Cray, GNU, and Intel, are provided via five programming environments that are accessed via the module utility. Each programming environment contains the full set of compatible compilers and libraries. To change from one compiler to the other you change the programming environment via the 'module swap' command.
PGI Compilers (C/C++/Fortran)
The Portland Group compilers are the default compilers on Hopper and should be accessed via the ftn, cc and CC compiler wrappers. They are the defaults for historical reasons, not because NERSC recommends them above the other available compilers. Significant PGI compiler bugs that affect NERSC users are listed at PGI compiler bugs.
Based on our experience with benchmarks described below, NERSC recommends the following compiler flags for basic usage: -fast or -fastsse.
Cray benchmarkers also recommend trying these flags for PGI compiler for addition optimization.
- -fast -Mipa=fast(,safe)
- If you can be flexible with precision, also try -Mfprelaxed
- Compiler feedback: -Minfo=all -Mneginfo
Use the flag -mp=nonuma to compile OpenMP directives.
% ftn -mp=nonuma MyCode.F90
PGI fully supports the 3.0 OpenMP standard. They intend to add 3.1 features incrementally with upcoming releases with full support for 3.1 planned for early 2013.
Accuracy and Consistency
The PGI compiler does not provide any optimization level that will guarantee that no floating point operations will be reordered and by default it is not strictly IEEE compliant, occasionally using "slightly less accurate methods" to improve performance.
For accurate, precise code, PGI recommends turning off all optimization and compelling strict IEEE compliance with these arguments:
% ftn -O0 -Kieee MyCode.F90
For the full list of compiler options type man pgf90, man pgf95,man pgcc or man pgCC. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
For further information refer to the User's Guide, Compiler Reference Manual, and Fortran Reference from the Portland Group. Go to the Portland Group web site.
The current default version of the PGI compiler is loaded automatically for you. Older and (sometimes) newer versions of the compiler may be available. To see those versions type module avail pgi. To use a different version type module swap pgi pgi/<new_version>.
Pathscale Compilers (C/C++/Fortran)
Cray has ended its library support for the Pathscale compilers so Pathscale users are strongly urged to use another compiler. Contact the NERSC consultants (firstname.lastname@example.org) for assistance in compiler conversion.
To use the Pathscale compilers, swap to the Pathscale programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling.
% module swap PrgEnv-pgi PrgEnv-pathscale
Recommended Compiler Flags
Based on our experience with several benchmarks, NERSC recommends this optimization compiler flag for most codes: -O3.
Cray benchmarkers also make these recommendations for further tuning with the Pathscale compiler:
- -Ofast Note: this is a little looser with precision than other compilers
- Compiler feedback: -LNO:simd_verbose=ON
- For more information users can look at the man page for all Pathscale compiler options
module swap PrgEnv-pgi PrgEnv-pathscale
man eko ("Every known option")
% ftn -mp MyCode.F90
The Pathscale compilers support version 2.5 of the OpenMP standard.
For the full list of compiler options type man pathf90, man path95, man pathcc or man pathCC. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
The current default version of the Pathscale compiler is loaded automatically for you. Older and (sometimes) newer versions of the compiler may be available. To see those versions type module avail pathscale. To use a different version type module swap pathscale pathscale/<new_version>.
Cray Compilers (C/C++/UPC/Fortran)
Cray produces compilers for its own systems. See Cray compiler bugs for a list of compiler bugs affecting NERSC users.
To use the Cray Compiling Environment (CCE), swap to the Cray programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling.
% module swap PrgEnv-pgi PrgEnv-cray
Recommended Compiler Flags
Based on the performance of several benchmarks described below, NERSC recommends the default level of optimization for most codes, i.e. no optimization arguments.
Cray benchmarkers have this advice for tuning with the Cray compilers:
- Use default optimization levels
- It’s the equivalent of most other compilers -O3 or -fast
- Use -O3,fp3 (or -O3 -hfp3, or some variation)
- -O3 only gives you slightly more than -O2
- -hfp3 gives you a lot more floating point optimization, esp. 32-bit
- If an application is intolerant of floating point reassociation, try a lower -hfp number, try -hfp1 first, only -hfp0 if absolutely necessary
- Might be needed for tests that require strict IEEE conformance
- Or applications that have ‘validated’ results from a different compiler
- In general, avoid using -Oipa5, -Oaggress, and so on; higher numbers are not always correlated with better performance
- Compiler feedback: -rm (Fortran) ‐hlist=m (C)
- If you know you don’t want OpenMP: -h noomp, ‐x omp or ‐O thread0
OpenMP flags are on by default with the Cray compilers. No extra flags are needed.
% ftn MyCode.F90
To disable OpenMP use -h noomp (only needed if your code contains OpenMP directives).
The Cray compiler fully supports the OpenMP 3.0 version along with some 3.1 features.
For further information refer to the Cray Fortran Reference Manual and Cray C and C++ Reference Manual from CrayDoc. Go to CrayDoc and search for CCE.
The current default version of the Cray compilers is loaded automatically when you swap to the PrgEnv-cray modulefile. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail cce. To use a different version type module swap cce cce/<new_version>.
UPC and CoArray Fortran
The Cray compilers support UPC and CoArray Fortran. PGAS describes these languages and shows how to build and run codes using them.
Intel Compilers (C/C++/Fortran)
NERSC has installed the high performance compilers produce by Intel on Hopper. See Intel bugs for a list of Intel compilers bugs affecting NERSC users.
To use the Intel compilers on Hopper, swap to the Intel programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling. This is all you need to do to initialize the Intel programming environment.
% module swap PrgEnv-pgi PrgEnv-intel
Based on our experience with several benchmarks described below, NERSC recommends the default level of optimization for most codes, i.e. no optimization arguments to the compiler.
Use the flag -openmp to compile for OpenMP
% ftn -openmp MyCode.F90
The Intel compilers support version 3.1 of the OpenMP standard.
For the full list of compiler options type man ifort,man icc, and man icpc. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
For further information about the Intel compilers go to Intel Compiler Documentation.
The current default version of the Intel compilers is loaded automatically when you swap to the PrgEnv-intel modulefile. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail intel. To use a different version type module swap intel intel/<new_version>.
GNU Compilers (C/C++/Fortran)
NERSC has installed MPI versions of the GNU compilers on Hopper. See GNU bugs for a list of GNU compiler bugs affecting NERSC users.
To use the GNU compilers, swap to the GNU programming environment and then use the Cray wrappers (ftn, cc, and CC) when compiling. Note: do not use /usr/bin/gcc, since thay is usually a much older version of gcc than the one available with the PrgEnv-gnu module.
% module swap PrgEnv-pgi PrgEnv-gnu
Based on our experience with several benchmarks described below, NERSC recommends the following compiler flag for most codes: -Ofast.
In addition Cray benchmarkers make the following suggestions for tuning the GNU compiler:
- -O3 -ffast-math -funroll-loops
- Compiler feedback: -ftree-vectorizer-verbose=2
ftn -fopenmp MyCode.F90
Version 4.7 and later of the GNU compilers support OpenMP version 3.1.
For the full list of compiler options type man gfortran,man gcc, man g++ or man gCC. However, remember always to use the Cray wrappers (ftn, cc, and CC) when compiling.
The current default version of the GNU compilers is loaded automatically when you swap to the PrgEnv-gnu modulefile. Older and (sometimes) newer versions of the compilers may be available. To see those versions type module avail gcc. To use a different version type module swap gcc gcc/<new_version>.
For further documentation about these compilers see GNU Compiler Manuals.
Fortran Standards Compliance
The level of compiliance of each compiler with the Fortran 2003 standards can be found at Fortran 2003 compiliance. Among other things, this standard implements object oriented programming in Fortran.
The level of compiliance of each compiler with the Fortran 2008 standards can be found at Fortran 2008 compliance. Among other things, this standard implements Fortran coarrays.
Choosing a Compiler
We currently have 4 different compilers available on Hopper. Although the Portland Group compilers are the default on these systems, this is for historical reasons, and does not constitute a recommendation of this compiler over the others.
Each of the 4 compilers has its own strong and weak points, and we cannot recommend one over the others for all codes. Although some compilers will on the average produce faster executables than others, for any of the compilers we have example of codes that are faster when compiled with it than when compiled with any of the other compilers.
When you are faced with a decision about which compiler to use, the answer might be quite different if you are porting that code from another platform than it would be if you are developing a new code from scratch.
When porting a code from another platform to Hopper, it is usually best to compile the code with the compiler that it uses on the system from which it is being ported if that compiler is available on the system to which you are porting it. You will have to make fewer changes in an existing Makefile, and you will probably have better optimized code and fewer compiler problems than with a different compiler.
Many optimizations are not architecture dependent, but involve generally useful techniques that are valid for almost any platform. The architecture specific arguments for Hopper are provided by the compiler wrappers on those systems for all the compilers. A code that has been well optimized on another system should also perform well on Hopper with the same compiler and optimization options.
Similarly most compiler bugs are architecture independent, so a code that has not experienced any compiler bugs on its previous system is unlike to run across any on systems to which it is ported when compiled with the same compiler version and optimization options.
If you are developing a new code on Hopper or porting a code that uses a nonsupported compiler, you should consider whether the code witll run only on Cray systems or whether it might be ported to other non-Cray platforms.
For Cray only codes you should consider the Cray developed compilers loaded with the PrgEnv-cray module. These compilers are very well integrated with the Cray math libraries as well as with the compute node processor architecture so they can usually find more optimization opportunities than the other compilers can. They will also be updated to reflect any HW or system library changes on the Cray systems.
For portable codes, you should consider the PrgEnv-gnu compilers. These are available for free on any unix/linux system, and we have found that they can produce a good level of optimization on virtually any code. The PrgEnv-pgi compilers, although they are a commercial product, are also very widely used on a wide variety of different architectures.
The table below gives some tips on when to use which compiler, however, the recommendations are very general and users are encouraged to try different compilers to see which performs best on their codes.
|Portland Group||The PGI compilers have been available for quite a while and are used on a wide variety of architectures, so many existing codes will have PGI targetted makefiles. Even codes that have not been compiled with PGI compilers before are very like to compile without problems because the wide exposure and longevity of the compilers mean that they have been exposed to a very diverse set of codes that have exposed existing problems with the compilers and allowed them to be fixed.|
|Pathscale||Pathscale compilers are relatively new. They are optimized for the AMD64 and EM64T architectures and produce very fast code on those systems. They produce object code that is binary compatible with that produced by the GNU compilers so that an executable code can be produced linking libraries and objects compiled with both compilers.|
|Cray Compilers||Cray has recently ported their vector oriented compiler suite to the XT and XE systems. This is one of the most sophisticated compiling systems ever developed, and has been steadily improving its level of optimization due to its close integration with the Cray system libraries and compute node architecture. It supports many very old Fortran constructions that the newer compilers do not.|
|Intel Compilers||Intel compilers are optimized for Intel architectures, but some applications see good performance with this compiler on AMD.|
|GNU Compilers||This is a very generic compiler suite that runs on all Linux systems, so there will almost be a GNU targetted makefile for any established code. In general, the optimization is not as good with this compiler as with the four commercial compilers, but there are some codes that will run faster when compiled with the GNU compilers than with any other compilers.|
From Cray's perspective
|Portland Group||Very good Fortran, okay C and C++. Good vectorization. Good functional correctness with optimization enabled. Good manual and automatic prefetch capabilities. Very interested in the HPC market, although this is not their only focus. Excellent working relationship with Cray, good bug responsiveness.|
|Pathscale||Good Fortran, C, probably good C++. Outstanding scalar optimization for loops that do not vectorize. Fortran front end uses an older version of the CCE Fortran front end. OpenMP uses a non-pthreads approach. Scalar benefits will not get as much mileage with longer vectors.|
|Cray Compilers||Outstanding Fortran, very good C, and okay C++. Very good verctorization. Very good Fortran language support; only real choice for Coarrays. C support is quite good, with UPC support. Very good scalar optimization and automatic parallelization. Clean optimization of OpenMP 3.0, with tasks. Sole delivery focus is on Linux-based Cray hardware systems. Best bug turnaround time. Cleanest integration with other Cray tools (performance tools, debuggers, upcoming productivity tools). No inline assembly support.|
|Intel Compilers||GoodFortran, excellent C and C++ (if you ignore vectorization). Automatic vectorization capabilities are modest, compared to PGI and CCE. Use of inline assembly is encouraged. Focus is more on best speed for scalar, non-scaling apps. Tuned for Intel architectures, but actually works well for some applications on AMD.|
|GNU Compilers||So-so Fortran, outstanding C and C++ (if you ignore vectorization). Obviously, the best for gcc compatibility. Scalar optimizer was recently rewritten and is very good. Vectorization capabilities focus mostly on inline assembly. Note the last few releases have been incompatible with each other (4.3, 4.4, and 4.5) and required recompilation of Fortran modules.|
Compiler Flag Comparison
Below is a comparison of the most important and useful compiler flags.
|-fast||-O3||default||default||-O3 -ffast-math||Produce high level of optimization|
|-mp=nonuma||-mp||-Oomp (default)||-openmp||-fopenmp||Activate OpenMP directives and pragmas in the code|
|-byteswapio||-byteswapio||-h byteswapio||-convert big_endian||-fconvert=swap||Read and write Fortran unformatted data files as big-endian|
|-Mfixed||-fixedform||-f fixed||-fixed||-ffixed-form||Process Fortran source using fixed form specifications.|
|-Mfree||-freeform||-f free||-free||-ffree-form||Process Fortran source using free form specifications.|
|-V||-dumpversion||-V||--version||--version||Show version number of the compiler.|
|not implemented||-zerouv||-h zero||not implemented||-finit-local-zero||Zero fill all uninitialized variables.|
|-e m||Creates .mod files to hold Fortran90 module information for future compiles|
|-J dir_name||Specifies the directory to which file.mod files are written when the -e m option is specified|
Compiling for a login node
All parallel applications must be run on compute nodes. Sometimes, however, a user wants to run a very small executable on the login nodes. This is permitted as long as the application does not impede the login node for other users. In this case, users should compile an application without the Cray parallel wrappers and instead use a compiler directly, such as gcc, g++, pgcc, pgCC, etc.
Finding Where Symbols Come From
The following option works within all of the programming environments on Hopper. It can be useful for two things: finding which library or user file a given symbol is resolved from (in cases where there might be multiple definitions, such as user code and a system library); and when you have an undefined symbol in your link but don't know where the reference is coming from.
ftn -Wl,-y<symbol_name> my_code.f
Here, <symbol_name> is the symbol you wish to inquire about (don't use the < or >). The trick is that you need to know how the compiler will reference the symbol, meaning that it will probably be lower case characters and may have a post-pended underscore. You can use the unix 'nm' command to find out. Example: Let's say I have a code that calls the LAPACK routine ZGELS but I'm not sure when I create my a.out if this symbol is coming from my code or from Cray's libsci math library. The following example shows that the symbol is in fact resolved from libsci.
% ftn mycode.f
% nm a.out | grep zgels
0000000000409380 T zgels_
% ftn -Wl,-yzgels_ mycode.f
/opt/xt-libsci/11.0.05/pgi/109/mc12/lib/libsci_pgi_mp.a(zgels.o): definition of zgels_