For all the compilers, Cray provides a convenient set of wrapper commands that should be used in almost all cases for compiling and linking parallel programs instead of the vendor specific compiler names. Invoking the wrappers will automatically link codes with MPI libraries and other Cray system software. All MPI and Cray system include directories are also transparently imported. In addition the wrappers append the compiler's target processor arguments for the Edison compute node processors.
NOTE: The intention is that programs be compiled on the login nodes and executed on the compute nodes. Because the compute nodes and login nodes have different operating systems, binaries created for compute nodes may not run on the login node. The wrappers mentioned above guarantee that codes compiled using the wrappers will run on the compute nodes.
For Fortran source code use ftn
% ftn -fast -o example.x example.f90
For C source code use cc
% cc -fast -o example.x example.c
For C++ source code use CC
% CC -fast -o example.x example.C
All compilers on Edison -- Intel, Cray and GNU -- are provided via three programming environments that are accessed via the module utility. Each programming environment contains the full set of compatible compilers and libraries. To change from one compiler to the other you change the programming environment via the 'module swap' command, e.g., module swap PrgEnv-intel PrgEnv-cray.
The Intel Math Kernel Library (MKL) is available on Edison for the Intel compiler, providing routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices. The library also includes fast Fourier transform (FFT) functions, vector mathematical and vector statistical functions with Fortran and C interfaces, and ScaLAPACK.
By default, the Intel compiler uses the Cray libsci math library. To use MKL on Edison you must unload the cray-libsci module:
module unload cray-libsci
and then link your code in either of the following two ways:
ftn -o mycode.exe *.o -mkl
ftn -o mycode.exe *.o -mkl=cluster
Note that the -mkl option NEEDS TO APPEAR AT THE END OF THE LINE, i.e., ftn -mkl *.o will not work. Also, for Intel compilers
ftn -o mycode.x *.o -L/path/to/libs -llib works, whereas
ftn -o mycode.x -L/path/to/libs -llib *.o does not.
For more details see the Intel compiler man pages for ifort, icc, and icpc.
Intel Advanced Vector Extensions (AVX)
The Edison compute processors are equipped with the Intel AVX (Advanced Vector Extensions) instructions which enable it to perform identical operations on multiple operands. Vectorization applies primarily to loops and can significantly improve the performance of your code. The hardware vector length for Edison is four 64-bit operands or eight 32-bit operands but for optimal performance, vectorizable loops should have as large a trip count as possible. The NERSC recommended optimization arguments for all three compilers implement vectorization.
You can (and should) have the Intel compiler emit information about vectorization when you compile. Use the "-vec-report3" compiler option to send the output to the terminal or add "-opt-report-file=filename" to send the output to a file.
Using the Cray programming environment is also a very good way to explore vectorization on Edison. First, swap to the Cray programming environment, with "module swap PrgEnv-intel PrgEnv-cray." Then, add the "-r im" option (for Fortran) when you compile. This will generate a listing file ("*.lst") for each source file in which all source is decorated with Cray's "Loopmark" annotations, describing the compiler's experiences in optimizing/vectorizing the code. A legend appears near the top of the listing with explanations of all the markings. Use the "-h report=v" option for C.
Enabling your application to take advantage of vectorization is an important component of achieving high performance on today's supercomputers. Vectorization allows you to execute a single instruction on multiple data objects in parallel within a single CPU core, thus improving performance. Read More »
Overview Adding OpenMP threading to an MPI code is an efficient way to run on multicore processors. Since OpenMP uses a global shared address space within each node, using OpenMP may reduce memory usage while adding parallelism. It can also reduce time spent in MPI communications. A collection of OpenMP resources, tutorials, etc can be found at OpenMP Reources. An interesting advantage of OpenMP is that you can add it incrementally to an existing code. Compiling to use OpenMP OpenMP is… Read More »