NERSCPowering Scientific Discovery Since 1974

Using OpenMP

Overview

OpenMP provides a standardized, threaded, shared-memory programming model, accessed via compiler directives that are embedded in the application source code.  More details on OpenMP (such as the standard specification and tutorials) can be found at the OpenMP Web Site

There are two approaches to using OpenMP on Carver:

  • "Pure" OpenMP applications run on a single node, and are thus limited to 8 threads on Nehalem nodes.
  • "Hybrid" MPI/OpenMP applications use OpenMP on individual nodes, and MPI to communicate between nodes.

It is important to note that OpenMP and Open MPI are different entities.  OpenMP is a programming model, implemented independently in all 3 compiler suites available on Carver.  Open MPI is a particular library implementation of the MPI programming model, and is the only MPI implementation available on Carver.

How to Compile

OpenMP directives are treated as comments by the compiler, unless the following compiler options are used:

Compiler Suite Option
PGI -mp
Intel -openmp
GCC -fopenmp

Examples

The following examples use the Open MPI compiler wrappers, and the appropriate compiler flags to enable OpenMP.  Thus, these commands are suitable for hybrid applications, but may also be used to compile pure OpenMP applications. 

Using PGI compiler suite:

carver% mpif90 -mp -fast example.f90

Using Intel compiler suite:

carver% mpif90 -openmp -O3 example.f90

Using GCC compiler suite:

carver% mpif90 -fopenmp mp -O3 example.f90

How to Run Pure OpenMP Applications

Because pure OpenMP applications can only execute on a single node, there is no need to use mpirun to launch the application.  The following example batch script shows a simple execution of a pure OpenMP application using 8 threads.

Three important points:

  • ppn is set to 1.
  • the number of OpenMP threads should be limited to a maximum of 8 (to avoid oversubscribing threads to cores).
  • pvmem must be set to the amount of usable memory (see Memory Considerations)
#PBS -l nodes=1:ppn=1
#PBS -l pvmem=20GB
#PBS -l walltime=00:10:00

cd $PBS_O_WORKDIR

setenv OMP_NUM_THREADS 8
./my_omp_executable

How to Run Hybrid MPI/OpenMP Applications

In the following example, mpirun is used to launch one MPI process on each of 4 nodes; each MPI process is itself an OpenMP application using 8 threads.

Three important points:

  • ppn must be set to the number of MPI processes per node.
  • the number of OpenMP threads should be limited to a maximum of 8 divided by the number of MPI processes per node (to avoid oversubscribing threads to cores).
  • pvmem must be set to the amount of usable memory divided by the number of MPI processes per node (see Memory Considerations).  If pvmem multiplied by ppn is greater than the amount of usable memory the job will be queued, but it will never run.
#PBS -l nodes=4:ppn=1
#PBS -l pvmem=20GB
#PBS -l walltime=00:10:00

cd $PBS_O_WORKDIR

setenv OMP_NUM_THREADS 8
mpirun -np 4 -bynode ./my_mpi_omp_executable

In this example, 2 MPI processes are started on each node, with each process using 4 threads.  Each process is bound to a socket (a quad-core processor).  Each process is limited to half of the usable memory due to the pvmem=10GB option.

#PBS -l nodes=4:ppn=2
#PBS -l pvmem=10GB
#PBS -l walltime=00:10:00

cd $PBS_O_WORKDIR

setenv OMP_NUM_THREADS 4
mpirun -np 8 -bysocket -bind-to-socket ./my_mpi_omp_executable