NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory

Using OpenMP on IBM SP

OpenMP supports multi-platform shared-memory parallel programming in C/C++ and Fortran on many architectures. This document describes how to compile and run OpenMP programs on IBM SP systems.

Contents

Additional Information


Basic usage

OpenMP provides an easy method for SMP-style parallelization of discrete, small sections of code, such as a do loop. This can be very helpful for code development and testing.

However, OpenMP has a number of limitations which make it less desirable than MPI for large scale computations.

  • OpenMP can only be used among the processors of a single node. For use with production scale, multi-node codes, OpenMP threads must be combined with MPI processes.
  • Debugging OpenMP threads is complex using Totalview.
  • OpenMP provides many ways to write codes which compile and run, but provide unexpected results, particularly for codes with large granularity (e.g., calls to subroutines). Local variables in subroutine calls will probably be shared among threads; users must be take care that the desired memory-association is in effect.
  • All of our examples below use the Fortran90 xlf90_r compiler. If you use the Fortran77 xlf_r compiler, be aware that the default is -qsave, which may result in unexpected sharing of variables in subroutines.

OpenMP is available for

  • Fortran using the IBM xlf_r compiler,
  • C using the IBM xlc_r compiler, or
  • C++ using the IBM xlC_r compiler.

To compile and run a Fortran code using OpenMP use:

% xlf90_r -qsmp=omp -o exename filename.f
% ./exename

To compile and run a C code using OpenMP use:

% xlc_r -qsmp=omp -o exename filename.c
% ./exename

To compile and run a C++ code using OpenMP use:

% xlC_r -qsmp=omp -o exename filename.C
% ./exename

It should be noted that the -qsmp=omp option is required for both the compile step and the link step.

A program built in this way will automatically use a number of threads equal to the number of processors on the node.

Here's a small example code that prints out the number of threads created.

! Filename: threads.f
! Compile: xlf90_r -o threads -qsmp=omp threads.f
 
       PROGRAM HELLO
       IMPLICIT NONE
 
       INTEGER nthreads, tid, OMP_GET_NUM_THREADS
       INTEGER OMP_GET_THREAD_NUM
 
!     Fork a team of threads
!$OMP PARALLEL PRIVATE(nthreads, tid)
 
!     Obtain and print thread id
       tid = OMP_GET_THREAD_NUM()
       print *, 'Hello World from thread ', tid
 
!     Only master thread does this
       IF (tid .EQ. 0) THEN
          nthreads = OMP_GET_NUM_THREADS()
          print *, 'Number of threads ', nthreads
       END IF
 
!     All threads join master thread and disband
!$OMP END PARALLEL
 
      END

The same small example code in C is shown below:

/* Filename: threads.c
   Compile: xlc_r -o threads -qsmp=omp threads.c  */
 
#include "omp.h"
 
int main ()
{
   int nthreads, tid;
 
/*     Fork a team of threads      */
   #pragma omp parallel private(nthreads, tid)
   {
 
/*     Obtain and print thread id  */
      tid = omp_get_thread_num();
      printf("Hello World from thread %d\n", tid);
 
/*     Only master thread does this  */
      if (tid==0) {
         nthreads = omp_get_num_threads();
         printf("Number of threads %d\n", nthreads);
      }
   }
   return 0;
}

The same small example code in C++ is shown below. Note that printf is used for output rather than the stream cout. This is because printf produces more coherent output for multiple threads; different parts of the cout streams would be mixed in the output from the different parallel threads.

// Filename:  threads.C
// Compile:   xlC_r -o threads -qsmp=omp threads.C
 
#include <iostream>
#include <omp.h>
 
int main ()
{
   int nthreads, tid;
 
//     Fork a team of threads
   #pragma omp parallel private(nthreads, tid)
   {
 
//     Obtain and print thread id 
      tid = omp_get_thread_num();
      printf("Hello World from thread %d\n", tid);
 
//     Only master thread does this
      if (tid==0) {
         nthreads = omp_get_num_threads();
         printf("Number of threads %d\n", nthreads);
      }
   }
   return 0;
}

Compiling and running on on the IBM SP is as follows:

% xlf90_r -o threads -qsmp=omp threads.f
** hello   === End of Compilation 1 ===
1501-510  Compilation successful for file threads.f.
% ./threads
 Hello World from thread  8
 Hello World from thread  0
 Number of threads  8
 Hello World from thread  3
...

% xlc_r -o threads -qsmp=omp threads.c
% ./threads
Hello World from thread 0
Number of threads 8
Hello World from thread 5
...

% xlC_r -o threads -qsmp=omp threads.C
% ./threads
Hello World from thread 0
Number of threads 8
Hello World from thread 8
...

Note that you do not have to use poe for pure OpenMP codes that are intended to run on a single node.

Changing the number of threads and tasks

You can change the number of threads by setting the OMP_NUM_THREADS environment variable. The deafult is to use the same number of threads as cpus available on a node. For example, to create 8 threads on a single node

% setenv OMP_NUM_THREADS 8
% ./threads

Hello World from thread  0
Number of threads  8
 Hello World from thread  1
 Hello World from thread  2
 Hello World from thread  3
 Hello World from thread  4
 Hello World from thread  5
 Hello World from thread  6
 Hello World from thread  7

The same thing may be accomplished by using poe to request one task on a single node. That one task will run OMP_NUM_THREADS threads.

% poe ./threads -nodes 1 -tasks_per_node 1

The environment variable XLSMPOPTS can be used to control the behavior of OpenMP threads (including the number of threads).

Running on more than one node

You can use poe to run on more than a single node. However, the nodes can not communicate using only OpenMP; see "Mixing OpenMP and MPI" in the next section. Set -nodes to the number of nodes, -tasks_per_node to 1, and OMP_NUM_THREADS to whatever you wish, or use the default. For example, this will run on 2 nodes with the default number of OMP threads per node:

% unsetenv OMP_NUM_THREADS
% poe ./threads -nodes 2 -tasks_per_node 1 

Here is an analogous LoadLeveler script that compiles and runs the three examples above:

#@ class = debug
#@ shell = /usr/bin/csh
#@ node = 2
#@ tasks_per_node =  1
#@ network.MPI = csss,not_shared,us 
#@ wall_clock_limit = 00:02:00
#@ notification = complete
#@ job_type = parallel
#@ output = $(jobid).$(stepid).out
#@ error = $(jobid).$(stepid).out
#@ environment = COPY_ALL
#@ queue
 
set echo
 
xlf90_r -o threads -qsmp=omp threads.f 
poe ./threads
 
xlc_r -o threads -qsmp=omp threads.c
poe ./threads
 
mpxlf90_r -o threads -qsmp=omp threads.f 
./threads
 
mpcc_r -o threads -qsmp=omp threads.c
./threads
 
exit

Note that poe is needed in this script when the code was compiled with a "serial" version of the compiler. Without poe the code will not run on more than a single node. However, if a "parallel" version of the compiler, such as mpxlf90_r or mpcc_r, is used to create the executable then poe does not need to to be used on the command line. The use of poe in batch scripts can be confusing, because LoadLeveler keywords will override poe command line options.

Mixing OpenMP and MPI

OpenMP and MPI can be freely mixed in Fortran source code. You must use a "multiprocessor" and "thread-safe" compiler invocation with the -qsmp=omp option, e.g.,

  • mpxlf90_r -qsmp=omp for Fortran,
  • mpcc_r -qsmp=omp for C, and
  • mpCC_r -qsmp=omp for C++.

Some users have reported cases where this mixed-mode programming strategy increases a code's runtime performance.

Here's the same code as above, but with some MPI calls mixed in:

! Filename: hello.f
! Compile:  mpxlf90_r -o hello -qsmp=omp hello.f
! Run:      poe ./hello -nodes 2 -tasks_per_node 1 
 
      PROGRAM HELLO
      IMPLICIT NONE

      INCLUDE 'mpif.h'
 
      INTEGER nthreads, tid, OMP_GET_NUM_THREADS
      INTEGER OMP_GET_THREAD_NUM, myid, ierr, nprocs
      CHARACTER*32 buf
 
      call MPI_INIT( ierr )
      call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
      call MPI_COMM_SIZE( MPI_COMM_WORLD, nprocs, ierr )
      print *, "MPI Process number ", myid, " of ", nprocs, " is alive"
 
!     Fork a team of threads on each MPI task
!$OMP PARALLEL PRIVATE(nthreads, tid)
 
!     Obtain and print thread id
      tid = OMP_GET_THREAD_NUM()
!     print *, 'Hello World from OMP thread ', tid, 'on process ',myid
 
!     Only master thread does this
      IF (tid==0) THEN
       nthreads = OMP_GET_NUM_THREADS()
       print *, 'Number of OMP threads ', nthreads, 'on process ',myid
      END IF
 
!     All threads join master thread and disband
!$OMP END PARALLEL
 
      if (myid==0) buf='an MPI message from process 0'
 
      call MPI_BCAST(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD,ierr)
      if(myid/=0) print *, 'Process ', myid, "got ", buf
 
      call MPI_FINALIZE(ierr)
      END

Here is the same OMP/MPI example in C:


/* Filename: hello.c
   Compile: mpcc_r -o hello -qsmp=omp hello.c
   Run:     poe ./hello -nodes 2 -tasks_per_node 1     */
 
#include "mpi.h"
#include "omp.h"
 
int main(int argc, char* argv[]) 
{
   int nthreads, tid;
   int myid, nprocs;
   char buf[32];
 
   MPI_Init(&argc, &argv);                    /* start MPI     */
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);      /* get my proc id    */
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs);    /* get no.r of procs */
   printf("MPI Process number %d of %d is alive\n", myid, nprocs);
 
/*     Fork a team of threads      */
   #pragma omp parallel private(nthreads, tid)
   {
 
/*     Obtain thread id  */
      tid = omp_get_thread_num();
 
/*     Only master thread does this  */
      if (tid==0) {
         nthreads = omp_get_num_threads();
         printf("Number of threads %d on process %d\n", 
		nthreads, myid);
      }
   }
 
   if (myid==0) { strcpy(buf,"an MPI message from process 0"); }
   MPI_Bcast(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD);
   if (myid!=0) {printf("Process %d got %s\n", myid, buf); }
 
   MPI_Finalize();                            /* finish MPI       */
   return 0;
}

Finally, here is the same OMP/MPI example in C++:


// Filename: hello.C
// Compile:  mpCC_r -o hello -qsmp=omp hello.C
// Run:      poe ./hello -nodes 2 -tasks_per_node 1
 
#include <iostream>
#include <mpi.h>
#include <omp.h>
 
int main(int argc, char* argv[]) 
{
   int nthreads, tid;
   int myid, nprocs;
   char buf[32];
 
   MPI_Init(&argc, &argv);                    // start MPI
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);      // get my processor id
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs);    // get number of procs
   printf("MPI Process number %d of %d is alive\n", myid, nprocs);
 
//     Fork a team of threads
   #pragma omp parallel private(nthreads, tid)
   {
 
//     Obtain thread id
      tid = omp_get_thread_num();
 
//     Only master thread does this
      if (tid==0) {
         nthreads = omp_get_num_threads();
         printf("Number of threads %d on process %d\n", 
			nthreads, myid);
      }
   }
 
   if (myid==0) { strcpy(buf,"an MPI message from process 0"); }
   MPI_Bcast(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD);
   if (myid!=0) {printf("Process %d got %s\n", myid, buf); }
 
   MPI_Finalize();                            // finish MPI
   return 0;
}

To compile:

% mpxlf90_r -o hello -qsmp=omp hello.f
 ** hello   === End of Compilation 1 ===
 1501-510  Compilation successful for file hello.f.

or

% mpcc_r -o hello -qsmp=omp hello.c

or

% mpCC_r -o hello -qsmp=omp hello.C

and to run on two nodes with 1 MPI process per node and the default of 16 OpenMP threads per node:

% poe ./hello -nodes 2 -tasks_per_node 1 

Here's a LoadLeveler script to run the code on two nodes with 2 total MPI tasks and 16 OMP threads per node.

#@ class = debug
#@ shell = /usr/bin/csh
#@ node = 2
#@ tasks_per_node =  1
#@ network.MPI = csss,not_shared,us 
#@ wall_clock_limit = 00:02:00
#@ notification = complete
#@ job_type = parallel
#@ output = $(jobid).$(stepid).out
#@ error = $(jobid).$(stepid).err
#@ environment = COPY_ALL
#@ queue

./hello

exit

After the job completes this is the standard output file:

 MPI process number  1  of  2  is alive
 MPI process number  0  of  2  is alive
 Number of OMP threads  16 on process  0
 Number of OMP threads  16 on process  1
 Process  1 got an MPI message from process 0

LBNL Home
Page last modified: Tue, 15 Jul 2008 19:56:59 GMT
Page URL: http://www.nersc.gov/nusers/resources/software/ibm/openmp.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science