NERSCPowering Scientific Discovery Since 1974

Using OpenMP with MPI

Overview

Edison nodes contain two processors with 12 cores each and each processor constitutes a NUMA node.  Codes typically use one OpenMP thread per physical compute core; however, hyperthreading (HT) is also possible, which would allow two threads per physical compute core. Without HT, the maximum number of OpenMP threads per node on Edison would be 24. OpenMP performance can be very dependent on the "mapping" of OpenMP threads to the architecture. On Edison, most likely the best choice may be something like 1-4 MPI processes per NUMA node with 12-3 OpenMP threads each. You should experiment with your code to find the best combination of HT, MPI processes, and OpenMP threads.  Substituting OpenMP threading for MPI parallelism is an excellent strategy on Edison.

Running using OpenMP

To use OpenMP with MPI, the code must be compiled with openmp enabled. Please refer to Using OpenMP for more details on programming.  Then you must request the correct value for number of nodes, number of MPI tasks,  and also set the OMP_NUM_THREADS environment variable. Key settings are as follows (assumes no HyperThreading): 

  • Total number of MPI tasks (-n) 
  • Number of MPI tasks per Edison node (-N); maximum of 24
  • Number of OpenMP threads (-c); maximum of 24
  • OMP_NUM_THREADS 

Please refer to sample batch scripts for running hybrid MPI/OpenMP jobs on the Edison Example Batch Scripts webpage. 

Supported Thread Levels

MPI defines four “levels” of thread safety.  The default thread support level for all three programming environments on Edison (intel, cray and gnu) is MPI_THREAD_SINGLE, where only one thread of execution exists.  The maximum thread support level is returned by the MPI_Init_thread() call in the "provided" argument.   "call MPI_init_thread()" is recommended to replace the "call MPI_init()" in all hybrid MPI/OpenMP programs.

API: MPI_INIT_THREAD (required, provided, ierr)
– IN: required, desired level of thread support (integer).   single: 0, funneled: 1, serialized: 2, multiple: 3
– OUT: provided, provided level of thread support (integer).  
– Returned provided maybe less than required.

MPI_QUERY_THREAD() can be used to query the provided thread support level.

You can set an environment variable MPICH_MAX_THREAD_SAFETY to different values to increase the thread safety.  (with intel/15.0.1.133)

envronment variable
MPICH_MAX_THREAD_SAFETY value
Supported Thread Level
not set MPI_THREAD_SERIALIZED 
single MPI_THREAD_SINGLE
funneled MPI_THREAD_FUNNELED
serialized MPI_THREAD_SERIALIZED
multiple MPI_THREAD_MULTIPLE

Using MPI with a fully-threaded code might involve something similar to the following (assuming Intel programming environment):

CC -o test.x TestMPI.cpp -openmp
setenv OMP_NUM_THREADS 6
setenv MPICH_MAX_THREAD_SAFETY multiple
srun -n 4 -N 4 -c 6 test.x

See man intro_mpi for more information, especially regarding performance implications of MPI_THREAD_MULTIPLE. 

Nested OpenMP

Nested OpenMP is supported on Edison using Intel and Cray compilers.  Please see more informaiton on example code and thread affinity control settings here.