IBM Books

MPI Programming Guide


The threads library: considerations

When programming in a threads environment, specific skills and considerations are required. The information in this subsection provides you with specific programming considerations when using POE and the MPI threads library. It assumes you are familiar with POSIX threads in general, including multiple executables, thread condition waiting, thread-specific storage, thread creation and termination.

For information about specifying an I/O node file, see IBM Parallel Environment for AIX: Operation and Use, Volume 1.

POE gets control first and handles task initialization

POE sets up its environment using the poe_remote_main entry point. The poe_remote_main entry point sets up signal handlers, initializes a thread for handling asynchronous communication, and sets up an atexit routine before your main program is invoked.

Note:
One area where the threads library and the signal-handling library differ greatly is in the initialization of message passing. In the threads library, message passing is initialized when MPI_INIT is called, not by poe_remote_main.

Language bindings

The FORTRAN, C, and C++ bindings for MPI are contained in the same library (libmpi_r.a) and can be freely intermixed.

Use of AIX signals

The POE run-time environment creates a thread to handle the following asynchronous signals by performing a sigwait on them:

These handlers perform cleanup and exit with a code of (128+signal). You can install your own signal handler for any or all of these signals. If you want the application to exit after you catch the signal, call the function pm_child_sig_handler(signal,NULL,NULL). The prototype for this function is in file usr/lpp/ppe.poe/include/pm_util.h.

The following signals, which are used by MPI in the non-threads library, are handled as described below.

SIGALRM

The threads library does not use SIGALRM and long system calls are not interrupted by the message-passing library. For example, sleep runs its entire duration unless interrupted by a user-generated event.

SIGIO

PE blocks SIGIO before calling your program. SIGIO is used in the IP version of the library to notify you of an I/O event or the arrival of a message packet. This notification is enabled via the environment variable MP_CSS_INTERRUPT. If this environment variable is set to YES, the message packet arrival dispatches the interrupt service thread to process the packet.

The user space version of the library receives notification of an arriving packet via an AIX kernel event and does not use SIGIO. You may unblock it or use sigwait to process SIGIO signals.

If you've registered a signal handler (via sigaction) for SIGIO before MPI_INIT is called, the function is added to the interrupt service thread and is executed each time the service thread is dispatched. Although registered as a signal handler, the function is not required to be signal safe because it is executed on a thread. You can use pthread calls to communicate with other threads. You cannot call MPI functions in this handler.

After MPI_FINALIZE is called, your signal handler is restored but you need to unblock SIGIO in order to receive subsequent SIGIO signals.

If you register or change the SIGIO signal handler after calling MPI_INIT, your changes are ignored by the MPI library but your changes are not undone by MPI_FINALIZE.

SIGPIPE

Neither the threads nor the non-threads IP libraries use SIGPIPE. The threads User Space library polls a variable set by the PSSP device driver to determine if the switch has faulted and needs to be restarted. As a result, it does not use SIGPIPE.

Limitations in setting the thread stack size

The main thread stack size is the same as the stack size used for non-threads applications. Library-created service threads use a default stack size of 8K for 32-bit applications and 16K for 64-bit applications. The default value is specified by the variable PTHREAD_STACK_MIN, which is defined in header file /usr/include/limits.h. If you write your own MPI reduction functions to use with nonblocking collective communications, these functions may run on a service thread. Functions you write as SIGIO handlers may also run on a service thread. If your reduction functions or signal handlers require significant amounts of stack space, you can use the MP_THREAD_STACKSIZE environment variable to cause larger stacks to be created for service threads. This does not affect the default stack size for any threads you create.

Forks are limited

If a task forks, only the thread that forked exists in the child task. Therefore, the message passing library will not operate properly. Also, if the forked child does not exec another program, it should be aware that an atexit routine has been registered for the parent which is also inherited by the child. In most cases, the atexit routine requests that POE terminate the task (parent). A forked child should terminate with an _exit(0) system call to prevent the atexit routine from being called. Also, if the forked parent terminates before the child, the child task will not be cleaned up by POE.

Note:
A forked child must not call the message passing library.

Standard I/O requires special attention

When your program runs on the remote nodes, it has no controlling terminal. STDIN and STDOUT, STDERR are always piped.

If your threads MPI program processes STDIN from a large file on the home node, you must do one of the following:

This also includes programs which may not explicitly use MPI.

If STDIN is piped (or redirected) to the POE binary (via ordinary pipes) and your application is linked with the threads library, then handle STDIN in the following way:

Thread-safe libraries

AIX provides thread-safe versions of most libraries, such as libc_r.a. However, not all libraries have a thread-safe version. It is your responsibility to determine whether the AIX libraries you use can be safely called by more than one thread.

Program and thread termination

MPI_FINALIZE terminates the MPI service threads but does not affect user-created threads. Use pthread_exit to terminate any user-created threads, and exit(m) to terminate the main program (initial thread). The value of m is used to set POE's exit status as explained on Exit status.

Order requirement for system includes

For threads programs, AIX requires that the system include <pthread.h> must be first with <stdio.h> or other system includes following it. <pthread.h> defines some conditional compile variables that modify the code generation of subsequent includes, particularly <stdio.h>. Note that <pthread.h> is not required unless your program uses thread-related calls or data.

Using MPI_INIT or MPI_INIT_THREAD

Call MPI_INIT once per task, not once per thread. MPI_INIT does not have to be called on the main thread, but MPI_INIT and MPI_FINALIZE must be called on the same thread.

MPI calls on other threads must adhere to the MPI standard in regard to the following:

In the threads library, a call to MPI_INIT with environment variable MP_SINGLE_THREAD set to yes is equivalent to what might be expected from a call to MPI_INIT_THREAD specifying MPI_THREAD_FUNNELED. A call with MP_SINGLE_THREAD set to no is equivalent to using MPI_THREAD_MULTIPLE. The default setting of MP_SINGLE_THREAD is no, so the default behavior of the threads library is MPI_THREAD_MULTIPLE. MPI-IO and MPI one-sided communication will not operate if MP_SINGLE_THREAD is set to yes. In fact, PE MPI thread behavior is determined by MP_SINGLE_THREAD before MPI_INIT_THREAD is called. The argument given in the call to MPI_INIT_THREAD has no effect.

Collective communication calls

Collective communication calls must meet the MPI standard requirement that all participating tasks execute collective communication calls on any given communicator in the same order. If collective communications call are made on multiple threads, it is your responsibility to ensure the proper sequencing or to use distinct communicators.

Support for M:N threads

By default, user threads are created with process contention scope, and M user threads are mapped to N kernel threads. The values of the ratio M:N and the default contention scope are settable by AIX environment variables. The service threads created by MPI, POE, and LAPI have system contention scope, that is, they are mapped 1:1 to kernel threads.

Any user-created thread will be converted to a system contention scope thread when it makes its first MPI call.

Checkpoint/restart limitations

64-bit application considerations

Support for 64-bit applications is provided in the MPI threads library only. 64-bit MPL support is not available. You can choose 64-bit support by specifying -q64 as a compiler flag. All objects in a 64-bit environment must be compiled with -q64. You cannot call a 32-bit library from a 64-bit application, nor can you call a 64-bit library from a 32-bit application.

Integers passed to the MPI library are always 32 bits long. If you use the FORTRAN compiler directive -qintsize=8 as your default integer length, you will need to type your MPI integer arguments as INTEGER*4.

As defined by the MPI standard, the count argument in MPI send and receive calls is a 4-byte signed integer. To send or receive extremely large messages, you may need to construct your own datatype (for example, a "page" datatype of 4096 contiguous bytes).

The FORTRAN compilation scripts mpxlf_r, mpxlf90_r, and mpxlf95_r set the include path for 'mpif.h' to: /usr/lpp/ppe.poe/include/thread64 or /usr/lpp/ppe.poe/include/thread, as appropriate. Do not add a separate include path to 'mpif.h' in your compiler scripts, as an incorrect version of 'mpif.h' could be found.

The AIX 64-bit address space is large enough to remove any limitations on the number of memory segments that can be used, so the information in Available virtual memory segments does not apply to the 64-bit library.

MPI_WAIT_MODE: the nopoll option

MPI_WAIT_MODE=nopoll is supported as an option in the threads library only. It causes a blocking MPI call to go into a system wait after one millisecond of polling without a message being received. It may reduce CPU consumption for applications that post a receive call on a separate thread. It is recommended that MP_CSS_INTERRUPT=yes be set when the nopoll wait is selected, so that the system wait can be interrupted by the arrival of a message. Otherwise, the nopoll wait is interrupted at the timing interval set by MP_POLLING_INTERVAL.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]