The information in this subsection provides you with specific additional programming considerations for when you are using POE and the (non-threads) MPI/MPL signal-handling library.
The signal-handling library is functionally stabilized. Because MPL is only in the signal-handling library, continued use of MPL is incompatible with exploiting MPI enhancements. For example, the signal-handling library does not support checkpoint/restart capabilities. To use the checkpoint/restart functions, you must compile with and use the threads library.
POE sets up its environment using the entry point poe_remote_main, which is linked using the AIX -binitfini option. The poe_remote_main entry point initializes the message passing library, sets up signal handlers, and sets up an atexit routine.
Only a subset of MPL message passing is allowed on handlers created by the MPL Receive and Call function (mpc_rcvncall or MP_RCNVCALL). MPI calls on these handlers are not supported. The threads library does not provide any facility that is comparable to the MPL Receive and Call function. Consider using LAPI or the MPI one-sided functions if this kind of facility is needed.
POE links in the following routines when your executable is compiled with mpcc, mpxlf or mpCC. These are routines specific for the signal handling environment.
POE initializes the parallel message passing library and determines that all nodes can communicate successfully before the user main() program gains control. As a result, any program compiled with the POE compiler scripts must be run under the control of POE and is not suitable as a serial program.
If communication initialization fails, the parallel task is terminated with an appropriate exit code.
POE handles the following asynchronous signals by installing signal handlers that use the sa_handler format:
These handlers perform cleanup and exit with a code of (128+signal). You can install your own signal handler for any or all of these signals. If you want the application to exit after you catch the signal, call the function pm_child_sig_handler(signal,NULL,NULL). The prototype for this function is in file usr/lpp/ppe.poe/include/pm_util.h.
In addition, the message passing library sets up signal handlers for SIGALRM, SIGIO, and SIGPIPE to manage message-passing activity. A user program may install a handler for any or all of these signals, but should save the address of and invoke the POE signal handler before returning to the interrupted code. The sigaction() function returns the required structure. Also, set the SA_RESTART flag as well as the mask so all signals are masked when the signal handler is running.
The following signals are used and specifically handled by the message-passing library in a signal-handling environment:
If the user application catches this signal but doesn't do interval timing, it should call the registered message passing signal handler before returning to the main code.
Do not block this signal for more than a few milliseconds.
The message passing library uses an interval timer to manage message traffic, specifically to ensure that messages progress even when message passing calls are not being made. When this interval timer expires, a SIGALRM signal is sent to the program, interrupting whatever computation is in progress. The message passing library has a signal handler set, and normally handles the signal and returns to the user's program without the program's knowledge. However, the following library and system calls are interrupted and do not complete normally. The user is responsible for testing whether an interrupt occurred and recovering from the interrupt. In many cases, this is accomplished by just retrying the call.
With the exception of exec, sleep, and system, the routines listed above set the system error indicator (the variable errno) to EINTR, which can be tested by the user's program. See Sample replacement select program.
Normal file read and write are restarted automatically by AIX, and should not require any special treatment by the user.
The fork and system calls create a new task in which the interval timer is still running. If a fork is followed by an exec (which is what system does), the signal handler for the timer is overlaid, and the task is terminated when the interval timer expires.
To handle this for the system call, temporarily turn the interval timer off before the call using the alarm(0,0) call, and then turn the timer on again after the system call using ualarm(500000, 500000), for example.
To handle the interval timer for a forked child, turn off the interval timer using the alarm(0,0) call in the child.
Further restrictions on fork follow.
As described earlier, if a task forks, the forked child inherits the running timer. The timer should be turned off before forking another program. If the forked child does not exec another program, it should be aware that an atexit routine has been registered for the parent which is also inherited by the child. In most cases, the atexit routine will request POE to terminate the task (parent). A forked child should terminate with an _exit(0) system call to prevent the atexit routine from being called. Also, if the forked parent terminates before the child, the child task will not be cleaned up by POE.
A forked child must not call the message passing library.