Purpose
Starts user-initiated checkpointing.
Library
libmpi_r.a
C synopsis
#include <pm_ckpt.h> int mpc_init_ckpt(int flags);
FORTRAN synopsis
i = MP_INIT_CKPT(%val(j))
Parameters
In C, flags can be set to MP_CUSER, which indicates complete user-initiated checkpointing, or MP_PUSER, which indicates partial user-initiated checkpointing.
In FORTRAN, j should be set to 0 (the value of MP_CUSER) or 1 (the value of MP_PUSER).
Description
MP_INIT_CKPT starts complete or partial user-initiated checkpointing. The checkpoint file name consists of the base name provided by the MP_CKPTFILE and MP_CKPTDIR environment variables, with a suffix of the task ID and a numeric checkpoint tag to differentiate it from an earlier checkpoint file.
If the MP_CKPTFILE environment variable is not specified, a default base name is constructed: poe.ckpt.tag, where tag is an integer that allows multiple versions of checkpoint files to exist. The file name specified by MP_CKPTFILE may include the full path of where the checkpoint files will reside, in which case the MP_CKPTDIR variable is to be ignored. If MP_CKPTDIR is not defined and MP_CKPTFILE does not specify a full path name, MP_CKPTFILE is used as a relative path name from the original working directory of the task.
Notes
Complete user-initiated checkpointing is a synchronous operation. All tasks of the parallel program must call MP_INIT_CKPT. MP_INIT_CKPT suspends the calling thread until all other tasks have called it (MP_INIT_CKPT). Other threads in the task are not suspended. After all tasks of the application have issued MP_INIT_CKPT, a local checkpoint is taken of each task.
In partial user-initiated checkpointing, one task of the parallel program calls MP_INIT_CKPT, thus invoking a checkpoint on the entire application. A checkpoint is performed asychronously on all other tasks. The thread that called MP_INIT_CKPT is suspended until the checkpoint is taken. Other threads in the task are not suspended.
Upon returning from the MP_INIT_CKPT call, the application continues to run. It may, however, be a restarted application that is now running, rather than the original, if the program was restarted from a checkpoint file.
In a case where several threads in a task call MP_INIT_CKPT using the same flag, the calls are serialized.
The task that calls MP_INIT_CKPT does not need to be an MPI program.
There are certain limitations associated with checkpointing an application. See Checkpoint/restart limitations for more information.
For general information on checkpointing and restarting programs, see IBM Parallel Environment for AIX: Operation and Use, Volume 1.
For more information on the use of LoadLeveler and checkpointing, see IBM LoadLeveler for AIX 5L: Using and Administering.
Return values
Examples
C Example
#include <pm_ckpt.h> int mpc_init_ckpt(int flags);
FORTRAN Example
i = MP_INIT_CKPT(%val(j))
Related information
Commands:
Subroutines: