IBM Books

MPI Programming Guide

MP_INIT_CKPT, mpc_init_ckpt

Purpose

Starts user-initiated checkpointing.

Library

libmpi_r.a

C synopsis

#include <pm_ckpt.h>
int mpc_init_ckpt(int flags);

FORTRAN synopsis

i = MP_INIT_CKPT(%val(j))

Parameters

In C, flags can be set to MP_CUSER, which indicates complete user-initiated checkpointing, or MP_PUSER, which indicates partial user-initiated checkpointing.

In FORTRAN, j should be set to 0 (the value of MP_CUSER) or 1 (the value of MP_PUSER).

Description

MP_INIT_CKPT starts complete or partial user-initiated checkpointing. The checkpoint file name consists of the base name provided by the MP_CKPTFILE and MP_CKPTDIR environment variables, with a suffix of the task ID and a numeric checkpoint tag to differentiate it from an earlier checkpoint file.

If the MP_CKPTFILE environment variable is not specified, a default base name is constructed: poe.ckpt.tag, where tag is an integer that allows multiple versions of checkpoint files to exist. The file name specified by MP_CKPTFILE may include the full path of where the checkpoint files will reside, in which case the MP_CKPTDIR variable is to be ignored. If MP_CKPTDIR is not defined and MP_CKPTFILE does not specify a full path name, MP_CKPTFILE is used as a relative path name from the original working directory of the task.

Notes

Complete user-initiated checkpointing is a synchronous operation. All tasks of the parallel program must call MP_INIT_CKPT. MP_INIT_CKPT suspends the calling thread until all other tasks have called it (MP_INIT_CKPT). Other threads in the task are not suspended. After all tasks of the application have issued MP_INIT_CKPT, a local checkpoint is taken of each task.

In partial user-initiated checkpointing, one task of the parallel program calls MP_INIT_CKPT, thus invoking a checkpoint on the entire application. A checkpoint is performed asychronously on all other tasks. The thread that called MP_INIT_CKPT is suspended until the checkpoint is taken. Other threads in the task are not suspended.

Upon returning from the MP_INIT_CKPT call, the application continues to run. It may, however, be a restarted application that is now running, rather than the original, if the program was restarted from a checkpoint file.

In a case where several threads in a task call MP_INIT_CKPT using the same flag, the calls are serialized.

The task that calls MP_INIT_CKPT does not need to be an MPI program.

There are certain limitations associated with checkpointing an application. See Checkpoint/restart limitations for more information.

For general information on checkpointing and restarting programs, see IBM Parallel Environment for AIX: Operation and Use, Volume 1.

For more information on the use of LoadLeveler and checkpointing, see IBM LoadLeveler for AIX 5L: Using and Administering.

Return values

0
indicates successful completion.

1
indicates that a restart operation occurred.

-1
indicates that an error occurred. A message describing the error will be issued.

Examples

C Example

#include <pm_ckpt.h>
int mpc_init_ckpt(int flags);

FORTRAN Example

i = MP_INIT_CKPT(%val(j))

Related information

Commands:

Subroutines:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]