IBM Books

IBM General Parallel File System for AIX: Data Management API Guide

[ Bottom of Page | Previous Page | Next Page | Table of Contents | Index ]


DMAPI failure model for GPFS

The failure model in XDSM is intended for a single-node system. There are two types of failure:

DM application failure
The DM application has failed, but the file system works normally. Recovery entails restarting the DM application, which then continues handling events. Unless the DM application recovers, events may remain pending indefinitely.

Total system failure
The file system has failed. All non-persistent DMAPI resources are lost. The DM application itself may or may not have failed. Sessions are not persistent, so recovery of events is not necessary. The file system cleans its state when it is restarted. There is no involvement of the DM application in such cleanup.

The simplistic XDSM failure model is inadequate for GPFS. Being a multi-node environment, GPFS may fail on one node, but survive on other nodes. This type of failure is called single node failure (or partial system failure). GPFS is built to survive and recover from single node failures, without meaningfully affecting file access on surviving nodes.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]