IBM General Parallel File System for AIX: Data Management API Guide
[ Bottom of Page | Previous Page | Next Page | Table of Contents | Index ]
The failure model in XDSM is intended for a single-node system.
There are two types of failure:
- DM application failure
- The DM application has failed, but the file system works normally.
Recovery entails restarting the DM application, which then continues handling
events. Unless the DM application recovers, events may remain pending
indefinitely.
- Total system failure
- The file system has failed. All non-persistent DMAPI resources are
lost. The DM application itself may or may not have failed.
Sessions are not persistent, so recovery of events is not necessary.
The file system cleans its state when it is restarted. There is no
involvement of the DM application in such cleanup.
The simplistic XDSM failure model is inadequate for GPFS. Being a
multi-node environment, GPFS may fail on one node, but survive on other
nodes. This type of failure is called single node failure
(or partial system failure). GPFS is built to survive and
recover from single node failures, without meaningfully affecting file access
on surviving nodes.
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]