NERSCPowering Scientific Discovery Since 1974

Introduction to Scientific I/O

Parallel HDF5

The HDF5 library can be compiled to provide parallel support using the MPI library. The HDF Group maintains a special tutorial on parallel topics.

An HDF5 file can be opened in parallel from an MPI application by specifying a parallel 'file driver' with an MPI communicator and info structure. This information is communicated to HDF5 through a 'property list,' a special HDF5 structure that is used to modify the default behavior of the library. In the following code, a file access property list is created and set to use the MPI-IO file driver:

/* create the file in parallel */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(fapl_id, mpi_comm, mpi_info);
file_id = H5Fcreate("myparfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id);

The MPI-IO file driver defaults to independent mode, where each processor can access the file independently and any conflicting accesses are handled by the underlying parallel file system. Another option for independent access is the MPI-POSIX file driver, which bypasses the MPI-IO layer and uses direct POSIX (e.g. fwrite) calls that are coordinated internally by HDF5. In some scenarios, this lighter-weight MPI-POSIX driver exhibits better performance, especially on systems with a poorly implemented MPI-IO library. To set the MPI-POSIX file driver, replace the H5Pset_fapl_mpio call above with:

H5Pset_fapl_mpiposix(fapl_id, mpi_comm, use_gpfs_hints);
If the parameter use_gpfs_hints is set to 1, additional optimizations for the GPFS filesystem are invoked. The /project filesystem on Franklin and Hopper uses GPFS, but currently the HDF5 optimizations for GPFS are not functional. There is an on-going project to resolve this.

Alternatively, the MPI-IO file driver can be set to collective mode to enable collective buffering. This is not specified during file creation, but rather during each transfer (read or write) to a datset by passing a dataset transfer property list to the read or write call. The following code shows how to write collectively to 'dataset0' in our previous example file:

dxpl_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(dxpl_id, H5FD_MPIO_COLLECTIVE);

/* describe a 1D array of elements on this processor */
memspace = H5Screate_simple(1, count, NULL);

/* map this processor's elements into the shared file */
filespace = H5Screate_simple(1, mpi_size*count, NULL);
offset = mpi_rank * count;
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, &offset, NULL, &count, NULL);

H5Dwrite(dset_id, H5T_NATIVE_FLOAT, memspace, filespace, dxpl_id, somedata0);

When accessing a dataset in parallel, two dataspaces are neccessary: one to specify the shape of the data in each processor's memory, and another to specify the layout of the data in the file. In the above code, each processor has the same number count of elements in a 1D array, and they write into a 1D array in the file according to their processor rank. The layout is specified with a 'hyperslab' in HDF5, and although this regular 1D array example is perhaps the simplest possible example, more complicated layouts are described in the next section.