NERSCPowering Scientific Discovery Since 1974

Introduction to Scientific I/O

Parallel HDF5

The HDF5 library can be compiled to provide parallel support using the MPI library. The HDF Group maintains a special tutorial on parallel topics.

An HDF5 file can be opened in parallel from an MPI application by specifying a parallel 'file driver' with an MPI communicator and info structure. This information is communicated to HDF5 through a 'property list,' a special HDF5 structure that is used to modify the default behavior of the library. In the following code, a file access property list is created and set to use the MPI-IO file driver:

/* create the file in parallel */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(fapl_id, mpi_comm, mpi_info);
file_id = H5Fcreate("myparfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl_id);

The MPI-IO file driver defaults to independent mode, where each processor can access the file independently and any conflicting accesses are handled by the underlying parallel file system. Alternatively, the MPI-IO file driver can be set to collective mode to enable collective buffering. This is not specified during file creation, but rather during each transfer (read or write) to a datset by passing a dataset transfer property list to the read or write call. The following code shows how to write collectively to 'dataset0' in our previous example file:

dxpl_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(dxpl_id, H5FD_MPIO_COLLECTIVE);

/* describe a 1D array of elements on this processor */
memspace = H5Screate_simple(1, count, NULL);

/* map this processor's elements into the shared file */
filespace = H5Screate_simple(1, mpi_size*count, NULL);
offset = mpi_rank * count;
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, &offset, NULL, &count, NULL);

H5Dwrite(dset_id, H5T_NATIVE_FLOAT, memspace, filespace, dxpl_id, somedata0);

When accessing a dataset in parallel, two dataspaces are necessary: one to specify the shape of the data in each processor's memory, and another to specify the layout of the data in the file. In the above code, each processor has the same number count of elements in a 1D array, and they write into a 1D array in the file according to their processor rank. The layout is specified with a 'hyperslab' in HDF5, and although this regular 1D array example is perhaps the simplest possible example, more complicated layouts are described in the next section.