Chombo I/O Benchmak

The I/O benchmark has been set up to create Chombo data structures, fill them with arbitrary data, and write a single HDF5 file. Currently the benchmark only writes HDF5 data file; it does not read data. Inputs to the benchmark include the grids file, number of components, replication factor, and the ordering of the boxes. The grids file contains the box descriptions, but no data. The number of components is the number of values per cell of each box -- increasing this value increases the size of each write to disk. The replication factor controls the global size of the problem and is described below. There are options to control the ordering of the boxes, which affects which processor is assigned which boxes, and therefore can affect the I/O performance.

Compiling Chombo I/O Benchmark



To compile, create a file Chombo/lib/mk/Make.defs.local. There are examples of this file for different machines in the directory Chombo/lib/mk/local. For machines that we have not compiled on, there will certainly be some necessary modifications to the Make.defs.local file. There must be a parallel version of HDF5 installed on the system. The Make.defs.local file can be modified to point to the HDF5 installation. On most of the NERSC machines, to load the parallel version HDF5, try module load hdf5_par

Next change directories to Chombo/benchmark/IOMI

The compile command is:
make all DIM=3 MPI=TRUE DEBUG=FALSE OPT=HIGH

DIM=3: Three dimensions
MPI=TRUE: compile for parallel execution.
DEBUG=FALSE: do not compile with -g
OPT=HIGH: compile with best optimzation flags and turn off Asserts.

This will create an executable in Chombo/benchmark/IOMI.
The exact name of the executable will depend on the OS name, compilers, and certain build flags. On franklin.nersc.gov, the executable name is iomi3d.Linux.64.CC.ftn.OPTHIGH.MPI.ex


For example, compiling could be as simple as:

cp Chombo/lib/mk/local/Make.defs.franklin Chombo/lib/mk/Make.defs.local
cd Chombo/benchmark/IOMI
make all DIM=3 MPI=TRUE DEBUG=FALSE OPT=HIGH

Running I/O Chombo Benchmark


The replication factor is the number of times the code replicates the base ABR file (s64x64x64L2r4i80b8-32o0f0.00100p1.abr) in each direction. A replication factor of 2 2 1 would create two identical sets of grids in the X and Y directions, 1 in the Z direction, yielding a problem that is 4 times in size compared to replication factor of 1 1 1 .

Sample batch scripts for submitting parallel jobs are included in the Chombo/benchmark/IOMI/benchmarkRuns directory, for example: sample_batch_franklin. Note the sample scripts delete the output file phi.hdf5 after the job executes.

There are also input files in the benchmarkRuns directory. The only difference between the input files is the replication factor. The choice of number of processors and replication factor would depend on the machine and the goal of the experiment. On franklin, an example experiment was to run a "weak-scaling" series such as:
# of processors         replication factor
  64                         2 2 2
 128                         4 2 2
 256                         4 4 2

The input file points to the location of the grids file which is in the ABR format. For example: Chombo/benchmark/IOMI/s64x64x64L2r4i80b8-32o0f0.00100p1.abr


Output



Chombo uses HDF5 to read/write. Two small text files are read in as input:

input -- contains basic parameters
ABR file -- contains the description of the grids (no data)

The output that we are trying to benchmark is one HDF5 file. There are also two small files per rank: pout.x and time.table.x, where x is the processor rank ID as well as other diagnostic files

Timing



For this benchmark, we are concerned with the write time. It is known that there are calculations involved in the output code that are not directly reading/writing to disk, but currently we label this total time as the "write time". The write time to be reported is the value labeled "OutputData" found in the file timmy.txt. For example, the following table indicates the average write time over all processors was 30.96 seconds.
         Totals        WC per.      count        avg         min         max  
============================================================================
           Everything  100.00%          1      32.77 [     32.76,      32.77]
----------------------------------------------------------------------------
           read input    0.58%          1       0.19 [      0.19,       0.19]
      create unit DBL    0.01%          1       0.00 [      0.00,       0.02]
        reLoadBalance    0.27%          1       0.09 [      0.08,       0.09]
            GridStats    0.00%          1       0.00 [      0.00,       0.00]
       replicateGrids    0.12%          1       0.04 [      0.04,       0.04]
             initData    4.24%          1       1.39 [      1.35,       1.43]
        ComputeSumRHS    0.00%          0       0.00 [      0.00,       0.00]
        ComputeSumPHI    0.00%          0       0.00 [      0.00,       0.00]
           ComputePDV    0.00%          0       0.00 [      0.00,       0.00]
           OutputData   94.49%          1      30.96 [     30.92,      31.01]
              Cleanup    0.28%          1       0.09 [      0.09,       0.10]
============================================================================
T0         table tots:  99.99%          8      32.76 [     32.66,      32.87]


Further timing information can be found in the time.table.x files. Some rough memory usage information can be found in the pout.0 file.
Download Chombo I/O Benchmark

ChomboIOBench_Nov27_2007.tar.gz