MAESTRO: Low Mach Number Flow With AMR
MAESTRO is a low mach number astrophysics code used to simulate the evolution leading up to a type 1a supernova explosion. The NERSC6 MAESTRO simulation examines convection in a white dwarf and includes both explicit and implicit timestepping schemes.
Although adaptive mesh refinement (AMR) is a powerful technique with the potential to reduce resources necessary to solve otherwise intractable problems in a variety of computational science fields, it brings with it a variety of additional performance challenges that are worthy of testing in a workload-driven evaluation environment. In particular, they have the potential to add non-uniform memory access and irregular inter-processor communication to what might have otherwise been relatively regular and uniform access and patterns. MAESTRO is capable of decomposing a uniform grid of constant resolution into a specified number of non-overlapping grid patches but in the NERSC-6 benchmark does not refine the grid.
MAESTRO is also used as an IO benchmark and the NERSC-6 tests will be run in two different ways, one with IO turned on and the other with IO turned off. The runs without IO are used to calculate the SSP while the runs with IO are intended to show the expansion factor from including IO in the calculation as well as the percentage of time the application spends doing IO.
MAESTRO contains a new algorithm specifically designed to neglect acoustic waves in the modeling of low Mach number convective fluids but retain compressibility effects due to nuclear reactions and background stratification that takes place in the centuries-long period prior to explosion. MAESTRO allows for large time steps, finite-amplitude density and temperature perturbations, and the hydrostatic evolution of the base state of the star.
The basic discretization in MAESTRO combines a symmetric operator-split treatment of chemistry and transport with a density-weighted approximate projection method to impose the evolution constraint. The resulting integration proceeds on the time scale of the relatively slow advective transport. Faster diffusion and chemistry processes are treated implicitly in time. This integration scheme is embedded in an adaptive mesh refinement algorithm based on a hierarchical system of rectangular non-overlapping grid patches at multiple levels with different resolution; however, in the NERSC-6 benchmark the grid does not adapt. A multigrid solver is used.
MAESTRO uses BoxLib, a foundation library of Fortran90 modules from LBNL that facilitates development of block-structured finite difference algorithms with rich data structures for describing operations that take place on data defined in regions of index space that are unions of non-intersecting rectangles. It is particularly useful in unsteady problems where the regions of interest may change in response to an evolving solution.
The code consists of TBD lines of Fortran90 source and uses MPI.
MAESTRO is brought to you by Ann Almgren and John Bell of CCSE/LBL and Mike Zingale of SUNYSB.
Relationship to NERSC Workload
MAESTRO is used at NERSC in supernovae ignition studies with both SciDAC, INCITE and base allocations.
Parallelization in MAESTRO is via a 3-D domain decomposition in which data and work are apportioned using a coarse-grained distribution strategy to balance the load and minimize communication costs. Options for data distribution include a knapsack algorithm, with an additional step to balance communications, and Morton-ordering space-filling curves. Studies have shown that the time-to-solution for the low Mach number algorithm is roughly two orders of magnitude faster than a more traditional compressible reacting flow solver.
The MAESTRO communication topology pattern is quite unusual.
Obtaining the Code
The NERSC-6 procurement web site is no longer available. Two forms must be filled out to access this code. Contact the NERSC consultants (address on the NERSC home page) for access.
Building the Code
Edit the compiler/architectures section of the file fParallel_new/MAESTRO/wdconvect/GNUmakefile with the appropriate settings. A number of architecture definitions already exist. No other changes to this file are necessary.
Next edit fParallel_new/mk/GMakedefs.mak with the compiler flags for the given architecture.
cd to fParallel_new/MAESTRO/wdconvect and type 'gmake'. If all goes smoothly an executable called main...mpi.exe will be created and copied to the top level 'run' directory.
Within the build directory fParallel_new/MAESTRO/wdconvect, a directory 't' is created to hold the .o files. Typing 'gmake clean' will remove the executable and .o files.
Build-Related Files in this distribution
The MAESTRO benchmark contains 4 top level directories: < fParallel_new - Contains source code run - Example batch scripts, input files, executable copied to this directory sample_outputs - Output files verify - Verification tool
Directory or File
Top-level directory with README
Sample output files
a verification script
Running the Code
Run MAESTRO from the top level 'run' directory after building the code. All necessary input files are containted in this directory. Each MAESTRO simulation must be passed an input file as a argument and that input file points to a certain 'grid' file. For example, a small test case which runs on one or a few processors can be run as:
mpirun -np 1 executableName inputs_3d_64cubed
If you examine the inputs file 'inputs_3d_64cubed', the first parameter listed is 'test_set = "grid_3d_64cubed"'. This file describes the grid layout for a given problem and must be available in the same directory as the executable. The model_file, 'model_6.e8.hse' needs to be available too as well as the file helm_table.dat.
Run the medium case as
mpirun -np 512 executableName inputs_medium
Run the large case as
mpirun -np 2048 executableName inputs_large
The code will write a file to standard out and also produce a file called "results_.txt" (eg. results_inputs_medium.txt) will be used to verify results. The standard out file will report the run time of various routines. The time labeled "NERSC_TIME" is the total run time minus initialization time and should be reported in the Application Benchmark table.
You can experiment with the following setting on your platform. The code developers have reported on some platforms the code may see a performance improvement if the following line in the code is changed from 'false' to 'true' in the file fParallel_new/boxlib/mulitfab.f90
logical, parameter, private :: Do_AllToAllV = .false.
This parameter controls whether the code uses send/recvs or alltoall for some MPI communcations.
The following table may give you a rough idea of how to fit the MAESTRO code on a system. The memory footprint will vary some from machine to machine. input_file total memory notes inputs_3d_64cubed ~400 MB inputs_3d_128cubed ~3200 MB The 'base case' runs are weak scaling and are setup to use 16 32^3 'boxes' per processor which corresponds to roughly 800 MB/core.
Memory Required By The Sample Problems:
For the official benchmarks, the medium and large cases, the simulation must run for 10 steps and is controlled by the 'max_step' parameter in the input file. For testing memory requirements, the Offeror could set the max_step parameter to something smaller to shorten the duration of the run.
The 'base case' runs are weak scaling and are setup to use 16 32^3 'boxes' per processor which corresponds to roughly 800 MB/core.
The MAESTRO medium benchmark is run on a 512x512x1024 grid. The memory footprint should scale up with total number of grid points from the 64cubed and 128cubed small cases. This case is targeted to run at 512 processors.
The MAESTRO large case uses a 1024cubed grid and is targeted to run on 2048 processors.
IO in MAESTRO can be turned on or off with a parameter in the inputs file called 'chk_int'. When 'chk_int' is set to -1, no checkpoint files are written. When 'chk_int' is set to a number x (greater than 0), it writes a checkpoint dump at the beginning of the run and every x timesteps. Two, additional input files for the IO runs are provided in the 'run' directory called, 'inputs_medium_io' and 'inputs_large_io' where the parameter 'chk_int' is set to 5. This will produce 3 checkpoint dumps for each run, one at step 0, 5 and 10 in directories chk0000/ chk0005/ and chk0010 respectively. MAESTRO uses a one file per processor approach per variable approach to IO.
A medium case checkpoint dump run on 512 processors will produce 39 GB of data in ~2560 files. Apart from a few header files, each file is ~10s MBs.
A large case checkpoint dump run on 2048 processors will produce 153 GB of data in ~10240 files again with most files ~10s MBs.
IO Tests should be run with a command like the following
mpirun -np 512 executableName inputs_medium_io and mpirun -np 2048
In the verify/ directory a python script called verify.py is used to compare results to an accepted results set. The script examines the values of a few variables over the entire grid and finds the maximum difference beween the accepted results and a given run's results. The difference should be less than ~10^-2.
The verification test is run as:
>> python verify.py accepted_results_medium.txt results.txt
where "accepted_results_medium" is a reference file in the verify directory generated on Franklin at NERSC. If results do not verify, you should try a lower optimization level. If results do not verify at any optimization level you can try using one of the other sample result files from bgp, ranger or bassi (Only a few sizes are available).
This is Version 1.0 of the NERSC MAESTRO benchmark, released on September 4, 2008.
Record of Formal Questions and Answers
No entries as yet.
- J. B. Bell, A. J. Aspden, M. S. Day, M. J. Lijewski, "Numerical simulation of low Mach number reacting flows," Journal of Physics: Conference Series 78 (2007) 012004.
- A. S. Almgren, J. B. Bell, M. Zingale, "MAESTRO: A Low Mach Number Stellar Hydrodynamics Code," Journal of Physics: Conference Series 78 (2007) 012085.