ASCR Requirements Worksheet
1.1. Project Information - Simulation and Analysis of Reacting Flows
|Document Prepared By||John Bell|
|Project Title||Simulation and Analysis of Reacting Flows|
|Principal Investigator||John Bell|
|Participating Organizations||Lawrence Berkeley National Laboratory|
|Funding Agencies||DOE SC|
2. Project Summary & Scientific Objectives for 2011-2014
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for 2011-2104. Please list one or two specific goals you hope to reach by 2014.
The objective of this project is to develop new simulation methodology for multiphysics applications. We use an integrated approach in which we consider the mathematical formulation, discretizations and software issues. In particular, we develop mathematical formulations that reflect the relationship between scales in the underlying problem. We then develop discretizations of those models that incorporate the mathematical structure of the underlying processes. We then implement these algorithms in the context of an evolving software infrastructure that facilitates implementation of the methodology on HPC architectures. As a part of this process we conduct scientific investigations in the respective application areas, which are our major consumer of computational resources.
Specific areas we are currently targeting are combustion, porous media flow and astrophysics. The focus of our work in combustion is on high-fidelity simulations of high-pressure flames with detailed chemistry and transport. We use a low Mach number formulation for these simulations that exploit the separation of scales between the flame dynamics and acoustic wave propagation. The simulations also use adaptive mesh refinement to focus computational resolution near the extremely thin reaction zones characteristic of high pressure flames.
Our work on subsurface flow focuses on the development of adaptive algorithms for multiphase, multicomponent, non-isothermal flows. Our formulation splits the parabolic behavior of pressure form the advection-dominated behavior of the chemical components of the mixture and the enthalpy. The overall structure of the system is determined by the characterization of the phase equilibrium properties of the mixture. Applications on this methodology include carbon sequestration and environmental remediation.
Our work in astrophysics currently focuses on the simulation of Type Ia supernova (SNIa). Our goal is to provide an end-to-end simulations. Our particular focus is on the simulation of the convection processes leading up to ignition. For these simulations, we use a low Mach number formulation for stratified flows. This capability enables us to follow several hours of the evolution of the star leading up to ignition, which would be infeasible with a standard compressible formulation. The conditions of the star at ignition, including ignition location and the levels of convective turbulence play a critical role in the subsequent explosion of the star. We also have developed a compressible code that includes a turbulent flame model for following the evolution of the star after ignition through the explosion. We are currently developing the methodology to map between these two codes. We are also investigating X-ray bursts, convection in massive stars, core collapse supernovae and cosmological simulations with the methodology we are developing.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
Our work focuses on the development of multiphysics simulation codes within CCSE at LBNL. The names and application areas of these codes are:
- LMC:low Mach number combustion,
- PMAMR: porous media
- MAESTRO: low Mach number astrophysics,
- CASTRO: compressible astrophysics
- NYX: computational cosmology
While each of these codes is used to simulate flows in a different application area, they share a number of common features:
- Built on CCSE's well-established BoxLib framework
- Rely on iterative linear solvers for constant and/or variable coefficient elliptic and parabolic equations based on geometric multigrid
- Implemented on 2D or 3D adaptive grid hierarchies (block-structured grid AMR)
We are currently using a hierarchical parallelization strategy based on a combination of MPI + OpenMP. At a coarse-grained level we distributed patches to nodes using MPI. At a fine grained level within the node we use OpenMP directives to parallelize operations on patches. We have demonstrated that this approach is effective on current HPC architectures and should be extensible to many more cores / node.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|Facilities Used or Using||NERSC OLCF NSF Centers|
|Architectures Used or Using||Cray XT IBM Power Linux Cluster|
|Total Computational Hours Used per Year||40M Core-Hours|
|NERSC Hours Used in 2010||10M Core-Hours|
|Number of Cores Used in Typical Production Run||2K - 24K|
|Wallclock Hours of Single Typical Production Run||200-1000|
|Total Memory Used per Run||1K-24K GB|
|Minimum Memory Required per Core||0.5 GB|
|Total Data Read & Written per Run||2000 - 60000 GB|
|Size of Checkpoint File(s)||100 - 500 GB|
|Amount of Data Moved In/Out of NERSC||1000 GB per month|
|On-Line File Storage Required (For I/O from a Running Job)||5 TB and 3000-12000 Files|
|Off-Line Archival Storage Required||300 TB and 1500 Files|
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
VisIt, C++, F90, MPI + OPENMP, htar
4. HPC Requirements in 2014
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|Computational Hours Required per Year||300 M|
|Anticipated Number of Cores to be Used in a Typical Production Run||25K-200K|
|Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above||
|Anticipated Total Memory Used per Run||10K - 100K GB|
|Anticipated Minimum Memory Required per Core||0.5 GB|
|Anticipated total data read & written per run||10000-100000 GB|
|Anticipated size of checkpoint file(s)||500-4000 GB|
|Anticipated Amount of Data Moved In/Out of NERSC||1000 GB per month|
|Anticipated On-Line File Storage Required (For I/O from a Running Job)||20-40 TB and 5000 - 20000 Files|
|Anticipated Off-Line Archival Storage Required||1200 TB and 5000 Files|
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
We anticipate two types of changes to our methodology over the next 5 years.
1. The development of higher-order discretization approaches. Currently the numerics in our production computations is second-order accurate in both space and time. We plan to develop higher-order versions of our methodology. Higher-order methods will allow us to trade increased floating point work for less memory and reduced communication.
2. The development of in situ analysis capabilities. The increase in compute capability relative to I/O capacity suggests that we need to integrate at least some of the data analysis into the simulation directly to reduce I/O volume.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).
It is difficult to assess for our current methodology what the minimum memory requirement is per core.
We do need to maintain a fairly large memory per node. For example, for a 32 core node, we anticipate requiring at least 8 Gbyte for the node. This level could be pushed lower (or alternatively to more cores) but would require a more advanced programming model.
4d. Please list any new software, services, or infrastructure support you will need through 2014.
In addition to the software listed above, we anticipate a need for programming model / middleware support for in situ analysis. There are several potential strategies for how this could be done; the details depend on the characteristics of the architecture.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
Our methodology is based on a block-structured adaptive mesh refinement strategy. In block-structured AMR methods, regions requiring additional refinement are grouped into large grid patches. Each of these patches is at least 16^3, often larger. This type of approach lends itself naturally to a hierarchical approach to parallelism. We are currently using a combination of MPI + OpenMP. At a coarse-grained level we distributed patches to nodes using MPI. At a fine grained level within the node we use OpenMP directives to parallelize operations on patches. We have demonstrated that this approach is effective on current HPC architectures and should be extensible to many more cores / node.
However, we note that OpenMP is not an ideal programming model for fine-grained parallelism. We believe that with improved programming models this type of MPI + X strategy would be much more effective. Key issues are:
1. A lighter-weight thread model that does not include the overheads of OpenMP
2. Programming model support to express data layout to avoid performance penalties associated with non-uniform memory access properties intranode
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve by 2014 with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
A 50X increase in HPC compute resources would enable significant progress in each of our application areas. One significant change associated with this level of resources would be the ability to integrate uncertainty quantification into the our methodology.
Combustion: Fully investigate the behavior of new alternative fuels with detailed chemistry in the high-pressure environments characteristic of realistic combustion systems. Quantify the role of uncertainty in rate parameters on the predictive ability of simulations and use simulations to reduce uncertainty in those data. Address key modeling issues related to the development of engineering design models for next generation combustion systems.
Porous media flow: Quantify the role of subsurface uncertainty on the prediction of contaminant plumes and carbon sequestration strategies.
Astrophysics: Perform high-fidelity simulations of a variety of different supernova phenomena and compare with observations. Perform detailed cosmological simulations and compare to observations to reduce uncertainty in key cosmological parameters.
Achieving any of these goal would require a balanced growth in the available resources. It would also require a significant shift in our programming models. The two main issues there would be effective utilization of higher-levels of intranode concurrency and tools for enabling in situ analysis of results as part of the overall simulation.