Sanjiva Lele, Johan Larsson
ASCR Requirements Worksheet
1.1. Project Information - Shocks, turbulence and material interfaces
|
Document Prepared By |
Sanjiva Lele, Johan Larsson |
|
Project Title |
Shocks, turbulence and material interfaces |
|
Principal Investigator |
Sanjiva Lele |
|
Participating Organizations |
Stanford University |
|
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: |
2. Project Summary & Scientific Objectives for 2011-2014
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for 2011-2104. Please list one or two specific goals you hope to reach by 2014.
Broadly speaking we are interested in fluid flows where interactions between shock waves, material interfaces, and compressible turbulence (or a subset of these) occur. The main objective is to elucidate the fundamental physics of these problems, which is still not fully understood. For this reason we focus on canonical problems, e.g., isotropic turbulence passing through a normal shock. While this canonical problem rarely occurs in nature, it does isolate the interaction-process from other effects. To experimentally study this canonical case is devilishly difficult, and hence numerical simulations from (essentially) first principles is the main hope for progress.
We started in 2008 by computing on 1200*3842 grids, for which we could reach a Reynolds number of 40. The modest size of these runs allowed us to run 15 cases to map out the parameter space. In 2009/2010 we have run on 2250*10242 grids at Reynolds number of 75 (4 cases only). This has allowed us to quantify the effect of the Reynolds number, at least to a first approximation.
For 2011-2014, one objective is to run a single case (or maybe two) at Reynolds number of 100, on approximately a 4000*20482 grid. This would really nail down the Reynolds number effect for this problem. A second objective (which we have already started on) is to work on the modeling of these problems in the context of large eddy simulation. This requires a large number of runs at very modest sizes, say 100-1000 cores.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
We use 2 different codes, both of which solve the weak form of the compressible Navier-Stokes equations (5 coupled nonlinear PDEs). Both codes use MPI and domain decomposition. Both codes are fully explicit in space and time, with no large matrix-inversions.
The main code for canonical high-fidelity work is the Hybrid code, which uses a structured Cartesian grid, 5th/6th-order solution-adaptive numerics (WENO at shocks, central differences elsewhere), and explicit 4-stage Runge-Kutta time-stepping. This code has been used in production runs on up to 65,536 cores (BG/P) and 12,288 cores (Franklin). Typical run-times are 12-24 hours. The largest grid sizes have been 2250*10242 .
We have recently started using the Charles code, which is unstructured and relatively low-order (4th-order in optimal situations, generally lower). This code uses a low-order version of the solution-adaptive Hybrid algorithm for accurate handling of shocks and turbulence. It has been used in production runs on up to 12,288 cores. The code has broad capabilities, including chemical reactions etc. As such, it is (naturally) not quite as fast or accurate as the Hybrid code.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
The main limitation is the human time required for post-processing and analysis. Running large simulations has become quite routine for us, but the scientific value of these runs is only realized when we really analyze the data in depth. We have ready-made scripts and codes that extract the obvious statistics we want to look at, but to analyze/extract things in a more interactive and curiosity-driven way is time-consuming. This is not specific to NERSC.
One NERSC-specific issue we have faced over the last year or so is the need to adjust various MPI flags on Franklin (e.g., MPICH_UNEX_BUFFER_SIZE etc). This is highly annoying as a user, since you never know with certainty that the submitted run will actually work. Not a big deal, but would be better if resolved.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: Stanford |
|
Architectures Used or Using |
Cray XT IBM Power BlueGene Linux Cluster GPUs Other: Dell |
|
Total Computational Hours Used per Year |
15M Core-Hours |
|
NERSC Hours Used in 2010 |
4.6M Core-Hours |
|
Number of Cores Used in Typical Production Run |
4096-16384 |
|
Wallclock Hours of Single Typical Production Run |
12-24 |
|
Total Memory Used per Run |
GB |
|
Minimum Memory Required per Core |
0.5 (runs), as much as possible (post-processing) GB |
|
Total Data Read & Written per Run |
1000 GB |
|
Size of Checkpoint File(s) |
40 GB |
|
Amount of Data Moved In/Out of NERSC |
100GB per year??? GB per |
|
On-Line File Storage Required (For I/O from a Running Job) |
1-2 TB and 100 Files |
|
Off-Line Archival Storage Required |
5 TB and 1000 Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
4. HPC Requirements in 2014
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|
Computational Hours Required per Year |
10M |
|
Anticipated Number of Cores to be Used in a Typical Production Run |
4K-8K (most runs), 100K (single run) |
|
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
12-24 |
|
Anticipated Total Memory Used per Run |
500-1000 GB |
|
Anticipated Minimum Memory Required per Core |
0.5-2 GB |
|
Anticipated total data read & written per run |
2000 GB |
|
Anticipated size of checkpoint file(s) |
40 GB |
|
Anticipated Amount of Data Moved In/Out of NERSC |
as little as possible... GB per |
|
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
5 TB and 100 Files |
|
Anticipated Off-Line Archival Storage Required |
25 TB and 5000 Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
The main changes are in the physical modeling, so they will not really change the algorithms. One real unknown is related to combustion modeling: we currently do this by keeping a large (say 1GB) pre-computed table of chemical data on each core. This is memory-intensive and memory-limited by nature, so would require cores with at least 2 GB memory to continue being feasible. We have started looking into hybrid programming models, where the table is kept on one node and OpenMP is used to parallelize among the cores on that node. This is still in the early stages, so we don't really know where we will be in 3-5 years.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).
If we still use pre-computed and tabulated chemistry, then 2 GB/core. Otherwise, 0.5 GB/core will be sufficient.
4d. Please list any new software, services, or infrastructure support you will need through 2014.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
We are still in the early stages. One student has modified a code to use OpenMP within the node but MPI between nodes, and is currently running on a local Stanford cluster. We have not yet adopted this in our major codes. Through our PSAAP-sponsored project we interact with computer scientists that are developing the domain-specific language Liszt. This would allow for a single code to be ported to CPUs or GPUs in a manner transparent to the scientist. This is still a work-in-progress.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve by 2014 with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
With 50X the resources and, crucially, the human time to make efficient use of them, we could:
1) Really nail down the physics of shock/turbulence interaction, thereby paving the way for predictive engineering models.
2) Compute a fully resolved supersonic turbulent reacting flow case, which would become an extremely useful benchmark for combustion modeling.
Both of these would both increase our physical understanding and stimulate developments in modeling.


