NERSCPowering Scientific Discovery Since 1974

Bruce Palmer

Case Study Worksheet

Project Information - Development and Test of an IO API for the Global Cloud Resolving Model

Document Prepared By Bruce Palmer
Project Title Development and Test of an IO API for the Global Cloud Resolving Model
Principal Investigator Karen Schuchardt
Participating Organizations PNNL, CSU, NERSC
Science Category Climate Environmental Science Biological Sciences
Funding Agencies DOE SC DOE NSA NSF NOAA NIH Other:

Project Summary (Scientific Objectives)

Please give a brief description of your project and its scientific objectives for the next 3-5 years.

This project is focused on providing an IO API for a Global Cloud Resolving Model (GCRM) being developed by Dave Randall and his collaborators. The API is focused on providing a simple interface that allows users to add data fields that will be written out to the API, to modify which fields are actually written to files, to specify the frequency of writing to files, and to specify the number of files and which fields they contain. The API is also designed to write files using a standard format (NetCDF) and to include enough additional information on the grid to support additional analyses and visualization without refering back to the original GCRM code. A major focus of the project is to write data at high bandwidths. The current minimum bandwidth required for run with the GCRM is 5 GB/s, but rates much higher than this may be needed as the GCRM develops.

Current HPC Usage and Methods

Facilities Used NERSC NCCS ACLF NSF Centers Other: EMSL
Architectures Used Cray XT IBM Power BlueGene Linux Cluster Other:
Total Computational Hours Used per Year Core-Hours NERSC Hours Used per Year 0 Core-Hours
Number of Cores Used in Typical Production Run
Wallclock Hours of Single Typical Production Run
Total Memory Used per Run GB Minimum Memory Required per Core GB
Total Data Read & Written per Run GB Size of Checkpoint File(s) 500 GB
Amount of Data Moved In/Out of NERSC GB How Often
On-Line File Storage Required (Directly Accesible from a Running Job) GB Files
Off-Line Archival Storage Required GB Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

Most data requirements, runtimes, etc. are specified by GCRM (see Dave Randall). 
 
Successful execution of GCRM will require large amounts of archival storage for results.

Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

GCRM code (being developed at CSU) 
Hydrostatic simulation code 
Multilevel grid test code 
 
These codes all simulate dynamics on the surface of the sphere. The current goal is to simulate at a resolution corresponding to 42 million surface cells and 100 vertical levels.

Please list the known limitations/obstacles/bottleneck of resources currently available HPC systems, and in particular, those at NERSC.

The main bottleneck is achieving high levels of IO bandwidth. We have done a lot of work to aggregate data so that we are only executing large contiguous writes on a subset of nodes, but we are still not achieving IO bandwidths that are a significant fraction of the theoretical maximum.

HPC Usage and Methods for the Next 3-5 Years

Anticipated changes to codes, mathematical methods and/or algorithms needed to achieve this project's scientific objectives.

Parallel IO libraries (parallel NetCDF, NetCDF4, HDF5) that can achieve a significant fraction of theoretical IO bandwidth for large contiguous writes to file 
 
Parallel IO libraries that fully support 64bit offsets etc. for very large (>4GB) files

Computational Hours Required per Year
Anticipated Number of Cores to be Used in a Typical Production Run
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above
Anticipated Total Memory Used per Run GB
Anticipated Minimum Memory Required per Core GB
Anticipated total data read & written per run GB
Anticipated size of checkpoint file(s) GB
Anticipated On-Line File Storage Required (Directly Accesible from a Running Job) GB Files
Anticipated Off-Line Archival Storage Required 1000-8000 GB ~100000 Files

Known or Anticipated architectural requirements (e.g., 2 GB memory/core).

As mentioned, most memory and computation requirements and are dictated by the GCRM model.

Please list any additional required or important software, services, or infrastructure beyond those listed in the previous section.

Large amounts of archival storage and high bandwidth interconnect between archive and filesystem.

It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

No plan so far. However, the current IO API is fairly modular and should be a good starting point for incorporating multicore programming ideas as they become available.

What Do You Need from NERSC?

Please tell us what you need from NERSC to meet your project's computing needs over the next 3-5 years. Also please feel free to make any general comments.

Reliable, high performance parallel IO libraries that support very large files.