NERSCPowering Scientific Discovery Since 1974

Edward Patton (Lee)

Case Study Worksheet

Project Information - Influences of the Boundary Layer Flow on Vegetation-Air Exchanges of Energy, Water and Carbon Dioxide

Document Prepared By Edward Patton (for Xuhui Lee)
Project Title Influences of the Boundary Layer Flow on Vegetation-Air Exchanges of Energy, Water and Carbon Dioxid
Principal Investigator Xuhui Lee
Participating Organizations Yale University 
National Center for Atmospheric Research
Science Category Climate Environmental Science Biological Sciences
Funding Agencies DOE SC DOE NSA NSF NOAA NIH Other:

Project Summary (Scientific Objectives)

Please give a brief description of your project and its scientific objectives for the next 3-5 years.

The objective of this project is to establish a mechanistic understanding of the interplay among flow 
heterogeneity in the atmospheric boundary layer (ABL), land surface heterogeneity, and the vegetation-air 
fluxes of energy, water and CO2. The project will investigate mechanisms by which mesoscale motions in 
the ABL influence vegetation-air exchange. It will also quantify the influence of heterogeneity on 
predictions by 1D column models used in regional and global scale climate models. It is hypothesized that 
two important ABL processes entrainment and flow heterogeneity cause biases in the observation and 
model estimates of vegetation-air exchange and that the degree of bias is different for active (heat and 
water) and passive (CO2) scalars.

Current HPC Usage and Methods

Facilities Used NERSC NCCS ACLF NSF Centers Other: DOD centers
Architectures Used Cray XT IBM Power BlueGene Linux Cluster Other: SGI Altix
Total Computational Hours Used per Year 1000000 Core-Hours NERSC Hours Used per Year 500000 Core-Hours
Number of Cores Used in Typical Production Run 64-4096 Wallclock Hours of Single Typical Production Run 12-600
Total Memory Used per Run 2-1000 GB Minimum Memory Required per Core 2 GB
Total Data Read & Written per Run 4-2000 GB GB Size of Checkpoint File(s) 1-500 GB GB
Amount of Data Moved In/Out of NERSC 600 GB How Often in frequently
On-Line File Storage Required (Directly Accesible from a Running Job) 2 GB 20 Files
Off-Line Archival Storage Required 20(?) GB 200(?) Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

We use the large-eddy simulation code developed in-house at NCAR. The code predicts turbulence and its response to forcing typical of and at scales appropriate to the Earth's atmospheric boundary layer. The model predicts time dependent velocity and scalar fields by numerically integrating the Navier-Stokes equations and conservation equations for heat and any number of additional scalars on a Cartesian grid. We enforce incompressibility by utilizing the continuity equation and solving the poisson equation for pressure. The basic numerical algorithm is a mixed pseudo-spectral finite difference code with third-order Runge-Kutta time stepping. We use periodic boundary conditions in the horizontal. Our surface boundary condition uses iteration to calculate the exchange at the ground relating to the overlying wind speeds and impacts of buoyancy. Land-surface boundary conditions vary with the problem being addressed; sometimes we use fixed/specified boundary conditions at the ground, but we have also coupled the LES code to a land-surface model (like those used in weather forecasting and climate models) to allow for time-dependent and spatially varying surface boundary conditions based upon predicted soil moisture/temperatures, canopy photosynthesis, and atmospheric demand. For this, we use the NOAH land surface model which is the primary land-surface model at the bottom of NCAR's Weather Research and Forecast model (WRF). We use a radiation condition at the top boundary. The code is written in FORTRAN90 and uses MPI to break down the computational effort in two of the three spatial dimensions. We use MPI/IO to read/write 3D volumes of data containing a minimum of 5 8-byte real variables. NCAR's MOZART chemical mechanism has been integrated into the LES code, so if we are pursuing turbulence/chemical coupling problems then we could potentially require 3D volumes of nearly 60 8-byte real variables. The problems we have been working on at NERSC (no complex chemistry) typically range in size from 256^3 up to 1024^3 grid points running on between 64 to 4096 CPUs. We have tested the code on Franklin on problems using from 2048^3 up to 3072^3 CPUs (which was the largest problem we could fit on to Franklin prior to it being upgraded to quad-cores with double the memory) running on up to 16384 CPUs. 
 
Our LES code that uses a 2D MPI implementation does not currently permit complex topography. For complex topography, our older version of the LES code uses MPI only in one spatial dimension and the topography can only be 2D. This code is still pseudo-spectral in the horizontal directions and finite difference in the vertical. It uses a curvilinear coordinate system and transposes the equations into a cartesian frame. We are actively developing our 2D MPI version from its current ability to investigate boundary layers over flat terrain to a more general system able to handle 3D terrain which can be fixed in time (turbulent flow over 3D hills) or time-varying (mimicking turbulent flow over 3D water waves). This code is about two times slower than our flat code because we need to iterate for pressure.

Please list the known limitations/obstacles/bottleneck of resources currently available HPC systems, and in particular, those at NERSC.

Our calculations are typically limited by time. Meaning, we need to integrate for long durations. With fixed forcing, simulations are usually integrated forward for two hours (simulated time) to allow the turbulence to develop and come into equilibrium with the forcing. At this point the analyses begin; depending on the the case being simulated, stable statistics require about five large-eddy turnover times of averaging which usually requires an additional 2-10 hours of further simulation. With three second timesteps (as determined using a fixed CFL number), this implies lengthy calculations. More and more, the field is moving toward studying turbulence driven by time-varying forcing (i.e., a full diurnal cycle). Time-varying forcing means that averages require ensemble averaging multiple lengthy calculations.  
 
As our problems get larger, we also find that we prefer not to write too many 3D volumes because they are extremely large and I/O is time-consuming. For this methodology to work, the machines we're running on need to be reliable enough that we can count on the computation arriving at the designated time for checkpointing so that we can write a restart file. This has been our biggest issue with Franklin. If a single node drops out for any reason, there's no way to recover (as far as we've been able to figure out) and we've lost then entire calculation (since last restart). When you're running on 16000 CPUs, the likelihood of one node dropping during a 12 hour period is non-zero.

HPC Usage and Methods for the Next 3-5 Years

Anticipated changes to codes, mathematical methods and/or algorithms needed to achieve this project's scientific objectives.

We have plans to be performing analyses of complete diurnal cycles (i.e. full day simulations). Turbulence changes character dramatically over a full day. In particular the scales of motion change through changes in atmospheric stability (large during the day, much smaller at night). This means grid requirements change, timesteps become smaller, etc. Typical calculations till now use semi-constant forcing and run until the turbulence is in equilibrium with that forcing. Moving to this new diurnal cycle paradigm implies time-varying forcing (the sun comes up, the sun goes down) which means that the calculations will require much longer integration times. Parameterizations of the boundary layer need to naturally transition across these regimes so calculations such as we propose will permit natural development and testing of those parmaterizations.

Computational Hours Required per Year 1-10 million
Anticipated Number of Cores to be Used in a Typical Production Run 1024-16384
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above
Anticipated Total Memory Used per Run GB
Anticipated Minimum Memory Required per Core GB
Anticipated total data read & written per run GB
Anticipated size of checkpoint file(s) GB
Anticipated On-Line File Storage Required (Directly Accesible from a Running Job) GB Files
Anticipated Off-Line Archival Storage Required GB Files

Known or Anticipated architectural requirements (e.g., 2 GB memory/core).

Please list any additional required or important software, services, or infrastructure beyond those listed in the previous section.

It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

What Do You Need from NERSC?

Please tell us what you need from NERSC to meet your project's computing needs over the next 3-5 years. Also please feel free to make any general comments.