 |
 |
|
|
|
The
enhanced Lin-Rood dynamical core shows excellent scaling (~80% efficient)
up to 1820 processors on the NERSC Cray T3E and IBM SP2.
|
|
Doug
Rotman, Art Mirin, John Tannahill, José Milovich, and Phil
Duffy, Lawrence Livermore National Laboratory
Research
Objectives
In collaboration
with NCAR and NASA, as well as Lawrence Berkeley, Los Alamos, Oak Ridge,
and Argonne National Laboratories, we are developing, implementing, and
enhancing the computational capabilities of the next-generation NCAR Community
Climate System Model (CCSM). In particular, we are improving the performance
and scalability of the NASA Lin-Rood dynamical core in the Community Climate
Model (CCM3/4) and the barotropic solver in the Parallel Ocean Program
(POP) model.
Computational
Approach
We are
expanding the capabilities of the Lin-Rood dynamical core by implementing
2D message passing domain decomposition along with enhanced use of OpenMP
within a processing node. The horizontal discretization is built upon
the flux form semi-Lagrangian transport algorithms, which have been extended
to the shallow water dynamical framework. The piecewise parabolic method
(PPM) is used as the 1D building block for multi-dimensional dynamics
and transport. Our approach is to use a 2D domain decomposition of latitude
and altitude for the dynamics, transposing the data to a 2D latitude/longitude
decomposition for column physics calculations, and then transposing back
to latitude/altitude for dynamics.
To improve
the performance and scalability of the barotropic solver, two approaches
are being tried: (1) To parallelize the baroclinic solver (which scales
well) using typical 2D ocean domain decomposition, but to carry out the
barotropic solver on a small number of processors to reduce the communication
latency and time. (2) To implement a new solver that reduces communication
needs by reorganizing the calculation to allow maximum use of local cell
information to update the barotropic velocities. This new solver uses
wave front recursion to eliminate interior variables within each domain.
This allows the construction of a reduced system of equations that involves
only those variables that are involved in communication across nearest
neighbor domain boundaries.
Accomplishments
We
have carried out timing and scalability studies of the Lin-Rood dynamical
core on a variety of platforms around the DOE complex. The NERSC IBM SP2
and Cray T3E showed excellent scaling (~80% efficiency) up to 1820 processors.
We began to see a strong reduction in parallel efficiency and performance
as we moved to 22 latitude subdomains (the maximum allowed at this resolution).
When run in a full climate model, this dynamical core is coupled with
a column physics package to update state variables. We tested transpose
libraries to ensure accurate results as well as analyzed transpose timings
to ensure efficient parallel execution.
In the
POP ocean barotropic solver, we have implemented the wave front recursion
elimination solver on a single processor using a five-point stencil. Tests
have been completed on the use of differing decompositions for the baroclinic
and barotropic solvers in the ocean model. By reducing the processors
used for the barotropic solver, a 25% increase in POP ocean model throughput
has been achieved. This is the result of having fewer but larger messages
and hence reducing latency.
Significance
NASA, NCAR,
and DOE are jointly developing a next-generation Community Climate System
Model, which will incorporate a higher degree of physical consistency
than is realized in the current generation of spectral and finite-difference
models. This project aims to enable model simulations on large parallel
machines so that longer and greater numbers of long climate simulations
can be carried out.
|