NERSCPowering Scientific Discovery for 50 Years

CS Chang

FES Requirements Worksheet

1.1. Project Information - Center for Plasma Edge Simulation

Document Prepared By

CS Chang

Project Title

Center for Plasma Edge Simulation

Principal Investigator

CS Chang

Participating Organizations

New York University, ORNL, PPPL, LBNL, MIT, Columbia U., Rutgers U. Lehigh U., Georgia Tech, Auburn U., U. Colorado, U. California at Irvine, Caltech, Hinton Associates

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

Develop the XGC large scale edge kinetic codes further for higher fidelity simulation of the electromagnetic multiscale edge physics in ITER. Using the kinetic codes, perform integrated simulation among kinetic, MHD, neutral particles, and atomic physics for higher fidelity understanding of the multiscale edge physics and the wall heat-load.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

Our current primary codes are XGC0 and XGC1 particle-in-cell codes. Particle motions are described by Largrangian equation of motion in 3D cylindrical coordinate system on realistic toroidal geometry with magnetic separatrix. Either Runge-Kutta or predictor-corrector methods are used. The field quantities are evaluated on grid mesh using linear multigrid PETSc solvers. The scale of the simulation is characterized by grid size, which then determines the number of particles. The parallelism is expressed by MPI/OpenMP hybrid. 

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Current limitation we are trying to overcome is in the algorithm which enables the fully electromagnetic turbulence simulation within the 5D gyrokinetic formalism. This is related to the time-step resolution of the fast electron motions, hence to the computing power. Presently, 5D ion full-f simulation of the DIII-D edge plasma requires 20 hours of simulation on 100,000 processor cores. Other machines can require more computing power. Fluid-kinetic or split-weight electron simplification technique can significantly reduce the computing-power requirement, by demanding only 2 times more number of processor cores instead of factor of 60 (full electron kinetics in deuteron plasmas). The electromagnetic simulation of DIII-D edge plasma thus requires 200K processor cores on Franklin for one-day completion. With higher computing power, the electrons can be simulated in full-velocity function, instead of the simplified velocity space function. ITER simulation will require about 1 million processor cores for a 20 hour run, assuming that the current linear scalability holds. 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 65,000,000 Core-Hours

NERSC Hours Used in 2009

 8,000,000 Core-Hours

Number of Cores Used in Typical Production Run

 15,000 - 170,000

Wallclock Hours of Single Typical Production Run

 20-100

Total Memory Used per Run

 40 GB

Minimum Memory Required per Core

0.3 GB

Total Data Read & Written per Run

 5,000 GB

Size of Checkpoint File(s)

1,000 GB

Amount of Data Moved In/Out of NERSC

 10 GB per  day

On-Line File Storage Required (For I/O from a Running Job)

 4 TB and 3,000 Files

Off-Line Archival Storage Required

 1 TB and 30 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

As the system becomes larger, early detection for the processor failure is an important issue for an HPC code such as XGC1. A detection software, which can be compiled and run together with an HPC code can be quite helpful. If the software can replace the faulty node with another one, it will be even better. Otherwise, the code run can be stopped for a restart. 

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

 500,000,000

Anticipated Number of Cores to be Used in a Typical Production Run

 1,000,000

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

 20-100

Anticipated Total Memory Used per Run

 100,000 GB

Anticipated Minimum Memory Required per Core

 0.1 GB

Anticipated total data read & written per run

 25,000 GB

Anticipated size of checkpoint file(s)

 5,000 GB

Anticipated Amount of Data Moved In/Out of NERSC

50 GB per  day

Anticipated On-Line File Storage Required (For I/O from a Running Job)

5 TB and 3,000 Files

Anticipated Off-Line Archival Storage Required

10 TB and 100 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

A new physics algorithm is needed, and currently under development, which utilizes the full-f ions and the delta-f electrons. Fully parallelized particle and grid data are needed for enhanced data locality and reduced memory requirement.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

0.1 GB memory/core

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

Since XGC1 is anticipated to be running at near the maximal capacity of the new machines over the next 5 years, efficient fault tolerance services will be needed. 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

We have been quite successful in adapting to the current level of multi-core architecture by using the MPI/OpenMP hybrid mode. We find that the all-OpenMP operation per node is not the optimal solution on 12 core XT5. Instead, the solution was two Open-MP processes per node. As the core numbers increase per processing element, our low level strategy is to find the highest-performance mixture between OpenMP and MPI per node. At a higher level, our 3-5 years strategy is to develop asynchronous algorithms for effective utilization of many heterogeneous cores. A run-time scheduler such as StarPU will be used to coordinate and map threads to computational resources. Another approach is the incorporation of partitioned global address space (PGAS) languages to offer means for expressing locality of data. 
 
We are also looking into the GPGPUs. Sparse matrix-vector multiply has been demonstrated on GPGPU and is supported by optimized library from Nvidia. Similarly, multi-grid has been demonstrated to have efficient implementation on GPGPU. These are expected to be useful for the XGC paricle-in-cell code. 

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

We will have higher fidelity 5D gyrokinetic simulation of the ITER plasma including kinetic electrons and electromagnetic turbulence in realistic diverted magnetic field geometry. This will allow us to understand many unknown, but necessary, physical phenomena for successful ITER research, which have not been possible using the present reduced models. 
 
Ability to handle very large jobs and more CPU hours are important for our project.