NERSCPowering Scientific Discovery for 50 Years

Lee Berry, Paul Bonoli, David Green

FES Requirements Worksheet

1.1. Project Information - Center for Simulation of Wave-Plasma Interactions (aka RF SciDAC)

Document Prepared By

Lee Berry, Paul Bonoli, David Green

Project Title

Center for Simulation of Wave-Plasma Interactions (aka RF SciDAC)

Principal Investigator

Paul Bonoli

Participating Organizations

Massachusetts Institute of Technology, Princeton Plasma Physics Laboratory, Oak Ridge National Laboratory, Tech-X Corporation, CompX Corporation, Lodestar Research Corporation, and General Atomics.

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

The over-arching goal of the project is to quantitatively understand how high power (tens of MW) radio frequency (RF) power in the ion cyclotron frequency range of frequencies (ICRF) and lower hybrid range of frequencies (LHRF) propagates from an external antenna and how it is subsequently absorbed in a tokamak plasma. This capability is needed to understand how to optimally use this power to heat, drive current, control plasma profiles, control plasma stability, and avoid parasitic losses in fusion plasmas including ITER. This problem is computationally intensive because of the non-linear, 3D, multiscale characteristics of the problem. Present 3D runs for linear and quasi-linear models of the plasma core can take 100-200k processor hours on today’s machines. Three dimensional non-linear runs that couple the core and edge (including the antenna) are expected to take 10-100 times more cycles. Specific scientific objectives include: 
• Coupled core-to-edge simulations that lead to an increased understanding of parasitic losses in the boundary plasma between the RF antenna and the core plasma. 
• Simulations of core interactions of RF power with energetic electrons and ions to understand how these species affect power flow in the confined plasma. 
• RF affects on fast-particle driven instabilities to understand if these interactions increase (decrease) the instability drive that can lead to reduced fusion power. 
To support these goals, we will have to develop improved algorithms to achieve the needed physics, resolution, and/or statistics to address these issues and to efficiently utilize new computer architectures. 

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

A separate file will be sent. 

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Que times for medium sized jobs, system stability, software stability (compiler versions etc.). These seem to be issues for all HPC centers. 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  Local Clusters

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 1,500,000 Core-Hours

NERSC Hours Used in 2009

 1,546,052 Core-Hours

Number of Cores Used in Typical Production Run

5000

Wallclock Hours of Single Typical Production Run

4

Total Memory Used per Run

 5000 GB

Minimum Memory Required per Core

 1 GB

Total Data Read & Written per Run

 0.1 GB

Size of Checkpoint File(s)

 GB

Amount of Data Moved In/Out of NERSC

 2 GB per  month

On-Line File Storage Required (For I/O from a Running Job)

0.1  TB and  100 Files

Off-Line Archival Storage Required

 TB and  Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

Specialized (and also old) graphics libraries like ncar and pgplot often lag compiler change overs.  

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

50,000,000

Anticipated Number of Cores to be Used in a Typical Production Run

2,000,000

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

4

Anticipated Total Memory Used per Run

 600,000 GB

Anticipated Minimum Memory Required per Core

 1 GB

Anticipated total data read & written per run

1,200,000  GB

Anticipated size of checkpoint file(s)

 600,000 GB

Anticipated Amount of Data Moved In/Out of NERSC

 100 GB per  month

Anticipated On-Line File Storage Required (For I/O from a Running Job)

10 TB and 10 Files

Anticipated Off-Line Archival Storage Required

 10 TB and  1000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Implementation of a next level checkpoint-restart and error detection model will be a major component. Schemes such as diskless check-pointing, minimizing check-point size, maintaining redundant hardware will required as the processor and equivalently, the hardware fail rate rises. This will be especially if our run-time for at scale jobs grows. Fortunately we anticipate reasonably short (~4hr) run time requirements. 
 
Separation of the on-node programming model from the inter-node programming model, i.e., remove MPI from on-node tasks in preparation for accelerators and high processor per node counts. While some progress will be made here, this will be a long term effort and depend on hardware definition. 

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

Continued efficient performance of dense-matrix factor libraries and file systems to handle large restart files are needed (matrix and matrix factors are only practical restart for the largest problems). Without extensive code development, cpu/gpu capability needs to remain high for AORSA/TORIC-LH. For particle codes, DC/ORBIT-RF, less development is required, and more gpu capability can readily be used.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

A logical continuation/expansion of the present would be reasonable, assuming that restart capability can be accommodated within anticipated increases in file systems capability. However hardware and compiler support for PGAS languages such as Co-array FORTRAN and UPC would be desirable. 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

Our first need is that efficient libraries be developed. FFTs, sparse, block, and dense matrix factors, and parallel function evaluation are prominent needs. Second, we have identified CPU intensive kernels in our codes with local data needs that would significantly benefit from GPU technology. Examples include a double integral in the matrix fill for AORSA and a 4D sum in AORSA. The latter has already demonstrated on ORNL machines with GPUs. Some effort to realize this potential is planned within the present project, but additional resources will be required for a production implementation.  
 
But even this type of effort depends what the hardware will be. A general strategy will be to separate inter node communications from intra node memory management. That is, use MPI only for inter-node communications. This should provide a good starting point for future development. However, even this step will be limited by resources. 

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

The objectives stated in '2', and the resources in '4' reflect an 'expanded HPC resources' set of goals and requirements. More resources with more processors, approaching 1,000,000, are required to resolve the 3D, non-linear physics that we believe controls the wave-plasma interactions of the three objectives stated above. Namely 
• Coupled core-to-edge simulations that lead to an increased understanding of parasitic losses in the boundary plasma between the RF antenna and the core plasma. 
• Simulations of core interactions of RF power with energetic electrons and ions to understand how these species affect power flow in the confined plasma. 
• RF affects on fast-particle driven instabilities to understand if these interactions increase (decrease) the instability drive that can lead to reduced fusion power.