NERSCPowering Scientific Discovery for 50 Years

Linda Sugiyama

FES Requirements Worksheet

1.1. Project Information - Title

Document Prepared By

Linda Sugiyama

Project Title

Title

Principal Investigator

Linda Sugiyama

Participating Organizations

test data

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

Extended MHD simulation using the M3D code investigates magnetically confined fusion plasmas in toroidal configurations. A major focus is the study of realistically shaped plasmas (D-shaped cross sections with one or two X-points on the plasma boundary, near the corners of the D) with a freely moving boundary, surrounded by a "vacuum" region and in turn surrounded by a solid wall. Additional outer vacuum-wall systems can exist. Previous studies have concentrated on plasmas bounded by a rigid conducting wall, but a freely moving boundary introduces important new physics, including a natural source of magnetic chaos near the plasma edge, that couples into the plasma core. Recent results at high resolution (due to both improved algorithms and computers with many more available processors) have allowed nonlinear simulations of experimental plasmas using realistic or nearly realistiac  values of the plasma resistivity, a long-sought goal for fusion MHD simulations. Although not studied in detail, the present code and its computational algorithms are capable of handling  MHD turbulencein existing plasmass  (toroidal harmonics up to at least n=40, poloidal harmonics at least 4X higher, radial grid on the order of half the thermal ion  gyroradius). 
 
Extension of the physics of the MHD plasma model will become a major focus over the next 3-5 years, since MHD does not completely describe the actual plasma and the differences will become much clearer as more simulations  are carried out. The physics and computational aspects of the extension will be the primary focus of the next 3-5 years.  Additional physics in extended MHD will also contribute directly to turbulence,  in particular anisotropic plasma temperature  (different values along and across the strong  magnetic field; the two temperaturescan be modeled as separate fluids). Another will be the MHD simulation of next generation fusion experiments such as ITER, whose large size and low collisionality means that much higher spatial resolution  will be needed. A third is the tighter coupling of  MHD and particle codes, on the time-step level, as currently being developed  in the SciDAC CPES and later, the FSP projects (eg, M3D and XGC). 

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

M3D code: MPP version and OpenMP version 
Toroidal configuration. Plasma surrounded by MHD vacuum, bounded by rigid wall  
Finite volume (triangles) in 2D poloidal planes, linear or higher order 
Unstructured grid 
Fourier or finite difference in toroidal angle 
MPP via PETSc MPI library 
OpenMP uses own subroutines, including plotting 
MPP visualization output written in HDF5; AVS/Express interface

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Difficulty in 
a. Running moderate size jobs (few hundred to few 1000's processors) 
for long wall clock times to follow nonlinear time evolution to saturation. 
Jobs of 800 to 2000 processors have a long wall clock turn around time that 
is too long for practical full runs. (This job size should scale well.) 
Small memory requirement per core and per run is set by the job speed - approx 3000 grid points per proc gives a reasonable wall clock time. 
b. Running many related small jobs simultaneously. 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 Core-Hours

NERSC Hours Used in 2009

 0 Core-Hours

Number of Cores Used in Typical Production Run

 432-768

Wallclock Hours of Single Typical Production Run

 200-300

Total Memory Used per Run

 GB

Minimum Memory Required per Core

 GB

Total Data Read & Written per Run

 22 GB

Size of Checkpoint File(s)

 0.44 GB

Amount of Data Moved In/Out of NERSC

 GB per  

On-Line File Storage Required (For I/O from a Running Job)

 TB and  Files

Off-Line Archival Storage Required

 TB and  Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

PETSc MPI library 
HDF5 
AVS/Express 
Visit 

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

 

Anticipated Number of Cores to be Used in a Typical Production Run

 

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

 

Anticipated Total Memory Used per Run

 GB

Anticipated Minimum Memory Required per Core

 GB

Anticipated total data read & written per run

 GB

Anticipated size of checkpoint file(s)

 GB

Anticipated Amount of Data Moved In/Out of NERSC

 GB per  

Anticipated On-Line File Storage Required (For I/O from a Running Job)

 TB and  Files

Anticipated Off-Line Archival Storage Required

 TB and  Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Physics will emphasize more turbulent and chaotic simulations in more complicated configurations. The number of unknowns theoretically and computationally mean it is difficult to predict future requirements. 
Denser, nonuniform unstructured spatial grids need better search and connection methods; possibly 3D elements to replace Fourier dependence in toroidal direction. 
Coupled MHD and particle codes or internal particle models will have greater emphasis.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

Not known.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

*If computers become multicore, need a multi-core PETSc. 
*Better visualization for large parallel runs with many time slices, especially 3D. 
*Porting of MPP code version to run on standard clusters (or cloud?) - the major stumbling block now is the MPI checkpoint writes fail on non-MPP files systems. This is critical for validation and verification; there are a number of users interested in applying M3D at a few 100 processors on their local cluster. 
*Development of tests for random processor failure (if possible). While MHD is typically sensitive to spurious numbers generated in any processor (job will usually blow up), some may creep in and be difficult to detect. Some instances occur at 400-500 processors; problem will be proportional to number of processors. 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

M3D is designed to separate the physics and the computational algorithms. The OpenMP version of M3D should help develop a mixed MPP/multi-core computation model. The MPP and OpenMP versions share the same Fortran code that describes the physics. Operators, global operators (eg, max,min,volume integrals), and matrix solves call subroutines that use the appropriate algorithm. The Fortran version also preserves a significant part of the matrix arithmetic structure from the early vector code. A multi-core version of PETSc would be the simplest solution. 

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

Understand the edge of a fusion plasma well enough to gain practical control of edge instabilities (suppress dangerous large instabilities, while allowing small oscillations that remove  impurities and promote a favorable plasma steady state. Begin to understand the edge-generated chaos in fusion plasmas and its importance to the core plasma and global energy and  particle confinement. Confinement is the main unknown that makes it difficult to design a  fusion reactor (or a next step burning device). 
 
Expanded resources: long wall-clock time jobs with more checkpoints saved, capability to run multiple jobs, both small and large (parameter scans, compare different physics models), Visualization and other analysis tools for large jobs.