Linda Sugiyama

FES Requirements Worksheet

1.1. Project Information - Title

Document Prepared By	Linda Sugiyama
Project Title	Title
Principal Investigator	Linda Sugiyama
Participating Organizations	test data
Funding Agencies	DOE SC DOE NSA NSF NOAA NIH Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

Extended MHD simulation using the M3D code investigates magnetically confined fusion plasmas in toroidal configurations. A major focus is the study of realistically shaped plasmas (D-shaped cross sections with one or two X-points on the plasma boundary, near the corners of the D) with a freely moving boundary, surrounded by a "vacuum" region and in turn surrounded by a solid wall. Additional outer vacuum-wall systems can exist. Previous studies have concentrated on plasmas bounded by a rigid conducting wall, but a freely moving boundary introduces important new physics, including a natural source of magnetic chaos near the plasma edge, that couples into the plasma core. Recent results at high resolution (due to both improved algorithms and computers with many more available processors) have allowed nonlinear simulations of experimental plasmas using realistic or nearly realistiac values of the plasma resistivity, a long-sought goal for fusion MHD simulations. Although not studied in detail, the present code and its computational algorithms are capable of handling MHD turbulencein existing plasmass (toroidal harmonics up to at least n=40, poloidal harmonics at least 4X higher, radial grid on the order of half the thermal ion gyroradius).

Extension of the physics of the MHD plasma model will become a major focus over the next 3-5 years, since MHD does not completely describe the actual plasma and the differences will become much clearer as more simulations are carried out. The physics and computational aspects of the extension will be the primary focus of the next 3-5 years. Additional physics in extended MHD will also contribute directly to turbulence, in particular anisotropic plasma temperature (different values along and across the strong magnetic field; the two temperaturescan be modeled as separate fluids). Another will be the MHD simulation of next generation fusion experiments such as ITER, whose large size and low collisionality means that much higher spatial resolution will be needed. A third is the tighter coupling of MHD and particle codes, on the time-step level, as currently being developed in the SciDAC CPES and later, the FSP projects (eg, M3D and XGC).

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

M3D code: MPP version and OpenMP version
Toroidal configuration. Plasma surrounded by MHD vacuum, bounded by rigid wall
Finite volume (triangles) in 2D poloidal planes, linear or higher order
Unstructured grid
Fourier or finite difference in toroidal angle
MPP via PETSc MPI library
OpenMP uses own subroutines, including plotting
MPP visualization output written in HDF5; AVS/Express interface

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Difficulty in
a. Running moderate size jobs (few hundred to few 1000's processors)
for long wall clock times to follow nonlinear time evolution to saturation.
Jobs of 800 to 2000 processors have a long wall clock turn around time that
is too long for practical full runs. (This job size should scale well.)
Small memory requirement per core and per run is set by the job speed - approx 3000 grid points per proc gives a reasonable wall clock time.
b. Running many related small jobs simultaneously.

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using	NERSC OLCF ACLF NSF Centers Other:
Architectures Used	Cray XT IBM Power BlueGene Linux Cluster Other:
Total Computational Hours Used per Year	Core-Hours
NERSC Hours Used in 2009	0 Core-Hours
Number of Cores Used in Typical Production Run	432-768
Wallclock Hours of Single Typical Production Run	200-300
Total Memory Used per Run	GB
Minimum Memory Required per Core	GB
Total Data Read & Written per Run	22 GB
Size of Checkpoint File(s)	0.44 GB
Amount of Data Moved In/Out of NERSC	GB per
On-Line File Storage Required (For I/O from a Running Job)	TB and Files
Off-Line Archival Storage Required	TB and Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

PETSc MPI library
HDF5
AVS/Express
Visit

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year
Anticipated Number of Cores to be Used in a Typical Production Run
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above
Anticipated Total Memory Used per Run	GB
Anticipated Minimum Memory Required per Core	GB
Anticipated total data read & written per run	GB
Anticipated size of checkpoint file(s)	GB
Anticipated Amount of Data Moved In/Out of NERSC	GB per
Anticipated On-Line File Storage Required (For I/O from a Running Job)	TB and Files
Anticipated Off-Line Archival Storage Required	TB and Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Physics will emphasize more turbulent and chaotic simulations in more complicated configurations. The number of unknowns theoretically and computationally mean it is difficult to predict future requirements.
Denser, nonuniform unstructured spatial grids need better search and connection methods; possibly 3D elements to replace Fourier dependence in toroidal direction.
Coupled MHD and particle codes or internal particle models will have greater emphasis.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

Not known.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

*If computers become multicore, need a multi-core PETSc.
*Better visualization for large parallel runs with many time slices, especially 3D.
*Porting of MPP code version to run on standard clusters (or cloud?) - the major stumbling block now is the MPI checkpoint writes fail on non-MPP files systems. This is critical for validation and verification; there are a number of users interested in applying M3D at a few 100 processors on their local cluster.
*Development of tests for random processor failure (if possible). While MHD is typically sensitive to spurious numbers generated in any processor (job will usually blow up), some may creep in and be difficult to detect. Some instances occur at 400-500 processors; problem will be proportional to number of processors.

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

M3D is designed to separate the physics and the computational algorithms. The OpenMP version of M3D should help develop a mixed MPP/multi-core computation model. The MPP and OpenMP versions share the same Fortran code that describes the physics. Operators, global operators (eg, max,min,volume integrals), and matrix solves call subroutines that use the appropriate algorithm. The Fortran version also preserves a significant part of the matrix arithmetic structure from the early vector code. A multi-core version of PETSc would be the simplest solution.

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

Understand the edge of a fusion plasma well enough to gain practical control of edge instabilities (suppress dangerous large instabilities, while allowing small oscillations that remove impurities and promote a favorable plasma steady state. Begin to understand the edge-generated chaos in fusion plasmas and its importance to the core plasma and global energy and particle confinement. Confinement is the main unknown that makes it difficult to design a fusion reactor (or a next step burning device).

Expanded resources: long wall-clock time jobs with more checkpoints saved, capability to run multiple jobs, both small and large (parameter scans, compare different physics models), Visualization and other analysis tools for large jobs.