Alex Friedman

FES Requirements Worksheet

1.1. Project Information - Simulation of intense beams for heavy-ion-fusion science (HEDLP / Inertial Fusion Energy)

Document Prepared By	Alex Friedman
Project Title	Simulation of intense beams for heavy-ion-fusion science (HEDLP / Inertial Fusion Energy)
Principal Investigator	Alex Friedman
Participating Organizations	LBNL, LLNL, PPPL, Univ. of Maryland
Funding Agencies	DOE SC DOE NSA NSF NOAA NIH Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

The US Heavy Ion Fusion Science Virtual National Laboratory (HIFS-VNL), a collaboration of LBNL, LLNL, and PPPL, conducts research on the science of intense heavy-ion beams, on high-energy-density physics (HEDP) and especially Warm Dense Matter (WDM) generated by ion beams, and on target physics for ion-beam-driven Inertial Fusion Energy. Ongoing experiments are therefore focused on generating, compressing, and focusing space-charge-dominated beams and using them to heat thin foils, the evolution and properties of which are then measured. To further this work, a new accelerator facility, the Neutralized Drift Compression Experiment-II (NDCX-II), is under construction at LBNL, with completion planned for 2012. Obtaining maximum benefit from these experiments is a key near-term goal of the simulation program. Recent simulation efforts in support of NDCX-II have concentrated on developing the physics and engineering design of the NDCX-II accelerator, on identifying favorable operating points, and on planning the Warm Dense Matter experiments to be done with its beam once operations ensue. As we transition to support of the actual experiments, the scope of our beam simulations must expand greatly, with more emphasis on detailed simulations of the actual beam as realized and its interaction with the target. This will include extensive studies of the coupled dynamics of the beam and the neutralizing plasma (which allows the beam to be compressed to a compact volume in space and time), and routine coupling of the ion beam data at the target plane from our main beam physics code, Warp, into our target physics codes ALE-AMR (run at NERSC) and Hydra (run at LLNL).

Intense ion beams are non-neutral plasmas and exhibit collective, nonlinear dynamics that must be understood using the kinetic models of plasma physics. This physics is rich and subtle: a wide range in spatial and temporal scales is involved, and effects associated with instabilities and non-ideal processes must be understood. In addition, multispecies effects must be modeled during the interaction of the beams with any stray electrons in the accelerator and with the neutralizing plasma in the target chamber. The models must account for the complex set of interactions among the various species, with the walls, and with the applied and self-fields. Finally, oscillations of the beam core and “mismatches” of the beam confinement can dilute the beam phase space and can parametrically pump particles into a low-density outlying, or “halo”, population; this physics imposes stringent requirements on numerical noise, requiring good particle statistics and mesh resolution. A blend of numerical techniques, all centered around the Particle-In-Cell method, are being used to tackle these problems, including: electrostatic and electromagnetic solvers, adaptive mesh refinement, cut-cell boundaries, large time step particle pusher, implicit field solvers and particle pushers.

Ion beams have a long memory, and initialization of a simulation at mid-system with an idealized particle distribution is often unsatisfactory; thus, a key goal is to develop and extensively exploit an integrated and detailed source-to-target beam simulation capability. The breadth of numerical algorithms that have been implemented in our codes to enable progressing toward high fidelity end-to-end calculations includes: (a) “computationally intensive,” based on massively parallel calculations using the lowest level of approximation with explicit solvers on uniform grids, small time steps, and large numbers of macroparticles; (b) “algorithmically intensive,” based on moderately large parallel calculations using the highest level of approximation with implicit solvers, AMR grids, large time steps, and moderate numbers of macroparticles; and (c) mid-range between computational intensity and algorithmic complexity by using combinations of the above-mentioned techniques. With the anticipated drastic increase in cores per node, decrease in memory per core, and increase in the computational cost of data movement, it will be important to preserve this level of flexibility in choice of algorithms to the user and developer, as it is very possible that one combination of algorithms will perform better on one machine while a different combination will be more adapted to another.

In order to determine material properties (such as equation of state) in the warm dense matter regime (WDM), we need to use simulation codes to help interpret experimental diagnostics. The WDM regime is at the boundaries of solid state physics and plasma physics, between non-degenerate material and degenerate, between ionized and neutral, between liquid and vapor. To understand and simulate the results of experiments that volumetrically heat material, many processes must be combined into a single simulation. These include, phase changes, ionization processes, shock processes, spall processes, and droplet formation, to name a few. Our goal is to fully characterize the experiments that will be conducted on the Neutralized Drift Compression Experiment II (NDCX-II) that is now being constructed at LBNL. This accelerator will use compressed ion beams to rapidly (sub-ns) heat material targets to temperatures of about 10,000 degrees K. The material is heated fast enough that although its temperature is well above the vaporization temperature its inertia will keep it (at least for an inertial confinement time of ~1 ns or so) at solid density. The experiment will be to record the response of the material (e.g. temperature, density, velocity) to infer the equation of state and other properties.

One particular aspect that we will focus on in this simulation project is the addition of surface tension effects to the 3D hydrodynamics code ALE-AMR. The formation of droplets has been one of the dominant features of data from the predecessor to NDCX-II, NDCX-I, yet there are no available codes that can describe the interplay between surface tension, inertial, and van der waals forces that form the droplets. The droplets are predicted to be on the 0.1 micron scale for NDCX-II, yet the gradient scale length for the ion deposition (i.e. the transverse focal spot size) is on the the 500 micron scale. So our intent is to model the formation of droplets initially on a microscopic patch of the target, but eventually increase the computational domain of the simulation until the entire beam heated part of the foil is included in the simulation.

Within 3 to 5 years, we expect that end-to-end simulations of the beam from source through target, and of the detailed target response, will be routinely carried out at high fidelity. We will also be exploring extensions to NDCX-II, and/or a follow-on facility, requiring extensive use of ensembles of runs, as has been done for the baseline design of NDCX-II.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

Our main ion-beam code, named Warp, was originally developed to simulate beam transport in induction accelerators for heavy-ion fusion (HIF). In recent years, however, the physics models in the code have been generalized, so that Warp can model beam injection, complicated boundary conditions, high-density plasmas, a wide variety of lattice elements, and the nonideal physics of beams interacting with walls and plasmas. The code now has an extensive international user base and is being applied to projects far removed from the HIF community.

Warp uses a flexible multi-species particle-in-cell model to describe beam dynamics and the electrostatic or electromagnetic fields in particle accelerators, particularly those driven by induction modules. While the core routines of Warp solve finite-difference representations of Maxwell's equations and relativistic or non-relativistic motion equations, the code also uses a large collection of subordinate models to describe lattice elements and such physical processes as beam injection, desorption, and ionization. The representation of particles by a much smaller number of "macroparticles" can be derived from Boltzmann's equation, describing the evolution of a population of particles interacting by collisions and the collective fields.

Warp is a 3-D time-dependent multiple-species particle-in-cell (PIC) code, with the addition of a Warped-coordinate particle advance to treat particles in a curved beam pipe. Self-fields are obtained via Poisson equations for the scalar and vector potentials or full electromagnetic via Maxwell equations. Simplified models are available for the self- magnetic and inductive forces. Time-dependent applied external fields can be specified through the Python user interface. Warp also has 2-D models, using Cartesian or cylindrical geometry, as well as a module representing the beam with a 4-D Vlasov formulation and with low-order moment equations. Models are available for background gas, wall effects, stray electrons, space-charge-limited and source-limited emission, and atomic processes such as charge exchange. Elaborate initialization and run-time options allow realistic modeling of induction accelerators. A beam may be initialized with one of many analytic distributions or with a distribution synthesized from experimental data, or ions can be emitted from a flat or curved diode surface. Lattice-element fields may be represented by several options, from simple hard-edge analytic forms to first-principles 3-D calculations. Poisson's equation can be solved using several methods, including FFT, Multigrid, and AMR/Multigrid. The EM solver can also use MR. With multigrid, the Shortley-Weller method for the subgrid-resolution description of conductors allows the use of complicated boundary conditions.

Parallelization of Warp is done using domain decomposition with MPI. Warp uses independent spatial decompositions for particles and for field quantities, allowing the particle and field advances to be load-balanced independently, and we recently added more general decompositions. In transverse-slice 2-D runs, the field solution is duplicated in each node, but solved in parallel by processors within a node.
The size and duration of Warp jobs varies tremendously, depending on such factors as problem dimensionality, grid size, duration, particle count, and the physical processes being modeled. However, with our generalized decomposition, we do not foresee any limitation resulting from the code's architecture. For a 3-D test problem using 512x512x512 cells, we have demonstrated excellent parallel scaling of the electromagnetic PIC capability, up to nearly 35,000 processors.

Our Warp projects tend not to be data intensive, using modest amounts of memory but requiring many time steps. We typically run 2-D and 3-D, with of order 100 grid cells along each axis. Higher resolution and large 3-D simulations typically have a mesh of order 100s by 100s by 1000s of grid cells. The data is either a single point per cell, or a 3-D vector per cell. Typically of order 1,000,000 particles are used, with 13 or more variables per particle. We currently use 512 to 1024 processors for typical Franklin runs and 4096 for a few key runs with fine grids and an augmented number of particles.

ALE-AMR is a new code that combines Arbitrary Lagrangian Eulerian (ALE) hydrodynamics with Adaptive Mesh Refinement (AMR) to connect the continuum to microstructural regimes. The code is unique in its ability to model hot radiating plasmas and cold fragmenting solids. Numerical techniques were developed for many of the physics packages to work efficiency on a dynamically moving and adapting mesh. A flexible strength/failure framework allows for pluggable material models. Parallelism is currently MPI-only, although are investigating hybrid models.

We also employ the Hydra code, a mature and capable 3-D radiation hydrodynamics code. This code is run on LLNL computers, which are accessed from LBNL and LLNL by members of our group.

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Our computer time allocation is currently a limiting factor. For some years our requirements for NERSC time were modest (and our allocation commensurate, at about 100,000 cru’s). We were concentrating on NDCX-II machine design using simplified tools such as the 1-D beam physics code (ASP), and on Warm Dense Matter simulations using both Hydra (at LLNL) and a specialized 1-D code (Dish). More recently, two developments compel us to carry out far more demanding simulations at NERSC: (1) the need to capture beam-in-plasma effects (requiring far greater numbers of simulation particles, and a smaller timestep size and grid spacing); and (2) the introduction of the ALE-AMR code into our group (requiring fines zoning for realistic problems). A NISE allocation of 500,000 cru’s has been an enormous boon.

Regarding NERSC-specific issues, the long batch queues and the limitations on interactive use have been impeding factors. Interactivity is useful for debugging and for diagnostics development, because Warp produces many of its diagnostics on-line (we thereby avoid many massive data transfers, offloading mostly processed data). The Lawrencium and Fusion clusters at LBNL have been more convenient, especially for multi-week runs, but of course both are limited in capacity, capability, and availability.

The initialization time of the ALE-AMR code on Franklin is problematic. Our colleagues at LLNL have run the same code on Livermore machines with an Infiniband interconnect and the initialization time is 10 orders of magnitude smaller than the initialization time on Franklin.

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using	NERSC OLCF ACLF NSF Centers Other: LBNL (Lawrencium) & PPPL clusters
Architectures Used	Cray XT IBM Power BlueGene Linux Cluster Other:
Total Computational Hours Used per Year	200k to 300k Core-Hours
NERSC Hours Used in 2009	74k (200k so far in 2010) Core-Hours
Number of Cores Used in Typical Production Run	128 (8 simultaneous cases)
Wallclock Hours of Single Typical Production Run	35 (256 cases total)
Total Memory Used per Run	60 GB
Minimum Memory Required per Core	1 GB
Total Data Read & Written per Run	100 GB
Size of Checkpoint File(s)	<100 GB
Amount of Data Moved In/Out of NERSC	10 GB per month
On-Line File Storage Required (For I/O from a Running Job)	0.1 TB and 500 Files
Off-Line Archival Storage Required	1 TB and 5000 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

The Warp code is deeply tied to Python. The main requirement for it is to have the capability of dynamic loading of shared objects on the compute nodes. A secondary requirement, related to the start up time, is having an efficient mechanism for python on all of the nodes to read in the sizable number of start up scripts and dynamic objects. As noted above, the requirement of rapidly distributing code to all nodes and effecting initialization is shared by ALE-AMR.

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year	60,000,000
Anticipated Number of Cores to be Used in a Typical Production Run	10k/case (>=1 case/run)
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above	1-10/case
Anticipated Total Memory Used per Run	600/case GB
Anticipated Minimum Memory Required per Core	1 GB
Anticipated total data read & written per run	100/case GB
Anticipated size of checkpoint file(s)	<100/case GB
Anticipated Amount of Data Moved In/Out of NERSC	100 GB per month
Anticipated On-Line File Storage Required (For I/O from a Running Job)	10 TB and 1000 Files
Anticipated Off-Line Archival Storage Required	100 TB and 10000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

The main components of Warp’s core loop are those of a standard PIC code: field solve, gathering of field onto particles, particles push, charge or current deposition on grid. The main challenge will be to ensure that the code runs efficiently on upcoming manycores/GPU architectures, where data locality will be of prime importance. Electromagnetic solve (based on FDTD explicit solvers), gathering of fields, particle push and charge or current deposition are local and should not be especially problematic, assuming that the particles are sorted in sufficiently compact grid blocks. Poisson solves, which are based on FFT and multigrid solvers, might be more challenging.

We are incorporating a surface tension model into the ALE-AMR code so we can use this code to investigate the formation of droplets in the NDCX experiments. We are investigating methods to include a surface tension model through operator splitting of the physics.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).

No special needs. As for many PIC codes, we benefit from fast gather and scatter-add.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

Nothing unusual.

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

Because our central paradigm is the particle-in-cell method, we will partner with others in that community and the computer science community, to develop, test, and implement multi-level decompositions that make effective use of the new architectures that emerge. We plan to extract kernels of the main components and analyze their projected performance on those architectures, and select the most effective methods.

The paradigm is essentially local, but is complicated by the need for a scatter-add deposition of particle currents onto a mesh. Thus integrity of the mesh-based source terms must be preserved by any new algorithm. Initial efforts at porting PIC methods to GPU architectures have been encouraging. However, while our beams are non-neutral or neutralized plasmas, they are not homogeneous plasmas, and boundary condition handling can be nontrivial. We must also deal with collision events (between charged particles, and with neutrals), and must collect more extensive particle diagnostics than on most other PIC codes. These attributes will need to be accommodated.

Much of our work has used an electrostatic field model, which is appropriate for our conditions. However, such a model requires a global field solution; even though we use multi-grid methods to effect this solution, it may remain a hindrance to performance with very large numbers of cores. Thus we may shift emphasis to an explicit electromagnetic Maxwell-Vlasov formulation. Given the denser neutralizing plasmas anticipated in future experiments, the timestep size may not be any smaller for EM than it would be for ES.

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

Our principal goals, and activities in support of those goals, over the next five years are to:

(1) Optimize the performance of the NDCX-II ion beam facility for each class of planned target experiments; achieve quantitative agreement with measurements; develop improved machine configurations and operating points. To accomplish these goals, we plan to use Warp to simulate the NDCX-II beams from source to target, in full kinetic detail, and including first-principles modeling of beam neutralization by plasma. The output from an ensemble of Warp runs (representing characteristic shot-to-shot variations) will be used as input to target simulations using ALE-AMR and other codes.

(2) Develop enhanced versions of NDCX-II (the machine is designed to be extensible and reconfigurable), and carry out studies to define a next-step ion beam facility beyond NDCX-II. To accomplish these goals, much of the work will involve iterative optimization employing Warp runs that assume ideal beam neutralization downstream of the accelerator. At present, optimization is achieved by running ensembles, since the particle statistics currently employed are not conducive to gradient-based methods; we hope the new computer resources will enable us to use such methods.

(3) Carry out detailed target simulations in the Warm Dense Matter regime using the ALE-AMR code, including surface tension effects, liquid-vapor coexistence, and accurate models of both the driving beam and the target geometry. For this we will need to make multiple runs (so as to capture shot-to-shot variations), and to both develop and employ synthetic diagnostics (so as to enable direct comparison with experiments). The new science that will be revealed is the physics of the transition from the liquid to vapor state of a volumetrically superheated material, where droplets are formed, and where the physics of phase transitions, surface tension and hydrodynamics all are playing significant roles in the dynamics. These simulations will enable calculations of equation of state and other material properties, and will be of interest in their own right for their illumination of the science of droplet formation.

The "expanded HPC resources" that are important for our project, with regard to the Warp elements, we most need more CPU hours, and more throughput for multiple small jobs (ensembles of runs are important to our research goals). While we do intend to run the occasional very large job as a convergence check, the majority of resources we expect to use will be devoted to the classes of runs listed in the summary table (4a).