Cameron Geddes
1.1. Project Information - Particle simulation of laser wakefield particle acceleration
|
Document Prepared By |
Cameron Geddes |
|
Project Title |
Particle simulation of laser wakefield particle acceleration |
|
Principal Investigator |
Cameron Geddes |
|
Participating Organizations |
LBNL, Tech-X |
|
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: DARPA |
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.
Laser-plasma acceleration of charged particles shows great promise for reducing the cost and size of next-generation electron and positron accelerators for the DOE high energy physics program. Plasmas are not subject to the electrical breakdown that limits conventional accelerators, accelerating gradients thousands of times those obtained in conventional accelerators have been obtained using the electric field of a plasma wave (wakefield) driven by an intense laser. Such accelerators will be important to scale beyond TeV energies for high energy physics and to provide brighter and smaller (laboratory- and hospital-scale) radiation sources including free-electron lasers, Thomson sources, and ultrafast THz.
Project emphasis is on simulating experiments at LBNL's LOASIS program, and planning future experiments, to quantitatively understand internal dynamics, evaluate controlled injection, understand beam propagation and improve interpretation of diagnostics. The plasma interaction in this regime is fully nonlinear, and particle distribution effects are important, making simulation essential but challenging. Time-explicit particle-in-cell (PIC) simulations have played a key role in supporting the rapid progress of laser plasma accelerators, including the physics behind the first narrow energy spread beams (2004), GeV e-beams in 3cm (2006). Applications of laser-plasma to colliders and advanced radiation sources (such as x-ray free-electron lasers or Thomson scattering generation of gamma-rays) require high-quality electron beams, and we therefor work both to improve numerical modeling of beam injection and propagation (2008,2009) and use the codes to understand how controlled injection of particles can improve beam quality (2008).
We are now designing next generation experiments at 10 GeV and staging of multiple LWFAs, for the BELLA PW laser facility in progress at LBNL, as well as controlled injection experiments to increase beam quality and stability. Over the next 3 years the BELLA facility will become operational, and is designed to test collider relevant accelerator stages. In the 3-5 year time frame, the simulations will therefor be developed to accurately design efficient 10 GeV stages for this facility, to understand staging of mulitple modules and transport of low emittance bunches required for collider and other applications, as well as to design injectors to create the required low emittance bunches. Other laser facilities are being planned in the same time frame. Electron beam driven plasma accelerators are also being developed, including the new FACET facility at SLAC, as detailed by Tsung et al. To address these goals, the project uses codes which include three-dimensional time-explicit particle in cell, laser envelope particle in cell, fluid plasma models, and Lorentz boosted simulation frames as described in detail below. There are other laser plasma accelerator simulation projects at NERSC using codes with similar algorithms, including OSIRIS.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
The main project code, VORPAL, is a parallel framework for finite-difference time domain (FDTD) simulations of fields and particles of various types, employing a variety of algorithms. Its algorithms and computational requirements are representative of PIC codes used in other projects such as OSIRIS. Fields and fluids are represented on a structured Cartesian mesh, while particles move through space. For this project, VORPAL is being used to model laser-plasma interactions and the associated particle acceleration. These simulations are electromagnetic (full Maxwell's equations are used) and relativistic. The electron plasma is usually represented by particles via PIC (particle-in- cell), but can also be represented as a cold, charged fluid. In addition to the standard algorithm which fully resolves the laser wavelength, laser fields can now be represented approximately by an "envelope" model for increased speed, which interacts with a full PIC treatment of the wakefields in the plasma. Computations can be conducted in a Lorentz boosted computational frame used to minimize disparity in scales between the laser and plasma and hence reduce the number of required time steps. Field-induced tunneling ionization and electron-impact ionization can be included. Particle tracks can be exported for radiation caculations. Collisions can also be modeled.
In simulating electromagnetics, VORPAL uses an explicit stencil to update E and B fields with 2nd-order accuracy on the standard dual Yee mesh. Relativistic particle motion is modeled by the 2nd-order leap-frog algorithm, and VORPAL uses high-order spline-based shapes for the current deposition. Linear finite difference operators are appropriately centered in space and time, to obtain global 2nd-order accuracy. E and B fields are leap-frogged in time. Particle (or fluid) and field updates are also leap-frogged. These operators are all local, which enables local communication via MPI and excellent scaling up to 8000 processors on Franklin.
For simulations using the laser envelope model, the Trilinos library suite (Aztec) is used to iteratively solve a Crank-Nicholson treatment of the field update.
The project also makes use of other particle in cell and fluid codes which use similar computational methods to VORPAL. A primary example is Warp which is a both a code and a general purpose framework for parallel three-dimensional Particle-In-Cell simulations of beams in
accelerators, plasmas, laser-plasma systems, non-neutral plasma traps,
sources, and other applications. It contains multiple field solvers
(electrostatic FFT, multigrid, electromagnetic), internal conductors
(cut-cell method with electrostatic solver), surface physics
(space-charge limited emission, secondary emission of electrons or
gas from impact of electrons or ions), volumetric ionization. It
employs advanced methods such as cut-cell boundaries, and has in particular been used to develop Lorentz boosted computational frame techniques and Adaptive
Mesh Refinement.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
We have run production simulations up to 11k-processors, with excellent scaling, and are limited in going further primarily by allocated processor hours and by machine availability (that is, it is easier to schedule 11k-procs for 24 hours than 33k for 8 hours). Related PIC codes (OSIRIS) have demonstrated excellent scaling to hundreds of thousands of processors on a warm plasma test problem. A primary issue to be resolved (as noted elsewhere) is scaling of parallel I/O (such as H5) for such machines especially for non-constant processor domain sizes.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: |
|
Architectures Used |
Cray XT IBM Power BlueGene Linux Cluster Other: |
|
Total Computational Hours Used per Year |
3400000 Core-Hours |
|
NERSC Hours Used in 2009 |
3400000 Core-Hours |
|
Number of Cores Used in Typical Production Run |
4000 |
|
Wallclock Hours of Single Typical Production Run |
24 |
|
Total Memory Used per Run |
100 GB |
|
Minimum Memory Required per Core |
<1 GB |
|
Total Data Read & Written per Run |
2000 GB |
|
Size of Checkpoint File(s) |
50 GB |
|
Amount of Data Moved In/Out of NERSC |
5000 GB per year |
|
On-Line File Storage Required (For I/O from a Running Job) |
2GB and 200 Files |
|
Off-Line Archival Storage Required |
20 GB and 20000 Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
HDF5 and assistance in tuning and working with it for large jobs, file system services and tuning assistance, VisIt, IDL. Visualization work and assistance in visualizing and analyzing large datasets, and in extracting physics data from them. Aztec/Trilinos, FFTW libraries.
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|
Computational Hours Required per Year |
150000000 |
|
Anticipated Number of Cores to be Used in a Typical Production Run |
500000 |
|
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
12 |
|
Anticipated Total Memory Used per Run |
100000 GB |
|
Anticipated Minimum Memory Required per Core |
<1 GB |
|
Anticipated total data read & written per run |
50000 GB |
|
Anticipated size of checkpoint file(s) |
1000 GB |
|
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
50 GB and 200 Files |
|
Anticipated Amount of Data Moved In/Out of NERSC |
10000 GB per year |
|
Anticipated Off-Line Archival Storage Required |
200 GB and 20000 Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
Multiple models may need to be integrated to handle the physics of each stage of the accelerator. This may include explicit PIC in the laboratory to handle the injection (creation) of the particle beam, envelope, quasistatic, or boosted frame PIC for long accelerator stages, and fluid models for the plasma structure together with PIC or other representations of the accelerated beam to reduce noise. Radiation and scattering contributions to beam quality will need to be modeled as well. Free-space propagation modules will be used to model the drift of the beams between stages in multi-stage collider designs to be developed, and for focusing.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).
Please see case study of D. Bruhwiler and F. Tsung for additional detail. Advanced accelerator simulations generally are not memory bound, and use large numbers of processors with relatively little memory/processor. Communication of the edge information from each processor to processors handling neighboring domains is required each step, so that communication is important (and may need to be multi-layered on many-core or GPU systems).
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Parallel file I/O such as H5 must be scaled to 10's - 100's of thousands of processors, and must be made robust to varying mesh sizes on different processors.
Error checking and job-relaunch services that detect if a job has terminated partway through and automatically restart it will become more important as jobs take up increasing numbers of nodes, with corresponding increase in failure possibilities.
To allow simulations to predictively guide experiments, scans of parameter space are needed (as are conducted in experiments), which will require the above job monitoring services together with automation to generate and run sequentially large numbers of jobs, and to extract the data from them.
Parallel visualization and analytics tools must be further developed, to provide similar functionality to well-known serial tools such as IDL/Matlab while providing access to petascale datasets. This is a serious challenge as even simple operators such as smoothing require communication or guard cells. The tools need to be robust and script-callable so as to be integrated into a batch workflow providing automatic analysis after batch compute jobs.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
Tech-X is working actively to develop VORPAL for GPUs and for other advanced architectures, and preliminary results are very promising. For further details see input of D. Bruhwiler. Related PIC codes such as UPIC (with which we collaborate under SCIDAC) have also shown good results on GPU architectures, as has VPIC on cell (Roadrunner). The internal structure of the codes must be reorganized in some cases to take advantage of these architectures. The PIC algorithm is hence well situated to take advantage of these new architectures.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
Based on past simulations, and using recently developed algorithms to reduce cost at the same time increased resources are available, it is anticipated that of order 50x scaling in computational resources will be needed to accurately design collider scale stages.
Past explicit PIC simulations of 100 MeV experiments required ~ Mhour simulations because of the requirement to resolve the laser wavelength (um) over the acceleration length (mm). The plasma length and diameter must both increase to increase beam energy, and meter-scale 10 GeV BELLA and collider relevant stages would then require at least Ghours with no new computational models and without accounting for the needed increase in resolution to provide increased accuracy in emittance transport.
Newly developed computational models will be used in conjunction with new computers to simulate the required stages. Laser envelope models reduce costs by not resolving the laser period, while performing calculations in a Lorentz boosted frame reduces cost by reducing the disparity in scale between the laser and plasma characteristic lengths. These models can save of order 10,000x in computational resources for meter-scale 10 GeV stages.
Using these models, it is anticipated that 10 GeV stages can be simulated using ~50x resources with order 10x the resolution of current simulations, and this will be needed to provide accurate modeling of emittance transport. At the same time, multiple 3D runs will be possible both to explore parameter space to improve beam quality and to simulate staging of the beam through multiple modules for high energies. Many runs at modest resolution will be made possible, and will be very important to allow simulations to predictively explore parameter space to guide experiments rather than being restricted to a few runs conducted after experiments to explain results. High resolution simulations of the particle injector will also be conducted to determine what combination of techniques is required to produce the required emittances for colliders and other applications.


