Lie-Quan Lee
HEP Case Study Worksheet
1.1. Project Information - Advanced Modeling for Particle Accelerators
|
Document Prepared By |
Lie-Quan Lee |
|
Project Title |
Advanced Modeling for Particle Accelerators |
|
Principal Investigator |
Kwok Ko |
|
Participating Organizations |
SLAC, BNL, FNAL, ORNL, TJNAF |
|
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: |
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.
The goal of this porject is to do design and optimization for accelerator development and to support research in accelerator science aligned with HEP mission.
Scientists at BNL, FNAL, ORNL, SLAC, and TJNAF identified significant computational tasks for accelerators science and development in next three to five years.
Compact Linear Collider (CLIC) - (1) Simulating HOM damping in the heavily-damped Accelerating Structures (AS) and heavily-damped and over-moded Power Extract and Transfer Structures (PETS). (2) Evaluate the dark current (3) Simulating two-beam acceleration (AS coupled with PETS).
LHC Collimator and Crab Cavity for upgrade - (1) Determine the broadband impedance of collimators which dominate the LHC impedance budget using T3P. (2) Optimize the couplers of the current crab cavity design using Omega3P. These studies are important for beam stability in the LHC upgrade.
RHIC and LHC Beam Dynamics - (1) Wire compensator simulation: Long-range beam-beam interactions are expected to degrade beam quality in the LHC. The effects of wire compensator will be investigated and compared with the experimental results. (2) Calculate the beam emittance during luminosity stores in RHIC and LHC. (3) Explore electron lens compensation of head-on beam-beam interactions in RHIC and the LHC. (4) Investigate the impact of beam-beam interactions in designs of the LHC Interaction Region upgrade.
ILC Electron Cloud Instability - The build-up of electron cloud in the ILC damping ring may cause beam instability and hence reduction in luminosity. Simulations of electron cloud build-up and determination of the onset of single-bunch fast head-tail and coupled-bunch instabilities using the 3D code CMAD will shed insights in understanding the phenomenon and subsequently lead to improved design of the ILC damping rings.
ILC electron gun - The new design will eliminate a damping ring by providing a flat beam. We will use Pic3P to study the emittance dependence on the quantum efficiency of the cathode.
High gradient structure R&D - Theoretical and experimental efforts are under way to develop high gradient X-band structures for the future TeV linear collider. Track3P will be used to analyze the dark current in various high gradient structure options.
Muon Collider Cavity- The muon cooling cavity works under strong external magnetic fields. It has been observed that this field can enhance multipacting activities and dark current heating. Numerical simulations with a broad range of design parameter space will be carried out in optimizing the external magnetic field map and the cavity shape to minimize dark current and multipacting.
Photonic Band Gap (PBG) Structures - The PBG accelerating structure is an advanced accelerator concept to generate high field gradients for acceleration of particles. Continuing efforts will be carried out for the MIT PBG higher-order-mode (HOM) calculation in the microwave regime and the SLAC PBG coupler design in the optical regime.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
1) Omega3P. It is a parallel finite-element eigensolver for accelerator cavity analysis. Its mathematical algorithms include explicit and implicit restarted Lanczos for solving real generalized eigenvalue problems, second-order Arnoldi for complex quadratic eigenvaule problems, inverse iterations and Jacobi-Davidson for complex nonlinear eigenvalue problems. It also includes sparse direct solvers and Krylov subspace methods with spectral multilevel preconditioner for shifted linear systems.
The number of elements (Ne) in the mesh and the basis function order (p) for each element together decide the matrix size N. For example, using 2nd order basis functions (p=2), the matrix size is roughly N = 6.2 * Ne. And with p=3, the matrix size is N = 18 * Ne.
The code uses MPI for the coarse-grain parallelism. In the basic linear algebra operations, we often use shared-memory version to further explore on-node parallelism.
2) T3P. T3P is a finite-element time-domain analysis code for simulating wakefields due to a beam transit and studying self-consistent particle-field interactions. It uses the same finite-element discretization as that in Omega3P for space and the Newmark-beta scheme for implicit time-stepping. In each time-stepping, a linear system is solved. The size of the linear system can be estimated the same way as the matrix size in Omega3P, namely, N=6.2 * Ne. However, the matrix in T3P is symmetric positive definite. We use conjugate gradient with block Jacobian preconditioner where each core owns one block and performs incomplete factorization. The method has been proved to be very efficient and scalable.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
In solving various eigenvalue problems using Omega3P, we need to solve a series of highly indefinite linear systems because the eigenvalues of interest are interior and a spectral transformation is needed. One way to solve those highly indefinite linear systems is to use sparse direct solvers. Unfortunately, sparse direct solvers suffer from imbalanced and non-scalable per-node memory usage. Scalability of per-node memory usage is defined similar to scalability of speed, but its focus is on how the memory is consumed by the application code, not on how fast the code is executed.
The amount of physical memory available on the each node certainly is the single most important constraint to how large problem size we can simulate. On the other hand, we are actively developing software and algorithmic solution to address the memory usage scalability issue. The spectra multi-level preconditioner is one of the successful fruits of such research activities. It pushes the problem size we can solve to be one order magnitude larger. In addtion, working with scientists from SciDAC math CETs, we are developing a general-purpose hybrid linear solver for our simulation, which will further improve the scalability of memory usage.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: |
|
Architectures Used |
Cray XT IBM Power BlueGene Linux Cluster Other: |
|
Total Computational Hours Used per Year |
9,200,000 Core-Hours |
|
NERSC Hours Used in 2009 |
1,200,000 Core-Hours |
|
Number of Cores Used in Typical Production Run |
8000 |
|
Wallclock Hours of Single Typical Production Run |
12 |
|
Total Memory Used per Run |
6000 GB |
|
Minimum Memory Required per Core |
1.5 GB |
|
Total Data Read & Written per Run |
1000 GB |
|
Size of Checkpoint File(s) |
50 GB |
|
Amount of Data Moved In/Out of NERSC |
200 GB per 3 months |
|
On-Line File Storage Required (For I/O from a Running Job) |
1 GB and 100 Files |
|
Off-Line Archival Storage Required |
10 GB and 1000 Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
Paraview is the software we use to do visualization and analysis. We would like to do visualization and analysis on NERSC but it requires both software (paraview) and hardware supports (running parallel jobs). Visualization becomes increasingly difficult because of the resulting data sizes.
Parallel netcdf is the the IO library we use for checkpointing and output data.
Maybe making some kind of emerging architectures available will provide concerted efforts for users to explore future architectures and programing models. Certainly there is an advantage from scale point of view comparing to individual teams acting alone.
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|
Computational Hours Required per Year |
50,000,000 |
|
Anticipated Number of Cores to be Used in a Typical Production Run |
100,000 |
|
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
24 |
|
Anticipated Total Memory Used per Run |
75,000 GB |
|
Anticipated Minimum Memory Required per Core |
1.5 GB |
|
Anticipated total data read & written per run |
10000 GB |
|
Anticipated size of checkpoint file(s) |
500 GB |
|
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
10 GB and 100 Files |
|
Anticipated Amount of Data Moved In/Out of NERSC |
1000 GB per 3 months |
|
Anticipated Off-Line Archival Storage Required |
100 GB and 1000 Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
1) More accurate workload and communication model in the partitioning scheme for better load balancing and domain decomposition
2) More scalable linear solver which is the computational kernel of our finite element simulation suite
3) We explore the discontinuous Galerkin method to accelerator modeling. This is high-risk high-yield research activity. If successful, it could greatly improve the scalability of the simulation.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).
We presented two extreme computational requirements.
For Omega3P, we want to have 64GB to 128GB memory per node (not per core).
For T3P, 2GB memory per core is good but we want to emphasize on the large job supports.
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Infrastructure support for remote and interactive visualization will be critical since the resulting data will be so large that it is very inefficient to transfer data from NERSC to home institutions for postprocessing.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
The way we have designed our simulation code is very modular. It can easily swap some components with their alternatives, or plug in new components for the same functionality. For example, we can easily incorporate a new linear solver or a preconditioner, which usually are the keys to higher scalability and performance. If new components that can more efficiently use those many-core or heterogeneous architectures emerge, we certainly can quickly adapt them into our code. In addition, we ourselves are also actively exploring GPU computing and other alternatives. This summer we hired an intern to explore iterative linear solver on GPUs.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
With access to 50X the HPC resources, we can have the following scientific progress:
1) accurately predict wakefield effects of beam-environment interactions with realistic bunch-size for large complex accelerator structures to understand performance of the accelerator and to provide information for further design optimization
2) model self-consistent field-particle interactions in space-charge dominated devices such as electron sources over long time scales with high accuracy to provide capabilities for designing high-quality and high-brightness beams for basic and applied scientific research
3) understand dark-currents and RF breakdown issues that limit accelerating structures operated at high gradients so as to provide insights for designing more efficient accelerating structures
Importance of "expanded HPC resources":
More per-node memory will be extremely helpful in the foreseeable future in the frequency-domain analysis using Omega3P.
More CPU hours and the ability to handle large jobs are very important in time-domain analysis using T3P.


