NERSCPowering Scientific Discovery Since 1974

Panagiotis Spentzouris

HEP Case Study Worksheet

1.1. Project Information - Community Petascale Project for Accelerator Science and Simulation (ComPASS)

Document Prepared By

Panagiotis Spentzouris

Project Title

Community Petascale Project for Accelerator Science and Simulation (ComPASS)

Principal Investigator

Panagiotis Spentzouris

Participating Organizations

Argonne National Laboratory, 
Brookhaven National Laboratory, 
Fermi National Accelerator Laboratory, 
Lawrence Berkeley National Laboratory, 
Stanford Linear Accelerator Center, 
Tech-X Corporation, 
Thomas Jefferson National Accelerator Facility, 
University of California, Los Angeles,

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

Particle accelerators are critical to scientific discovery in the DOE program in America and indeed the world. The development and optimization of accelerators is essential for advancing our understanding of the fundamental properties of matter, energy, space, and time, and for enabling research in materials sciences, chemistry, geosciences, and aspects of biosciences.  
 
The High Energy Physics (HEP) program uses accelerators to answer fundamental questions about nature such as the origin of mass and the asymmetry between matter and antimatter and to search for new particles, new symmetries, and possible extra dimensions of space. In the DOE 15-year plan for HEP, the first two action items call for full support of the program of the Large Hadron Collider (LHC) at CERN and for the establishment of leadership in the R&D effort to design and build the proposed International Linear Collider (ILC) on U.S. soil. Even with the current HEP budget difficulties, the recent report of the Particle Physics Project Prioritization Panel (P5) emphasizes in its recommendations the need to maintain leadership in both the energy and the intensity frontier of accelerator science. At the same time, it is imperative to maximize the physics reach of the ongoing DOE/HEP program, and that involves the performance optimization of the Fermilab Tevatron. Furthermore, DOE/HEP is supporting a world-class R&D program to develop new accelerator technologies including laser wakefield and plasma wakefield accelerators, as well as other types of advanced accelerator concepts. 
 
Under SciDAC1, AST, the predecessor project to ComPASS, produced a powerful suite of parallel simulation tools representing a paradigm shift in computational accelerator science. Simulations that used to take weeks or more now take hours, and simulations once thought impossible are now performed routinely. A lot of these successful applications utilized NERSC facilities and their development benefited from the NERSC infrastructure. 
 
Because of the complexity, precision, and beam intensity requirements of next generation accelerators, our paradigm has to change from single machine, single-component simulations to end-to-end, multi-physics simulations. In FY09, ComPASS will continue to develop applications in a comprehensive, integrated accelerator simulation environment. These applications include large-scale electromagnetic modeling of SRF cavities (ILC design) for the Fermilab proton driver (Project-X) design, with realistic cavity shapes and misalignments; assessment of the impact of wakefields on beam dynamics; and multiphysics, multi-bunch modeling of the Fermilab Main Injector and Booster, for performance optimization under the current and Project-X operating conditions. We will also focus on design optimization of accelerator components with complicated geometries such as the LHC crab cavity, which includes couplers with very fine features. We will also perform beam-beam and electron-cloud simulations to help understand and optimize LHC machine performance. Our applications emphasize the interaction of beam dynamics and electromagnetics codes. In addition, the project will assist the development of advanced accelerator concepts. We will provide real-time or near-real-time feedback between simulation and advanced accelerator experiments.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

The ComPASS project funds the development of three broad categories of codes for: a) machine design and optimization, b) component design and optimization, c)support of new accelerator techniques and technologies. The last two categories are covered in the other case study sheets, so here we focus on (a). Main codes are Synergia, ML/Impact (both multi-physics frameworks), BeamBeam3D and NIMZOVICH (single purpose codes). The codes utilize electrostatic particle-in-cell model with structured grids, with different strategies and solver implementations: 
a. Depending on the physics of the problem, the codes might use domain decomposition, particle decomposition, or hybrid decomposition. There may be communication of particle data, grid data, or both. Particle movement between Poisson solves may be slight or large, hence, some codes use a particle manager and some do not use a particle manager.  
b. Solvers. Our codes utilize spectral based, finite difference based, and hybrid descritisations with FFT and multi-grid based solvers.  
 
Depending on the type of algorithm, we have different grid size limitations (due to memory requirements): typical large grid 1024^3 for the first scheme (both particles and grids distributed), 256^3 for the second (just grid). This results to requirements for 10 to 100M macroparticles, depending on the type of simulation. Paralelism is expressed using MPI.  

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Support of shared libraries (for framework applications). 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  Local Development Clusters

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 10M Core-Hours

NERSC Hours Used in 2009

 2.5M Core-Hours

Number of Cores Used in Typical Production Run

1k-10k (application dependent)

Wallclock Hours of Single Typical Production Run

48

Total Memory Used per Run

 1.5 to 16 GB

Minimum Memory Required per Core

 0.5 to 2 GB

Total Data Read & Written per Run

100 GB

Size of Checkpoint File(s)

1 GB

Amount of Data Moved In/Out of NERSC

 100 GB per  week

On-Line File Storage Required (For I/O from a Running Job)

 1 GB and 100000 Files

Off-Line Archival Storage Required

 10 GB and 200000 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

Functional parallel hdf5; shared library versions of standard libraries such as fftw, hdf5, etc. 

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

3M

Anticipated Number of Cores to be Used in a Typical Production Run

10k-100k

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

48-72

Anticipated Total Memory Used per Run

 2 to 32 GB

Anticipated Minimum Memory Required per Core

 0.5 to 2 GB

Anticipated total data read & written per run

 200 GB

Anticipated size of checkpoint file(s)

 2 GB

Anticipated On-Line File Storage Required (For I/O from a Running Job)

 2 GB and 10000 Files

Anticipated Amount of Data Moved In/Out of NERSC

200 GB per  week

Anticipated Off-Line Archival Storage Required

 20 GB and 200000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Algorithmic impovents to allow for longer time steps, solver strong scaling improvements.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

2GB memory per core or better.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

shared libraries, better queue organization for development and test jobs (that reduces dependence to local development resources). Utilization of ensemble runs with workflow for design parameter optimization will require support  
for error detection and recovery. 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

We are currently starting our research program on understanding how to effectively utilize GPUs. Our applications (for machine design and optimization) have two main components: particle tracking and field soves. Our efforts todate have demonstrated that we can do efficient tracking with high-order-optics on GPUs. We are investigating field solves on GPUs and hybrid schemes involving a misture of conventional procs and GPUs. We will need more information on the architecture of the future machines incorporating GPUs in order to design efficient multi-level parallelism schemes. 

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

Access to 50X resources will allow us to: 
a) deploy multi-scale, multi-physics beam dynamics simulations to predict beam loss and resulting activation in Intensity Frontier accelerators covering the full range of scales relevant to the problem, from 10^-3 m beams, to 10 m wakefields, to many 10^3 m propagation. Such simulations will be of significant importance for the design and operation of the short and mid-term FNAL future plans.  
b) deploy multi-scale, multi-physics beam dynamics simulations to help maximize luminosity in Energy Frontier accelerators. Such simulations will be important for helping maximize the output of the last years of the Tevatron, help diagnose potential LHC problems, and contribute to the design of the next generation lepton collider.  
 
Importance of "expanded HPC resources": more CPU hours, more throughput for small jobs, more memory.