NERSCPowering Scientific Discovery Since 1974

Doug Toussaint

HEP Case Study Worksheet

1.1. Project Information - Quantum Chromodynamics with three flavors of dynamical quarks

Document Prepared By

Doug Toussaint

Project Title

Quantum Chromodynamics with three flavors of dynamical quarks

Principal Investigator

Doug Toussaint

Participating Organizations

University of Arizona, University of Utah, Indiana University, University of California Santa Barbara, Washington University, University of the Pacific

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

We are engaged in a broad research program to study quantum chromodynamics (QCD), including the dynamical effects of three flavors of quarks. Although there is little doubt that QCD is the correct theory of the strong interactions, non-perturbative QCD calculations are crucial for testing the weak interaction part of the Standard Model: In the absence of such calculations the strong effects completely obscure the weak physics one is trying to study. At present the only means of carrying out non-perturbative QCD calculations from first principles and with controlled errors is through large scale numerical simulations. These simulations are needed to obtain a quantitative understanding of the physical phenomena controlled by the strong interactions, to determine a number of the basic parameters of the Standard Model, and to make precise tests of the Standard Model's range of validity. Despite the many successes of the Standard Model, it is believed by high energy physicists that to understand physics at the shortest distances a more general theory, which unifies all four of the fundamental forces of nature, will be required. The Standard Model is expected to be a limiting case of this more general theory, just as classical mechanics is a limiting case of the more general quantum mechanics. A central objective of the experimental program in high energy physics, and of lattice QCD simulations, is to determine the range of validity of the Standard Model, and to search for new physics beyond it. Thus, QCD simulations play an important role in efforts to obtain a deeper understanding of the fundamental laws of physics.Together with the Fermilab Theory group, our collaboration has been carrying out a long-term and successful project aimed at determining to high precision several key parameters of the Standard Model called the CKM matrix elements. These parameters control weak transitions between quark flavors. We are especially interested in the least well-known CKM parameters involving the bottom and charm quarks. To determine them requires precise experimental measurements of the decays of particles containing a heavy quark and precise numerical simulations of the strong interaction environment of these decays. The precision of the resulting CKM matrix element depends on the precision of both experiment and theory. It is essential that precision in theory keep pace with experiment. 
 
Our current calculations are based on the large library of gauge field configurations (lattices) that we have generated over the past several years. They represent snapshots of the QCD vacuum. To determine the quantities needed for the CKM matrix elements, we follow the propagation and transitions of heavy and light quarks on these gauge field configurations and average the results over the ensemble of configurations. We have obtained excellent results thus far \cite{Vub,Vcb}, using gauge field ensembles with lattice spacing ranging from 0.09 to 0.15 fm. But we need better 
precision. To get it requires going to smaller lattice spacing and lighter quark masses. For this reason we have begun processing our 0.06 fm ensembles. Our lattice artifacts decrease approximately as the square of the lattice spacing, so this project should reduce errors from this source by about a factor of two. Our 0.06 fm ensembles were generated in the presence of light sea quarks (up, down and strange) at approximately the physical strange quark mass and with equal up and down masses ranging from 0.1 to 0.4 times the strange quark mass. We need all of them in order to extrapolate (``chiral 
extrapolation'') to the physical up and down quark masses (approximately $0.035 m_s$ ??). Obviously, the precision of this extrapolation depends critically on the precision of the calculation at the smallest light quark mass. 
 
We are currently implementing an improved method for discretizing the quark fields which reduces all the discretization errors, at the cost of a factor of two to three in computational power for each lattice site. Also, it should allow us to simulate near the physical light quark mass, reducing the need for a chiral extrapolation. With the combination of reduced lattice artifacts, smaller discretization errors, larger physical volumes and higher statistics we hope to achieve accuracies of one percent for many of the hadron matrix elements needed to determine fundamental parameters of the standard model, or perhaps to expose deviations from the standard model predictions. 

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

We use the MILC collaboration's QCD code suite. This code uses MPI for parallel processing on almost all machines. The simulation uses as uniform four dimensional grid. Grid sizes in current simulations range from 16x16x16x48 to 64x64x64x144. The most challenging simulations, and those which drive our planning, are on the larger grids. The basic simulation algorithm is a molecular dynamics evolution of the gauge field configurations through an imaginary simulation time, usually combined with a Metropolis accept/reject step to compensate for step size errors. The quark fields are formally integrated out, leading to a nonlocal force. Computation of this force involves solution to a sparse matrix system, which is done with the conjugate gradient algorithm. 
 
Analysis of the stored configurations requires comparable computing time to their generation. The overwhelming majority of the analysis time is used by the conjugate gradient solution to the sparse matrix systems. 

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

The limits are, more or less in order, interprocessor communication speed, memory bandwidth within a node, and CPU speed. In most of our computations, input/output is not a major bottleneck. 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other: Fermilab computing system

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 60,000,000 (approx) Core-Hours

NERSC Hours Used in 2009

 13,000,000 Core-Hours

Number of Cores Used in Typical Production Run

128-16384 (BGP), 6144 (NERSC)

Wallclock Hours of Single Typical Production Run

8

Total Memory Used per Run

 70 GB

Minimum Memory Required per Core

GB

Total Data Read & Written per Run

 30 GB

Size of Checkpoint File(s)

10 GB

Amount of Data Moved In/Out of NERSC

 100 GB per  week

On-Line File Storage Required (For I/O from a Running Job)

 0.5 GB and  100 Files

Off-Line Archival Storage Required

 600 (mostly other centers) GB and 100000 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

 

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

300,000,000

Anticipated Number of Cores to be Used in a Typical Production Run

20,000

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

8

Anticipated Total Memory Used per Run

 300 GB

Anticipated Minimum Memory Required per Core

 GB

Anticipated total data read & written per run

 90 GB

Anticipated size of checkpoint file(s)

 30 GB

Anticipated On-Line File Storage Required (For I/O from a Running Job)

1.5 GB and  100 Files

Anticipated Amount of Data Moved In/Out of NERSC

300 GB per  week

Anticipated Off-Line Archival Storage Required

 2000 GB and  100000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

We will need to modify our codes to handle 100-core chips and/or GPU's.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

We are investigating the use of GPU's, but have not yet reached a point where we understand whether they will work efficiently for our projects, and how difficult it will be to use them.  

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

We hope to reduce theoretical uncertainties in determining fundamental parameters of the standard model to the one percent level, or to smaller than the corresponding experimental uncertainties. This should help physicists understand some of the seemingly arbitrary features of our current understanding. 
 
Roughly, we expect the amount of data we need to manipulate will go up by around 
a factor of three (memory per run, file sizes, transfers to/from NERSC), while the CPU requirements will increase by a much larger factor --- we could easily profit from a 50X increase.