John Ludlow
FES Requirements Worksheet
1.1. Project Information  Atomic and Molecular Physics for Controlled Fusion Energy
Document Prepared By 
John Ludlow 
Project Title 
Atomic and Molecular Physics for Controlled Fusion Energy 
Principal Investigator 
John Ludlow 
Participating Organizations 
Auburn University 
Funding Agencies 
DOE SC DOE NSA NSF NOAA NIH Other: EAEC, IAEA 
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project  highlighting its computational aspect  and outline its scientific objectives for the next 35 years. Please list one or two specific goals you hope to reach in 5 years.
We plan to apply existing and develop new theoretical and computational methods, which will make use of the world's latest advanced computing platforms, to calculate atomic and molecular collision processes of key importance to ongoing research in controlled fusion energy. We are supported by grants from the US Department of Energy, the European Atomic Energy Community, and the International Atomic Energy Agency. We currently have computing allocations at NERSC in Oakland, CA, NICS in Knoxville, TN, NCCS in Oak Ridge, TN, and ALCF in Argonne, IL; with pending proposals to HECToR in Edinburgh, UK and PRACE in Julich, DE.
Over the next five years we shall continue to provide the fundamental collisional rates required for light element fusion related species, which includes electronimpact excitation and ionization of atoms and diatomic molecules and their ions, dielectronic recombination of atomic ions, and heavy particle impact excitation, ionization, and charge transfer with atoms and diatomic molecules. We will also develop our collision codes to gradually progress to heavy element fusion related species, such as Mo, Xe, and W. We will continue to employ a variety of perturbative distortedwave and nonperturbative closecoupling methods depending on the complexity and the accuracy required for the atomic or molecular system under consideration.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
Our most computationally demanding codes use the RMatrix with PseudoStates (RMPS) and TimeDependent CloseCoupling (TDCC) methods to solve the Schrodinger and Dirac equations for a range of collisional processes in fusion related atoms and molecules.
The RMPS method is a basis set approach to solving the timeindependent Schrodinger and Dirac equations. Based upon accurate structure calculations, the method requires the formation and diagonalization of large symmetric Hermitian matrices for all eigenvalues and eigenvectors. Current problems need up to 50,000 cores.
The TDCC method is a lattice approach to solving the timedependent Schrodinger and Dirac equations. The method employs a mixture of explicit and implicit propagation techniques on multidimensional grids related to the number of active electrons in the problem. The TDCC2d codes for electron single ionization use 2500 cores, the TDCC3d cores for electron double ionization use 125,000 cores, the TDCC4d cores for bare ion double ionization use 50,000 cores, and a planned TDCC6D for bare ion double charge transfer will need 1,000,000 cores.
Both the RMPS and TDCC suite of codes are written in Fortran using MPI to implement parallelism. The codes are developed within our group and utilize externally written library routines, from Lapack, ScaLapack, NAG, and parallel NAG. Visualization employs library routines from NCAR and GNUPLOT, and MPEG movies for the timedependent codes.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
The RMPS codes will have to explore furthur degrees of parallelism in the formation of the Hamiltonian matrices. There is at least an order of magnitude difference in the scale of the calculations between light fusion related species and heavy species, such as W.
The TDCC codes will evolve to include a higher degree of dimensionality in collisional problems and be extended to address more complicated molecular problems. Lattice calculations involve a high level of communication between processors, especially as these multidimensional problems shall be distributed over an even greater number of processors.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
Facilities Used or Using 
NERSC OLCF ACLF NSF Centers Other: 
Architectures Used 
Cray XT IBM Power BlueGene Linux Cluster Other: 
Total Computational Hours Used per Year 
CoreHours 
NERSC Hours Used in 2009 
CoreHours 
Number of Cores Used in Typical Production Run 

Wallclock Hours of Single Typical Production Run 

Total Memory Used per Run 
GB 
Minimum Memory Required per Core 
GB 
Total Data Read & Written per Run 
GB 
Size of Checkpoint File(s) 
GB 
Amount of Data Moved In/Out of NERSC 
GB per 
OnLine File Storage Required (For I/O from a Running Job) 
TB and Files 
OffLine Archival Storage Required 
TB and Files 
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
LAPACK, ScaLAPACK, NAG, parallel NAG, NCAR
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
Computational Hours Required per Year 

Anticipated Number of Cores to be Used in a Typical Production Run 

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above 

Anticipated Total Memory Used per Run 
GB 
Anticipated Minimum Memory Required per Core 
GB 
Anticipated total data read & written per run 
GB 


Anticipated size of checkpoint file(s) 
GB 
Anticipated Amount of Data Moved In/Out of NERSC 
GB per 
Anticipated OnLine File Storage Required (For I/O from a Running Job) 
TB and Files 
Anticipated OffLine Archival Storage Required 
TB and Files 
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
The use of parallel I/O. Further reduction in memory footprint of codes per core.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 1 μs).
At least 8 GB memory per core will be needed for the RMPS codes. Low latency will be important for the TDCC codes to ensure efficient communication.
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
We need ScaLAPACK routines for matrix diagonalization of nonsymmetric real and complex matrices. We would also like to have ScaLAPACK FFT routines to supplement the current NAG parallel FFT routines.
4e. It is believed that the dominant HPC architecture in the next 35 years will incorporate processing elements composed of 10s1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
We shall continue to experiment with the OPEN_MP libraries to assess whether they help with nested Fortran do loops for small numbers of cores. For larger number of cores and GPUs,we will begin experiments with the CUDA Fortran approach to nested Fortran do loops.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
With access to 50 times the current HPC resources, electronimpact excitation and ionization of the near neutral stages of heavy elements such as Mo, Xe, and W can be studied. This will benefit impurity transport research on existing tokamaks and future machines like ITER. Bare ion collisions with atoms and molecules, including double charge transfer, would also be possible.
An order of magnitude more CPU hours, large memory nodes, large scratch space, and quicker throughput for large jobs are all important for future computational research.