HaiPing Cheng
BES Requirements Worksheet
1.1. Project Information 
Document Prepared By 
HaiPing Cheng 
Project Title 

Principal Investigator 
HaiPing Cheng 
Participating Organizations 
University of Florida 
Funding Agencies 
DOE SC DOE NSA NSF NOAA NIH Other: 
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project  highlighting its computational aspect  and outline its scientific objectives for the next 35 years. Please list one or two specific goals you hope to reach in 5 years.
The five year plan of research in my group will include the following subjects:1. Electronic and Magnetic properties of nanostructured surface, of interfaces btween soft and hard materials.2. Quantum Transport through tunneling junctions, semiclassical and classical charge transport in organic/bio materials. 3. Multiscale and multiphenomena simulations of composite materials and complex systems. 4. Impurity effects in high Tc superconductors.
Quantum calculations based on DFT is essential in our research projects. Systems size will routinely include 1001000 atoms, or 100010000 electrons.
We study fundamental interactions of molecules, clusters, and nanocrystals with surfaces, the properties of molecular and nanowires, as well as magnetic materials and ntunneling junctions using high accuracy electronic structure calculations, large scale MD methods, and Green function techniques (both equilibrium and nonequilibrium). The research program includes simulations of adsorption of nanoclusters/nanotubes on surfaces, of interplay between the structure and transport properties of molecularnanowires, of mechanical, electronic and magnetic properties of materials. Specifically, we want to continue our investigations of 1) interactions between particles (organic and inorganic) and surfaces (Carbon, BN, Ni, Au, Fe); 2) properties of molecular/nanojunction(molecules, layered materials, and magnetic particles); 3) selfassembled molecular monolayers on metal surfaces; and 4) interactions between (H2O) and SiO2 (surface, nanopores, channels). We aim to understand processes such as electron transfer, metal penetration through organic spacer, hydrogen dissociation on nanocluster array, electronic, transport, and mechanical properties of nanowires, and hydrolytic weakening, chemical bond breaking and information (water molecules and films on surface) and crack propagation in SiO2. To accomplish our goals, we will use existing computational methods and develop new computational models and algorithms. We have, in the past few years, developed three interfaces: 1) an interface that combines DFT with all valence electrons and pseudopotentials with a DFTjellium model, 2) an interface that bridges quantum MD and classical MD with emphasis on the mechanical properties, and 3) an interface between molecular dynamics and finite element method. Currently, we are working with a new computer architecture for multiscale simulations that combine classical and quantum methods, and PAW potentials and GW calculations for a GPL code  PWSCF.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
Franklin, 1. PWSCF, VASP, BOLSDMD Solving KohnSham equations for finite systems using planewave as basis set in conjuction with pseudopotential method. PWSCF (PlaneWave SelfConsistent Field) is a set of programs for electronic structure calculations within DensityFunctional Theory and DensityFunctional Perturbation Theory, using a PlaneWave basis set and pseudopotentials. PWscf is released under the GNU General Public License. My group is currently implementing the selfconsistent quasiparticle GW method into PWSCF package (supported by NSF), our development will be distributed also via GNU GPL. We are also testing and constructing PAW potentials library for PWSCF. Similar to PWSCF, VASP implements KohnSham density functional theory within a planewave basis. A set of singleparticle Schrodinger equations are solved iteratively until selfconsistency in the calculated electronic density is reached. The single particle wavefunctions are expanded in a three dimensional Fourier series. A large number of numerical techniques are employed. A summary is given at http://cms.mpi.univie.ac.at/vasp/vasp/node193.html The most critical limiting numerical technique for large runs is matrix diagonalization, or the calculation of all eigenvectors of a dense Hermitian matrix. The matrix is tridiagonalized using the method implemented in SCALAPACK and the eigenvectors are solved for using the divide and conquer approach. BOLSDMD, solves partial diffrential equations (density functional theory KohnSham equations) for the electronic structure of finite system, as well as coupled ordinary differential equations for the nuclei dynamics. It employs FFT to compute Hamiltonian on realreciprocal dual space to speed up the "onthefly" firstprinciples molecular dynamics. The wave function are expanded in planewave basis set in conjunction with pseudopotential method. The code is parallelized using MPI with spatial decomposition. In addition, the code uses modified BlockDavison diagonalization, iterative density mixing scheme to solve for the lowest eigenvalues and eigenfunctions. Hybrid BOMD combines BOLSDMD with classical MD to perform multiscale, hybrid quantumclassical simulation. It is an important development to bridge different length scales.
2. SMEAGOL, IGATOR, Transiesta Green's function + DFT in local basis for quantum transport calculations TransiGator computes the current of electrons through molecular junctions in general. Particularly, it was developed to study 2dimensional atomic interfaces. The 2dimensional capability of the code allows for a more realistic modeling of systems such as molecular selfassembled monolayers (SAM). In this way, the intermolecular interactions that take place between neighboring molecules are fully accounted within the accuracy of the quantum level of theory employed. The codes feeds from the quantummechanical information of the system. This information can be generated from any commercialy available software implementation of the density functional theory (DFT) in a localized basis set.
3. DL_POLY Classical molecular dynamics
4. CASINO Diffusion Quantum MonteCarlo for electronic system.
5. SIESTA, Gaussian, Crystals KohnSham Equation using local basis set
6. MOPACK Semiclassical method, i.e. HartreeFock with approximations. These code are all based on MPI.
Code in development
1. Planewave transport based on PWSCF and scattering theory (MPI, OpenMP?)
2. OPAL architect (MPI2)
3. QPscfGW
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
1. Long wait when use > 128 CPU 2. Limited time per run. We need continuous run time = 1 week to 10 days. 3. Code optimization, e.g. VASP as optimized by Paul Kent is good but similar effort should be extended to VASP 5 and PWSCF.
Computer architecture optimization/math library optimization. For example, PWSCF on bassi (IBM SP5) is much faster than on Franklin (Cray XT4, Opteron) or Hopper (Cray XT5, Opteron); optimized math libraries sometimes will cause as much as 50% performance enhance (MKL etc vs. ordinary Lapack/Scalapack). Reports on Internet also claims that Intel CPU+Intel Compilers+MKL generally generates codes much faster than Opteron+Intel Compilers+MKL or Opteron+Pathscale+ACML.
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
Facilities Used or Using 
NERSC OLCF ACLF NSF Centers Other: UFL/HPC 
Architectures Used 
Cray XT IBM Power BlueGene Linux Cluster Other: 
Total Computational Hours Used per Year 
3,000,000 CoreHours 
NERSC Hours Used in 2009 
~2,000,000 CoreHours 
Number of Cores Used in Typical Production Run 
5,00020,000 
Wallclock Hours of Single Typical Production Run 

Total Memory Used per Run 
500 GB 
Minimum Memory Required per Core 
1.52 GB 
Total Data Read & Written per Run 
100 GB 
Size of Checkpoint File(s) 
100200 GB 
Amount of Data Moved In/Out of NERSC 
2T GB per month 
OnLine File Storage Required (For I/O from a Running Job) 
1 GB and Files 
OffLine Archival Storage Required 
30 GB and Files 
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computationintensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
Computational Hours Required per Year 
510 million 
Anticipated Number of Cores to be Used in a Typical Production Run 
64512 
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above 
5000 
Anticipated Total Memory Used per Run 
5001000 GB 
Anticipated Minimum Memory Required per Core 
4 GB 
Anticipated total data read & written per run 
1000 GB 
Anticipated size of checkpoint file(s) 
1000 GB 
Anticipated OnLine File Storage Required (For I/O from a Running Job) 
3 GB and Files 
Anticipated Amount of Data Moved In/Out of NERSC 
100 GB per day 
Anticipated OffLine Archival Storage Required 
100500 GB and Files 
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
Code development (see above)
Code optimization
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).
1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computationintensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Working with us on code optimization
4e. It is believed that the dominant HPC architecture in the next 35 years will incorporate processing elements composed of 10s1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computationintensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
1. We will be able to do what we are doing (many calculations in my group is done using local clusters).
2. If we obtained 50x more CPU time, we can start to do free energy calculations based on DFT. We will also be able to do reaction processes that involves collective dynamics. For example, we can study dynamics of proton transfer in liquid water.
3. We will be able to study via firstprinciple simulations, the binding between inhibitor and HIVprotease and effects of mutation.
4. We will be able to study multiphysics processes using OPAL. For example, charge transport in solution upon interaction with light with inclusion of atomic motion and charge migration and conductance change in real time.