NERSCPowering Scientific Discovery Since 1974

Hai-Ping Cheng

BES Requirements Worksheet

1.1. Project Information -

Document Prepared By

Hai-Ping Cheng

Project Title

 

Principal Investigator

Hai-Ping Cheng

Participating Organizations

University of Florida

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

The five year plan of research in my group will include the following subjects:1. Electronic and Magnetic properties of nanostructured surface, of interfaces btween soft and hard materials.2. Quantum Transport through tunneling junctions, semiclassical and classical charge transport in organic/bio materials. 3. Multi-scale and multi-phenomena simulations of composite materials and complex systems. 4. Impurity effects in high Tc superconductors.  
 
Quantum calculations based on DFT is essential in our research projects. Systems size will routinely include 100-1000 atoms, or 1000-10000 electrons. 
 
We study fundamental interactions of molecules, clusters, and nanocrystals with surfaces, the properties of molecular and nano-wires, as well as magnetic materials and ntunneling junctions using high accuracy electronic structure calculations, large scale MD methods, and Green function techniques (both equilibrium and non-equilibrium). The research program includes simulations of adsorption of nano-clusters/nano-tubes on surfaces, of interplay between the structure and transport properties of molecular-nano-wires, of mechanical, electronic and magnetic properties of materials. Specifically, we want to continue our investigations of 1) interactions between particles (organic and inorganic) and surfaces (Carbon, BN, Ni, Au, Fe); 2) properties of molecular/nano-junction(molecules, layered materials, and magnetic particles); 3) self-assembled molecular monolayers on metal surfaces; and 4) interactions between (H2O) and SiO2 (surface, nanopores, channels). We aim to understand processes such as electron transfer, metal penetration through organic spacer, hydrogen dissociation on nano-cluster array, electronic, transport, and mechanical properties of nano-wires, and hydrolytic weakening, chemical bond breaking and information (water molecules and films on surface) and crack propagation in SiO2. To accomplish our goals, we will use existing computational methods and develop new computational models and algorithms. We have, in the past few years, developed three interfaces: 1) an interface that combines DFT with all valence electrons and pseudopotentials with a DFT-jellium model, 2) an interface that bridges quantum MD and classical MD with emphasis on the mechanical properties, and 3) an interface between molecular dynamics and finite element method. Currently, we are working with a new computer architecture for multi-scale simulations that combine classical and quantum methods, and PAW potentials and GW calculations for a GPL code -- PWSCF.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

Franklin, 1. PWSCF, VASP, BO-LSD-MD Solving Kohn-Sham equations for finite systems using planewave as basis set in conjuction with pseudo-potential method. PWSCF (Plane-Wave Self-Consistent Field) is a set of programs for electronic structure calculations within Density-Functional Theory and Density-Functional Perturbation Theory, using a Plane-Wave basis set and pseudopotentials. PWscf is released under the GNU General Public License. My group is currently implementing the self-consistent quasiparticle GW method into PWSCF package (supported by NSF), our development will be distributed also via GNU GPL. We are also testing and constructing PAW potentials library for PWSCF. Similar to PWSCF, VASP implements Kohn-Sham density functional theory within a plane-wave basis. A set of single-particle Schrodinger equations are solved iteratively until self-consistency in the calculated electronic density is reached. The single particle wavefunctions are expanded in a three dimensional Fourier series. A large number of numerical techniques are employed. A summary is given at http://cms.mpi.univie.ac.at/vasp/vasp/node193.html The most critical limiting numerical technique for large runs is matrix diagonalization, or the calculation of all eigenvectors of a dense Hermitian matrix. The matrix is tridiagonalized using the method implemented in SCALAPACK and the eigenvectors are solved for using the divide and conquer approach. BO-LSD-MD, solves partial diffrential equations (density functional theory Kohn-Sham equations) for the electronic structure of finite system, as well as coupled ordinary differential equations for the nuclei dynamics. It employs FFT to compute Hamiltonian on real-reciprocal dual space to speed up the "on-the-fly" first-principles molecular dynamics. The wave function are expanded in planewave basis set in conjunction with pseudo-potential method. The code is parallelized using MPI with spatial decomposition. In addition, the code uses modified Block-Davison diagonalization, iterative density mixing scheme to solve for the lowest eigenvalues and eigenfunctions. Hybrid BOMD combines BO-LSD-MD with classical MD to perform multiscale, hybrid quantum-classical simulation. It is an important development to bridge different length scales. 
 
2. SMEAGOL, IGATOR, Transiesta Green's function + DFT in local basis for quantum transport calculations Trans-iGator computes the current of electrons through molecular junctions in general. Particularly, it was developed to study 2-dimensional atomic interfaces. The 2-dimensional capability of the code allows for a more realistic modeling of systems such as molecular self-assembled monolayers (SAM). In this way, the intermolecular interactions that take place between neighboring molecules are fully accounted within the accuracy of the quantum level of theory employed. The codes feeds from the quantum-mechanical information of the system. This information can be generated from any commercialy available software implementation of the density- functional theory (DFT) in a localized basis set. 
 
3. DL_POLY Classical molecular dynamics 
 
4. CASINO Diffusion Quantum Monte-Carlo for electronic system. 
 
5. SIESTA, Gaussian, Crystals Kohn-Sham Equation using local basis set  
 
6. MOPACK Semiclassical method, i.e. Hartree-Fock with approximations. These code are all based on MPI.  
 
Code in development 
1. Planewave transport based on PWSCF and scattering theory (MPI, OpenMP?) 
2. OPAL architect (MPI-2) 
3. QPscfGW 

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

1. Long wait when use > 128 CPU 2. Limited time per run. We need continuous run time = 1 week to 10 days. 3. Code optimization, e.g. VASP as optimized by Paul Kent is good but similar effort should be extended to VASP 5 and PWSCF. 
Computer architecture optimization/math library optimization. For example, PWSCF on bassi (IBM SP5) is much faster than on Franklin (Cray XT4, Opteron) or Hopper (Cray XT5, Opteron); optimized math libraries sometimes will cause as much as 50% performance enhance (MKL etc vs. ordinary Lapack/Scalapack). Reports on Internet also claims that Intel CPU+Intel Compilers+MKL generally generates codes much faster than Opteron+Intel Compilers+MKL or Opteron+Pathscale+ACML. 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  UFL/HPC

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

3,000,000 Core-Hours

NERSC Hours Used in 2009

~2,000,000 Core-Hours

Number of Cores Used in Typical Production Run

5,000-20,000

Wallclock Hours of Single Typical Production Run

 

Total Memory Used per Run

 500 GB

Minimum Memory Required per Core

 1.5-2 GB

Total Data Read & Written per Run

100 GB

Size of Checkpoint File(s)

 100-200 GB

Amount of Data Moved In/Out of NERSC

 2T GB per  month

On-Line File Storage Required (For I/O from a Running Job)

1 GB and  Files

Off-Line Archival Storage Required

 30 GB and  Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computation-intensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).  

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

5-10 million

Anticipated Number of Cores to be Used in a Typical Production Run

64-512

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

5000

Anticipated Total Memory Used per Run

 500-1000 GB

Anticipated Minimum Memory Required per Core

4 GB

Anticipated total data read & written per run

 1000 GB

Anticipated size of checkpoint file(s)

 1000 GB

Anticipated On-Line File Storage Required (For I/O from a Running Job)

 3 GB and  Files

Anticipated Amount of Data Moved In/Out of NERSC

100 GB per  day

Anticipated Off-Line Archival Storage Required

100-500 GB and  Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Code development (see above) 
Code optimization

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computation-intensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

Working with us on code optimization 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

1. Hardware: one CPU+512 GPU cores+64G RAM (for example) on one node such that the number of GPU cores and the total memory is suitable for most calculations. Fast interconnection between nodes so that node parallelization can still be employed when necessary. 2. Software: Develop a special version of MPI that enables parallelization between GPUs but with shared memory. Thus the computation-intensive part can be done within GPUs, while the CPU only controls the execution flow of the code (GPU behaviors).  

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

1. We will be able to do what we are doing (many calculations in my group is done using local clusters). 
2. If we obtained 50x more CPU time, we can start to do free energy calculations based on DFT. We will also be able to do reaction processes that involves collective dynamics. For example, we can study dynamics of proton transfer in liquid water. 
3. We will be able to study via first-principle simulations, the binding between inhibitor and HIV-protease and effects of mutation. 
4. We will be able to study multi-physics processes using OPAL. For example, charge transport in solution upon interaction with light with inclusion of atomic motion and charge migration and conductance change in real time.