NERSCPowering Scientific Discovery Since 1974

Randy Cygan

BES Requirements Worksheet

1.1. Project Information - The Nature of the Mineral-Water Interface: A Molecular Simulation and Spectroscopic Investigation

Document Prepared By

Randy Cygan

Project Title

The Nature of the Mineral-Water Interface: A Molecular Simulation and Spectroscopic Investigation

Principal Investigator

Randy Cygan

Participating Organizations

Sandia National Laboratories 
Michigan State University 
Northwestern University 
Purdue University

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

Classical simulation using approximate energy expressions remain the computational method of choice for atomistic simulations of complex systems such as the clay-water interface. We are well equipped for these simulations at Sandia, with a dedicated computer cluster with 100 processors, and access to institutional computing clusters. We use the LAMMPS code for molecular dynamics simulations, as well as the Forcite module of Materials Studio software for efficient parameter development. However, we will continue to use quantum methods both as a tool in force field development and as a means to study reactivity at clay surfaces. We employ density functional theory (DFT) methods using the freely available VASP code and the commercially available DMol3 module of Materials Studio. Using ab initio MD, we can directly compare dynamical properties from quantum and classical MD simulations. 
 
Specific technical goals include the development of an accurate classical force field to describe the edge structure of clay minerals, and the development of large-scale models for ion adsorption onto basal and edge structures of muscovite with validation by spectroscopy.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

Classical molecular dynamics using LAMMPS and Forcite codes 
up to 250K atom systems 
up to 50M time steps 
 
Quantum Density Funcitonal Theory using VASP and DMol codes 
up to 400 atom systems 
up to 60K time steps for AIMD 
 
Quantum Hartree-Fock using Gaussian code 
up to 200 atom clusters 
 
All MPI based processing 

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Practical limitations include institutional queue priorities and processor/time limits 
Limit on number of atoms and ability to include critical chemistry 
Visualization software for large atom systems 
 
Nothing NERSC-specific 

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  Institutional machines and local clusters

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

900K Core-Hours

NERSC Hours Used in 2009

 0 Core-Hours

Number of Cores Used in Typical Production Run

48

Wallclock Hours of Single Typical Production Run

96

Total Memory Used per Run

 20 GB

Minimum Memory Required per Core

 1 GB

Total Data Read & Written per Run

 2 GB

Size of Checkpoint File(s)

2 GB

Amount of Data Moved In/Out of NERSC

 GB per  

On-Line File Storage Required (For I/O from a Running Job)

 0.005 GB and 20 Files

Off-Line Archival Storage Required

 4 GB and 8000 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

None 

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

2M

Anticipated Number of Cores to be Used in a Typical Production Run

200

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

96

Anticipated Total Memory Used per Run

 10 GB

Anticipated Minimum Memory Required per Core

 5 GB

Anticipated total data read & written per run

20 GB

Anticipated size of checkpoint file(s)

 5 GB

Anticipated On-Line File Storage Required (For I/O from a Running Job)

 0.005 GB and  20 Files

Anticipated Amount of Data Moved In/Out of NERSC

 GB per  

Anticipated Off-Line Archival Storage Required

 10 GB and 20000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Vertorization using GPUs

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

GPU-based algorithms 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

No 

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

Increased CPU hours with associated increase in memory 
 
Will lead to improved uncertainty quantification with larger number of realizations; important for performance assessment projects 
 
Will allow improved fidelity in capturing full chemistry associated with systems involving large molecules and complex substrates 
 
Longer simulation times allow opportunity to capture critical chemistry and better validate models with spectrosocpy and experiment