Andrew Felmy
BES Requirements Worksheet
1.1. Project Information - Computational Studies in Molecular Geochemistry
|
Document Prepared By |
Andrew Felmy |
|
Project Title |
Computational Studies in Molecular Geochemistry |
|
Principal Investigator |
Andrew Felmy |
|
Participating Organizations |
Pacific Northwest National Laboratory |
|
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: |
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.
Our effort consists of molecular level simulations in key areas of geochemistry. geochemical areas include:
Interactions between Fe(III) reducing bacteria and iron-oxides - We are using ab initio molecular modeling calculations of structures of anthraquinone reductant species and Fe(III)-oxide surfaces to produce a kinetic model of heterogeneous electron transfer reactions with implications for biogeochemical activity in subsurface environments.
Mineral surface interactions - This project is directed at providing a molecular scale understanding of surface complexation reactions at oxide, oxyhydroxide, and silicate minerals through the use of molecular modeling calculations.
Supercritical behavior of Ore Forming Fluids- This project is using first principles molecular dynamic simulations to improve our understanding of natural processes that lead to the concentration of metal species in natural waters and deposition of ore rich formations.
The Reaction Specificity of Nanoparticles in Solution - Application to the Reaction of Nanoparticulate Iron and Iron Compounds with Chlorinated Hydrocarbons This project is using first principles simulations to model and characterize the mechanism the dissociative reduction of halogenated hydrocarbons by iron-oxide and nanoparticulate iron systems.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
Classical molecular dynamics, ab initio molecular dynamics, electronic structure theory (e.g. MP2, CCSD(T), DFT)
These calculations are performed using NWChem, Molpro, and various research codes.
The parallelism in these programs are expessed using GA tools, MPI, hybrid MPI/OpenMP
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
Memory Speed,interconnect_speed,Algorithm Scaling,Point-to-point latency,Point-to-point bandwidth,Broadcast performance,all-reduce performance
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: EMSL MSCF |
|
Architectures Used |
Cray XT IBM Power BlueGene Linux Cluster Other: |
|
Total Computational Hours Used per Year |
700,000 Core-Hours |
|
NERSC Hours Used in 2009 |
500,000 Core-Hours |
|
Number of Cores Used in Typical Production Run |
100..1000 |
|
Wallclock Hours of Single Typical Production Run |
100 |
|
Total Memory Used per Run |
10GB..1TB GB |
|
Minimum Memory Required per Core |
2 GB |
|
Total Data Read & Written per Run |
5GB..1TB GB |
|
Size of Checkpoint File(s) |
5GB..1TB GB |
|
Amount of Data Moved In/Out of NERSC |
GB per |
|
On-Line File Storage Required (For I/O from a Running Job) |
GB and Files |
|
Off-Line Archival Storage Required |
GB and Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
Global Arrays
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|
Computational Hours Required per Year |
10-20 million cpu hours per year |
|
Anticipated Number of Cores to be Used in a Typical Production Run |
1,000-20,000 cores |
|
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
100 |
|
Anticipated Total Memory Used per Run |
10 Gigabytes – 25 Terabytes of memory GB |
|
Anticipated Minimum Memory Required per Core |
1 GB |
|
Anticipated total data read & written per run |
10 Gigabytes – 25 Terabytes of memory GB |
|
Anticipated size of checkpoint file(s) |
10 Gigabytes – 25 Terabytes of memory GB |
|
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
5000 GB and 500 Files |
|
Anticipated Amount of Data Moved In/Out of NERSC |
GB per |
|
Anticipated Off-Line Archival Storage Required |
GB and Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
Highly scalable implementations for AIMD w/ hybrid DFT and MP2 are being developed.
Highly scalable implementations of high level ab initio methods are being developed
Current implementations readily scale to at least 30,000 cpus
Hybrid MPI/openMP, GA/openMP MPI/CUDA, GA/CUDA algorithms are being developed to improve FLOP rates
An DOE ASCR research project “Automatic transformation of MPI programs” is being pursued with S.Baden (UCSD) and D. Quinlan (LLNL) to improve the overlap of computation and communication in NWChem.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Tools for Hybrid programing OpenCL, HPPM??
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
Yes, see 4b
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
Predictive models could be developed for mineral-water interfaces and the nanoparticle-water interfaces.
Access to large numbers of cpus (>10,000) for extended periods of time.


