Jeffrey B. Neaton
BES Requirements Worksheet
1.1. Project Information - Charge Transport and Excited States at Interfaces in Nanostructured Materials
|
Document Prepared By |
Jeffrey B. Neaton |
|
Project Title |
Charge Transport and Excited States at Interfaces in Nanostructured Materials |
|
Principal Investigator |
Jeffrey B. Neaton |
|
Participating Organizations |
Lawrence Berkeley National Laboratory |
|
Funding Agencies |
DOE SC DOE NSA NSF NOAA NIH Other: |
2. Project Summary & Scientific Objectives for the Next 5 Years
Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.
Over the next 3-5 years, we seek to develop and apply first-principles density functional theory (DFT) and many-body perturbation theory to understand the nature of excited states and charge dynamics at interfaces in nanoscale materials.
Nanostructures are often distinguished by their large surface-to-volume ratios that, upon integration into devices, can lead to a high density of nanoscale interfaces. However, the relationship between the structure and electronic properties of interfaces, and device function, is not yet well understood. This is particularly the case in the context of solar energy applications, such as photovoltaic conversion and multielectron catalysis, where knowledge of electronic excited states -- often poorly described by ground-state DFT -- are central to evaluating and understanding the efficacy newly-synthesized nanomaterials.
Mid-range goals include the the study of light-matter interactions and charge transport --key processes fundamental to solar energy conversion -- in the following three contexts: (a) nonequilibrium charge dynamics in molecular junctions, including photo-assisted transport; (b) chemical contributions to surface-enhanced Raman scattering organic molecules at metal surfaces; and (c) optical absorption and charge separation in covalently-joined donor-acceptor organic and inorganic multicomponent nanosystems.
3. Current HPC Usage and Methods
3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)
We work with a variety of DFT and excited-state codes. We summarize these below. These codes spend the majority of their time with vector products, matrix diagonalization and inversion, and fast Fourier transforms.
VASP: Plane-wave pseudopotential DFT code. 1-300 atoms. OpenMPI. Number of processors: 100-1,500. Typical run time: 0.3-12 hours.
PWSCF: Plane-wave pseudopotential DFT code. 1-300 atoms. OpenMPI. Number of processors: 100-1,500. Typical run time: 0.3-12 hours.
SIESTA: Local-orbital pseudopotential DFT code. 1-2000 atoms. OpenMPI. Number of processors: 100-1000. Typical run time: 0.3 to 12 hours.
BerkeleyGW: Plane-wave pseudopotential excited-state code. 1-100 atoms. OpenMPI. Number of processors: 500-10000. Typical run time: 0.3 to 8 hours.
SCARLET: Local-orbital pseudopotential scattering-state transport code. 100-500 atoms. OpenMPI. Number of processors: 100-500. Typical run time: 0.3 to 12 hours.
3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?
In terms of our codes, we are ultimately bound by read/write I/O, and large distributed matrix multiplications and inversions, and FFTs.
There are two NERSC-specific issues that can result in a large time-to-solution or present barriers to attempting large calculations.
First, there are long queue times for jobs that use less than a few thousand cores and that request more than 30 total minutes. The bulk of our calculations are an intermediate range, requiring 1000 processors and a few hours per CPU. A quick turnaround for tens of simultaneous jobs of this size would greatly enhance our productivity.
Second, for our largest simulations, the 0.5 TB limit on scratch presents a barrier. We would need more scratch space to run tens of simultaneous jobs. Relatedly, regarding medium-term storage, we have also found retrieval from HPSS inconvenient for large files (time outs, inability to predict file transfer times, etc).
3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.
|
Facilities Used or Using |
NERSC OLCF ACLF NSF Centers Other: |
|
Architectures Used |
Cray XT IBM Power BlueGene Linux Cluster Other: |
|
Total Computational Hours Used per Year |
5,000,000 Core-Hours |
|
NERSC Hours Used in 2009 |
2,000,000 Core-Hours |
|
Number of Cores Used in Typical Production Run |
1000 |
|
Wallclock Hours of Single Typical Production Run |
6 |
|
Total Memory Used per Run |
200 GB |
|
Minimum Memory Required per Core |
1 GB |
|
Total Data Read & Written per Run |
20 GB |
|
Size of Checkpoint File(s) |
No checkpointing |
|
Amount of Data Moved In/Out of NERSC |
15 GB per month |
|
On-Line File Storage Required (For I/O from a Running Job) |
0.2 GB and Files |
|
Off-Line Archival Storage Required |
0.1 GB and Files |
Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.
4. HPC Requirements in 5 Years
4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.
|
Computational Hours Required per Year |
15,000,000 |
|
Anticipated Number of Cores to be Used in a Typical Production Run |
5000 |
|
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above |
6 |
|
Anticipated Total Memory Used per Run |
1000 GB |
|
Anticipated Minimum Memory Required per Core |
2 GB |
|
Anticipated total data read & written per run |
200 GB |
|
Anticipated size of checkpoint file(s) |
GB |
|
Anticipated On-Line File Storage Required (For I/O from a Running Job) |
10 GB and Files |
|
Anticipated Amount of Data Moved In/Out of NERSC |
100 GB per month |
|
Anticipated Off-Line Archival Storage Required |
10 GB and Files |
4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.
Algorithm development, as well as the development of new and efficient approximations, will be extremely important to achieving our goals.
A significant bottleneck for our plane-wave pseudopotential excited-state calculations is a required sum over a large number of unoccupied states. The number of occupied states required grows linearly with the volume of the system supercell. For large systems, generation of this unoccupied subspace, which is reliant upon iterative diagonalization techniques (usually via conjugate gradient), will become prohibitive. Alternative approaches, which avoid such sums, are now being proposed, but the community is only in early stages of evaluating their efficiency.
4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).
We will require at least 2GB/core. Also, high-bandwidth communication will continue to be important, as the matrix algebra and FFTs underlying our codes are distributed.
4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.
Down the road, OS-based checkpointing would be helpful.
4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.
We have not yet explored alternatives to MPI. We would be open to doing so, if we were provided access to resources.
New Science With New Resources
To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?
Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).
Breakthroughs in our fundamental understanding matter at the nanoscale through computation have good potential to contribute to society-level problems. For example, a fundamental global challenge is to develop a scalable technology, with nontoxic abundant components, for efficiently and inexpensively harvesting solar photon energy and then converting it into convenient forms for storage and transportation. Nanostructure-based organic and inorganic solar cells are now being explored for this purpose. Fundamentally different from more expensive silicon solar cells, these devices will inevitably rely on a high density of nanoscale interfaces to separate and transport electrons and holes. Despite their promise, there exists little microscopic intuition or theory to guide material and device design for such systems. A primary reason is the absence of a quantitative picture of the excited-state electronic structure underlying key processes in solar energy conversion – absorption, charge separation, charge transport, and charge collection. Fundamental research in this area -- illustrating how interfacial electronic structure is connected to function -- is needed.
In the next few years, we seek to develop and apply first-principles density functional theory (DFT) and many-body perturbation theory to understand the nature of excited states and charge dynamics at the interfaces that pervade these nanoscale materials. These systems are challenging, both intuitively and computationally, because traditionally, they are difficult to probe experimentally, and often of a size just beyond the reach of conventional approaches in quantum chemistry and solid-state DFT techniques. For validation, our work evolves in the context of existing collaborations between the PI and experimental groups -- at LBNL and elsewhere -- on systems ranging from the idealized to the more complex.
A 50x increase in the allocated CPU time and storage, rapidly and easily accessible with small several-thousand processor jobs of moderate length (3-6 hours), would allow us to much more rapidly and effectively assess the level of theory required to obtain good agreement between theory and experiment -- and understand interfacial electronic structure -- for larger systems and at higher levels of theory, including those most directly relevant to contemporary experiments in solar energy conversion. To reiterate, higher throughput for a larger number of medium-sized jobs would be desirable.


