NERSCPowering Scientific Discovery Since 1974

Lin-Wang Wang

BES Requirements Worksheet

1.1. Project Information - Electronic structures of nanosystems

Document Prepared By

Lin-Wang Wang

Project Title

Electronic structures of nanosystems

Principal Investigator

Lin-Wang Wang

Participating Organizations

Lawrence Berkeley National Laboratory

Funding Agencies

 DOE SC  DOE NSA  NSF  NOAA  NIH  Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

The overall goal of this project is to understand the electronic structures and optical properties of nanosystems, to help the use of these systems in energy applications. We like to understand the band alignment in a heterojunction nanostructure, the carrier dynamics and electron-phonon interaction in such a system, the exciton formation and binding energy, the nanocontact and interface, the internal electric field in nanosystems, and the overall band diagram of the nanodevice, and surface atomic structure and electronic states, and the defect states.  
 
In terms of computation, we use many different codes: VASP, PEtot, LS3DF, Escan, and charge patching method, PW-transport codes. They have different requirement for the number of processors and times.  
 
Two examples for the next 5 years, we like to understand: (1) surface structure and states, and their impact to internal carrier dynamics and trapping; (2) electron-phonon interaction and the resulting consequence in carrier dynamics, and exciton transport and dissociation.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

VASP: Planewave pseudopotential code. Number of processors: 200-1,000.  
Typical run time: 0.3 to 8 hours.  
 
PEtot: Planewave pseudopotential code. Number of processors: 200-1,000.  
Typical run time: 0.3 to 8 hours.  
 
Escan: Planewave nonselfconsistent code. Number of processors: ~500.  
Typical run time: 0.3 to 8 hours.  
 
LS3DF: Divide-and-Conquer code. Number of processors: 1,000 to 50,000.  
Typical run time: 1 to 8 hours.  
 
Other codes. Number of processors: 10-200.  
Typical run time: 10 minutes to 2 hours.  

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Queue waiting time for small jobs.  
Unable to run large processor jobs but short time run.  
 
Very often, we need to run relatively small jobs for relatively short time,  but need very quick turn around because we need to run them very often (depend on the result we get). We like to treat the main computer as an interactive tool. Not all of our jobs need large number of processors. But we don't want to  migrate our jobs to smaller clusers. We like to use the same platform as our large jobs.  

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using

 NERSC  OLCF  ACLF  NSF Centers  Other:  

Architectures Used

 Cray XT  IBM Power  BlueGene  Linux Cluster  Other:  

Total Computational Hours Used per Year

 2Millions Core-Hours

NERSC Hours Used in 2009

 1Million Core-Hours

Number of Cores Used in Typical Production Run

 8 to 2,000

Wallclock Hours of Single Typical Production Run

 8 minutes to 8 hours

Total Memory Used per Run

 1 to 100 GB

Minimum Memory Required per Core

 1 GB

Total Data Read & Written per Run

 from 0 to 30 GB

Size of Checkpoint File(s)

 Not really doing this 

Amount of Data Moved In/Out of NERSC

5GB/month

On-Line File Storage Required (For I/O from a Running Job)

0.5 GB and  2,000 Files

Off-Line Archival Storage Required

 2 GB and 1,000 Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

One comments: the questionnaire like this often ask what is the typical size, time, etc. But for material science simulations, there is no typical size and time, and no typical answer. The size, time etc vary a lot, there is a broad distribution, typical from 0 (size, time) to the maximum we can afford (or the code can still work) to run. This does not mean the largest run is the most important one. Sometime we have many small runs, and interactive runs. They are as important as the large ones. How to have a better environment to accommodate 
these small jobs is as important as how to take care of the bigger jobs.  

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year

 5-10 M

Anticipated Number of Cores to be Used in a Typical Production Run

 1 to 10,000

Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above

 0.5 to 10

Anticipated Total Memory Used per Run

 100 GB

Anticipated Minimum Memory Required per Core

 2 GB

Anticipated total data read & written per run

 20 GB

Anticipated size of checkpoint file(s)

 None GB

Anticipated On-Line File Storage Required (For I/O from a Running Job)

 4 GB and 10,000 Files

Anticipated Amount of Data Moved In/Out of NERSC

 20 GB per  month

Anticipated Off-Line Archival Storage Required

 10 GB and   10,000 Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Mostly physics based algorithm changes and new methodology developments.  
Test other communication scheme besides MPI.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

Please keep 2GB memory/core.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

 

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

Yes, we like to test GPU. But that is a major shift in parallelization. Without resource, it is difficult to do that by ourselves.  

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).

That might allows us to seriously study the nanocrystal surface passivation.  However, I think only HPC increase will not be enough. There need to be the  corresponding code and algorithm development, because the old code/algorithm  will not work no matter how much increase you have on HPC. As a matter of fact,  the current DFT code already cannot take the advantage of the full HPC.  
 
It might also allow us to study carrier dynamics via the time domain simulation (for both electron transport following the Schrodinger's equation and nuclei  dynamics).