NERSCPowering Scientific Discovery Since 1974

Teresa Head-Gordon

Case Study Worksheet

Project Information - Advanced Theoretical Models to Characterize the Alzheimers Abeta Peptide

Document Prepared By Teresa Head-Gordon
Project Title Advanced Theoretical Models to Characterize the Alzheimers Abeta Peptide
Principal Investigator Teresa Head-Gordon
Participating Organizations LBNL 
UC-Berkeley
Science Category Climate Environmental Science Biological Sciences
Funding Agencies DOE SC DOE NSA NSF NOAA NIH Other:

Project Summary (Scientific Objectives)

Please give a brief description of your project and its scientific objectives for the next 3-5 years.

Alzheimer's is a neurodegenerative disease linked to the aggregation and amyloid fibril formation of a set of short ~40 residue peptides, amyloid beta, which are known to be highly prone to fibrilization in vitro and in vivo. Although early attention focused on the toxicity of the amyloid fibrils as the cause of disease, it is now hypothesized that oligomers (on the order of ~6 peptides) formed during early aggregation are actually the major toxic species. Thus there is a need to develop an understanding of the entire aggregation process that ultimately leads to the specific structure of the final amyloid fibril, starting with the monomer through to these oligomer structures. 
 
Given the possible toxicity of the earlier protofibril states, the focus is now to understand what structural aspects of the ordered fibril is prevalent in the monomer, and ultimately how the Abeta monomers assemble in early phases as proposed by photo-induced cross-linking and into the highly ordered mesoscopic fibril suggested by solid-state NMR experimental models. Our preliminary results 
using a coarse-grained model has allowed us to explore many interesting aspects of the mesoscopic protofibril and its critical nucleus. However, the coarse-grained model is not adequate for addressing some molecular questions posed by experiments. We will pursue a first phase of study to answer what 
are the sequence attributes and specific molecular interactions that stabilize structure in the monomer in aqueous solution. 
 
We propose to use molecular dynamics simulations combined with accelerated convergence algorithms, with the most recent generations of polarizable protein and water force fields, to characterize structural ensembles and thermodynamics of amyloid beta monomer. We wish to understand whether structure in the monomeric peptide is well-defined enough to promote ordered stable oligomers. The proposed work will contribute to our knowledge of primary sequence and structural factors that ultimately govern the aggregation process in amyloid beta, and should eventually impart the ability to develop new protein engineering strategies for reducing aggregation and therefore disease virulence. Our findings are also expected to impact research in biotechnology, where protein aggregation serves as a bottleneck in the manufacture of pharmaceutical proteins, and in materials science, where amyloid fibrils are being investigated for use as possible nanomaterials

Current HPC Usage and Methods

Facilities Used
  • NERSC
NCCS ACLF NSF Centers Other:
Architectures Used
  • Cray XT
  • IBM Power
BlueGene
  • Linux Cluster
Other:
Total Computational Hours Used per Year Core-Hours NERSC Hours Used per Year 1.05M Core-Hours
Number of Cores Used in Typical Production Run 1,664 Wallclock Hours of Single Typical Production Run 8
Total Memory Used per Run GB Minimum Memory Required per Core GB
Total Data Read & Written per Run GB Size of Checkpoint File(s) GB
Amount of Data Moved In/Out of NERSC GB How Often
On-Line File Storage Required (Directly Accesible from a Running Job) .5 GB Files
Off-Line Archival Storage Required GB Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

apack,scalapack,fftw

Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

AMBER 10 
 
The underlying molecular dynamics engine is a particle-based algorithm, and the code is largely written in Fortran77/90 and uses MPI on most basic applications. There is an optimized suite of code which exploits particle mesh Ewald to give O(NlogN) scaling of energy and force evaluations, and which is parallelized with MPI to efficiently exploit distributed memory architectures. Overlayed on top of this fine-grained parallelization is another layer of (trivial) coarse-grained parallelization involving the replica exchange sampling algorithm, which runs N- independent simulations (each at a different temperatures), that involve infrequent  
communication (on the order of milliseconds) to swap state point information (position and velocities of all atoms). An earlier version of the code platform, AMBER9.0, is currently available on Franklin NERSC-LBNL, and thus it is already established that this project will make effective use of the supercomputing facility requested. We have compiled the most recent version of AMBER10.0. NERSC also maintains a page that shows an example of using job steps to accomplish long simulations in smaller blocks on the NERSC queues, and job restarting is well facilitated in AMBER10.0. 
 
Molecular dynamics simulations numerically integrate Newton's equations of motion at very short (~1fs) timesteps in order to evolve a molecular system of interest in time. 
 
n addition to standard double precision arithmetic needed to integrate the equations of motion, the calculation of a the long range electrostatic interactions is achieved via the Particle Mesh Ewald algorithm. This O(N log N) algorithm using a FFT of atomic partial charges interpolated to grid points to determine inverse space Coulomb energies and forces. 

Please list the known limitations/obstacles/bottleneck of resources currently available HPC systems, and in particular, those at NERSC.

HPC Usage and Methods for the Next 3-5 Years

Anticipated changes to codes, mathematical methods and/or algorithms needed to achieve this project's scientific objectives.

Computational Hours Required per Year
Anticipated Number of Cores to be Used in a Typical Production Run
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above
Anticipated Total Memory Used per Run GB
Anticipated Minimum Memory Required per Core GB
Anticipated total data read & written per run GB
Anticipated size of checkpoint file(s) GB
Anticipated On-Line File Storage Required (Directly Accesible from a Running Job) GB Files
Anticipated Off-Line Archival Storage Required GB Files

Known or Anticipated architectural requirements (e.g., 2 GB memory/core).

Please list any additional required or important software, services, or infrastructure beyond those listed in the previous section.

It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

What Do You Need from NERSC?

Please tell us what you need from NERSC to meet your project's computing needs over the next 3-5 years. Also please feel free to make any general comments.