NERSCPowering Scientific Discovery for 50 Years

NERSC Initiative for Scientific Exploration (NISE) 2010 Awards

Computational Prediction of Protein-Protein Interactions

Harley McAdams, Stanford University

Associated NERSC Project: Computational Prediction of Transcription Factor Binding Sites (m926), Principal Investigator: Harley McAdams

NISE Award: 1,880,000 Hours
Award Date: February 2010

Interactions between proteins underlie much of the biology of the cell. For example mammalian brains use protein-protein interactions to send signals through the brain, and drugs disrupt harmful bacteria by interfering with their protein function. Experimentally determining whether two proteins interact is often labor-intensive and expensive, making computational alternatives to screen for proteins likely to interact highly desirable.

My group is developing novel algorithms to predict interactions between proteins and their cognate DNA binding sites. The computational work we propose here would allow us to extend our algorithm to predict protein-protein interactions.

We predict protein-DNA interactions using a physics-based calculation of the electrostatics of binding pockets between proteins (specifically, DNA-binding regulatory proteins) and DNA. We use a machine learning procedure to predict the structural hotspots within the binding pocket. We now propose to use a similar technique to infer the affinity of interaction between two protein surfaces, for example FtsZ and FtsA, two proteins that are part of the bacterial cell division machinery. This problem differs from the case of protein-DNA interactions, where the region of interaction is known from experiment. In the more general case of protein-protein interactions, any exposed patch on the protein surface is a potential contact region for interacting with other proteins.

To address this problem, we will generalize our protein-DNA interaction algorithm with a new statistical module for predicting regions within proteins that are likely candidates for protein-protein interactions. Since interacting proteins depend on the mutual compatibility of their surface patches to mediate the interaction, such patches are evolutionarily co-conserved. To detect these co-conserved patches, we will search for overrepresented fragments in protein structures, identifying the surface patches of the protein that are the likely candidates for protein-protein interactions. After conserved surfaces patches have been identified, we will be able to apply a similar algorithmic procedure as we now use for protein-DNA interactions, except that many more possible interactions mediated by the surface patches must now be considered, leading to a more intense computational requirement. In the protein-DNA case, only a single surface patch on the DNA and protein surface is considered, while in the more general protein-protein interaction case, multiple patches on each interacting partner must be considered. This results in a combinatoric increase in the computation that is particularly suited for parallelization.