|
|||||||
Science Highlights: Biological and Environmental Research |
Recognition and Classification of Protein Folds in Complete Genomes | |||||||
|
PCM uses the predicted secondary structure and the local correlation of the hydrophobicity of amino acid residues for alignment of two protein sequences. Dynamic programming techniques are used in the global alignment procedure (algorithm of Niddleman and Wunsch) with no terminal penalties. Statistical significance is estimated with a Z score. In contrast to classic hard cutoff, a heuristic cutoff is introduced and applied to predict protein folds reliably. The single-linkage clustering algorithm was optimized to group proteins with the same fold. For fold recognition, each gene product of a complete genome is compared with a library of known folds (a non-redundant set of known protein structures). For fold classification purposes, the sequences of gene products in a complete genome are compared in all-against-all manner, and then clustered by the single-linkage algorithm.
A pilot version of PCM was developed and tested on a data set of 64 proteins and about 400 structural homologues. The optimal physical property of amino acid residues, a method for secondary structure prediction, and alignment and scoring schemes were obtained during test experiments. The method demonstrated good performance in fold recognition on the test protein set compared with advanced sequence- and structure-based fold recognition methods. It was applied to trial fold recognition in the Methanococcus jannaschii genome and predicted protein folds of several hypothetical proteins. Later the method was applied to protein fold classification, or clustering proteins into groups with similar three-dimensional folds. The appropriate clustering scheme was developed, and for the test data set, it provided most of the known well-populated folds as distinct clusters.
The explosion in the number of genome sequences during the past several years makes functional characterization of gene products overwhelming. Classical sequence similarity methods predict the functions of proteins with high sequence similarity to other proteins with known functions. Threading, or aligning sequence and three-dimensional structure, can detect the structural similarity of proteins with low sequence identity, but requires knowledge of structure and is limited by the number of known structures. We developed a new method, PCM, for detecting similarity of protein folds on the basis of protein sequences and sequence-based properties only. It does not require known structures for fold recognition and extends the sequence similarity boundary of sequence comparison methods. Therefore, it may predict the structure and function of many hypothetical proteins by exploring their structural relationships. In fold classification, PCM helps to identify new targets for experimental determination of new folds by detecting fold similarity in the groups of hypothetical proteins.
Igor Grigoriev and Sung-Hou Kim, "Protein fold classification based on local correlation of hydrophobicity," Proc. Pacific Symposium on Biocomputing 2000 (submitted). Chao Zhang and Sung-Hou Kim, "The anatomy of protein beta-topology: I. Single beta-sheet. II. Beta-barrels and beta-sandwiches," J. Mol. Biol. (submitted). |
||||||||
|
||||||||