Prediction of potential GPI-modification sites in proprotein sequences

B Eisenhaber; P Bork; F Eisenhaber

doi:10.1006/jmbi.1999.3069

Prediction of potential GPI-modification sites in proprotein sequences

J Mol Biol. 1999 Sep 24;292(3):741-58. doi: 10.1006/jmbi.1999.3069.

Authors

B Eisenhaber¹, P Bork, F Eisenhaber

Affiliation

¹ European Molecular Biology Laboratory, Meyerhofstrasse1, Heidelberg, D-69012, Federal Republic of Germany. [email protected]

PMID: 10497036
DOI: 10.1006/jmbi.1999.3069

Abstract

Glycosylphosphatidylinositol (GPI) lipid anchoring is a common posttranslational modification known mainly from extracellular eukaryotic proteins. Attachment of the GPI moiety to the carboxyl terminus (omega-site) of the polypeptide follows after proteolytic cleavage of a C-terminal propeptide. For the first time, a new prediction technique locating potential GPI-modification sites in precursor sequences has been applied for large-scale protein sequence database searches. The composite prediction function (with separate parametrisation for metazoan and protozoan proteins) consists of terms evaluating both amino acid type preferences at sequence positions near a supposed omega-site as well as the concordance with general physical properties encoded in multi-residue correlation within the motif sequence. The latter terms are especially successful in rejecting non-appropriate sequences from consideration. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set of fully annotated sequences from the SWISS-PROT database as well as with a newly created database "big-Pi" (more than 300 GPI-motif mutations extracted from original literature sources). The accuracy of predicting the effect of mutations in the GPI sequence motif was above 83 %. Lists of potential precursor proteins which are non-annotated in SWISS-PROT and SPTrEMBL are presented on the WWW-page http://www.embl-heidelberg.de/beisenha/gpi/gpi_p rediction. html The algorithm has been implemented in the prototype software "big-Pi predictor" which may find application as a genome annotation and target selection tool.

MeSH terms

Algorithms
Carrier Proteins / chemistry
Databases as Topic
Folate Receptors, GPI-Anchored
Glycosylphosphatidylinositols / chemistry*
Humans
Lipids / chemistry
Mutation
Protein Isoforms
Protein Precursors / chemistry*
Protein Processing, Post-Translational
Protozoan Proteins / chemistry
Receptors, Cell Surface*
Sequence Analysis

Substances

Carrier Proteins
Folate Receptors, GPI-Anchored
Glycosylphosphatidylinositols
Lipids
Protein Isoforms
Protein Precursors
Protozoan Proteins
Receptors, Cell Surface