Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins

Proteins. 2001 Feb 1;42(2):148-63. doi: 10.1002/1097-0134(20010201)42:2<148::aid-prot20>3.0.co;2-r.

Abstract

An all-against-all protein structure comparison using the Combinatorial Extension (CE) algorithm applied to a representative set of PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html). These substructures represent commonly identified folds, domains, or components thereof. Most of the subsequences forming these similar substructures have no significant sequence similarity. We present a method to identify conserved amino acid positions and residue-dependent property clusters within these subsequences starting with structure alignments. Each of the subsequences is aligned to its homologues in SWALL, a nonredundant protein sequence database. The most similar sequences are purged into a common frequency matrix, and weighted homologues of each one of the subsequences are used in scoring for conserved key amino acid positions (CKAAPs). We have set the top 20% of the high-scoring positions in each substructure to be CKAAPs. It is hypothesized that CKAAPs may be responsible for the common folding patterns in either a local or global view of the protein-folding pathway. Where a significant number of structures exist, CKAAPs have also been identified in structure alignments of complete polypeptide chains from the same protein family or superfamily. Evidence to support the presence of CKAAPs comes from other computational approaches and experimental studies of mutation and protein-folding experiments, notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non-CKAAPs is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148-163.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Amino Acids / chemistry*
  • Bacterial Proteins / chemistry
  • Calcium / metabolism
  • Conserved Sequence*
  • Immunoglobulins / chemistry*
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Conformation*
  • Protein Engineering
  • Protein Folding
  • Repressor Proteins / chemistry
  • Sequence Homology, Amino Acid
  • Troponin C / chemistry
  • Viral Proteins / chemistry
  • Viral Regulatory and Accessory Proteins

Substances

  • Amino Acids
  • Bacterial Proteins
  • IgG Fc-binding protein, Streptococcus
  • Immunoglobulins
  • Repressor Proteins
  • Troponin C
  • Viral Proteins
  • Viral Regulatory and Accessory Proteins
  • phage repressor proteins
  • Calcium