Multiple sequence threading: an analysis of alignment quality and stability

J Mol Biol. 1997 Jun 27;269(5):902-43. doi: 10.1006/jmbi.1997.1008.

Abstract

Methods that compare a protein sequence directly to a structure can be divided into those that construct a molecular model (threading methods) and those that perform a sequence alignment with the structure encoded as a sequence of structural states (one-dimensional/three-dimensional (1D/3D) matching). The former take into account the internal packing of the molecule but the latter do not. On the other hand, it is simple to include multiple sequence data in a 1D/3D comparison but difficult in a threading method. Here, a protein sequence/structure alignment method is described that uses a combination of matching predicted and observed residue exposure, predicted and observed secondary structure (1D/3D) together with pairwise packing interactions in the core (threading). Using a variety of distantly related and analogous protein structures, the multiple sequence threading (MST) method was compared to a single sequence threading (SST) method (that uses complex potentials of mean-force) and also to a multiple sequence alignment (MSA) program. It was found that the MST method produced alignments that were better than the best that could be obtained with either the SST or MSA method. The method was found to be stable to error in both secondary structure prediction and predicted exposure and also under variation of the key parameters (fully described in an Appendix). The contribution of the pairwise term was found to be small but without it, the correct alignments were less stable and structurally unreasonable deletions were observed when matching against larger structures. Using the parameters derived for alignment, the method was able to recognise related folds in the structure databank with a specificity comparable to other methods.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence*
  • Amino Acids / chemistry
  • Bacterial Proteins / chemistry
  • Escherichia coli Proteins*
  • Hemoglobins / chemistry
  • Hydroxymethylbilane Synthase*
  • Immunoglobulins / chemistry
  • Leghemoglobin / chemistry
  • Models, Molecular
  • Models, Theoretical
  • Molecular Sequence Data
  • Myoglobin / chemistry
  • Pattern Recognition, Automated
  • Protein Conformation*
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / classification
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Amino Acids
  • Bacterial Proteins
  • Escherichia coli Proteins
  • Hemoglobins
  • Immunoglobulins
  • Leghemoglobin
  • Myoglobin
  • Proteins
  • Hydroxymethylbilane Synthase
  • hemC protein, E coli

Associated data

  • GENBANK/E64429
  • GENBANK/S50177
  • GENBANK/S50178
  • GENBANK/S59519
  • PDB/1AAK
  • PDB/1ACF
  • PDB/1BTN
  • PDB/1CDG
  • PDB/1CHD
  • PDB/1COT
  • PDB/1CPC
  • PDB/1HAR
  • PDB/1LIS
  • PDB/1LLA
  • PDB/1MBA
  • PDB/1NPK
  • PDB/1NTR
  • PDB/1SCU
  • PDB/2FCR
  • PDB/2NAC
  • PDB/2PNA
  • PDB/2TGI
  • PDB/3CHY
  • PDB/3DPA
  • PDB/3FAB
  • PDB/3HHR
  • PDB/4FGF
  • PDB/4FXN
  • PDB/4TRX
  • PDB/7RSA