SATCHMO: sequence alignment and tree construction using hidden Markov models

Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.

Abstract

Motivation: Aligning multiple proteins based on sequence information alone is challenging if sequence identity is low or there is a significant degree of structural divergence. We present a novel algorithm (SATCHMO) that is designed to address this challenge. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree. The alignment at a given node contains all sequences within its sub-tree, and predicts which positions in those sequences are alignable and which are not. Aligned regions therefore typically get shorter on a path from a leaf to the root as sequences diverge in structure. Current methods either regard all positions as alignable (e.g. ClustalW), or align only those positions believed to be homologous across all sequences (e.g. profile HMM methods); by contrast SATCHMO makes different predictions of alignable regions in different subgroups. SATCHMO generates profile hidden Markov models at each node; these are used to determine branching order, to align sequences and to predict structurally alignable regions.

Results: In experiments on the BAliBASE benchmark alignment database, SATCHMO is shown to perform comparably to ClustalW and the UCSC SAM HMM software. Results using SATCHMO to identify protein domains are demonstrated on potassium channels, with implications for the mechanism by which tumor necrosis factor alpha affects potassium current.

Availability: The software is available for download from http://www.drive5.com/lobster/index.htm

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Gene Expression Profiling / methods*
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical*
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Software*

Substances

  • Proteins