Representation of amino acids as five-bit or three-bit patterns for filtering protein databases

Bioinformatics. 2001 Aug;17(8):676-85. doi: 10.1093/bioinformatics/17.8.676.

Abstract

Motivation: We propose representing amino acids by bit-patterns so they may be used in a filter algorithm for similarity searches over protein databases, to rapidly eliminate non-homologous regions of database sequences. The filter algorithm would be based on dynamic programming optimization. It would have the advantage over previous filter algorithms that its substitution scoring function distinguishes between conservative and non-conservative amino acid substitutions.

Results: Simulated annealing was used to search for the best five-bit or three-bit patterns to represent amino acids, where similar amino acids were given similar bit-patterns. The similarity between amino acids was estimated from the BLOSUM45 matrix. Representing amino acids by these five-bit and three-bit patterns, the Escherichia coli PhoE precursor and the bacteriophage PA2 LC precursor were aligned. The alignments were nearly the same as that obtained when BLOSUM45 was used to score substitutions.

Availability: The C code of the optimization algorithm for searching for the optimal bit-pattern representation of amino acids is available from the authors upon request.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acid Substitution
  • Amino Acids*
  • Bacteriophages / genetics
  • Computational Biology
  • Conserved Sequence
  • Databases, Protein*
  • Escherichia coli / genetics
  • Escherichia coli Proteins
  • Porins / chemistry
  • Porins / genetics
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Amino Acid
  • Viral Proteins / chemistry
  • Viral Proteins / genetics

Substances

  • Amino Acids
  • Escherichia coli Proteins
  • Porins
  • Viral Proteins
  • PhoE protein, E coli