Large scale bacterial gene discovery by similarity search

Nat Genet. 1994 Jun;7(2):205-14. doi: 10.1038/ng0694-205.

Abstract

DNA sequencing efforts frequently uncover genes other than the targeted ones. We have used rapid database scanning methods to search for undescribed eubacterial and archean protein coding frames in regions flanking known genes. By searching all prokaryotic DNA sequences not marked as coding for proteins or stable RNAs against the protein databases, we have identified more than 450 new examples of bacterial proteins, as well as a smaller number of possible revisions to known proteins, at a surprisingly high rate of one new protein or revision for every 24 initial DNA sequences or 8,300 nucleotides examined. Seven proteins are members of families which have not been described in prokaryotic sequences. We also describe 49 re-interpretations of existing sequence data of particular biological significance.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / genetics*
  • Codon / genetics
  • DNA, Bacterial / genetics*
  • Databases, Factual*
  • Genes, Bacterial*
  • Molecular Sequence Data
  • Open Reading Frames
  • Protein Biosynthesis
  • Sequence Homology, Amino Acid

Substances

  • Bacterial Proteins
  • Codon
  • DNA, Bacterial