Ten years of bacterial genome sequencing: comparative-genomics-based discoveries

Funct Integr Genomics. 2006 Jul;6(3):165-85. doi: 10.1007/s10142-006-0027-2. Epub 2006 May 12.

Abstract

It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.

Publication types

  • Comparative Study
  • Historical Article
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Bacterial Vaccines / biosynthesis
  • Computer Simulation
  • Databases, Nucleic Acid / statistics & numerical data
  • Genetic Code
  • Genetic Variation
  • Genome, Bacterial*
  • Genomic Islands
  • Genomics / methods*
  • History, 20th Century
  • Intellectual Property
  • Interspersed Repetitive Sequences
  • Models, Genetic
  • Phylogeny
  • Proteome / analysis
  • Sequence Analysis, DNA

Substances

  • Bacterial Vaccines
  • Proteome