Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes

Proteomics. 2016 Jan;16(2):226-40. doi: 10.1002/pmic.201500263. Epub 2015 Nov 23.

Abstract

Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.

Keywords: Archaea; Bacteria; Gene prediction; Genome annotation; Microbiology; Phylogeny.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Archaeal Proteins / genetics*
  • Bacterial Proteins / genetics*
  • Gene Transfer, Horizontal
  • Genome, Archaeal
  • Genome, Bacterial
  • Humans
  • Molecular Sequence Annotation
  • Open Reading Frames
  • Phylogeny
  • Proteome / genetics*
  • Proteomics*

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Proteome