Proteogenomics: needs and roles to be filled by proteomics in genome annotation

Brief Funct Genomic Proteomic. 2008 Jan;7(1):50-62. doi: 10.1093/bfgp/eln010. Epub 2008 Mar 10.

Abstract

While genome sequencing efforts reveal the basic building blocks of life, a genome sequence alone is insufficient for elucidating biological function. Genome annotation--the process of identifying genes and assigning function to each gene in a genome sequence--provides the means to elucidate biological function from sequence. Current state-of-the-art high-throughput genome annotation uses a combination of comparative (sequence similarity data) and non-comparative (ab initio gene prediction algorithms) methods to identify protein-coding genes in genome sequences. Because approaches used to validate the presence of predicted protein-coding genes are typically based on expressed RNA sequences, they cannot independently and unequivocally determine whether a predicted protein-coding gene is translated into a protein. With the ability to directly measure peptides arising from expressed proteins, high-throughput liquid chromatography-tandem mass spectrometry-based proteomics approaches can be used to verify coding regions of a genomic sequence. Here, we highlight several ways in which high-throughput tandem mass spectrometry-based proteomics can improve the quality of genome annotations and suggest that it could be efficiently applied during the gene calling process so that the improvements are propagated through the subsequent functional annotation process.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Alternative Splicing
  • Codon, Initiator
  • Codon, Terminator
  • Genes
  • Genome*
  • Genome, Bacterial
  • Genomics*
  • Mass Spectrometry
  • Open Reading Frames
  • Proteomics / methods*

Substances

  • Codon, Initiator
  • Codon, Terminator