An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome

PLoS One. 2010 Jan 28;5(1):e8949. doi: 10.1371/journal.pone.0008949.

Abstract

Background: Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool.

Results: Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR)<5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci.

Conclusions: MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Cell Line, Tumor
  • Genome, Human*
  • Humans
  • Mass Spectrometry / methods*
  • Molecular Sequence Data
  • Sequence Homology, Amino Acid