Iterative genome correction largely improves proteomic analysis of nonmodel organisms

J Proteome Res. 2014 Jun 6;13(6):2724-34. doi: 10.1021/pr500369b. Epub 2014 May 19.

Abstract

The current application and development of proteomic studies typically depend on the availability of sequenced genomes. Protein identification based on the detected peptides with liquid chromatography tandem mass spectrometry is limited by the absence of sequenced genomes in many nonmodel organisms. In this study, we demonstrated a new strategy based on our stable, accurate, and error-tolerant FANSe (Fast and Accurate mapping tool for Nucleotide Sequencing datasets) mapping algorithm to correct genome sequences in an iterative manner. To evaluate the efficiency of the corrected genome databases in proteomic study, MS/MS spectra of whole proteome extracted from a Bacillus pumilus strain without complete genome sequence were searched against the protein sequence databases derived from the complete reference genome sequence of a homologous bacterium and from the corrected genome sequence. The results indicated that the corrected protein sequence database could significantly facilitate peptide/protein identification. Importantly, this strategy can help to detect novel peptide variants. This strategy of genome correction will promote the development of functional proteomics in nonmodel organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacillus / genetics*
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • Base Sequence
  • Databases, Protein
  • Genetic Variation
  • Genome, Bacterial*
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation
  • Molecular Sequence Data
  • Peptide Fragments / chemistry
  • Peptide Mapping
  • Proteome / chemistry
  • Proteome / genetics*
  • Proteomics
  • Sequence Analysis, DNA
  • Tandem Mass Spectrometry

Substances

  • Bacterial Proteins
  • Peptide Fragments
  • Proteome