Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra

Mol Cell Proteomics. 2009 Jan;8(1):53-69. doi: 10.1074/mcp.M800103-MCP200. Epub 2008 Aug 14.

Abstract

Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Databases, Protein*
  • Genome, Human
  • Humans
  • Molecular Sequence Data
  • Peptides / analysis
  • Peptides / chemistry*
  • Sequence Analysis, Protein / methods*
  • Shewanella / chemistry
  • Tandem Mass Spectrometry / methods*

Substances

  • Peptides