Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra

Sangtae Kim; Nitin Gupta; Nuno Bandeira; Pavel A Pevzner

doi:10.1074/mcp.M800103-MCP200

Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra

Mol Cell Proteomics. 2009 Jan;8(1):53-69. doi: 10.1074/mcp.M800103-MCP200. Epub 2008 Aug 14.

Authors

Sangtae Kim¹, Nitin Gupta, Nuno Bandeira, Pavel A Pevzner

Affiliation

¹ Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.

Abstract

Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Databases, Protein*
Genome, Human
Humans
Molecular Sequence Data
Peptides / analysis
Peptides / chemistry*
Sequence Analysis, Protein / methods*
Shewanella / chemistry
Tandem Mass Spectrometry / methods*

Substances

Peptides

Abstract

Publication types

MeSH terms

Substances

Grants and funding