A new probabilistic database search algorithm for ETD spectra

J Proteome Res. 2009 Jun;8(6):3198-205. doi: 10.1021/pr900153b.

Abstract

Peptide characterization using electron transfer dissociation (ETD) is an important analytical tool for protein identification. The fragmentation observed in ETD spectra is complementary to that seen when using the traditional dissociation method, collision activated dissociation (CAD). Applications of ETD enhance the scope and complexity of the peptides that can be studied by mass spectrometry-based methods. For example, ETD is shown to be particularly useful for the study of post-translationally modified peptides. To take advantage of the power provided by ETD, it is important to have an ETD-specific database search engine, an integral tool of mass spectrometry-based analytical proteomics. In this paper, we report on our development of a database search engine using ETD spectra and protein sequence databases to identify peptides. The search engine is based on the probabilistic modeling of shared peaks count and shared peaks intensity between the spectra and the peptide sequences. The shared peaks count accounts for the cumulative variations from amino acid sequences, while shared peaks intensity models the variations between the candidate sequence and product ion intensities. To demonstrate the utility of this algorithm for searching real-world data, we present the results of applications of this model to two high-throughput data sets. Both data sets were obtained from yeast whole cell lysates. The first data set was obtained from a sample digested by Lys-C, and the second data set was obtained by a digestion using trypsin. We searched the data sets against a combined forward and reversed yeast protein database to estimate false discovery rates. We compare the search results from the new methods with the results from a search engine often employed for ETD spectra, OMSSA. Our findings show that overall the new model performs comparably to OMSSA for low false discovery rates. At the same time, we demonstrate that there are substantial differences with OMSSA for results on subsets of data. Therefore, we conclude the new model can be considered as being complementary to previously developed models.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Databases, Protein*
  • Metalloendopeptidases / metabolism
  • Myoglobin / chemistry
  • Myoglobin / metabolism
  • Peptide Fragments / chemistry
  • Peptide Fragments / metabolism
  • Proteomics / methods*
  • ROC Curve
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / metabolism
  • Tandem Mass Spectrometry / methods*
  • Trypsin / metabolism

Substances

  • Myoglobin
  • Peptide Fragments
  • Saccharomyces cerevisiae Proteins
  • Trypsin
  • Metalloendopeptidases
  • peptidyl-Lys metalloendopeptidase