A new probabilistic database search algorithm for ETD spectra

Rovshan G Sadygov; David M Good; Danielle L Swaney; Joshua J Coon

doi:10.1021/pr900153b

A new probabilistic database search algorithm for ETD spectra

J Proteome Res. 2009 Jun;8(6):3198-205. doi: 10.1021/pr900153b.

Authors

Rovshan G Sadygov¹, David M Good, Danielle L Swaney, Joshua J Coon

Affiliation

¹ Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555, USA. [email protected]

Abstract

Peptide characterization using electron transfer dissociation (ETD) is an important analytical tool for protein identification. The fragmentation observed in ETD spectra is complementary to that seen when using the traditional dissociation method, collision activated dissociation (CAD). Applications of ETD enhance the scope and complexity of the peptides that can be studied by mass spectrometry-based methods. For example, ETD is shown to be particularly useful for the study of post-translationally modified peptides. To take advantage of the power provided by ETD, it is important to have an ETD-specific database search engine, an integral tool of mass spectrometry-based analytical proteomics. In this paper, we report on our development of a database search engine using ETD spectra and protein sequence databases to identify peptides. The search engine is based on the probabilistic modeling of shared peaks count and shared peaks intensity between the spectra and the peptide sequences. The shared peaks count accounts for the cumulative variations from amino acid sequences, while shared peaks intensity models the variations between the candidate sequence and product ion intensities. To demonstrate the utility of this algorithm for searching real-world data, we present the results of applications of this model to two high-throughput data sets. Both data sets were obtained from yeast whole cell lysates. The first data set was obtained from a sample digested by Lys-C, and the second data set was obtained by a digestion using trypsin. We searched the data sets against a combined forward and reversed yeast protein database to estimate false discovery rates. We compare the search results from the new methods with the results from a search engine often employed for ETD spectra, OMSSA. Our findings show that overall the new model performs comparably to OMSSA for low false discovery rates. At the same time, we demonstrate that there are substantial differences with OMSSA for results on subsets of data. Therefore, we conclude the new model can be considered as being complementary to previously developed models.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Animals
Databases, Protein*
Metalloendopeptidases / metabolism
Myoglobin / chemistry
Myoglobin / metabolism
Peptide Fragments / chemistry
Peptide Fragments / metabolism
Proteomics / methods*
ROC Curve
Saccharomyces cerevisiae Proteins / chemistry
Saccharomyces cerevisiae Proteins / metabolism
Tandem Mass Spectrometry / methods*
Trypsin / metabolism

Substances

Myoglobin
Peptide Fragments
Saccharomyces cerevisiae Proteins
Trypsin
Metalloendopeptidases
peptidyl-Lys metalloendopeptidase

Abstract

Publication types

MeSH terms

Substances

Grants and funding