Experiment-specific estimation of peptide identification probabilities using a randomized database

Roger Higdon; Jason M Hogan; Natali Kolker; Gerald van Belle; Eugene Kolker

doi:10.1089/omi.2007.0040

Experiment-specific estimation of peptide identification probabilities using a randomized database

OMICS. 2007 Winter;11(4):351-65. doi: 10.1089/omi.2007.0040.

Authors

Roger Higdon¹, Jason M Hogan, Natali Kolker, Gerald van Belle, Eugene Kolker

Affiliation

¹ Seattle Children's Hospital and Regional Medical Center, Seattle, WA 98101, USA.

PMID: 18092908
DOI: 10.1089/omi.2007.0040

Abstract

Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.

Publication types

Comparative Study
Evaluation Study
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Databases, Protein*
Models, Biological
Peptides / chemistry*
Probability
Random Allocation
Regression Analysis
Software

Substances

Peptides

Grants and funding

GM076680-01A1/GM/NIGMS NIH HHS/United States