pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics

J Proteome Res. 2019 Jul 5;18(7):2747-2758. doi: 10.1021/acs.jproteome.8b00993. Epub 2019 Jun 24.

Abstract

As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.

Keywords: false-negative rate; false-positive rate; tandem mass spectrometry; target-decoy approach; validation methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Peptides / analysis*
  • Proteome / analysis
  • Proteomics / methods*
  • Reproducibility of Results
  • Scientific Experimental Error
  • Sensitivity and Specificity
  • Validation Studies as Topic*

Substances

  • Peptides
  • Proteome