Uncertainty estimation of predictions of peptides' chromatographic retention times in shotgun proteomics

Bioinformatics. 2017 Feb 15;33(4):508-513. doi: 10.1093/bioinformatics/btw619.

Abstract

Motivation: Liquid chromatography is frequently used as a means to reduce the complexity of peptide-mixtures in shotgun proteomics. For such systems, the time when a peptide is released from a chromatography column and registered in the mass spectrometer is referred to as the peptide's retention time . Using heuristics or machine learning techniques, previous studies have demonstrated that it is possible to predict the retention time of a peptide from its amino acid sequence. In this paper, we are applying Gaussian Process Regression to the feature representation of a previously described predictor E lude . Using this framework, we demonstrate that it is possible to estimate the uncertainty of the prediction made by the model. Here we show how this uncertainty relates to the actual error of the prediction.

Results: In our experiments, we observe a strong correlation between the estimated uncertainty provided by Gaussian Process Regression and the actual prediction error. This relation provides us with new means for assessment of the predictions. We demonstrate how a subset of the peptides can be selected with lower prediction error compared to the whole set. We also demonstrate how such predicted standard deviations can be used for designing adaptive windowing strategies.

Contact: [email protected].

Availability and implementation: Our software and the data used in our experiments is publicly available and can be downloaded from https://github.com/statisticalbiotechnology/GPTime .

MeSH terms

  • Amino Acid Sequence
  • Chromatography, Liquid / methods
  • Mass Spectrometry / methods
  • Models, Theoretical*
  • Peptides / chemistry*
  • Proteomics / methods*
  • Software*
  • Uncertainty*

Substances

  • Peptides