Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry

Bioinformatics. 2013 Apr 1;29(7):821-9. doi: 10.1093/bioinformatics/btt056. Epub 2013 Feb 17.

Abstract

Motivation: Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions.

Results: The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study.

Availability: Our algorithm is implemented in Java and is freely available for academic use from http://genes.toronto.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Cluster Analysis
  • Fungal Proteins / metabolism
  • Humans
  • Phosphopeptides / chemistry
  • Protein Interaction Mapping
  • Protein Processing, Post-Translational*
  • Proteome / metabolism
  • Proteomics / methods
  • Statistics, Nonparametric
  • Tandem Mass Spectrometry*

Substances

  • Fungal Proteins
  • Phosphopeptides
  • Proteome