Focus on the spectra that matter by clustering of quantification data in shotgun proteomics

Nat Commun. 2020 Jun 26;11(1):3234. doi: 10.1038/s41467-020-17037-3.

Abstract

In shotgun proteomics, the analysis of label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow, thereby preventing valuable information from being discarded in the identification stage. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. This reduces search time due to the data reduction. We can now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Quandenser+Triqler outperforms the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins for all tested datasets. Software is available for all major operating systems at https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Databases, Protein
  • Escherichia coli Proteins / metabolism
  • HeLa Cells
  • Humans
  • Peptides / metabolism
  • Proteasome Endopeptidase Complex / metabolism
  • Proteome / metabolism
  • Proteomics*
  • Saccharomyces cerevisiae / metabolism
  • Software
  • Ubiquitin / metabolism

Substances

  • Escherichia coli Proteins
  • Peptides
  • Proteome
  • Ubiquitin
  • Proteasome Endopeptidase Complex