Statistical Models for the Analysis of Isobaric Tags Multiplexed Quantitative Proteomics

J Proteome Res. 2017 Sep 1;16(9):3124-3136. doi: 10.1021/acs.jproteome.6b01050. Epub 2017 Aug 18.

Abstract

Mass spectrometry is being used to identify protein biomarkers that can facilitate development of drug treatment. Mass spectrometry-based labeling proteomic experiments result in complex proteomic data that is hierarchical in nature often with small sample size studies. The generalized linear model (GLM) is the most popular approach in proteomics to compare protein abundances between groups. However, GLM does not address all the complexities of proteomics data such as repeated measures and variance heterogeneity. Linear models for microarray data (LIMMA) and mixed models are two approaches that can address some of these data complexities to provide better statistical estimates. We compared these three statistical models (GLM, LIMMA, and mixed models) under two different normalization approaches (quantile normalization and median sweeping) to demonstrate when each approach is the best for tagged proteins. We evaluated these methods using a spiked-in data set of known protein abundances, a systemic lupus erythematosus (SLE) data set, and simulated data from multiplexed labeling experiments that use tandem mass tags (TMT). Data are available via ProteomeXchange with identifier PXD005486. We found median sweeping to be a preferred approach of data normalization, and with this normalization approach there was overlap with findings across all methods with GLM being a subset of mixed models. The conclusion is that the mixed model had the best type I error with median sweeping, whereas LIMMA had the better overall statistical properties regardless of normalization approaches.

Keywords: TMT; biomarkers; mixed models; proteomics; statistical models.

MeSH terms

  • Blood Proteins / chemistry
  • Blood Proteins / isolation & purification*
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Escherichia coli Proteins / chemistry
  • Escherichia coli Proteins / isolation & purification*
  • Humans
  • Lupus Erythematosus, Systemic / blood
  • Lupus Erythematosus, Systemic / diagnosis
  • Lupus Erythematosus, Systemic / genetics*
  • Lupus Erythematosus, Systemic / pathology
  • Models, Statistical*
  • Protein Array Analysis / statistics & numerical data*
  • Proteomics / methods
  • Proteomics / statistics & numerical data
  • Staining and Labeling / methods

Substances

  • Blood Proteins
  • Escherichia coli Proteins