MSstatsTMT: Statistical Detection of Differentially Abundant Proteins in Experiments with Isobaric Labeling and Multiple Mixtures

Ting Huang; Meena Choi; Manuel Tzouros; Sabrina Golling; Nikhil Janak Pandya; Balazs Banfai; Tom Dunkley; Olga Vitek

doi:10.1074/mcp.RA120.002105

MSstatsTMT: Statistical Detection of Differentially Abundant Proteins in Experiments with Isobaric Labeling and Multiple Mixtures

Mol Cell Proteomics. 2020 Oct;19(10):1706-1723. doi: 10.1074/mcp.RA120.002105. Epub 2020 Jul 17.

Authors

Ting Huang¹, Meena Choi¹, Manuel Tzouros², Sabrina Golling², Nikhil Janak Pandya², Balazs Banfai², Tom Dunkley², Olga Vitek³

Affiliations

¹ Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
² Roche Pharma Research and Early Development, Pharmaceutical Sciences-BiOmics and Pathology, Roche Innovation Center Basel, Hoffmann-La Roche Ltd, Basel, Switzerland.
³ Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA. Electronic address: [email protected].

Abstract

Tandem mass tag (TMT) is a multiplexing technology widely-used in proteomic research. It enables relative quantification of proteins from multiple biological samples in a single MS run with high efficiency and high throughput. However, experiments often require more biological replicates or conditions than can be accommodated by a single run, and involve multiple TMT mixtures and multiple runs. Such larger-scale experiments combine sources of biological and technical variation in patterns that are complex, unique to TMT-based workflows, and challenging for the downstream statistical analysis. These patterns cannot be adequately characterized by statistical methods designed for other technologies, such as label-free proteomics or transcriptomics. This manuscript proposes a general statistical approach for relative protein quantification in MS- based experiments with TMT labeling. It is applicable to experiments with multiple conditions, multiple biological replicate runs and multiple technical replicate runs, and unbalanced designs. It is based on a flexible family of linear mixed-effects models that handle complex patterns of technical artifacts and missing values. The approach is implemented in MSstatsTMT, a freely available open-source R/Bioconductor package compatible with data processing tools such as Proteome Discoverer, MaxQuant, OpenMS, and SpectroMine. Evaluation on a controlled mixture, simulated datasets, and three biological investigations with diverse designs demonstrated that MSstatsTMT balanced the sensitivity and the specificity of detecting differentially abundant proteins, in large-scale experiments with multiple biological mixtures.

Keywords: Mass spectrometry; TMT; bioinformatics software; hypothesis testing; mathematical modeling; multiple mixtures; protein quantification; quantification; statistics.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Humans
Isotope Labeling*
Proteome / metabolism*
Proteomics
Statistics as Topic*
Tandem Mass Spectrometry*

Substances

Proteome

Grants and funding

R01 LM013115/LM/NLM NIH HHS/United States