We developed a normalization method utilizing the expression levels of a panel of endogenous proteins as normalization standards (EPNS herein). We tested the validity of the method using two sets of tandem mass tag (TMT)-labeled data and found that this normalization method effectively reduced global intensity bias at the protein level. The coefficient of variation (CV) of the overall median was reduced by 55% and 82% on average, compared to the reduction by 72% and 86% after normalization using the upper quartile. Furthermore, we used differential protein expression analysis and statistical learning to identify biomarkers for colorectal cancer from a CPTAC data set. The expression changes of a panel of proteins, including NUP205, GTPBP4, CNN2, GNL3, and S100A11, all of which highly correlate with colorectal cancer. Applying these five proteins as model features, random forest modeling obtained prediction results with the maximum AUC of 0.9998 using EPNS-normalized data, comparing favorably to the AUC of 0.9739 using the raw data. Thus, the normalization method based on EPNS reduced the global intensity bias and is applicable for quantitative proteomic analysis.
Keywords: biomarker; endogenous protein normalization; global bias; quantitative proteomics.