Protein differential expression analysis plays an important role in the understanding of molecular mechanisms as well as the pathogenesis of complex diseases. With the rapid development of mass spectrometry, shotgun proteomics using spectral counts has become a prevailing method for the quantitative analysis of complex protein mixtures. Existing methods in differential proteomics expression typically carry out analysis at the single-protein level. However, it is well-known that proteins interact with each other when they function in biological processes. In this study, focusing on biological network modules, we proposed a negative binomial generalized linear model for differential expression analysis of spectral count data in shotgun proteomics. In order to show the efficacy of the model in protein expression analysis at the level of protein modules, we conducted two simulation studies using synthetic data sets generated from theoretical distribution of count data and a real data set with shuffled counts. Then, we applied our method to a colorectal cancer data set and a nonsmall cell lung cancer data set. When compared with single-protein analysis methods, the results showed that module-based statistical model which takes account of the interactions among proteins led to more effective identification of subtle but coordinated changes at the systems level.
Keywords: biological network module; differential expression analysis; negative binomial model; shotgun proteomics; spectral count.