Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector μ of a p-dimensional random vector X is μ0 . Proteins within the same biological pathway may correlate with one another in a complicated way, and type I error rates can be inflated if such correlations are incorrectly assumed to be absent. The inflation tends to be more pronounced when the sample size is very small or there is a large amount of missingness in the data, as is frequently the case in proteomic discovery studies. To tackle these challenges, we propose a regularized Hotelling's T2 (RHT) statistic together with a non-parametric testing procedure, which effectively controls the type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the RHT statistic under pertinent assumptions and compare the test performance with four existing methods through simulation examples. We apply the RHT test to a hormone therapy proteomics data set, and identify several interesting biological pathways for which blood serum concentrations changed following hormone therapy initiation.
Keywords: Hotelling’s T2; pathway analysis; proteomics; regularization.