Evaluation of a statistical equivalence test applied to microarray data

J Biopharm Stat. 2010 Mar;20(2):240-66. doi: 10.1080/10543400903572738.

Abstract

Microarray technology is commonly used to identify differentially expressed (DE) genes across conditions. A related issue that has rarely been discussed but is equally important is to identify commonly expressed genes or constantly expressed genes across different organs, tissues, or species. A common practice in the literature for such studies is to apply the differential expression analysis and conclude that a gene is unchanged if there is no statistical evidence to conclude for differential expression. However, genes that are not statistically significantly DE could be (1) truly non-DE genes or (2) truly DE genes not detected by the statistical test of differential expression due to lack of power resulted from high noise level or lack of replication. Therefore, the practice of treating non-statistically significantly DE genes as non-DE genes has the risk of including genes that are truly DE without controlling such errors. We argue that if one wants to identify genes that are truly non-DE, one needs to show statistical evidence through valid statistical tests with the appropriate type I error rate control. In this paper, we consider the identification of non-DE genes through statistical equivalence tests under the framework of multiple testing. In particular, we consider the average equivalence criterion and study the power and false discovery rate (FDR) of the standard average equivalence test, the "two one-sided tests" (TOST), through extensive simulation studies based on real microarray data sets. We study the effects of various factors that can affect the power and FDR of the equivalence test including the proportion of non-DE genes. We also compare the ROC curves of the equivalence test with those of the naive method of selecting genes that are not statistically significant DE.

Publication types

  • Evaluation Study

MeSH terms

  • Animals
  • Computer Simulation
  • Data Interpretation, Statistical
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Regulation
  • Humans
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • ROC Curve
  • Reproducibility of Results
  • Sample Size