A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data

Biostatistics. 2007 Oct;8(4):744-55. doi: 10.1093/biostatistics/kxm002. Epub 2007 Jan 22.

Abstract

Due to advances in experimental technologies, it is feasible to collect measurements for a large number of variables. When these variables are simultaneously screened by a statistical test, it is necessary to consider the adjustment for multiple hypothesis testing. The false discovery rate has been proposed and widely used to address this issue. A related problem is the estimation of the proportion of true null hypotheses. The long-standing difficulty to this problem is the identifiability of the nonparametric model. In this study, we propose a moment-based method coupled with sample splitting for estimating this proportion. If the p values from the alternative hypothesis are homogeneously distributed, then the proposed method will solve the identifiability and give its optimal performances. When the p values from the alternative hypothesis are heterogeneously distributed, we propose to approximate this mixture distribution so that the identifiability can be achieved. Theoretical aspects of the approximation error are discussed. The proposed estimation method is completely nonparametric and simple with an explicit formula. Simulation studies show the favorable performances of the proposed method when it is compared to the other existing methods. Two microarray gene expression data sets are considered for applications.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biometry
  • Confidence Intervals
  • Data Interpretation, Statistical
  • Gene Expression Profiling / statistics & numerical data*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*