An investigation of two multivariate permutation methods for controlling the false discovery proportion

Stat Med. 2007 Oct 30;26(24):4428-40. doi: 10.1002/sim.2865.

Abstract

Identifying genes that are differentially expressed between classes of samples is an important objective of many microarray experiments. Because of the thousands of genes typically considered, there is a tension between identifying as many of the truly differentially expressed genes as possible, but not too many genes that are not really differentially expressed (false discoveries). Controlling the proportion of identified genes that are false discoveries, the false discovery proportion (FDP), is a goal of interest. In this paper, two multivariate permutation methods are investigated for controlling the FDP. One is based on a multivariate permutation testing (MPT) method that probabilistically controls the number of false discoveries, and the other is based on the Significance Analysis of Microarrays (SAM) procedure that provides an estimate of the FDP. Both methods account for the correlations among the genes. We find the ability of the methods to control the proportion of false discoveries varies substantially depending on the implementation characteristics. For example, for both methods one can proceed from the most significant gene to the least significant gene until the estimated FDP is just above the targeted level ('top-down' approach), or from the least significant gene to the most significant gene until the estimated FDP is just below the targeted level ('bottom-up' approach). We find that the top-down MPT-based method probabilistically controls the FDP, whereas our implementation of the top-down SAM-based method does not. Bottom-up MPT-based or SAM-based methods can result in poor control of the FDP.

Publication types

  • Comparative Study

MeSH terms

  • Breast Neoplasms / genetics
  • False Positive Reactions
  • Female
  • Gene Expression Profiling / statistics & numerical data*
  • Genes, BRCA1
  • Genes, BRCA2
  • Humans
  • Models, Statistical
  • Multivariate Analysis
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*