Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation

Biostatistics. 2009 Jan;10(1):60-9. doi: 10.1093/biostatistics/kxn015. Epub 2008 Jun 6.

Abstract

In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biometry / methods
  • Confidence Intervals
  • False Positive Reactions
  • Gene Expression Profiling / methods*
  • Humans
  • Male
  • Models, Statistical*
  • Multivariate Analysis
  • Oligonucleotide Array Sequence Analysis / methods
  • Prostatic Neoplasms / genetics
  • Reproducibility of Results
  • Sample Size
  • Statistics, Nonparametric*
  • Uncertainty