Nested, non-parametric, correlative analysis of microarrays for heterogenous phenotype characterization

Stat Med. 2007 Feb 28;26(5):1090-101. doi: 10.1002/sim.2596.

Abstract

We present a non-parametric approach for qualitatively selecting candidate genes to characterize several criteria that are nested among genes selected on the basis of their individual, similar effects upon an array-wide closeness measure. In this setting, a goal is to obtain a reliable characterization of phenotypes, based on very high-dimensional data from a few samples. As opposed to a distance-based approach, the proposed measure defines closeness based on gene signal profiles (functionals) rather than on isolated (numerical) differences in each gene between samples. By using such a measure to characterize intensity differences, we effectively separate biological from artifactual variation in expression, due to tissue effects or signal calibration. Based on this measure, we successively examine the significance of the following: a set of similarly behaved genes relative to all arrayed genes, a set of candidate genes relative to similarly behaved genes, individual candidate genes relative to non-candidates, and the direction, as over- or under-expressed, of candidate genes. In each setting, sample pairs are the units of analysis, with U-statistics the theoretical framework. We illustrate the method on a microarray experiment, where the goal is to select sets of genes that characterize a type of skin cancer and its histological subtypes.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Heterogeneity*
  • Humans
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Phenotype*
  • Skin Neoplasms / classification
  • Skin Neoplasms / genetics
  • Statistics, Nonparametric*
  • United States