We present a non-parametric approach for qualitatively selecting candidate genes to characterize several criteria that are nested among genes selected on the basis of their individual, similar effects upon an array-wide closeness measure. In this setting, a goal is to obtain a reliable characterization of phenotypes, based on very high-dimensional data from a few samples. As opposed to a distance-based approach, the proposed measure defines closeness based on gene signal profiles (functionals) rather than on isolated (numerical) differences in each gene between samples. By using such a measure to characterize intensity differences, we effectively separate biological from artifactual variation in expression, due to tissue effects or signal calibration. Based on this measure, we successively examine the significance of the following: a set of similarly behaved genes relative to all arrayed genes, a set of candidate genes relative to similarly behaved genes, individual candidate genes relative to non-candidates, and the direction, as over- or under-expressed, of candidate genes. In each setting, sample pairs are the units of analysis, with U-statistics the theoretical framework. We illustrate the method on a microarray experiment, where the goal is to select sets of genes that characterize a type of skin cancer and its histological subtypes.
Copyright (c) 2006 John Wiley & Sons, Ltd.