Identifying splits with clear separation: a new class discovery method for gene expression data

Bioinformatics. 2001:17 Suppl 1:S107-14. doi: 10.1093/bioinformatics/17.suppl_1.s107.

Abstract

We present a new class discovery method for microarray gene expression data. Based on a collection of gene expression profiles from different tissue samples, the method searches for binary class distinctions in the set of samples that show clear separation in the expression levels of specific subsets of genes. Several mutually independent class distinctions may be found, which is difficult to obtain from most commonly used clustering algorithms. Each class distinction can be biologically interpreted in terms of its supporting genes. The mathematical characterization of the favored class distinctions is based on statistical concepts. By analyzing three data sets from cancer gene expression studies, we demonstrate that our method is able to detect biologically relevant structures, for example cancer subtypes, in an unsupervised fashion.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology
  • Databases, Factual
  • Gene Expression
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Leukemia / genetics
  • Melanoma / genetics
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Oncogenes