The application of rule-based methods to class prediction problems in genomics

J Comput Biol. 2003;10(5):689-98. doi: 10.1089/106652703322539033.

Abstract

We propose a method for constructing classifiers using logical combinations of elementary rules. The method is a form of rule-based classification, which has been widely discussed in the literature. In this work we focus specifically on issues that arise in the context of classifying cell samples based on RNA or protein expression measurements. The basic idea is to specify elementary rules that exhibit a locally strong pattern in favor of a single class. Strict admissibility criteria are imposed to produce a manageable universe of elementary rules. Then the elementary rules are combined using a set covering algorithm to form a composite rule that achieves a perfect fit to the training data. The user has explicit control over a parameter that determines the composite rule's level of redundancy and parsimony. This built-in control, along with the simplicity of interpreting the rules, makes the method particularly useful for classification problems in genomics. We demonstrate the new method using several microarray datasets and examine its generalization performance. We also draw comparisons to other machine-learning strategies such as CART, ID3, and C4.5.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Breast Neoplasms / genetics
  • Classification*
  • Computational Biology / methods
  • Female
  • Gene Expression Regulation
  • Genomics*
  • Humans
  • Lymphoma / classification
  • Lymphoma / genetics
  • Models, Genetic
  • Proteins / genetics
  • RNA / genetics

Substances

  • Proteins
  • RNA