Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours?

Annu Int Conf IEEE Eng Med Biol Soc. 2011:2011:3573-6. doi: 10.1109/IEMBS.2011.6090596.

Abstract

Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Behavior*
  • Disease*
  • Humans
  • Models, Theoretical*