Gaining Confidence on Molecular Classification through Consensus Modeling and Validation

Toxicol Mech Methods. 2006;16(2-3):59-68. doi: 10.1080/15376520600558259.

Abstract

Current advances in genomics, proteomics, and metabonomics would result in a constellation of benefits in human health. Classification applying supervised learning methods to omics data as one of the molecular classification approaches has enjoyed its growing role in clinical application. However, the utility of a molecular classifier will not be fully appreciated unless its quality is carefully validated. A clinical omics data is usually noisy with the number of independent variables far more than the number of subjects and, possibly, with a skewed subject distribution. Given that, the consensus approach holds an advantage over a single classifier. Thus, the focus of this review is mainly placed on how validating a molecular classifier using Decision Forest (DF), a robust consensus approach. We recommended that a molecular classifier has to be assessed with respect to overall prediction accuracy, prediction confidence and chance correlation, which can be readily achieved in DF. The commonalities and differences between external validation and cross-validation are also discussed for perspective use of these methods to validate a DF classifier. In addition, the advantages of using consensus approaches for identification of potential biomarkers are also rationalized. Although specific DF examples are used in this review, the provided rationales and recommendations should be equally applicable to other consensus methods.