Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. Tree-based methods provide a natural framework for creating patient subgroups for risk classification. In this article, we review methodological and practical aspects of tree-based methods, with a focus on diagnostic classification (binary outcome) and prognostication (censored survival outcome). Creating an ensemble of trees improves prediction accuracy and addresses instability in a single tree. Ensemble methods are described that rely on resampling from the original data. Finally, we present methods to identify a representative tree from the ensemble that can be used for clinical decision-making. The methods are illustrated using data on ischemic heart disease classification, and data from the SPRINT trial (Systolic Blood Pressure Intervention Trial) on adverse events in patients with high blood pressure.
Keywords: classification; clinical decision-making; coronary artery disease; hypertension; risk.