Background: This manuscript describes an approach for analyzing large amounts of disparate clinical data to elucidate the most impactful factor(s) that relate to a meaningful clinical outcome, in this case, the quality of life of cancer patients. The relationships between clinical and quality of life variables were evaluated using the EORTC QLQ-C30 global health domain--a validated surrogate variable for overall cancer patient well-being.
Methods: A cross-sectional study design was used to evaluate the determinants of global health in cancer patients who initiated treatment at two regional medical centers between January 2001 and December 2009. Variables analyzed included 15 EORTC QLQ-C30 scales, age at diagnosis, gender, newly diagnosed/ recurrent disease status, and stage. The decision tree algorithm, perhaps unfamiliar to practicing clinicians, evaluates the relative contribution of individual parameters in classifying a clinically meaningful functional endpoint, such as the global health of a patient.
Findings: Multiple patient characteristics were identified as important contributors. Fatigue, in particular, emerged as the most prevalent indicator of cancer patients' quality of life in 16/23 clinically relevant subsets. This analysis allowed results to be stated in a clinically-intuitive, rule set format using the language and quantities of the Quality of Life (QoL) tool itself.
Interpretation: By applying the classification algorithms to a large data set, identification of fatigue as a root factor in driving global health and overall QoL was revealed. The ability to practice mining of clinical data sets to uncover critical clinical insights that are immediately applicable to patient care practices is illustrated.