Background: This paper explores and evaluates the application of classical and dominance-based rough set theory (RST) for the development of data-driven prognostic classification models for hospice referral. In this work, rough set based models are compared with other data-driven methods with respect to two factors related to clinical credibility: accuracy and accessibility. Accessibility refers to the ability of the model to provide traceable, interpretable results and use data that is relevant and simple to collect.
Methods: We utilize retrospective data from 9,103 terminally ill patients to demonstrate the design and implementation RST- based models to identify potential hospice candidates. The classical rough set approach (CRSA) provides methods for knowledge acquisition, founded on the relational indiscernibility of objects in a decision table, to describe required conditions for membership in a concept class. On the other hand, the dominance-based rough set approach (DRSA) analyzes information based on the monotonic relationships between condition attributes values and their assignment to the decision class. CRSA decision rules for six-month patient survival classification were induced using the MODLEM algorithm. Dominance-based decision rules were extracted using the VC-DomLEM rule induction algorithm.
Results: The RST-based classifiers are compared with other predictive and rule based decision modeling techniques, namely logistic regression, support vector machines, random forests and C4.5. The RST-based classifiers demonstrate average AUC of 69.74 % with MODLEM and 71.73 % with VC-DomLEM, while the compared methods achieve average AUC of 74.21 % for logistic regression, 73.52 % for support vector machines, 74.59 % for random forests, and 70.88 % for C4.5.
Conclusions: This paper contributes to the growing body of research in RST-based prognostic models. RST and its extensions posses features that enhance the accessibility of clinical decision support models. While the non-rule-based methods-logistic regression, support vector machines and random forests-were found to achieve higher AUC, the performance differential may be outweighed by the benefits of the rule-based methods, particularly in the case of VC-DomLEM. Developing prognostic models for hospice referrals is a challenging problem resulting in substandard performance for all of the evaluated classification methods.