Identifying unreliable predictions in clinical risk models

Paul D Myers; Kenney Ng; Kristen Severson; Uri Kartoun; Wangzhi Dai; Wei Huang; Frederick A Anderson; Collin M Stultz

doi:10.1038/s41746-019-0209-7

Identifying unreliable predictions in clinical risk models

NPJ Digit Med. 2020 Jan 23:3:8. doi: 10.1038/s41746-019-0209-7. eCollection 2020.

Authors

Paul D Myers¹, Kenney Ng², Kristen Severson², Uri Kartoun², Wangzhi Dai¹, Wei Huang³, Frederick A Anderson³, Collin M Stultz^{1

4

5}

Affiliations

¹ 1Department of Electrical Engineering and Computer Science and Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, MA USA.
² Center for Computational Health, IBM Research, Cambridge, MA USA.
³ 3Center for Outcomes Research, University of Massachusetts Medical School, Worcester, MA USA.
⁴ 4Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA USA.
⁵ 5Division of Cardiology, Massachusetts General Hospital, Boston, MA USA.

Abstract

The ability to identify patients who are likely to have an adverse outcome is an essential component of good clinical care. Therefore, predictive risk stratification models play an important role in clinical decision making. Determining whether a given predictive model is suitable for clinical use usually involves evaluating the model's performance on large patient datasets using standard statistical measures of success (e.g., accuracy, discriminatory ability). However, as these metrics correspond to averages over patients who have a range of different characteristics, it is difficult to discern whether an individual prediction on a given patient should be trusted using these measures alone. In this paper, we introduce a new method for identifying patient subgroups where a predictive model is expected to be poor, thereby highlighting when a given prediction is misleading and should not be trusted. The resulting "unreliability score" can be computed for any clinical risk model and is suitable in the setting of large class imbalance, a situation often encountered in healthcare settings. Using data from more than 40,000 patients in the Global Registry of Acute Coronary Events (GRACE), we demonstrate that patients with high unreliability scores form a subgroup in which the predictive model has both decreased accuracy and decreased discriminatory ability.

Keywords: Predictive markers; Prognostic markers; Risk factors.