Inter-Rater Reliability and Impact of Disagreements on Acute Physiology and Chronic Health Evaluation IV Mortality Predictions

Crit Care Explor. 2019 Oct 30;1(10):e0059. doi: 10.1097/CCE.0000000000000059. eCollection 2019 Oct.

Abstract

Acute Physiology and Chronic Health Evaluation is a well-validated method to risk-adjust ICU patient outcomes. However, predictions may be affected by inter-rater reliability for manually entered elements. We evaluated inter-rater reliability for Acute Physiology and Chronic Health Evaluation IV manually entered elements among clinician abstractors and assessed the impacts of disagreements on mortality predictions.

Design: Cross-sectional.

Setting: Academic medical center.

Subjects: Patients admitted to five adult ICUs.

Interventions: None.

Measurements and main results: Acute Physiology and Chronic Health Evaluation IV manually entered elements were abstracted from a selection of charts (n = 41) by two clinician "raters" trained in Acute Physiology and Chronic Health Evaluation IV methodology. Rater agreement (%) was determined for each manually entered element, including Acute Physiology and Chronic Health Evaluation diagnosis, Glasgow Coma Scale score, admission source, chronic conditions, elective/emergency surgery, and ventilator use. Cohen's kappa (K) or intraclass correlation coefficient was calculated for nominal and continuous manually entered elements, respectively. The impacts of manually entered element choices on Acute Physiology and Chronic Health Evaluation IV mortality predictions were computed using published Acute Physiology and Chronic Health Evaluation IV equations, and observed to expected hospital mortality ratios were compared between rater groups. The majority of manually entered element inconsistency was due to disagreement in choice of Glasgow Coma Scale (63.8% agreement, 0.83 intraclass correlation coefficient), Acute Physiology and Chronic Health Evaluation diagnosis (68.3% agreement, 0.67 kappa), and admission source (90.2% agreement, 0.85 kappa). The difference in predicted mortality between raters related to Glasgow Coma Scale disagreements was significant (observed to expected mortality ratios for Rater 1 [1.009] vs Rater 2 [1.134]; p < 0.05). Differences related to Acute Physiology and Chronic Health Evaluation diagnosis or admission source disagreements were negligible. The new "unable to score" choice for Glasgow Coma Scale was used for 18% of Glasgow Coma Scale measurements but accounted for 63% of "major" Glasgow Coma Scale disagreements, and 50% of the overall difference in Acute Physiology and Chronic Health Evaluation-predicted mortality between raters.

Conclusions: Inconsistent use among raters of the new "unable to score" choice for Glasgow Coma Scale introduced in Acute Physiology and Chronic Health Evaluation IV was responsible for important decreases in both Glasgow Coma Scale and Acute Physiology and Chronic Health Evaluation IV mortality prediction reliability in our study. A Glasgow Coma Scale algorithm we developed after the study to improve reliability related to use of this new "unable to score" choice is presented.

Keywords: Acute Physiology and Chronic Health Evaluation; Glasgow Coma Scale; hospital mortality; intensive care units; outcome assessment (healthcare); predictive scoring systems; reproducibility of results; statistical models; telemedicine/tele-intensive care unit.