Objective: Vital signs-based models are complicated by repeated measures per patient and frequently missing data. This paper investigated the impacts of common vital signs modeling assumptions during clinical deterioration prediction model development.
Study design and setting: Electronic medical record (EMR) data from five Australian hospitals (1 January 2019-31 December 2020) were used. Summary statistics for each observation's prior vital signs were created. Missing data patterns were investigated using boosted decision trees, then imputed with common methods. Two example models predicting in-hospital mortality were developed, as follows: logistic regression and eXtreme Gradient Boosting. Model discrimination and calibration were assessed using the C-statistic and nonparametric calibration plots.
Results: The data contained 5,620,641 observations from 342,149 admissions. Missing vitals were associated with observation frequency, vital sign variability, and patient consciousness. Summary statistics improved discrimination slightly for logistic regression and markedly for eXtreme Gradient Boosting. Imputation method led to notable differences in model discrimination and calibration. Model calibration was generally poor.
Conclusion: Summary statistics and imputation methods can improve model discrimination and reduce bias during model development, but it is questionable whether these differences are clinically significant. Researchers should consider why data are missing during model development and how this may impact clinical utility.
Keywords: Area under curve; Clinical deterioration; Clinical prediction model; Computerized medical record systems; Early warning score; Missing data.
Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.