Electronic medical record (EMR) data present many opportunities for population health research. The use of EMR data for population risk models can be impeded by the high proportion of missingness in key patient variables. Common approaches like complete case analysis and multiple imputation may not be appropriate for some population health initiatives that require a single, complete analytic data set. In this study, we demonstrate a sequential hot-deck imputation (HDI) procedure to address missingness in a set of cardiometabolic measures in an EMR data set. We assessed the performance of sequential HDI within the individual variables and a commonly used composite risk score. A data set of cardiometabolic measures based on EMR data from 2 large urban hospitals was used to create a benchmark data set with simulated missingness. Sequential HDI was applied, and the resulting data were used to calculate atherosclerotic cardiovascular disease risk scores. The performance of the imputation approach was assessed using a set of metrics to evaluate the distribution and validity of the imputed data. Of the 567,841 patients, 65% had at least 1 missing cardiometabolic measure. Sequential HDI resulted in the distribution of variables and risk scores that reflected those in the simulated data while retaining correlation. When stratified by age and sex, risk scores were plausible and captured patterns expected in the general population. The use of sequential HDI was shown to be a suitable approach to multivariate missingness in EMR data. Sequential HDI could benefit population health research by providing a straightforward, computationally nonintensive approach to missing EMR data that results in a single analytic data set.
Copyright © 2024 Wolters Kluwer Health, Inc. All rights reserved.