Objective: To develop a machine learning framework to forecast emergency department (ED) crowding and to evaluate model performance under spatial and temporal data drift.
Materials and methods: We obtained 4 datasets, identified by the location: 1-large academic hospital and 2-rural hospital, and time period: pre-coronavirus disease (COVID) (January 1, 2019-February 1, 2020) and COVID-era (May 15, 2020-February 1, 2021). Our primary target was a binary outcome that is equal to 1 if the number of patients with acute respiratory illness that were ED boarding for more than 4 h was above a prescribed historical percentile. We trained a random forest and used the area under the curve (AUC) to evaluate out-of-sample performance for 2 experiments: (1) we evaluated the impact of sudden temporal drift by training models using pre-COVID data and testing them during the COVID-era, (2) we evaluated the impact of spatial drift by testing models trained at location 1 on data from location 2, and vice versa.
Results: The baseline AUC values for ED boarding ranged from 0.54 (pre-COVID at location 2) to 0.81 (COVID-era at location 1). Models trained with pre-COVID data performed similarly to COVID-era models (0.82 vs 0.78 at location 1). Models that were transferred from location 2 to location 1 performed worse than models trained at location 1 (0.51 vs 0.78).
Discussion and conclusion: Our results demonstrate that ED boarding is a predictable metric for ED crowding, models were not significantly impacted by temporal data drift, and any attempts at implementation must consider spatial data drift.
Keywords: COVID-19; data drift; emergency department boarding; emergency medicine; machine learning.
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: [email protected].