Development and Temporal Validation of an Electronic Medical Record-Based Insomnia Prediction Model Using Data from a Statewide Health Information Exchange

J Clin Med. 2023 May 5;12(9):3286. doi: 10.3390/jcm12093286.

Abstract

This study aimed to develop and temporally validate an electronic medical record (EMR)-based insomnia prediction model. In this nested case-control study, we analyzed EMR data from 2011-2018 obtained from a statewide health information exchange. The study sample included 19,843 insomnia cases and 19,843 controls matched by age, sex, and race. Models using different ML techniques were trained to predict insomnia using demographics, diagnosis, and medication order data from two surveillance periods: -1 to -365 days and -180 to -365 days before the first documentation of insomnia. Separate models were also trained with patient data from three time periods (2011-2013, 2011-2015, and 2011-2017). After selecting the best model, predictive performance was evaluated on holdout patients as well as patients from subsequent years to assess the temporal validity of the models. An extreme gradient boosting (XGBoost) model outperformed all other classifiers. XGboost models trained on 2011-2017 data from -1 to -365 and -180 to -365 days before index had AUCs of 0.80 (SD 0.005) and 0.70 (SD 0.006), respectively, on the holdout set. On patients with data from subsequent years, a drop of at most 4% in AUC is observed for all models, even when there is a five-year difference between the collection period of the training and the temporal validation data. The proposed EMR-based prediction models can be used to identify insomnia up to six months before clinical detection. These models may provide an inexpensive, scalable, and longitudinally viable method to screen for individuals at high risk of insomnia.

Keywords: electronic medical records; insomnia; machine learning; sleep; temporal validation.

Grants and funding