Aim: To use electronic health record (EHR) data to develop a scalable and transferrable model to predict 6-month risk for diabetic ketoacidosis (DKA)-related hospitalization or emergency care in youth with type 1 diabetes (T1D). Method: To achieve a sharable predictive model, we engineered features using EHR data mapped to the T1D Exchange Quality Improvement Collaborative's (T1DX-QI) data schema used by 60+ U.S. diabetes centers and chose a compact set of 15 features (e.g., demographics, factors related to diabetes management, etc.) to yield "explainable AI" predictions for DKA risk on a 6-month horizon. We used an ensemble of gradient-boosted, tree-based models trained on data collected from September 1, 2017 to November 1, 2022 (3097 unique patients; 24,638 clinical encounters) from a tertiary care pediatric diabetes clinic network in the Midwest USA. Results: We rank-ordered the top 10, 25, 50, and 100 highest-risk youth in an out-of-sample testing set, which yielded an average precision of 0.96, 0.81, 0.75, and 0.70, respectively. The lift of the model (relative to random selection) for the top 100 individuals is 19. The model identified average time between DKA episodes, time since the last DKA episode, and T1D duration as the top three features for predicting DKA risk. Conclusions: Our DKA risk model effectively predicts youths' relative risk of experiencing hospitalization for DKA and is readily deployable to other diabetes centers that map diabetes data to the T1DX-QI schema. This model may facilitate the development of targeted interventions for youths at the highest risk for DKA. Future work will add novel features such as device data, social determinants of health, and diabetes self-management behaviors.
Keywords: artificial intelligence; diabetic ketoacidosis; machine learning; population health; type 1 diabetes.