Predictive analytics can be used to identify people with HIV currently retained in care who are at risk for future disengagement from care, allowing for prioritization of retention interventions. We utilized machine learning methods to develop predictive models of retention in care, defined as no more than a 12 month gap between HIV care appointments in the Center for AIDS Research Network of Integrated Clinical Systems (CNICS) cohort. Data were split longitudinally into derivation and validation cohorts. We created logistic regression (LR), random forest (RF), and gradient boosted machine (XGB) models within a discrete-time survival analysis framework and compared their performance to a baseline model that included only demographics, viral suppression, and retention history. 21,267 Patients with 507,687 visits from 2007 to 2018 were included. The LR model outperformed the baseline model (AUC 0.68 [0.67-0.70] vs. 0.60 [0.59-0.62], P < 0.001). RF and XGB models had similar performance to the LR model. Top features in the LR model included retention history, age, and viral suppression.
Keywords: Machine learning; Predictive analytics; Retention in care.
© 2022. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.