Background: Reducing care lapses for people living with HIV is critical to ending the HIV epidemic and beneficial for their health. Predictive modeling can identify clinical factors associated with HIV care lapses. Previous studies have identified these factors within a single clinic or using a national network of clinics, but public health strategies to improve retention in care in the United States often occur within a regional jurisdiction (eg, a city or county).
Objective: We sought to build predictive models of HIV care lapses using a large, multisite, noncurated database of electronic health records (EHRs) in Chicago, Illinois.
Methods: We used 2011-2019 data from the Chicago Area Patient-Centered Outcomes Research Network (CAPriCORN), a database including multiple health systems, covering the majority of 23,580 people with an HIV diagnosis living in Chicago. CAPriCORN uses a hash-based data deduplication method to follow people across multiple Chicago health care systems with different EHRs, providing a unique citywide view of retention in HIV care. From the database, we used diagnosis codes, medications, laboratory tests, demographics, and encounter information to build predictive models. Our primary outcome was lapses in HIV care, defined as having more than 12 months between subsequent HIV care encounters. We built logistic regression, random forest, elastic net logistic regression, and XGBoost models using all variables and compared their performance to a baseline logistic regression model containing only demographics and retention history.
Results: We included people living with HIV with at least 2 HIV care encounters in the database, yielding 16,930 people living with HIV with 191,492 encounters. All models outperformed the baseline logistic regression model, with the most improvement from the XGBoost model (area under the receiver operating characteristic curve 0.776, 95% CI 0.768-0.784 vs 0.674, 95% CI 0.664-0.683; P<.001). Top predictors included the history of care lapses, being seen by an infectious disease provider (vs a primary care provider), site of care, Hispanic ethnicity, and previous HIV laboratory testing. The random forest model (area under the receiver operating characteristic curve 0.751, 95% CI 0.742-0.759) revealed age, insurance type, and chronic comorbidities (eg, hypertension), as important variables in predicting a care lapse.
Conclusions: We used a real-world approach to leverage the full scope of data available in modern EHRs to predict HIV care lapses. Our findings reinforce previously known factors, such as the history of prior care lapses, while also showing the importance of laboratory testing, chronic comorbidities, sociodemographic characteristics, and clinic-specific factors for predicting care lapses for people living with HIV in Chicago. We provide a framework for others to use data from multiple different health care systems within a single city to examine lapses in care using EHR data, which will aid in jurisdictional efforts to improve retention in HIV care.
Keywords: Chicago; EHR; HIV; HIV care continuum; electronic health record; lapse in care; people living with HIV; predictive model; retention in care.
©Joseph A Mason, Eleanor E Friedman, Samantha A Devlin, John A Schneider, Jessica P Ridgway. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 17.05.2023.