Objectives: This study aimed to develop a machine learning (ML) model to predict disengagement from HIV care, high viral load or death among people living with HIV (PLHIV) with the goal of enabling proactive support interventions in Tanzania. The algorithm addressed common challenges when applying ML to electronic medical record (EMR) data: (1) imbalanced outcome distribution; (2) heterogeneity across multisite EMR data and (3) evolving virological suppression thresholds.
Design: Observational study using a national EMR database.
Setting: Conducted in two regions in Tanzania, using data from the National HIV Care database.
Participants: The study included over 6 million HIV care visit records from 295 961 PLHIV in two regions in Tanzania's National HIV Care database from January 2015 to May 2023.
Results: Our ML model effectively identified PLHIV at increased risk of adverse outcomes. Key predictors included past disengagement from care, antiretroviral therapy (ART) status (which tracks a patient's engagement with ART across visits), age and time on ART. The downsampling approach we implemented effectively managed imbalanced data to reduce prediction bias. Site-specific algorithms performed better compared with a universal approach, highlighting the importance of tailoring ML models to local contexts. A sensitivity analysis confirmed the model's robustness to changes in viral load suppression thresholds.
Conclusions: ML models leveraging large-scale databases of patient data offer significant potential to identify PLHIV for interventions to enhance engagement in HIV care in resource-limited settings. Tailoring algorithms to local contexts and flexibility towards evolving clinical guidelines are essential for maximising their impact.
Keywords: HIV & AIDS; electronic health records; machine learning.
© Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.