Background: Effective measures exist to prevent the spread of HIV. However, the identification of patients who are candidates for these measures can be a challenge. A machine learning model to predict risk for HIV may enhance patient selection for proactive outreach.
Setting: Using data from the electronic health record at Parkland Health, 1 of the largest public healthcare systems in the country, a machine learning model is created to predict incident HIV cases. The study cohort includes any patient aged 16 or older from 2015 to 2019 (n = 458,893).
Methods: Implementing a 70:30 ratio random split of the data into training and validation sets with an incident rate <0.08% and stratified by incidence of HIV, the model is evaluated using a k-fold cross-validated (k = 5) area under the receiver operating characteristic curve leveraging Light Gradient Boosting Machine Algorithm, an ensemble classifier.
Results: The light gradient boosting machine produces the strongest predictive power to identify good candidates for HIV PrEP. A gradient boosting classifier produced the best result with an AUC of 0.88 (95% confidence interval: 0.86 to 0.89) on the training set and 0.85 (95% confidence interval: 0.81 to 0.89) on the validation set for a sensitivity of 77.8% and specificity of 75.1%.
Conclusions: A gradient boosting model using electronic health record data can be used to identify patients at risk of acquiring HIV and implemented in the clinical setting to build outreach for preventative interventions.
Copyright © 2024 The Author(s). Published by Wolters Kluwer Health, Inc.