Background: This study aims to develop and examine the performance of machine learning (ML) algorithms in predicting viral suppression among statewide people living with HIV (PWH) in South Carolina (SC).
Methods: Extracted through the electronic reporting system in SC, the study population was adult PWH who were diagnosed between 2005-2021. Viral suppression was defined as viral load <200 copies/ml. The predictors, includingsocio-demographics, a historical information of viral load indicators (e.g., viral rebound), comorbidities, healthcare utilization, and annual county-level factors (e.g., social vulnerability) were measured in each 4-month windows. Using historic information in different lag time windows (1-, 3- or 5-lagged time windows with each 4-month as a unit), both traditional and ML approaches (e.g., Long Short-Term Memory network [LSTM]) were applied to predict viral suppression. Comparisons of prediction performance between different models were assessed by area under curve (AUC), recall, precision, F1 score, and Youden index.
Results: Machine learning approaches outperformed the generalized linear mixed model. In all the three lagged analysis of a total of 15,580 PWH, the LSTM (lag 1: AUC=0.858; lag 3: AUC=0.877; lag 5: AUC=0.881) algorithm outperformed all the other methods in terms of AUC performance for predicting viral suppression. The top-ranking predictors that were common in different models included historical information of viral suppression, viral rebound, and viral blips in the Lag-1 time window. Inclusion of county level variables did not improve the model prediction accuracy.
Conclusion: Supervised machine learning algorithms may offer better performance for risk prediction of viral suppression than traditional statistical methods.
Copyright © 2024 Wolters Kluwer Health, Inc. All rights reserved.