Machine learning approaches identify immunologic signatures of total and intact HIV DNA during long-term antiretroviral therapy

bioRxiv [Preprint]. 2024 Jun 25:2023.11.16.567386. doi: 10.1101/2023.11.16.567386.

Abstract

Understanding the interplay between the HIV reservoir and the host immune system may yield insights into HIV persistence during antiretroviral therapy (ART) and inform strategies for a cure. Here, we applied machine learning approaches to cross-sectional high-parameter HIV reservoir and immunology data in order to characterize host-reservoir associations and generate new hypotheses about HIV reservoir biology. High-dimensional immunophenotyping, quantification of HIV-specific T cell responses, and measurement of genetically intact and total HIV proviral DNA frequencies were performed on peripheral blood samples from 115 people with HIV (PWH) on long-term ART. Analysis demonstrated that both intact and total proviral DNA frequencies were positively correlated with T cell activation and exhaustion. Years of ART and select bifunctional HIV-specific CD4 T cell responses were negatively correlated with the percentage of intact proviruses. A Leave-One-Covariate-Out (LOCO) inference approach identified specific HIV reservoir and clinical-demographic parameters, such as age and biological sex, that were particularly important in predicting immunophenotypes. Overall, immune parameters were more strongly associated with total HIV proviral frequencies than intact proviral frequencies. Uniquely, however, expression of the IL-7 receptor alpha chain (CD127) on CD4 T cells was more strongly correlated with the intact reservoir. Unsupervised dimension reduction analysis identified two main clusters of PWH with distinct immune and reservoir characteristics. Using reservoir correlates identified in these initial analyses, decision tree methods were employed to visualize relationships among multiple immune and clinical-demographic parameters and the HIV reservoir. Finally, using random splits of our data as training-test sets, machine learning algorithms predicted with approximately 70% accuracy whether a given participant had qualitatively high or low levels of total or intact HIV DNA. The techniques described here may be useful for assessing global patterns within the increasingly high-dimensional data used in HIV reservoir and other studies of complex biology.

Publication types

  • Preprint