Different features identified by machine learning associated with the HIV compartmentalization in semen

Infect Genet Evol. 2022 Mar:98:105224. doi: 10.1016/j.meegid.2022.105224. Epub 2022 Jan 23.

Abstract

Genetic compartmentalization in semen has been observed in previous studies. However, genetic signatures associated with compartmentalization in semen are only beginning to be explored. A total of 2071 partial HIV env sequences for paired blood and semen specimens were collected from 42 persons with HIV (24 for subtype B, 18 for subtype C). The HIV sequences datasets of subtype B and C were then divided to compartmentalization group and no-compartmentalization group by using the genetic compartmentalization tests. These datasets were used to construct a machine learning (ML) metadataset. AAIndex metrics were adopted as quantitative measures of the biophysicochemical properties of each amino acid. Five algorithm tests were applied, all of which are implemented in the caret package. For Subtype B, the accuracy for the compartmentalization group is 0.87 (range: 0.80-0.92), 0.69 (range: 0.58-0.79) for the no-compartmentlization group. The similar results were also showed in subtype C. The accuracy for the compartmentalization group is 0.74 (range: 0.64-0.83), 0.50 (range: 0.39-0.61) for the no-compartmentlization. The model identified six env features most significant in distinguishing between proviruses in blood and semen in subtype B and C. These features are related to CD4 binding, glycosylation sites and coreceptor selection, which further associated with the viral compartmentalization in semen. In summary, we describe a machine learning model that distinguishes semen-tropic virus based on env sequences and identify six different important features. These ML approach and models can help us better understand the semen-tropic virus phenotype, and therefore its reservoir component, guiding a new study direction toward eradication of the HIV reservoir.

Keywords: HIV-1; Machine learning; Semen; Sequence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • HIV Infections / virology*
  • HIV-1 / isolation & purification*
  • Humans
  • Machine Learning*
  • Semen / virology*
  • Viral Load*