Evaluation of sequence ambiguities of the HIV-1 pol gene as a method to identify recent HIV-1 infection in transmitted drug resistance surveys

Infect Genet Evol. 2013 Aug:18:125-31. doi: 10.1016/j.meegid.2013.03.050. Epub 2013 Apr 11.

Abstract

Identification of recent HIV infection within populations is a public health priority for accurate estimation of HIV incidence rates and transmitted drug resistance at population level. Determining HIV incidence rates by prospective follow-up of HIV-uninfected individuals is challenging and serological assays have important limitations. HIV diversity within an infected host increases with duration of infection. We explore a simple bioinformatics approach to assess viral diversity by determining the percentage of ambiguous base calls in sequences derived from standard genotyping of HIV-1 protease and reverse transcriptase. Sequences from 691 recently infected (≤1 year) and chronically infected (>1 year) individuals from Sweden, Vietnam and Ethiopia were analyzed for ambiguity. A significant difference (p<0.0001) in the proportion of ambiguous bases was observed between sequences from individuals with recent and chronic infection in both HIV-1 subtype B and non-B infection, consistent with previous studies. In our analysis, a cutoff of <0.47% ambiguous base calls identified recent infection with a sensitivity and specificity of 88.8% and 74.6% respectively. 1,728 protease and reverse transcriptase sequences from 36 surveys of transmitted HIV drug resistance performed following World Health Organization guidance were analyzed for ambiguity. The 0.47% ambiguity cutoff was applied and survey sequences were classified as likely derived from recently or chronically infected individuals. 71% of patients were classified as likely to have been infected within one year of genotyping but results varied considerably amongst surveys. This bioinformatics approach may provide supporting population-level information to identify recent infection but its application is limited by infection with more than one viral variant, decreasing viral diversity in advanced disease and technical aspects of population based sequencing. Standardization of sequencing techniques and base calling and the addition of other parameters such as CD4 cell count may address some of the technical limitations and increase the usefulness of the approach.

Keywords: Ambiguity; Bioinformatics; HIV; Incidence; Resistance; Viral diversity.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chronic Disease
  • Databases, Genetic
  • Drug Resistance, Viral
  • Female
  • Genes, pol*
  • HIV Infections / classification
  • HIV Infections / epidemiology
  • HIV Infections / transmission
  • HIV Infections / virology*
  • HIV-1 / genetics*
  • Humans
  • Male
  • Sequence Alignment
  • Sequence Analysis, RNA