Understanding genetic variation in human immunodeficiency virus (HIV) is clinically and immunologically important for patient treatment and vaccine development. We investigated the longitudinal intra-host genetic variation of HIV in over 3,000 individuals in the US National HIV Surveillance System with at least four reported HIV-1 polymerase (pol) sequences. In this population, we identified 149 putative instances of superinfection (i.e. an individual sequentially infected with genetically divergent, polyphyletic viruses). Unexpectedly, we discovered a group of 240 individuals with consecutively sampled viral strains that were >0.015 substitutions/site divergent, despite remaining monophyletic in the phylogeny. Viruses in some of these individuals had a maximum genetic divergence approaching that found between two random, unrelated HIV-1 subtype-B pol sequences within the US population. Individuals with these highly divergent viruses tended to be diagnosed nearly a decade earlier in the epidemic than people with superinfection or virus with less intra-host genetic variation, and they had distinct transmission risk factor profiles. To better understand this genetic variation in cases with extremely divergent, monophyletic viruses, we performed molecular clock phylogenetic analysis. Our findings suggest that, like Hepatitis C virus, extremely divergent HIV lineages can be maintained within an individual and reemerge over a period of years.
Keywords: HIV; genetic variation; molecular evolution; superinfection.