Global analysis of sequence diversity within HIV-1 subtypes across geographic regions

Future Virol. 2012 May;7(5):505-517. doi: 10.2217/fvl.12.37.

Abstract

AIMS: HIV-1 sequence diversity can affect host immune responses and phenotypic characteristics such as antiretroviral drug resistance. Current HIV-1 sequence diversity classification uses phylogeny-based methods to identify subtypes and recombinants, which may overlook distinct subpopulations within subtypes. While local epidemic studies have characterized sequence-level clustering within subtypes using phylogeny, identification of new genotype - phenotype associations are based on mutational correlations at individual sequence positions. We perform a systematic, global analysis of position-specific pol gene sequence variation across geographic regions within HIV-1 subtypes to characterize subpopulation differences that may be missed by standard subtyping methods and sequence-level phylogenetic clustering analyses. MATERIALS #ENTITYSTARTX00026; METHODS: Analysis was performed on a large, globally diverse, cross-sectional pol sequence dataset. Sequences were partitioned into subtypes and geographic subpopulations within subtypes. For each subtype, we identified positions that varied according to geography using VESPA (viral epidemiology signature pattern analysis) to identify sequence signature differences and a likelihood ratio test adjusted for multiple comparisons to characterize differences in amino acid (AA) frequencies, including minority mutations. Synonymous nonsynonymous analysis program (SNAP) was used to explore the role of evolutionary selection witihin subtype C. RESULTS: In 7693 protease (PR) and reverse transcriptase (RT) sequences from untreated patients in multiple geographic regions, 11 PR and 11 RT positions exhibited sequence signature differences within subtypes. Thirty six PR and 80 RT positions exhibited within-subtype geography-dependent differences in AA distributions, including minority mutations, at both conserved and variable loci. Among subtype C samples from India and South Africa, nine PR and nine RT positions had significantly different AA distributions, including one PR and five RT positions that differed in consensus AA between regions. A selection analysis of subtype C using SNAP demonstrated that estimated rates of nonsynonymous and synonymous mutations are consistent with the possibility of positive selection across geographic subpopulations within subtypes. CONCLUSION: We characterized systematic genotypic pol differences across geographic regions within subtypes that are not captured by the subtyping nomenclature. Awareness of such differences may improve the interpretation of future studies determining the phenotypic consequences of genetic backgrounds.