Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID

Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):20166-71. doi: 10.1073/pnas.1110064108. Epub 2011 Nov 30.

Abstract

Viruses can create complex genetic populations within a host, and deep sequencing technologies allow extensive sampling of these populations. Limitations of these technologies, however, potentially bias this sampling, particularly when a PCR step precedes the sequencing protocol. Typically, an unknown number of templates are used in initiating the PCR amplification, and this can lead to unrecognized sequence resampling creating apparent homogeneity; also, PCR-mediated recombination can disrupt linkage, and differential amplification can skew allele frequency. Finally, misincorporation of nucleotides during PCR and errors during the sequencing protocol can inflate diversity. We have solved these problems by including a random sequence tag in the initial primer such that each template receives a unique Primer ID. After sequencing, repeated identification of a Primer ID reveals sequence resampling. These resampled sequences are then used to create an accurate consensus sequence for each template, correcting for recombination, allelic skewing, and misincorporation/sequencing errors. The resulting population of consensus sequences directly represents the initial sampled templates. We applied this approach to the HIV-1 protease (pro) gene to view the distribution of sequence variation of a complex viral population within a host. We identified major and minor polymorphisms at coding and noncoding positions. In addition, we observed dynamic genetic changes within the population during intermittent drug exposure, including the emergence of multiple resistant alleles. These results provide an unprecedented view of a complex viral population in the absence of PCR resampling.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Base Sequence
  • Codon / genetics
  • DNA Primers / metabolism*
  • DNA, Complementary / biosynthesis
  • Drug Resistance, Multiple, Viral / drug effects
  • Drug Resistance, Multiple, Viral / genetics
  • Genes, Viral / genetics*
  • Genetic Variation / drug effects
  • HIV Protease / genetics*
  • HIV-1 / drug effects
  • HIV-1 / enzymology
  • HIV-1 / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Linkage Disequilibrium / genetics
  • Molecular Sequence Data
  • Phylogeny
  • Protease Inhibitors / pharmacology
  • RNA, Viral / genetics
  • Templates, Genetic

Substances

  • Codon
  • DNA Primers
  • DNA, Complementary
  • Protease Inhibitors
  • RNA, Viral
  • HIV Protease
  • p16 protease, Human immunodeficiency virus 1