"Deep" sequencing accuracy and reproducibility using Roche/454 technology for inferring co-receptor usage in HIV-1

David J H F Knapp; Rachel A McGovern; Art F Y Poon; Xiaoyin Zhong; Dennison Chan; Luke C Swenson; Winnie Dong; P Richard Harrigan

doi:10.1371/journal.pone.0099508

"Deep" sequencing accuracy and reproducibility using Roche/454 technology for inferring co-receptor usage in HIV-1

PLoS One. 2014 Jun 24;9(6):e99508. doi: 10.1371/journal.pone.0099508. eCollection 2014.

Authors

David J H F Knapp¹, Rachel A McGovern¹, Art F Y Poon¹, Xiaoyin Zhong¹, Dennison Chan¹, Luke C Swenson¹, Winnie Dong¹, P Richard Harrigan²

Affiliations

¹ BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada.
² BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada; Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada.

Abstract

Next generation, "deep", sequencing has increasing applications both clinically and in disparate fields of research. This study investigates the accuracy and reproducibility of "deep" sequencing as applied to co-receptor prediction using the V3 loop of Human Immunodeficiency Virus-1. Despite increasing use in HIV co-receptor prediction, the accuracy and reproducibility of deep sequencing technology, and the factors which can affect it, have received only a limited level of investigation. To accomplish this, repeated deep sequencing results were generated using the Roche GS-FLX (454) from a number of sources including a non-homogeneous clinical sample (N = 47 replicates over 18 deep sequencing runs), and a large clinical cohort from the MOTIVATE and A400129 studies (N = 1521). For repeated measurements of a non-homogeneous clinical sample, increasing input copy number both decreased variance in the measured proportion of non-R5 using virus (p<<0.001 and 0.02 for single replicates and triplicates respectively) and increased measured viral diversity (p<0.001; multiple measures). Detection of sequences with a mean abundance less than 1% abundance showed a 2 fold increase in median coefficient of variation (CV) in repeated measurements of a non-homogeneous clinical sample, and a 2.7 fold increase in CV in the MOTIVATE/A400129 dataset compared to sequences with ≥1% abundance. An unexpected source of error included read position, with low accuracy reads occurring more frequently towards the edge of sequencing regions (p<<0.001). Overall, the primary source of variability was sampling error caused by low input copy number/minority species prevalence, though other sources of error including sequence intrinsic, temporal, and read-position related errors were detected.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

HIV Infections / virology*
HIV-1 / genetics*
High-Throughput Nucleotide Sequencing / methods*
Humans
Reproducibility of Results
Sequence Analysis, RNA / methods*

Grants and funding

200802HFE/Canadian Institutes of Health Research/Canada