A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

Tian-Hao Zhang; Nicholas C Wu; Ren Sun

doi:10.1186/s12864-016-2388-9

A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

BMC Genomics. 2016 Feb 12:17:108. doi: 10.1186/s12864-016-2388-9.

Authors

Tian-Hao Zhang^{1

2

3}, Nicholas C Wu^{4

5

6}, Ren Sun^{7

8}

Affiliations

¹ Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, 90095, CA, USA. [email protected].
² School of Life Science, Fudan University, Shanghai, 200433, China. [email protected].
³ Molecular Biology Institute, University of California, Los Angeles, 90095, CA, USA. [email protected].
⁴ Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, 90095, CA, USA. [email protected].
⁵ Molecular Biology Institute, University of California, Los Angeles, 90095, CA, USA. [email protected].
⁶ Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, 92037, CA, USA. [email protected].
⁷ Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, 90095, CA, USA. [email protected].
⁸ Molecular Biology Institute, University of California, Los Angeles, 90095, CA, USA. [email protected].

Abstract

Background: The high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods.

Result: We comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study.

Conclusion: Our study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity.

A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding