Low-, high-coverage, and two-stage DNA sequencing in the design of the genetic association study

Genet Epidemiol. 2017 Apr;41(3):187-197. doi: 10.1002/gepi.22015. Epub 2016 Nov 4.

Abstract

Next-generation sequencing-based genetic association study (GAS) is a powerful tool to identify candidate disease variants and genomic regions. Although low-coverage sequencing offers low cost but inadequacy in calling rare variants, high coverage is able to detect essentially every variant but at a high cost. Two-stage sequencing may be an economical way to conduct GAS without losing power. In two-stage sequencing, an affordable number of samples are sequenced at high coverage as the reference panel, then to impute in a larger sample is sequenced at low coverage. As unit sequencing costs continue to decrease, investigators can now conduct GAS with more flexible sequencing depths. Here, we systematically evaluate the effect of the read depth and sample size on the variant discovery power and association power for study designs using low-coverage, high-coverage, and two-stage sequencing. We consider 12 low-coverage, 12 high-coverage, and 51 two-stage design scenarios with the read depth varying from 0.5× to 80×. With state-of-the-art simulation and analysis packages and in-house scripts, we simulate the complete study process from DNA sequencing to SNP (single nucleotide polymorphism) calling and association testing. Our results show that with appropriate allocation of sequencing effort, two-stage sequencing is an effective approach for conducting GAS. We provide practical guidelines for investigators to plan the optimum sequencing-based GAS including two-stage sequencing design given their specific constraints of sequencing investment.

Keywords: next-generation sequencing; rare variant; sequencing cost; study design.

MeSH terms

  • Genome, Human*
  • Genome-Wide Association Study / methods*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / standards*
  • Humans
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Research Design*
  • Sample Size