On optimal pooling designs to identify rare variants through massive resequencing

Genet Epidemiol. 2011 Apr;35(3):139-47. doi: 10.1002/gepi.20561. Epub 2011 Jan 19.

Abstract

The advent of next-generation sequencing technologies has facilitated the detection of rare variants. Despite the significant cost reduction, sequencing cost is still high for large-scale studies. In this article, we examine DNA pooling as a cost-effective strategy for rare variant detection. We consider the optimal number of individuals in a DNA pool to detect an allele with a specific minor allele frequency (MAF) under a given coverage depth and detection threshold. We found that the optimal number of individuals in a pool is indifferent to the MAF at the same coverage depth and detection threshold. In addition, when the individual contributions to each pool are equal, the total number of individuals across different pools required in an optimal design to detect a variant with a desired power is similar at different coverage depths. When the contributions are more variable, more individuals tend to be needed for higher coverage depths. Our study provides general guidelines on using DNA pooling for more cost-effective identifications of rare variants.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Disease / genetics
  • Gene Frequency
  • Genetic Variation*
  • Genome-Wide Association Study / statistics & numerical data
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Models, Statistical
  • Molecular Epidemiology / statistics & numerical data
  • Polymorphism, Single Nucleotide
  • Probability
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data