Rare variant association testing by adaptive combination of P-values

PLoS One. 2014 Jan 15;9(1):e85728. doi: 10.1371/journal.pone.0085728. eCollection 2014.

Abstract

With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the [Formula: see text]-MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675-685) and propose an approach (named 'adaptive combination of P-values for rare variant association testing', abbreviated as 'ADA') that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Computer Simulation
  • Data Interpretation, Statistical
  • Gene Frequency
  • Genetic Association Studies / methods*
  • Heart Diseases / genetics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Linear Models
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA*
  • Software