A hybrid likelihood model for sequence-based disease association studies

PLoS Genet. 2013;9(1):e1003224. doi: 10.1371/journal.pgen.1003224. Epub 2013 Jan 24.

Abstract

In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values < 0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Exome
  • Genetic Association Studies*
  • Genome, Human
  • Humans
  • Likelihood Functions
  • Mitogen-Activated Protein Kinase Kinases* / genetics
  • Mitogen-Activated Protein Kinase Kinases* / metabolism
  • Models, Genetic*
  • Models, Theoretical
  • Polymorphism, Single Nucleotide
  • Probability
  • Signal Transduction / genetics*

Substances

  • Mitogen-Activated Protein Kinase Kinases