Analysis of case-control association studies with known risk variants

Noah Zaitlen; Bogdan Pasaniuc; Nick Patterson; Samuela Pollack; Benjamin Voight; Leif Groop; David Altshuler; Brian E Henderson; Laurence N Kolonel; Loic Le Marchand; Kevin Waters; Christopher A Haiman; Barbara E Stranger; Emmanouil T Dermitzakis; Peter Kraft; Alkes L Price

doi:10.1093/bioinformatics/bts259

Analysis of case-control association studies with known risk variants

Bioinformatics. 2012 Jul 1;28(13):1729-37. doi: 10.1093/bioinformatics/bts259. Epub 2012 May 3.

Authors

Noah Zaitlen¹, Bogdan Pasaniuc, Nick Patterson, Samuela Pollack, Benjamin Voight, Leif Groop, David Altshuler, Brian E Henderson, Laurence N Kolonel, Loic Le Marchand, Kevin Waters, Christopher A Haiman, Barbara E Stranger, Emmanouil T Dermitzakis, Peter Kraft, Alkes L Price

Affiliation

¹ Department of Epidemiology, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. [email protected]

Abstract

Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature.

Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants.

Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Case-Control Studies
Genetic Association Studies
Genome-Wide Association Study*
Humans
Logistic Models
Models, Statistical
Polymorphism, Single Nucleotide*
Prevalence
Risk
Software

Abstract

Publication types

MeSH terms

Grants and funding