Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression

Brief Bioinform. 2022 Mar 10;23(2):bbab572. doi: 10.1093/bib/bbab572.

Abstract

With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.

Keywords: common controls; conditional logistic regression; matched analysis; population stratification; rare variant association tests.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Computer Simulation
  • Humans
  • Logistic Models
  • Models, Genetic*