Permutation testing in the presence of polygenic variation

Genet Epidemiol. 2015 May;39(4):249-58. doi: 10.1002/gepi.21893. Epub 2015 Mar 10.

Abstract

This article discusses problems with and solutions to performing valid permutation tests for quantitative trait loci in the presence of polygenic effects. Although permutation testing is a popular approach for determining statistical significance of a test statistic with an unknown distribution--for instance, the maximum of multiple correlated statistics or some omnibus test statistic for a gene, gene-set, or pathway--naive application of permutations may result in an invalid test. The risk of performing an invalid permutation test is particularly acute in complex trait mapping where polygenicity may combine with a structured population resulting from the presence of families, cryptic relatedness, admixture, or population stratification. I give both analytical derivations and a conceptual understanding of why typical permutation procedures fail and suggest an alternative permutation-based algorithm, MVNpermute, that succeeds. In particular, I examine the case where a linear mixed model is used to analyze a quantitative trait and show that both phenotype and genotype permutations may result in an invalid permutation test. I provide a formula that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait. I validate this formula by doing simulations, showing that the permutation distribution matches the theoretical expectation, and that my suggested permutation-based test obtains the correct null distribution. Finally, I discuss situations where naive permutations of the phenotype or genotype are valid and the applicability of the results to other test statistics.

Keywords: QTL; family studies; permutation test; polygenic effect; population structure; type I error rate.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Genotype
  • Humans
  • Linear Models
  • Models, Genetic*
  • Multifactorial Inheritance / genetics*
  • Phenotype
  • Probability
  • Quantitative Trait Loci*
  • Software*