Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

Genetics. 2016 Dec;204(4):1379-1390. doi: 10.1534/genetics.116.189712. Epub 2016 Oct 21.

Abstract

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

Keywords: mixed models; multivariate analysis; population structure.

MeSH terms

  • Algorithms*
  • Animals
  • Genome-Wide Association Study / methods*
  • Humans
  • Mice
  • Phenotype*
  • Polymorphism, Single Nucleotide
  • Population / genetics
  • Sensitivity and Specificity
  • Yeasts / genetics