A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects

Genet Epidemiol. 2007 Apr;31(3):238-51. doi: 10.1002/gepi.20205.

Abstract

The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene-gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal "pruned" tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene-environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research.

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Cholelithiasis / genetics*
  • Environment
  • Genetics, Population
  • Humans
  • Linear Models*
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics
  • Regression Analysis