Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis

Epidemiology. 2016 Nov;27(6):870-8. doi: 10.1097/EDE.0000000000000548.

Abstract

There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Atherosclerosis / etiology
  • Bayes Theorem
  • Cluster Analysis*
  • Data Interpretation, Statistical*
  • Environmental Exposure / adverse effects
  • Epidemiologic Research Design*
  • Female
  • Follow-Up Studies
  • Gene-Environment Interaction*
  • Genetic Predisposition to Disease
  • Humans
  • Middle Aged
  • Models, Statistical*
  • Regression Analysis
  • Risk Factors