Unified variable selection in semi-parametric models

Stat Methods Med Res. 2017 Dec;26(6):2821-2831. doi: 10.1177/0962280215610928. Epub 2015 Oct 20.

Abstract

We propose a Bayesian variable selection method in semi-parametric models with applications to genetic and epigenetic data (e.g., single nucleotide polymorphisms and DNA methylation, respectively). The data are individually standardized to reduce heterogeneity and facilitate simultaneous selection of categorical (single nucleotide polymorphisms) and continuous (DNA methylation) variables. The Gaussian reproducing kernel is applied to the transformed data to evaluate joint effect of the variables, which may include complex interactions between, e.g., single nucleotide polymorphisms and DNA methylation. Indicator variables are introduced to the model for the purpose of variable selection. The method is demonstrated and evaluated using simulations under different scenarios. We apply the method to identify informative DNA methylation sites and single nucleotide polymorphisms in a set of genes based on their joint effect on allergic sensitization. The selected single nucleotide polymorphisms and methylation sites have the potential to serve as early markers for allergy prediction, and consequently benefit medical and clinical research to prevent allergy before its manifestation.

Keywords: Bayesian methods; DNA methylation; Gaussian kernel; non-linear effects; reproducing kernel; single nucleotide polymorphisms; transformation; variable selection.

MeSH terms

  • Bayes Theorem
  • Biostatistics / methods*
  • Computer Simulation
  • CpG Islands
  • DNA Methylation
  • Epigenesis, Genetic
  • Humans
  • Hypersensitivity / genetics
  • Models, Genetic*
  • Models, Statistical*
  • Monte Carlo Method
  • Nonlinear Dynamics
  • Normal Distribution
  • Polymorphism, Single Nucleotide