Finite adaptation and multistep moves in the metropolis-hastings algorithm for variable selection in genome-wide association analysis

PLoS One. 2012;7(11):e49445. doi: 10.1371/journal.pone.0049445. Epub 2012 Nov 15.

Abstract

High-dimensional datasets with large amounts of redundant information are nowadays available for hypothesis-free exploration of scientific questions. A particular case is genome-wide association analysis, where variations in the genome are searched for effects on disease or other traits. Bayesian variable selection has been demonstrated as a possible analysis approach, which can account for the multifactorial nature of the genetic effects in a linear regression model.Yet, the computation presents a challenge and application to large-scale data is not routine. Here, we study aspects of the computation using the Metropolis-Hastings algorithm for the variable selection: finite adaptation of the proposal distributions, multistep moves for changing the inclusion state of multiple variables in a single proposal and multistep move size adaptation. We also experiment with a delayed rejection step for the multistep moves. Results on simulated and real data show increase in the sampling efficiency. We also demonstrate that with application specific proposals, the approach can overcome a specific mixing problem in real data with 3822 individuals and 1,051,811 single nucleotide polymorphisms and uncover a variant pair with synergistic effect on the studied trait. Moreover, we illustrate multimodality in the real dataset related to a restrictive prior distribution on the genetic effect sizes and advocate a more flexible alternative.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Genetic
  • Genetic Variation*
  • Genome-Wide Association Study / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Linear Models
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics

Grants and funding

This work was supported by the Finnish Doctoral Programme in Computational Sciences FICS (http://fics.hiit.fi/; TP); and the Academy of Finland (http://www.aka.fi/; grant 218248 to AV, and Pubgensens project grant 129230 to AV). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.