Coronary artery disease is one of the most common types of cardiovascular disease. Death from coronary heart disease is influenced by genetic factors in both women and men. In this article, we propose a novel Bayesian variable selection framework for the identification of important genetic variants associated with coronary artery disease disease status. Instead of treating each feature independently as in conventional Bayesian variable selection methods, we propose an innovative prior for the inclusion probabilities of genetic variants that accounts for their ordering structure. We assume that neighboring variants are more likely to be selected together as they tend to be highly correlated and have similar biological functions. Additionally, we propose to group participating subjects based on underlying population structure and fit separate regressions, so that the regression coefficients can better reflect different disease risks in different population groups. Our approach borrows strength across regression models through an innovative prior inspired by the Markov random fields. The proposed framework can improve variable selection and prediction performances as demonstrated in the simulation studies. We also apply the proposed framework to the CATHeterization GENetics data with binary Coronary artery disease disease status.
Keywords: Bayesian variable selection; CATHGEN; Coronary artery disease; Markov random fields prior.