Application of Bayesian networks to GAW20 genetic and blood lipid data

BMC Proc. 2018 Sep 17;12(Suppl 9):19. doi: 10.1186/s12919-018-0116-y. eCollection 2018.

Abstract

Background: Bayesian networks have been proposed as a way to identify possible causal relationships between measured variables based on their conditional dependencies and independencies. We explored the use of Bayesian network analyses applied to the GAW20 data to identify possible causal relationships between differential methylation of cytosine-phosphate-guanine dinucleotides (CpGs), single-nucleotide polymorphisms (SNPs), and blood lipid trait (triglycerides [TGs]).

Methods: After initial exploratory linear regression analyses, 2 Bayesian networks analyses were performed. First, we used the real data and modeled the effects of 4 CpGs previously found to be associated with TGs in the Genetics of Lipid Lowering Drugs and Diet Network Study (GOLDN). Second, we used the simulated data and modeled the effect of a fictional lipid modifying drug with 5 known causal SNPs and 5 corresponding CpGs.

Results: In the real data we show that relationships are present between the CpGs, TGs, and other variables-age, sex, and center. In the simulated data, we show, using linear regression, that no CpGs and only 1 SNP were associated with a change in TG levels, and, using Bayesian network analysis, that relationships are present between the change in TG levels and most SNPs, but not with CpGs.

Conclusions: Even when the causal relationships between variables are known, as with the simulated data, if the relationships are not strong then it is challenging to reproduce them in a Bayesian network.