AnchorFCI: harnessing genetic anchors for enhanced causal discovery of cardiometabolic disease pathways

Front Genet. 2024 Dec 9:15:1436947. doi: 10.3389/fgene.2024.1436947. eCollection 2024.

Abstract

Introduction: Cardiometabolic diseases, a major global health concern, stem from complex interactions of lifestyle, genetics, and biochemical markers. While extensive research has revealed strong associations between various risk factors and these diseases, latent confounding and limited causal discovery methods hinder understanding of their causal relationships, essential for mechanistic insights and developing effective prevention and intervention strategies.

Methods: We introduce anchorFCI, a novel adaptation of the conservative Really Fast Causal Inference (RFCI) algorithm, designed to enhance robustness and discovery power in causal learning by strategically selecting and integrating reliable anchor variables from a set of variables known not to be caused by the variables of interest. This approach is well-suited for studies of phenotypic, clinical, and sociodemographic data, using genetic variables that are recognized to be unaffected by these factors. We demonstrate the method's effectiveness through simulation studies and a comprehensive causal analysis of the 2015 ISA-Nutrition dataset, featuring both anchorFCI for causal discovery and state-of-the-art effect size identification tools from Judea Pearl's framework, showcasing a robust, fully data-driven causal inference pipeline.

Results: Our simulation studies reveal that anchorFCI effectively enhances robustness and discovery power while handles latent confounding by integrating reliable anchor variables and their non-ancestral relationships. The 2015 ISA-Nutrition dataset analysis not only supports many established causal relationships but also elucidates their interconnections, providing a clearer understanding of the complex dynamics and multifaceted nature of cardiometabolic risk.

Discussion: AnchorFCI holds significant potential for reliable causal discovery in complex, multidimensional datasets. By effectively integrating non-ancestral knowledge and addressing latent confounding, it is well-suited for various applications requiring robust causal inference from observational studies, providing valuable insights in epidemiology, genetics, and public health.

Keywords: RFCI; cardiometabolic risk factors; causal discovery; causal effect identification; explainability; genetic anchors; partial ancestral graphs; unfaithfulness.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. We acknowledge support from the Open Access Publication Fund of the University of Münster. Also, this work has been funded in part by the Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research: Deep Insight 031L0267A) and the LOEWE program of the State of Hesse (Germany) in the research cluster Diffusible Signals (LOEWE/2/13/519/03/06.001 (0002)/74) to DH. This research was also supported by the São Paulo Research Foundation (FAPESP) grant #2022/03420–0 to MC and grant #2023/05857–9 to AC, as well as by São Paulo Municipal Health Department grant #2013–0.235.936–0; FAPESP grant #2017/05125–7; and National Council for Scientific and Technological Development (CNPq grant #402674/2016–2) to RMF. The funding sources were not involved in the study design, data collection, data analysis and interpretation, writing of the report, or the decision to submit the paper for publication.