Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease

Nora Verplaetse; Antoine Passemiers; Adam Arany; Yves Moreau; Daniele Raimondi

doi:10.1186/s13059-023-03064-y

Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease

Genome Biol. 2023 Oct 5;24(1):224. doi: 10.1186/s13059-023-03064-y.

Authors

Nora Verplaetse¹, Antoine Passemiers², Adam Arany², Yves Moreau², Daniele Raimondi³

Affiliations

¹ Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium. [email protected].
² Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
³ Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium. [email protected].

Abstract

Background: Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects.

Results: We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case-control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis.

Conclusions: In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.

Keywords: Genome interpretation; Machine learning; Neural networks.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Humans
Inflammatory Bowel Diseases* / genetics
Neural Networks, Computer
Nonlinear Dynamics*
Phenotype
Sample Size

Grants and funding

U54 HG003067/HG/NHGRI NIH HHS/United States