Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease

Genome Biol. 2023 Oct 5;24(1):224. doi: 10.1186/s13059-023-03064-y.

Abstract

Background: Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects.

Results: We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case-control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis.

Conclusions: In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.

Keywords: Genome interpretation; Machine learning; Neural networks.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Inflammatory Bowel Diseases* / genetics
  • Neural Networks, Computer
  • Nonlinear Dynamics*
  • Phenotype
  • Sample Size