Constrained maximum entropy models to select genotype interactions associated with censored failure times

J Bioinform Comput Biol. 2018 Dec;16(6):1840024. doi: 10.1142/S0219720018400243. Epub 2018 Oct 30.

Abstract

We propose a novel screening method targeting genotype interactions associated with disease risks. The proposed method extends the maximum entropy conditional probability model to address disease occurrences over time. Continuous occurrence times are grouped into intervals. The model estimates the conditional distribution over the disease occurrence intervals given individual genotypes by maximizing the corresponding entropy subject to constraints linking genotype interactions to time intervals. The EM algorithm is employed to handle observations with uncertainty, for which the disease occurrence is censored. Stepwise greedy search is proposed to screen a large number of candidate constraints. The minimum description length is employed to select the optimal set of constraints. Extensive simulations show that five or so quantile-dependent intervals are sufficient to categorize disease outcomes into different risk groups. Performance depends on sample size, number of genotypes, and minor allele frequencies. The proposed method outperforms the likelihood ratio test, Lasso, and a previous maximum entropy method with only binary (disease occurrence, non-occurrence) outcomes. Finally, a GWAS study for type 1 diabetes patients is used to illustrate our method. Novel one-genotype and two-genotype interactions associated with neuropathy are identified.

Keywords: EM algorithm; GWAS; Maximum entropy; censoring; lagrange multiplier; stepwise greedy search.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Diabetes Mellitus, Type 1 / complications
  • Diabetes Mellitus, Type 1 / genetics*
  • Diabetic Nephropathies / genetics*
  • Entropy
  • Female
  • Genetic Predisposition to Disease / genetics*
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Male
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide