Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Alzheimers Dement. 2024 Dec;20(12):8700-8714. doi: 10.1002/alz.14319. Epub 2024 Nov 7.

Abstract

Introduction: Alzheimer's disease (AD) is the most common form of dementia in the elderly. Given that AD neuropathology begins decades before symptoms, there is a dire need for effective screening tools for early detection of AD to facilitate early intervention.

Methods: Here, we used tree-based and deep learning methods to train polyomic prediction models for AD affection status and age at onset, employing genomic, proteomic, metabolomic, and drug use data from UK Biobank. We used SHAP to determine the feature's importance.

Results: Our best-performing polyomic model achieved an area under the receiver operating characteristics curve (AUROC) of 0.87. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides apolipoprotein E (APOE) alleles. Increasing the number of cases by including "AD-by-proxy" cases did not improve AD prediction.

Discussion: Among the four modalities, genomics, and proteomics were the most informative modality based on AUROC (area under the receiver operating characteristic curve). Our data suggest that two blood-based biomarkers (glial fibrillary acidic protein [GFAP] and CXCL17) may be effective for early presymptomatic prediction of AD.

Highlights: We developed a polyomic model to predict AD and age-at-onset using omics and medication use data from EHR. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides APOE alleles. "AD-by-proxy" cases, if used in training, do not improve AD prediction. Proteomics was the most informative modality overall for affection status and AAO prediction.

Keywords: Alzheimer's disease; machine learning; omics; polyomic model; prediction.

MeSH terms

  • Age of Onset
  • Aged
  • Alzheimer Disease* / diagnosis
  • Alzheimer Disease* / genetics
  • Apolipoproteins E / genetics
  • Biomarkers*
  • Chemokines, CXC / genetics
  • Deep Learning
  • Female
  • Genomics
  • Glial Fibrillary Acidic Protein
  • Humans
  • Machine Learning*
  • Male
  • Proteomics
  • ROC Curve

Substances

  • Biomarkers
  • Glial Fibrillary Acidic Protein
  • Apolipoproteins E
  • Chemokines, CXC
  • GFAP protein, human