Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment

Am J Gastroenterol. 2024 Aug 1;119(8):1466-1482. doi: 10.14309/ajg.0000000000002870. Epub 2024 May 16.

Abstract

Introduction: Accurate risk prediction can facilitate screening and early detection of pancreatic cancer (PC). We conducted a systematic review to critically evaluate effectiveness of machine learning (ML) and artificial intelligence (AI) techniques applied to electronic health records (EHR) for PC risk prediction.

Methods: Ovid MEDLINE(R), Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, Ovid Cochrane Database of Systematic Reviews, Scopus, and Web of Science were searched for articles that utilized ML/AI techniques to predict PC, published between January 1, 2012, and February 1, 2024. Study selection and data extraction were conducted by 2 independent reviewers. Critical appraisal and data extraction were performed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies checklist. Risk of bias and applicability were examined using prediction model risk of bias assessment tool.

Results: Thirty studies including 169,149 PC cases were identified. Logistic regression was the most frequent modeling method. Twenty studies utilized a curated set of known PC risk predictors or those identified by clinical experts. ML model discrimination performance (C-index) ranged from 0.57 to 1.0. Missing data were underreported, and most studies did not implement explainable-AI techniques or report exclusion time intervals.

Discussion: AI/ML models for PC risk prediction using known risk factors perform reasonably well and may have near-term applications in identifying cohorts for targeted PC screening if validated in real-world data sets. The combined use of structured and unstructured EHR data using emerging AI models while incorporating explainable-AI techniques has the potential to identify novel PC risk factors, and this approach merits further study.

Publication types

  • Systematic Review

MeSH terms

  • Early Detection of Cancer / methods
  • Electronic Health Records*
  • Humans
  • Machine Learning*
  • Pancreatic Neoplasms* / diagnosis
  • Risk Assessment / methods