Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment

Anup Kumar Mishra; Bradford Chong; Shivaram P Arunachalam; Ann L Oberg; Shounak Majumder

doi:10.14309/ajg.0000000000002870

Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment

Am J Gastroenterol. 2024 Aug 1;119(8):1466-1482. doi: 10.14309/ajg.0000000000002870. Epub 2024 May 16.

Authors

Anup Kumar Mishra¹, Bradford Chong¹, Shivaram P Arunachalam¹, Ann L Oberg², Shounak Majumder¹

Affiliations

¹ Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA.
² Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.

PMID: 38752654
PMCID: PMC11296923 (available on 2025-08-01)
DOI: 10.14309/ajg.0000000000002870

Abstract

Introduction: Accurate risk prediction can facilitate screening and early detection of pancreatic cancer (PC). We conducted a systematic review to critically evaluate effectiveness of machine learning (ML) and artificial intelligence (AI) techniques applied to electronic health records (EHR) for PC risk prediction.

Methods: Ovid MEDLINE(R), Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, Ovid Cochrane Database of Systematic Reviews, Scopus, and Web of Science were searched for articles that utilized ML/AI techniques to predict PC, published between January 1, 2012, and February 1, 2024. Study selection and data extraction were conducted by 2 independent reviewers. Critical appraisal and data extraction were performed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies checklist. Risk of bias and applicability were examined using prediction model risk of bias assessment tool.

Results: Thirty studies including 169,149 PC cases were identified. Logistic regression was the most frequent modeling method. Twenty studies utilized a curated set of known PC risk predictors or those identified by clinical experts. ML model discrimination performance (C-index) ranged from 0.57 to 1.0. Missing data were underreported, and most studies did not implement explainable-AI techniques or report exclusion time intervals.

Discussion: AI/ML models for PC risk prediction using known risk factors perform reasonably well and may have near-term applications in identifying cohorts for targeted PC screening if validated in real-world data sets. The combined use of structured and unstructured EHR data using emerging AI models while incorporating explainable-AI techniques has the potential to identify novel PC risk factors, and this approach merits further study.

Publication types

Systematic Review

MeSH terms

Early Detection of Cancer / methods
Electronic Health Records*
Humans
Machine Learning*
Pancreatic Neoplasms* / diagnosis
Risk Assessment / methods

Abstract

Publication types

MeSH terms

Grants and funding