Predicting Incident Adenocarcinoma of the Esophagus or Gastric Cardia Using Machine Learning of Electronic Health Records

Gastroenterology. 2023 Dec;165(6):1420-1429.e10. doi: 10.1053/j.gastro.2023.08.011. Epub 2023 Aug 18.

Abstract

Background & aims: Tools that can automatically predict incident esophageal adenocarcinoma (EAC) and gastric cardia adenocarcinoma (GCA) using electronic health records to guide screening decisions are needed.

Methods: The Veterans Health Administration (VHA) Corporate Data Warehouse was accessed to identify Veterans with 1 or more encounters between 2005 and 2018. Patients diagnosed with EAC (n = 8430) or GCA (n = 2965) were identified in the VHA Central Cancer Registry and compared with 10,256,887 controls. Predictors included demographic characteristics, prescriptions, laboratory results, and diagnoses between 1 and 5 years before the index date. The Kettles Esophageal and Cardia Adenocarcinoma predictioN (K-ECAN) tool was developed and internally validated using simple random sampling imputation and extreme gradient boosting, a machine learning method. Training was performed in 50% of the data, preliminary validation in 25% of the data, and final testing in 25% of the data.

Results: K-ECAN was well-calibrated and had better discrimination (area under the receiver operating characteristic curve [AuROC], 0.77) than previously validated models, such as the Nord-Trøndelag Health Study (AuROC, 0.68) and Kunzmann model (AuROC, 0.64), or published guidelines. Using only data from between 3 and 5 years before index diminished its accuracy slightly (AuROC, 0.75). Undersampling men to simulate a non-VHA population, AUCs of the Nord-Trøndelag Health Study and Kunzmann model improved, but K-ECAN was still the most accurate (AuROC, 0.85). Although gastroesophageal reflux disease was strongly associated with EAC, it contributed only a small proportion of gain in information for prediction.

Conclusions: K-ECAN is a novel, internally validated tool predicting incident EAC and GCA using electronic health records data. Further work is needed to validate K-ECAN outside VHA and to assess how best to implement it within electronic health records.

Keywords: Electronic Health Records; Esophageal Neoplasms; Gastroesophageal Reflux Disease; Mass Screening; Stomach Neoplasms.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adenocarcinoma* / diagnosis
  • Adenocarcinoma* / epidemiology
  • Adenocarcinoma* / pathology
  • Cardia / pathology
  • Electronic Health Records
  • Esophageal Neoplasms* / diagnosis
  • Esophageal Neoplasms* / epidemiology
  • Esophageal Neoplasms* / pathology
  • Esophagus
  • Humans
  • Machine Learning
  • Male
  • Stomach Neoplasms* / diagnosis
  • Stomach Neoplasms* / epidemiology
  • Stomach Neoplasms* / pathology

Supplementary concepts

  • Adenocarcinoma Of Esophagus