Using the Electronic Health Record to Develop a Gastric Cancer Risk Prediction Model

Gastro Hep Adv. 2024 Jul 14;3(7):910-916. doi: 10.1016/j.gastha.2024.07.001. eCollection 2024.

Abstract

Background and aims: Gastric cancer (GC) is a leading cause of cancer incidence and mortality globally. Population screening is limited by the low incidence and prevalence of GC in the United States. A risk prediction algorithm to identify high-risk patients allows for targeted GC screening. We aimed to determine the feasibility and performance of a logistic regression model based on electronic health records to identify individuals at high risk for noncardia gastric cancer (NCGC).

Methods: We included 614 patients who had a diagnosis of NCGC between ages 40 and 80 years and who were seen at our large tertiary medical center in multiple states between 2010 and 2021. Controls without a diagnosis of NCGC were randomly selected in a 1:10 ratio of cases to controls. Multiple imputation by chained equations for missing data followed by logistic regression on imputed datasets was used to estimate the probability of NCGC. Area under the curve and the 0.632 estimator was used as the estimate for discrimination.

Results: The 0.632 estimator value was 0.731, indicating robust model performance. Probability of NCGC was higher with increasing age (odds ratio [OR] = 1.16, 95% confidence interval [CI]: 1.04-1.3), male sex (OR = 1.97; 95% CI: 1.64-2.36), Black (OR = 3.07; 95% CI: 2.46-3.83) or Asian race (OR = 4.39; 95% CI: 2.60-7.42), tobacco use (OR = 1.61; 95% CI: 1.34-1.94), anemia (OR = 1.35; 95% CI: 1.09-1.68), and pernicious anemia (OR = 6.12, 95% CI: 3.42-10.95).

Conclusion: We demonstrate the feasibility and good performance of an electronic health record-based logistic regression model for estimating the probability of NCGC. Future studies will refine and validate this model, ultimately identifying a high-risk cohort who could be eligible for NCGC screening.

Keywords: Cancer Disparity; Electronic Health Record; High-Risk Individuals; Logistic Regression Model; Noncardia Gastric Cancer; Screening.