Development of an Electronic Health Record-Based Algorithm for Predicting Lung Cancer Screening Eligibility in the Population-Based Research to Optimize the Screening Process Lung Research Consortium

JCO Clin Cancer Inform. 2023 Sep:7:e2300063. doi: 10.1200/CCI.23.00063.

Abstract

Purpose: Lung cancer screening (LCS) guidelines in the United States recommend LCS for those age 50-80 years with at least 20 pack-years smoking history who currently smoke or quit within the last 15 years. We tested the performance of simple smoking-related criteria derived from electronic health record (EHR) data and developed and tested the performance of a multivariable model in predicting LCS eligibility.

Methods: Analyses were completed within the Population-based Research to Optimize the Screening Process Lung Consortium (PROSPR-Lung). In our primary validity analyses, the reference standard LCS eligibility was based on self-reported smoking data collected via survey. Within one PROSPR-Lung health system, we used a training data set and penalized multivariable logistic regression using the Least Absolute Shrinkage and Selection Operator to select EHR-based variables into the prediction model including demographics, smoking history, diagnoses, and prescription medications. A separate test data set assessed model performance. We also conducted external validation analysis in a separate health system and reported AUC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy metrics associated with the Youden Index.

Results: There were 14,214 individuals with survey data to assess LCS eligibility in primary analyses. The overall performance for assigning LCS eligibility status as measured by the AUC values at the two health systems was 0.940 and 0.938. At the Youden Index cutoff value, performance metrics were as follows: accuracy, 0.855 and 0.895; sensitivity, 0.886 and 0.920; specificity, 0.896 and 0.850; PPV, 0.357 and 0.444; and NPV, 0.988 and 0.992.

Conclusion: Our results suggest that health systems can use an EHR-derived multivariable prediction model to aid in the identification of those who may be eligible for LCS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aged, 80 and over
  • Early Detection of Cancer / methods
  • Electronic Health Records*
  • Humans
  • Lung
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / epidemiology
  • Middle Aged
  • Smoking / adverse effects
  • Smoking / epidemiology