Assessing Different Approaches to Leveraging Historical Smoking Exposure Data to Better Select Lung Cancer Screening Candidates: A Retrospective Validation Study

Daniel J Kats; Yosra Adie; Abdulhakim Tlimat; Peter J Greco; David C Kaelber; Yasir Tarabichi

doi:10.1093/ntr/ntaa192

Assessing Different Approaches to Leveraging Historical Smoking Exposure Data to Better Select Lung Cancer Screening Candidates: A Retrospective Validation Study

Nicotine Tob Res. 2021 Aug 4;23(8):1334-1340. doi: 10.1093/ntr/ntaa192.

Authors

Daniel J Kats^{1

2}, Yosra Adie³, Abdulhakim Tlimat², Peter J Greco^{1

2}, David C Kaelber^{1

2}, Yasir Tarabichi^{2

4}

Affiliations

¹ School of Medicine, Case Western Reserve University, Cleveland, OH.
² Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, OH.
³ Center for Reducing Health Disparities, The MetroHealth System, Cleveland, OH.
⁴ Division of Pulmonary, Critical Care, and Sleep Medicine, The MetroHealth System, Cleveland, OH.

PMID: 32974635
DOI: 10.1093/ntr/ntaa192

Abstract

Introduction: There is mounting interest in the use of risk prediction models to guide lung cancer screening. Electronic health records (EHRs) could facilitate such an approach, but smoking exposure documentation is notoriously inaccurate. While the negative impact of inaccurate EHR data on screening practices reliant on dichotomized age and smoking exposure-based criteria has been demonstrated, less is known regarding its impact on the performance of model-based screening.

Aims and methods: Data were collected from a cohort of 37 422 ever-smokers between the ages of 55 and 74, seen at an academic safety-net healthcare system between 1999 and 2018. The National Lung Cancer Screening Trial (NLST) criteria, PLCOM2012 and LCRAT lung cancer risk prediction models were validated against time to lung cancer diagnosis. Discrimination (area under the receiver operator curve [AUC]) and calibration were assessed. The effect of substituting the last documented smoking variables with differentially retrieved "history conscious" measures was also determined.

Results: The PLCOM2012 and LCRAT models had AUCs of 0.71 (95% CI, 0.69 to 0.73) and 0.72 (95% CI, 0.70 to 0.74), respectively. Compared with the NLST criteria, PLCOM2012 had a significantly greater time-dependent sensitivity (69.9% vs. 64.5%, p < .01) and specificity (58.3% vs. 56.4%, p < .001). Unlike the NLST criteria, the performances of the PLCOM2012 and LCRAT models were not prone to historical variability in smoking exposure documentation.

Conclusions: Despite the inaccuracies of EHR-documented smoking histories, leveraging model-based lung cancer risk estimation may be a reasonable strategy for screening, and is of greater value compared with using NLST criteria in the same setting.

Implications: EHRs are potentially well suited to aid in the risk-based selection of lung cancer screening candidates, but healthcare providers and systems may elect not to leverage EHR data due to prior work that has shown limitations in structured smoking exposure data quality. Our findings suggest that despite potential inaccuracies in the underlying EHR data, screening approaches that use multivariable models may perform significantly better than approaches that rely on simpler age and exposure-based criteria. These results should encourage providers to consider using pre-existing smoking exposure data with a model-based approach to guide lung cancer screening practices.

MeSH terms

Aged
Early Detection of Cancer*
Humans
Lung Neoplasms* / diagnosis
Lung Neoplasms* / epidemiology
Mass Screening
Middle Aged
Retrospective Studies
Risk Assessment
Smoking
Tomography, X-Ray Computed