Objectives: To validate and compare the performance of the Brock model and Lung CT Screening Reporting and Data System (Lung-RADS) on nodules detected by baseline CT screening.
Methods: We performed a secondary analysis of the Korean Lung Cancer Screening Project (K-LUCAS; ClinicalTrials.gov , NCT03394703), a nationwide, multicenter, prospective cohort study. From April 2017 to December 2018, low-dose CT screening was performed on high-risk subjects. Discrimination and calibration of Brock models 2a and 2b (i.e., full model without and with spiculation, respectively) were assessed, and discrimination was compared with that of Lung-RADS, which utilized subjective assessment categories 2b (b stands for benign) and 4X.
Results: Of the 13,150 subjects, 4578 were eligible (median age 62 years; 4458 men; 9929 nodules including 40 lung cancers). Areas under the receiver operating characteristic curve were 0.96 (IQR 0.92-0.99) for Brock model 2a, 0.96 (IQR 0.92-0.99) for Brock model 2b, and 0.95 (IQR 0.91-0.99) for Lung-RADS (p = 0.32 and p = 0.34, respectively). At an equivalent cutoff of 5%, Brock model 2b (sensitivity 87.5% [35/40]; specificity 93.6% [9259/9889]; positive predictive value [PPV] 5.3% [35/665]; negative predictive value [NPV] 99.9% [9259/9264]) and Lung-RADS (sensitivity 87.5% [35/40]; specificity 93.3% [9222/9889]; PPV 5.0% [35/702]; NPV 99.9% [9222/9227]) performed similarly well (all p > 0.05). The calibration performance of both Brock models 2a and 2b was poor (both p < 0.001).
Conclusions: Lung-RADS, when reinforced with visual assessment-based categories, has a similar diagnostic performance to the Brock model for baseline CT scans.
Key points: • Brock model 2b and Lung CT Screening Reporting and Data System (Lung-RADS) demonstrated a similar discrimination performance for lung cancer in the baseline CT screening (areas under the receiver operating characteristic curve 0.96 vs. 0.95; p = 0.34). • When visual assessment-based categories were removed from Lung-RADS, specificity and positive predictive value were lower than those of Brock model 2b (p = 0.001 and p = 0.02, respectively). • The Brock model showed poor calibration (p < 0.001).
Keywords: Diagnostic screening programs; Early detection of cancer; Lung neoplasms; Multidetector computed tomography; Statistical models.