Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

Peter G Mikhael; Jeremy Wohlwend; Adam Yala; Ludvig Karstens; Justin Xiang; Angelo K Takigami; Patrick P Bourgouin; PuiYee Chan; Sofiane Mrah; Wael Amayri; Yu-Hsiang Juan; Cheng-Ta Yang; Yung-Liang Wan; Gigin Lin; Lecia V Sequist; Florian J Fintelmann; Regina Barzilay

doi:10.1200/JCO.22.01345

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography

J Clin Oncol. 2023 Apr 20;41(12):2191-2200. doi: 10.1200/JCO.22.01345. Epub 2023 Jan 12.

Authors

Peter G Mikhael^{1

2}, Jeremy Wohlwend^{1

2}, Adam Yala^{1

2}, Ludvig Karstens^{1

2}, Justin Xiang^{1

2}, Angelo K Takigami^{3

4}, Patrick P Bourgouin^{3

4}, PuiYee Chan⁵, Sofiane Mrah⁴, Wael Amayri⁴, Yu-Hsiang Juan^{6

7}, Cheng-Ta Yang^{6

8}, Yung-Liang Wan^{6

7}, Gigin Lin^{6

7}, Lecia V Sequist^{3

5}, Florian J Fintelmann^{3

4}, Regina Barzilay^{1

2}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.
² Jameel Clinic, Massachusetts Institute of Technology, Cambridge, MA.
³ Harvard Medical School, Boston, MA.
⁴ Department of Radiology, Massachusetts General Hospital, Boston, MA.
⁵ Department of Medicine, Massachusetts General Hospital, Boston, MA.
⁶ Chang Gung University, Taoyuan, Taiwan.
⁷ Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
⁸ Department of Thoracic Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan.

Abstract

Purpose: Low-dose computed tomography (LDCT) for lung cancer screening is effective, although most eligible people are not being screened. Tools that provide personalized future cancer risk assessment could focus approaches toward those most likely to benefit. We hypothesized that a deep learning model assessing the entire volumetric LDCT data could be built to predict individual risk without requiring additional demographic or clinical data.

Methods: We developed a model called Sybil using LDCTs from the National Lung Screening Trial (NLST). Sybil requires only one LDCT and does not require clinical data or radiologist annotations; it can run in real time in the background on a radiology reading station. Sybil was validated on three independent data sets: a heldout set of 6,282 LDCTs from NLST participants, 8,821 LDCTs from Massachusetts General Hospital (MGH), and 12,280 LDCTs from Chang Gung Memorial Hospital (CGMH, which included people with a range of smoking history including nonsmokers).

Results: Sybil achieved area under the receiver-operator curves for lung cancer prediction at 1 year of 0.92 (95% CI, 0.88 to 0.95) on NLST, 0.86 (95% CI, 0.82 to 0.90) on MGH, and 0.94 (95% CI, 0.91 to 1.00) on CGMH external validation sets. Concordance indices over 6 years were 0.75 (95% CI, 0.72 to 0.78), 0.81 (95% CI, 0.77 to 0.85), and 0.80 (95% CI, 0.75 to 0.86) for NLST, MGH, and CGMH, respectively.

Conclusion: Sybil can accurately predict an individual's future lung cancer risk from a single LDCT scan to further enable personalized screening. Future study is required to understand Sybil's clinical applications. Our model and annotations are publicly available.

[Media: see text].

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning*
Early Detection of Cancer / methods
Humans
Lung
Lung Neoplasms* / diagnostic imaging
Mass Screening / methods
Tomography, X-Ray Computed