Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning

Jordan Wong; Allan Fong; Nevin McVicar; Sally Smith; Joshua Giambattista; Derek Wells; Carter Kolbeck; Jonathan Giambattista; Lovedeep Gondara; Abraham Alexander

doi:10.1016/j.radonc.2019.10.019

Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning

Radiother Oncol. 2020 Mar:144:152-158. doi: 10.1016/j.radonc.2019.10.019. Epub 2019 Dec 5.

Authors

Affiliations

¹ BC Cancer - Vancouver Center, Canada. Electronic address: [email protected].
² BC Cancer - Vancouver Center, Canada. Electronic address: [email protected].
³ BC Cancer - Vancouver Center, Canada. Electronic address: [email protected].
⁴ BC Cancer - Victoria Center, Canada. Electronic address: [email protected].
⁵ Saskatchewan Cancer Agency, Regina, Canada; Limbus AI Inc., Regina, Canada. Electronic address: [email protected].
⁶ BC Cancer - Victoria Center, Canada. Electronic address: [email protected].
⁷ Limbus AI Inc., Regina, Canada. Electronic address: [email protected].
⁸ Limbus AI Inc., Regina, Canada. Electronic address: [email protected].
⁹ BC Cancer - Vancouver Center, Canada. Electronic address: [email protected].
¹⁰ BC Cancer - Victoria Center, Canada. Electronic address: [email protected].

PMID: 31812930
DOI: 10.1016/j.radonc.2019.10.019

Abstract

Background: Deep learning-based auto-segmented contours (DC) aim to alleviate labour intensive contouring of organs at risk (OAR) and clinical target volumes (CTV). Most previous DC validation studies have a limited number of expert observers for comparison and/or use a validation dataset related to the training dataset. We determine if DC models are comparable to Radiation Oncologist (RO) inter-observer variability on an independent dataset.

Methods: Expert contours (EC) were created by multiple ROs for central nervous system (CNS), head and neck (H&N), and prostate radiotherapy (RT) OARs and CTVs. DCs were generated using deep learning-based auto-segmentation software trained by a single RO on publicly available data. Contours were compared using Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (HD).

Results: Sixty planning CT scans had 2-4 ECs, for a total of 60 CNS, 53 H&N, and 50 prostate RT contour sets. The mean DC and EC contouring times were 0.4 vs 7.7 min for CNS, 0.6 vs 26.6 min for H&N, and 0.4 vs 21.3 min for prostate RT contours. There were minimal differences in DSC and 95% HD involving DCs for OAR comparisons, but more noticeable differences for CTV comparisons.

Conclusions: The accuracy of DCs trained by a single RO is comparable to expert inter-observer variability for the RT planning contours in this study. Use of deep learning-based auto-segmentation in clinical practice will likely lead to significant benefits to RT planning workflow and resources.

Keywords: Machine learning; Radiotherapy.

MeSH terms

Deep Learning*
Head and Neck Neoplasms*
Humans
Male
Observer Variation
Organs at Risk
Radiotherapy Planning, Computer-Assisted