Clinical Implementation and Evaluation of Auto-Segmentation Tools for Multi-Site Contouring in Radiotherapy

Gerd Heilemann; Martin Buschmann; Wolfgang Lechner; Vincent Dick; Franziska Eckert; Martin Heilmann; Harald Herrmann; Matthias Moll; Johannes Knoth; Stefan Konrad; Inga-Malin Simek; Christopher Thiele; Alexandru Zaharie; Dietmar Georg; Joachim Widder; Petra Trnkova

doi:10.1016/j.phro.2023.100515

Clinical Implementation and Evaluation of Auto-Segmentation Tools for Multi-Site Contouring in Radiotherapy

Phys Imaging Radiat Oncol. 2023 Nov 17:28:100515. doi: 10.1016/j.phro.2023.100515. eCollection 2023 Oct.

Authors

Gerd Heilemann¹, Martin Buschmann¹, Wolfgang Lechner¹, Vincent Dick¹, Franziska Eckert¹, Martin Heilmann¹, Harald Herrmann¹, Matthias Moll¹, Johannes Knoth¹, Stefan Konrad¹, Inga-Malin Simek¹, Christopher Thiele¹, Alexandru Zaharie¹, Dietmar Georg¹, Joachim Widder¹, Petra Trnkova¹

Affiliation

¹ Department of Radiation Oncology, Comprehensive Cancer Center Vienna, Medical University Vienna, Vienna, Austria.

Abstract

Background and purpose: Tools for auto-segmentation in radiotherapy are widely available, but guidelines for clinical implementation are missing. The goal was to develop a workflow for performance evaluation of three commercial auto-segmentation tools to select one candidate for clinical implementation.

Materials and methods: One hundred patients with six treatment sites (brain, head-and-neck, thorax, abdomen, and pelvis) were included. Three sets of AI-based contours for organs-at-risk (OAR) generated by three software tools and manually drawn expert contours were blindly rated for contouring accuracy. The dice similarity coefficient (DSC), the Hausdorff distance, and a dose/volume evaluation based on the recalculation of the original treatment plan were assessed. Statistically significant differences were tested using the Kruskal-Wallis test and the post-hoc Dunn Test with Bonferroni correction.

Results: The mean DSC scores compared to expert contours for all OARs combined were 0.80 ± 0.10, 0.75 ± 0.10, and 0.74 ± 0.11 for the three software tools. Physicians' rating identified equivalent or superior performance of some AI-based contours in head (eye, lens, optic nerve, brain, chiasm), thorax (e.g., heart and lungs), and pelvis and abdomen (e.g., kidney, femoral head) compared to manual contours. For some OARs, the AI models provided results requiring only minor corrections. Bowel-bag and stomach were not fit for direct use. During the interdisciplinary discussion, the physicians' rating was considered the most relevant.

Conclusion: A comprehensive method for evaluation and clinical implementation of commercially available auto-segmentation software was developed. The in-depth analysis yielded clear instructions for clinical use within the radiotherapy department.

Keywords: Auto-segmentation; Deep Learning; Radiotherapy; Segmentation.