Expert-centered Evaluation of Deep Learning Algorithms for Brain Tumor Segmentation

Katharina V Hoebel; Christopher P Bridge; Sara Ahmed; Oluwatosin Akintola; Caroline Chung; Raymond Y Huang; Jason M Johnson; Albert Kim; K Ina Ly; Ken Chang; Jay Patel; Marco Pinho; Tracy T Batchelor; Bruce R Rosen; Elizabeth R Gerstner; Jayashree Kalpathy-Cramer

doi:10.1148/ryai.220231

Expert-centered Evaluation of Deep Learning Algorithms for Brain Tumor Segmentation

Radiol Artif Intell. 2024 Jan;6(1):e220231. doi: 10.1148/ryai.220231.

Authors

Katharina V Hoebel¹, Christopher P Bridge¹, Sara Ahmed¹, Oluwatosin Akintola¹, Caroline Chung¹, Raymond Y Huang¹, Jason M Johnson¹, Albert Kim¹, K Ina Ly¹, Ken Chang¹, Jay Patel¹, Marco Pinho¹, Tracy T Batchelor¹, Bruce R Rosen¹, Elizabeth R Gerstner¹, Jayashree Kalpathy-Cramer¹

Affiliation

¹ From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology (K.V.H., C.P.B., A.K., K.I.L., K.C., J.P., B.R.R., E.R.G., J.K.C.), and Stephen E. and Catherine Pappas Center for Neuro-Oncology (O.A., A.K., K.I.L., E.R.G.), Massachusetts General Hospital, 149 13th St, Charlestown, MA 02129; Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Mass (K.V.H., K.C., J.P.); MGH and BWH Center for Clinical Data Science, Boston, Mass (C.P.B., J.K.C.); Department of Radiation Oncology, Division of Radiation Oncology (S.A., C.C.); Department of Diagnostic Radiology, Division of Diagnostic Imaging (C.C.), and Department of Neuroradiology (J.M.J.), Division of Diagnostic Imaging, The University of Texas MD Anderson Cancer Center, Houston, Tex; Departments of Radiology (R.Y.H.) and Neurology (T.T.B.), Brigham and Women's Hospital, Boston, Mass; Department of Radiology and Advanced Imaging Research Center, University of Texas Southwestern Medical Center, Dallas, Tex (M.P.); and Department of Ophthalmology, University of Colorado Anschutz Medical Campus, Aurora, Colo (J.K.C.).

Abstract

Purpose To present results from a literature survey on practices in deep learning segmentation algorithm evaluation and perform a study on expert quality perception of brain tumor segmentation. Materials and Methods A total of 180 articles reporting on brain tumor segmentation algorithms were surveyed for the reported quality evaluation. Additionally, ratings of segmentation quality on a four-point scale were collected from medical professionals for 60 brain tumor segmentation cases. Results Of the surveyed articles, Dice score, sensitivity, and Hausdorff distance were the most popular metrics to report segmentation performance. Notably, only 2.8% of the articles included clinical experts' evaluation of segmentation quality. The experimental results revealed a low interrater agreement (Krippendorff α, 0.34) in experts' segmentation quality perception. Furthermore, the correlations between the ratings and commonly used quantitative quality metrics were low (Kendall tau between Dice score and mean rating, 0.23; Kendall tau between Hausdorff distance and mean rating, 0.51), with large variability among the experts. Conclusion The results demonstrate that quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences, and existing metrics do not capture the clinical perception of segmentation quality. Keywords: Brain Tumor Segmentation, Deep Learning Algorithms, Glioblastoma, Cancer, Machine Learning Clinical trial registration nos. NCT00756106 and NCT00662506 Supplemental material is available for this article. © RSNA, 2023.

Keywords: Brain Tumor Segmentation; Cancer; Deep Learning Algorithms; Glioblastoma; Machine Learning.

Publication types

Clinical Trial

MeSH terms

Algorithms
Benchmarking
Brain Neoplasms* / diagnostic imaging
Deep Learning*
Glioblastoma* / diagnostic imaging
Humans

Abstract

Publication types

MeSH terms

Grants and funding