Purpose: To investigate the effect of data quality and quantity on the performance of deep learning (DL) models, for dose prediction of intensity-modulated radiotherapy (IMRT) of esophageal cancer.
Material and methods: Two databases were used: a variable database (VarDB) with 56 clinical cases extracted retrospectively, including user-dependent variability in delineation and planning, different machines and beam configurations; and a homogenized database (HomDB), created to reduce this variability by re-contouring and re-planning all patients with a fixed class-solution protocol. Experiment 1 analysed the user-dependent variability, using 26 patients planned with the same machine and beam setup (E26-VarDB versus E26-HomDB). Experiment 2 increased the training set by groups of 10 patients (E16, E26, E36, E46, and E56) for both databases. Model evaluation metrics were the mean absolute error (MAE) for selected dose-volume metrics and the global MAE for all body voxels.
Results: For Experiment 1, E26-HomDB reduced the MAE for the considered dose-volume metrics compared to E26-VarDB (e.g. reduction of 0.2 Gy for D95-PTV, 1.2 Gy for Dmean-heart or 3.3% for V5-lungs). For Experiment 2, increasing the database size slightly improved performance for HomDB models (e.g. decrease in global MAE of 0.13 Gy for E56-HomDB versus E26-HomDB), but increased the error for the VarDB models (e.g. increase in global MAE of 0.20 Gy for E56-VarDB versus E26-VarDB).
Conclusion: A small database may suffice to obtain good DL prediction performance, provided that homogenous training data is used. Data variability reduces the performance of DL models, which is further pronounced when increasing the training set.
Keywords: Automatic planning; Deep learning; Esophageal cancer; IMRT; Radiotherapy.
Copyright © 2021 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.