Auto-segmentation of primary tumors in oropharyngeal cancer using PET/CT images is an unmet need that has the potential to improve radiation oncology workflows. In this study, we develop a series of deep learning models based on a 3D Residual Unet (ResUnet) architecture that can segment oropharyngeal tumors with high performance as demonstrated through internal and external validation of large-scale datasets (training size = 224 patients, testing size = 101 patients) as part of the 2021 HECKTOR Challenge. Specifically, we leverage ResUNet models with either 256 or 512 bottleneck layer channels that demonstrate internal validation (10-fold cross-validation) mean Dice similarity coefficient (DSC) up to 0.771 and median 95% Hausdorff distance (95% HD) as low as 2.919 mm. We employ label fusion ensemble approaches, including Simultaneous Truth and Performance Level Estimation (STAPLE) and a voxel-level threshold approach based on majority voting (AVERAGE), to generate consensus segmentations on the test data by combining the segmentations produced through different trained cross-validation models. We demonstrate that our best performing ensembling approach (256 channels AVERAGE) achieves a mean DSC of 0.770 and median 95% HD of 3.143 mm through independent external validation on the test set. Our DSC and 95% HD test results are within 0.01 and 0.06 mm of the top ranked model in the competition, respectively. Concordance of internal and external validation results suggests our models are robust and can generalize well to unseen PET/CT data. We advocate that ResUNet models coupled to label fusion ensembling approaches are promising candidates for PET/CT oropharyngeal primary tumors auto-segmentation. Future investigations should target the ideal combination of channel combinations and label fusion strategies to maximize segmentation performance.
Keywords: Auto-contouring; CT; Deep learning; Head and neck cancer; Oropharyngeal cancer; PET; Tumor segmentation.