Deep learning (DL) tools developed on adult data sets may not generalize well to pediatric patients, posing potential safety risks. We evaluated the performance of TotalSegmentator, a state-of-the-art adult-trained CT organ segmentation model, on a subset of organs in a pediatric CT dataset and explored optimization strategies to improve pediatric segmentation performance. TotalSegmentator was retrospectively evaluated on abdominal CT scans from an external adult dataset (n = 300) and an external pediatric data set (n = 359). Generalizability was quantified by comparing Dice scores between adult and pediatric external data sets using Mann-Whitney U tests. Two DL optimization approaches were then evaluated: (1) 3D nnU-Net model trained on only pediatric data, and (2) an adult nnU-Net model fine-tuned on the pediatric cases. Our results show TotalSegmentator had significantly lower overall mean Dice scores on pediatric vs. adult CT scans (0.73 vs. 0.81, P < .001) demonstrating limited generalizability to pediatric CT scans. Stratified by organ, there was lower mean pediatric Dice score for four organs (P < .001, all): right and left adrenal glands (right adrenal, 0.41 [0.39-0.43] vs. 0.69 [0.66-0.71]; left adrenal, 0.35 [0.32-0.37] vs. 0.68 [0.65-0.71]); duodenum (0.47 [0.45-0.49] vs. 0.67 [0.64-0.69]); and pancreas (0.73 [0.72-0.74] vs. 0.79 [0.77-0.81]). Performance on pediatric CT scans improved by developing pediatric-specific models and fine-tuning an adult-trained model on pediatric images where both methods significantly improved segmentation accuracy over TotalSegmentator for all organs, especially for smaller anatomical structures (e.g., > 0.2 higher mean Dice for adrenal glands; P < .001).
Keywords: Age bias; Model generalizability; Multi-Organ image segmentation; Transfer learning.
© 2024. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.