Background: Medical imagesegmentation is an essential step in both clinical and research applications, and automated segmentation models-such as TotalSegmentator-have become ubiquitous. However, robust methods for validating the accuracy of these models remain limited, and manual inspection is often necessary before the segmentation masks produced by these models can be used.
Methods: To address this gap, we have developed a novel validation framework for segmentation models, leveraging data augmentation to assess model consistency. We produced segmentation masks for both the original and augmented scans, and we calculated the alignment metrics between these segmentation masks.
Results: Our results demonstrate strong correlation between the segmentation quality of the original scan and the average alignment between the masks of the original and augmented CT scans. These results were further validated by supporting metrics, including the coefficient of variance and the average symmetric surface distance, indicating that agreement with augmented-scan segmentation masks is a valid proxy for segmentation quality.
Conclusions: Overall, our framework offers a pipeline for evaluating segmentation performance without relying on manually labeled ground truth data, establishing a foundation for future advancements in automated medical image analysis.
Keywords: AI; TotalSegmentator; augmentation; automated segmentation; evaluation; medical imaging.