Recent advancements in deep learning have significantly improved medical image segmentation. However, the generalization performance and potential risks of data-driven models remain insufficiently validated. Specifically, unrealistic segmentation predictions deviating from actual anatomical structures, known as a Seg-Hallucination, often occur in deep learning-based models. The Seg-Hallucinations can result in erroneous quantitative analyses and distort critical imaging biomarker information, yet effective audits or corrections to address these issues are rare. Therefore, we propose an automated Seg-Hallucination surveillance and correction (ASHSC) algorithm utilizing only 3D organ mask information derived from CT images without reliance on the ground truth. Two publicly available datasets were used in developing the ASHSC algorithm: 280 CT scans from the TotalSegmentator dataset for training and 274 CT scans from the Cancer Imaging Archive (TCIA) dataset for performance evaluation. The ASHSC algorithm utilizes a two-stage on-demand strategy with mesh-based convolutional neural networks and generative artificial intelligence. The segmentation quality level (SQ-level)-based surveillance stage was evaluated using the area under the receiver operating curve, sensitivity, specificity, and positive predictive value. The on-demand correction performance of the algorithm was assessed using similarity metrics: volumetric Dice score, volume error percentage, average surface distance, and Hausdorff distance. Average performance of the surveillance stage resulted in an AUROC of 0.94 ± 0.01, sensitivity of 0.82 ± 0.03, specificity of 0.90 ± 0.01, and PPV of 0.92 ± 0.01 for test dataset. After the on-demand refinement of the correction stage, all the four similarity metrics were improved compared to a single use of the AI-segmentation model. This study not only enhances the efficiency and reliability of handling the Seg-Hallucination but also eliminates the reliance on ground truth. The ASHSC algorithm offers intuitive 3D guidance for uncertainty regions, while maintaining manageable computational complexity. The SQ-level-based on-demand correction strategy adaptively minimizes uncertainties inherent in deep-learning-based organ masks and advances automated auditing and correction methodologies.
Keywords: AI audit; Seg-Hallucination; anomaly screening; segmentation; uncertainty.