The InterVision Framework: An Enhanced Fine-Tuning Deep Learning Strategy for Auto-Segmentation in Head and Neck

Byongsu Choi; Chris J Beltran; Sang Kyun Yoo; Na Hye Kwon; Jin Sung Kim; Justin Chunjoo Park

doi:10.3390/jpm14090979

The InterVision Framework: An Enhanced Fine-Tuning Deep Learning Strategy for Auto-Segmentation in Head and Neck

J Pers Med. 2024 Sep 15;14(9):979. doi: 10.3390/jpm14090979.

Authors

Byongsu Choi^{1

2

3}, Chris J Beltran¹, Sang Kyun Yoo^{2

3}, Na Hye Kwon^{2

3}, Jin Sung Kim^{2

3

4}, Justin Chunjoo Park¹

Affiliations

¹ Department of Radiation Oncology, Mayo Clinic, Jacksonville, FL 32224, USA.
² Yonsei Cancer Center, Department of Radiation Oncology, Yonsei Heavy Ion Therapy Research Institute, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.
³ Medical Physics and Biomedical Engineering Lab (MPBEL), Yonsei University College of Medicine, Seoul 03722, Republic of Korea.
⁴ OncoSoft Inc., Seoul 03776, Republic of Korea.

Abstract

Adaptive radiotherapy (ART) workflows are increasingly adopted to achieve dose escalation and tissue sparing under dynamic anatomical conditions. However, recontouring and time constraints hinder the implementation of real-time ART workflows. Various auto-segmentation methods, including deformable image registration, atlas-based segmentation, and deep learning-based segmentation (DLS), have been developed to address these challenges. Despite the potential of DLS methods, clinical implementation remains difficult due to the need for large, high-quality datasets to ensure model generalizability. This study introduces an InterVision framework for segmentation. The InterVision framework can interpolate or create intermediate visuals between existing images to generate specific patient characteristics. The InterVision model is trained in two steps: (1) generating a general model using the dataset, and (2) tuning the general model using the dataset generated from the InterVision framework. The InterVision framework generates intermediate images between existing patient image slides using deformable vectors, effectively capturing unique patient characteristics. By creating a more comprehensive dataset that reflects these individual characteristics, the InterVision model demonstrates the ability to produce more accurate contours compared to general models. Models are evaluated using the volumetric dice similarity coefficient (VDSC) and the Hausdorff distance 95% (HD95%) for 18 structures in 20 test patients. As a result, the Dice score was 0.81 ± 0.05 for the general model, 0.82 ± 0.04 for the general fine-tuning model, and 0.85 ± 0.03 for the InterVision model. The Hausdorff distance was 3.06 ± 1.13 for the general model, 2.81 ± 0.77 for the general fine-tuning model, and 2.52 ± 0.50 for the InterVision model. The InterVision model showed the best performance compared to the general model. The InterVision framework presents a versatile approach adaptable to various tasks where prior information is accessible, such as in ART settings. This capability is particularly valuable for accurately predicting complex organs and targets that pose challenges for traditional deep learning algorithms.

Keywords: ART; Swin-Unet; auto-segmentation; deep learning; deform vector; head and neck; transformer.

Abstract

Grants and funding