Development and Validation of a Modality-Invariant 3D Swin U-Net Transformer for Liver and Spleen Segmentation on Multi-Site Clinical Bi-parametric MR Images

J Imaging Inform Med. 2024 Dec 20. doi: 10.1007/s10278-024-01362-w. Online ahead of print.

Abstract

To develop and validate a modality-invariant Swin U-Net Transformer (UNETR) deep learning model for liver and spleen segmentation on abdominal T1-weighted (T1w) or T2-weighted (T2w) MR images from multiple institutions for pediatric and adult patients with known or suspected chronic liver diseases. In this IRB-approved retrospective study, clinical abdominal axial T1w and T2w MR images from pediatric and adult patients were retrieved from four study sites, including Cincinnati Children's Hospital Medical Center (CCHMC), New York University (NYU), University of Wisconsin (UW) and University of Michigan / Michigan Medicine (UM). The whole liver and spleen were manually delineated as the ground truth masks. We developed a modality-invariant 3D Swin UNETR using a modality-invariant training strategy, in which each patient's T1w and T2w MR images were treated as separate training samples. We conducted both internal and external validation experiments. A total of 241 T1w and 339 T2w MR sequences from 304 patients (age [mean ± standard deviation], 31.8 ± 20.3 years; 132 [43%] female) were included for model development. The Swin UNETR achieved a Dice similarity coefficient (DSC) of 0.95 ± 0.02 on T1w images and 0.93 ± 0.05 on T2w images for liver segmentation. This is significantly better than the U-Net model (0.90 ± 0.05, p < 0.001 and 0.90 ± 0.13, p < 0.001, respectively). The Swin UNETR achieved a DSC of 0.88 ± 0.12 on T1w images and 0.93 ± 0.10 on T2w images for spleen segmentation, and it significantly outperformed a modality-invariant U-Net model (0.80 ± 0.18, p = 0.001 and 0.88 ± 0.12, p = 0.002, respectively). Our study demonstrated that a modality-invariant Swin UNETR model can segment the liver and spleen on routinely collected clinical bi-parametric abdominal MR images from children and adult patients.

Keywords: Deep learning; Medical image segmentation; Transformer.