Multi-scale segmentation squeeze-and-excitation UNet with conditional random field for segmenting lung tumor from CT images

Comput Methods Programs Biomed. 2022 Jul:222:106946. doi: 10.1016/j.cmpb.2022.106946. Epub 2022 Jun 8.

Abstract

Background and objective: Lung cancer counts among diseases with the highest global morbidity and mortality rates. The automatic segmentation of lung tumors from CT images is of vast significance. However, the segmentation faces several challenges, including variable shapes and different sizes, as well as complicated surrounding tissues.

Methods: We propose a multi-scale segmentation squeeze-and-excitation UNet with a conditional random field (M-SegSEUNet-CRF) to automatically segment lung tumors from CT images. M-SegSEUNet-CRF employs a multi-scale strategy to solve the problem of variable tumor size. Through the spatially adaptive attention mechanism, the segmentation SE blocks embedded in 3D UNet are utilized to highlight tumor regions. The dense connected CRF framework is further added to delineate tumor boundaries at a detailed level. In total, 759 CT scans of patients with lung cancer were used to train and evaluate the M-SegSEUNet-CRF model (456 for training, 152 for validation, and 151 for test). Meanwhile, the public NSCLC-Radiomics and LIDC datasets have been utilized to validate the generalization of the proposed method. The role of different modules in the M-SegSEUNet-CRF model is analyzed by the ablation experiments, and the performance is compared with that of UNet, its variants and other state-of-the-art models.

Results: M-SegSEUNet-CRF can achieve a Dice coefficient of 0.851 ± 0.071, intersection over union (IoU) of 0.747 ± 0.102, sensitivity of 0.827 ± 0.108, and positive predictive value (PPV) of 0.900 ± 0.107. Without a multi-scale strategy, the Dice coefficient drops to 0.820 ± 0.115; without CRF, it drops to 0.842 ± 0.082, and without both, it drops to 0.806 ± 0.120. M-SegSEUNet-CRF presented a higher Dice coefficient than 3D UNet (0.782 ± 0.115) and its variants (ResUNet, 0.797 ± 0.132; DenseUNet, 0.792 ± 0.111, and UNETR, 0.794 ± 0.130). Although the performance slightly declines with the decrease in tumor volume, M-SegSEUNet-CRF exhibits more obvious advantages than the other comparative models.

Conclusions: Our M-SegSEUNet-CRF model improves the segmentation ability of UNet through the multi-scale strategy and spatially adaptive attention mechanism. The CRF enables a more precise delineation of tumor boundaries. The M-SegSEUNet-CRF model integrates these characteristics and demonstrates outstanding performance in the task of lung tumor segmentation. It can furthermore be extended to deal with other segmentation problems in the medical imaging field.

Keywords: 3D-UNet; Adaptive attention; Conditional random field; Deep learning; Image segmentation; Lung tumor; Squeeze-and-excitation.

MeSH terms

  • Humans
  • Image Processing, Computer-Assisted / methods
  • Lung Neoplasms* / diagnostic imaging
  • Tomography, X-Ray Computed*
  • Tumor Burden