Improving Image Segmentation with Contextual and Structural Similarity

Xiaoyang Chen; Qin Liu; Hannah H Deng; Tianshu Kuang; Henry Hung-Ying Lin; Deqiang Xiao; Jaime Gateno; James J Xia; Pew-Thian Yap

doi:10.1016/j.patcog.2024.110489

Improving Image Segmentation with Contextual and Structural Similarity

Pattern Recognit. 2024 Aug:152:110489. doi: 10.1016/j.patcog.2024.110489. Epub 2024 Apr 9.

Authors

Xiaoyang Chen¹, Qin Liu², Hannah H Deng³, Tianshu Kuang³, Henry Hung-Ying Lin³, Deqiang Xiao¹, Jaime Gateno^{3

4}, James J Xia^{3

4}, Pew-Thian Yap¹

Affiliations

¹ Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA.
² Department of Computer Science, University of North Carolina, Chapel Hill, 27599, NC, USA.
³ Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA.
⁴ Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, 10065, NY, USA.

PMID: 38645435
PMCID: PMC11027435 (available on 2025-08-01)
DOI: 10.1016/j.patcog.2024.110489

Abstract

Deep learning models for medical image segmentation are usually trained with voxel-wise losses, e.g., cross-entropy loss, focusing on unary supervision without considering inter-voxel relationships. This oversight potentially leads to semantically inconsistent predictions. Here, we propose a contextual similarity loss (CSL) and a structural similarity loss (SSL) to explicitly and efficiently incorporate inter-voxel relationships for improved performance. The CSL promotes consistency in predicted object categories for each image sub-region compared to ground truth. The SSL enforces compatibility between the predictions of voxel pairs by computing pair-wise distances between them, ensuring that voxels of the same class are close together whereas those from different classes are separated by a wide margin in the distribution space. The effectiveness of the CSL and SSL is evaluated using a clinical cone-beam computed tomography (CBCT) dataset of patients with various craniomaxillofacial (CMF) deformities and a public pancreas dataset. Experimental results show that the CSL and SSL outperform state-of-the-art regional loss functions in preserving segmentation semantics.

Keywords: Cone-beam computed tomography; Image segmentation; Inter-voxel relationships; Pancreas segmentation.

Abstract

Grants and funding