Breast cancer remains a significant global health concern and is a leading cause of mortality among women. The accuracy of breast cancer diagnosis can be greatly improved with the assistance of automatic segmentation of breast ultrasound images. Research has demonstrated the effectiveness of convolutional neural networks (CNNs) and transformers in segmenting these images. Some studies combine transformers and CNNs, using the transformer's ability to exploit long-distance dependencies to address the limitations inherent in convolutional neural networks. Many of these studies face limitations due to the forced integration of transformer blocks into CNN architectures. This approach often leads to inconsistencies in the feature extraction process, ultimately resulting in suboptimal performance for the complex task of medical image segmentation. This paper presents CSAU-Net, a cross-scale attention-guided U-Net, which is a combined CNN-transformer structure that leverages the local detail depiction of CNNs and the ability of transformers to handle long-distance dependencies. To integrate global context data, we propose a cross-scale cross-attention transformer block that is embedded within the skip connections of the U-shaped architectural network. To further enhance the effectiveness of the segmentation process, we incorporated a gated dilated convolution (GDC) module and a lightweight channel self-attention transformer (LCAT) on the encoder side. Extensive experiments conducted on three open-source datasets demonstrate that our CSAU-Net surpasses state-of-the-art techniques in segmenting ultrasound breast lesions.
Keywords: Breast lesion segmentation; Convolutional neural network; Deep learning; Transformer; Ultrasound imaging.
© 2025. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.