Recently, researchers have introduced Transformer into medical image segmentation networks to encode long-range dependency, which makes up for the deficiencies of convolutional neural networks (CNNs) in global context modeling, and thus improves segmentation performance. However, in Transformer, due to the heavy computational burden of paired attention modeling between redundant visual tokens, the efficiency of Transformer needs to be further improved. Therefore, in this paper, we propose ATTransUNet, a Transformer enhanced hybrid architecture based on the adaptive token for ultrasound and histopathology image segmentation. In the encoding stage of the ATTransUNet, we introduced an Adaptive Token Extraction Module (ATEM), which can mine a few important visual tokens in the image for self-attention modeling, thus reducing the complexity of the model and improving the segmentation accuracy. In addition, in the decoding stage, we introduce a Selective Feature Reinforcement Module (SFRM) to reinforce the representation of and attention to key tissues or pathological features. The proposed ATTransUNet is evaluated on the basis of three medical image segmentation datasets. The results show that ATTransUNet achieves the best segmentation performance compared with the previous state-of-the-art models, and the proposed method is also competitive in terms of the network parameters and computation.
Keywords: Feature selection; Histopathology image segmentation; Learnable tokens; Transformer; U-Net; Ultrasound image segmentation.
Copyright © 2022 Elsevier Ltd. All rights reserved.