Background: In clinical practices, doctors usually need to synthesize several single-modality medical images for diagnosis, which is a time-consuming and costly process. With this background, multimodal medical image fusion (MMIF) techniques have emerged to synthesize medical images of different modalities, providing a comprehensive and objective interpretation of the lesion.
Purpose: Although existing MMIF approaches have shown promising results, they often overlook the importance of multiscale feature diversity and attention interaction, which are essential for superior visual outcomes. This oversight can lead to diminished fusion performance. To bridge the gaps, we introduce a novel approach that emphasizes the integration of multiscale features through a structured decomposition and attention interaction.
Methods: Our method first decomposes the source images into three distinct groups of multiscale features by stacking different numbers of diverse branch blocks. Then, to extract global and local information separately for each group of features, we designed the convolutional and Transformer block attention branch. These two attention branches make full use of channel and spatial attention mechanisms and achieve attention interaction, enabling the corresponding feature channels to fully capture local and global information and achieve effective inter-block feature aggregation.
Results: For the MRI-PET fusion type, MACAN achieves average improvements of 24.48%, 27.65%, 19.24%, 27.32%, 18.51%, and 10.33% over the compared methods in terms of Qcb, AG, SSIM, SF, Qabf, and VIF metrics, respectively. Similarly, for the MRI-SPECT fusion type, MACAN outperforms the compared methods with average improvements of 29.13%, 26.43%, 18.20%, 27.71%, 16.79%, and 10.38% in the same metrics. In addition, our method demonstrates promising results in segmentation experiments. Specifically, for the T2-T1ce fusion, it achieves a Dice coefficient of 0.60 and a Hausdorff distance of 15.15. Comparable performance is observed for the Flair-T1ce fusion, with a Dice coefficient of 0.60 and a Hausdorff distance of 13.27.
Conclusion: The proposed multiple attention channels aggregated network (MACAN) can effectively retain the complementary information from source images. The evaluation of MACAN through medical image fusion and segmentation experiments on public datasets demonstrated its superiority over the state-of-the-art methods, both in terms of visual quality and objective metrics. Our code is available at https://github.com/JasonWong30/MACAN.
Keywords: attention interaction; multimodal medical image fusion; multiscale features.
© 2024 American Association of Physicists in Medicine.