Purpose: Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their feature maps and predict displacement or deformation fields through convolutional or fully connected layers from these high-dimensional feature maps. We present vector field attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences.
Approach: VFA uses neural networks to extract multi-resolution feature maps from the fixed and moving images and then retrieves pixel-level correspondences based on feature similarity. The retrieval is achieved with a novel attention module without the need for learnable parameters. VFA is trained end-to-end in either a supervised or unsupervised manner.
Results: We evaluated VFA for intra- and inter-modality registration and unsupervised and semi-supervised registration using public datasets as well as the Learn2Reg challenge. VFA demonstrated comparable or superior registration accuracy compared with several state-of-the-art methods.
Conclusions: VFA offers a novel approach to deformable image registration by directly retrieving spatial correspondences from feature maps, leading to improved performance in registration tasks. It holds potential for broader applications.
Keywords: attention; deformable image registration; non-rigid registration; transformer; unsupervised registration.
© 2024 Society of Photo-Optical Instrumentation Engineers (SPIE).