Efficient Swin Transformer for Remote Sensing Image Super-Resolution

Xudong Kang; Puhong Duan; Jier Li; Shutao Li

doi:10.1109/TIP.2024.3489228

Efficient Swin Transformer for Remote Sensing Image Super-Resolution

IEEE Trans Image Process. 2024 Nov 6:PP. doi: 10.1109/TIP.2024.3489228. Online ahead of print.

Authors

Xudong Kang, Puhong Duan, Jier Li, Shutao Li

PMID: 39504286
DOI: 10.1109/TIP.2024.3489228

Abstract

Remote sensing super-resolution (SR) technique, which aims to generate high-resolution image with rich spatial details from its low-resolution counterpart, play a vital role in many applications. Recently, more and more studies attempt to explore the application of Transformer in remote sensing field. However, they suffer from the high computational burden and memory consumption for remote sensing super-resolution. In this paper, we propose an efficient Swin Transformer (ESTNet) via channel attention for SR of remote sensing images, which is composed of three components. First, a three-layer convolutional operation is utilized to extract shallow features of the input low-resolution image. Then, a residual group-wise attention module is proposed to extract the deep features, which contains an efficient channel attention block (ECAB) and a group-wise attention block (GAB). Finally, the extracted deep features are reconstructed to generate high-resolution remote sensing images. Extensive experimental results proclaim that the proposed ESTNet can obtain better super-resolution results with low computational burden. Compared to the recently proposed Transformer-based remote sensing super-resolution method, the number of parameters is reduced by 82.68% while the computational cost is reduced by 87.84%. The code of the proposed ESTNet will be available at https://github.com/PuhongDuan/ESTNet for reproducibility.