Language-Guided Semantic Clustering for Remote Sensing Change Detection

Shenglong Hu; Yiting Bian; Bin Chen; Huihui Song; Kaihua Zhang

doi:10.3390/s24247887

Language-Guided Semantic Clustering for Remote Sensing Change Detection

Sensors (Basel). 2024 Dec 10;24(24):7887. doi: 10.3390/s24247887.

Authors

Shenglong Hu¹, Yiting Bian¹, Bin Chen¹, Huihui Song¹, Kaihua Zhang¹

Affiliation

¹ B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing 210044, China.

Abstract

Existing learning-based remote sensing change detection (RSCD) commonly uses semantic-agnostic binary masks as supervision, which hinders their ability to distinguish between different semantic types of changes, resulting in a noisy change mask prediction. To address this issue, this paper presents a Language-guided semantic clustering framework that can effectively transfer the rich semantic information from the contrastive language-image pretraining (CLIP) model for RSCD, dubbed LSC-CD. The LSC-CD considers the strong zero-shot generalization of the CLIP, which makes it easy to transfer the semantic knowledge from the CLIP into the CD model under semantic-agnostic binary mask supervision. Specifically, the LSC-CD first constructs a category text-prior memory bank based on the dataset statistics and then leverages the CLIP to transform the text in the memory bank into the corresponding semantic embeddings. Afterward, a CLIP adapter module (CAM) is designed to fine-tune the semantic embeddings to align with the change region embeddings from the input bi-temporal images. Next, a semantic clustering module (SCM) is designed to cluster the change region embeddings around the semantic embeddings, yielding the compact change embeddings that are robust to noisy backgrounds. Finally, a lightweight decoder is designed to decode the compact change embeddings, yielding an accurate change mask prediction. Experimental results on three public benchmarks including LEVIR-CD, WHU-CD, and SYSU-CD demonstrate that the proposed LSC-CD achieves state-of-the-art performance in terms of all evaluated metrics.

Keywords: clustering; contrastive language-image pretraining (CLIP); remote sensing change detection (RSCD); semantic information.

Grants and funding

This research received no external funding.