Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

Wang, Yuhao; Miao, Lingjuan; Zhou, Zhiqiang; Zhang, Lei; Qiao, Yajun

doi:10.48550/arXiv.2402.16267

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.16267v1 (cs)

[Submitted on 26 Feb 2024]

Title:Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

Authors:Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Lei Zhang, Yajun Qiao

View PDF HTML (experimental)

Abstract:Infrared-visible image fusion (IVIF) has attracted much attention owing to the highly-complementary properties of the two image modalities. Due to the lack of ground-truth fused images, the fusion output of current deep-learning based methods heavily depends on the loss functions defined mathematically. As it is hard to well mathematically define the fused image without ground truth, the performance of existing fusion methods is limited. In this paper, we first propose to use natural language to express the objective of IVIF, which can avoid the explicit mathematical modeling of fusion output in current losses, and make full use of the advantage of language expression to improve the fusion performance. For this purpose, we present a comprehensive language-expressed fusion objective, and encode relevant texts into the multi-modal embedding space using CLIP. A language-driven fusion model is then constructed in the embedding space, by establishing the relationship among the embedded vectors to represent the fusion objective and input image modalities. Finally, a language-driven loss is derived to make the actual IVIF aligned with the embedded language-driven fusion model via supervised training. Experiments show that our method can obtain much better fusion results than existing techniques.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.16267 [cs.CV]
	(or arXiv:2402.16267v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.16267
Related DOI:	https://doi.org/10.48550/arXiv.2402.16267

Submission history

From: Yuhao Wang [view email]
[v1] Mon, 26 Feb 2024 03:08:01 UTC (6,151 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators