TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Cheng, Chunyang; Xu, Tianyang; Wu, Xiao-Jun; Li, Hui; Li, Xi; Tang, Zhangyong; Kittler, Josef

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.14209 (cs)

[Submitted on 21 Dec 2023 (v1), last revised 8 Feb 2024 (this version, v2)]

Title:TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Authors:Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Zhangyong Tang, Josef Kittler

View PDF HTML (experimental)

Abstract:Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of this work, we propose the use of textual attention to adapt image quality assessment to the fusion task. To facilitate the implementation of the proposed text-guided fusion paradigm, and its adoption by the wider research community, we release a text-annotated image fusion dataset IVT. Extensive experiments demonstrate that our approach (TextFusion) consistently outperforms traditional appearance-based fusion methods. Our code and dataset will be publicly available at this https URL.

Comments:	v2 version, 13 pages, 16 figures, with the code repository link
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	I.4
Cite as:	arXiv:2312.14209 [cs.CV]
	(or arXiv:2312.14209v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.14209

Submission history

From: Chunyang Cheng [view email]
[v1] Thu, 21 Dec 2023 09:25:10 UTC (18,737 KB)
[v2] Thu, 8 Feb 2024 11:43:57 UTC (21,987 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators