TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Chunyang Cheng; Hui Li; Josef Kittler; Tianyang Xu; Xiao-Jun Wu; Xi Li; Zhangyong Tang

arxiv: 2312.14209 · v2 · pith:VE6WEF5Knew · submitted 2023-12-21 · 💻 cs.CV

TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Chunyang Cheng , Tianyang Xu , Xiao-Jun Wu , Hui Li , Xi Li , Zhangyong Tang , Josef Kittler This is my paper

classification 💻 cs.CV

keywords fusionimagecontrollabletexttextfusionassociationconveyeddataset

0 comments

read the original abstract

Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of this work, we propose the use of textual attention to adapt image quality assessment to the fusion task. To facilitate the implementation of the proposed text-guided fusion paradigm, and its adoption by the wider research community, we release a text-annotated image fusion dataset IVT. Extensive experiments demonstrate that our approach (TextFusion) consistently outperforms traditional appearance-based fusion methods. Our code and dataset will be publicly available at https://github.com/AWCXV/TextFusion.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EvaNet: Towards More Efficient and Consistent Infrared and Visible Image Fusion Assessment
cs.CV 2026-04 unverdicted novelty 6.0

EvaNet is a lightweight network that efficiently approximates image fusion metrics with improved consistency to human perception via decomposition, contrastive learning, and LLM input.