arxiv: 2605.01916 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: unknown

EAPFusion: Intrinsic Evolving Auxiliary Prior Guidance for Infrared and Visible Image Fusion

Zhenyu Sun , Luobin Zhang , Axi Niu , Haishen Wang , Qingsen Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:22 UTC · model grok-4.3

classification 💻 cs.CV

keywords infrared visible image fusiondynamic convolutionintrinsic priorsadaptive kernelsmulti-modal fusionchannel mixingsemantic segmentation

0 comments

The pith

Self-evolving intrinsic priors generate adaptive kernels for state-of-the-art infrared-visible image fusion without external models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that infrared-visible fusion improves when static trained weights are replaced by a compact set of intrinsic priors that evolve across scales and condition dynamic convolutions to produce scene-specific kernels. This addresses the problem of methods failing to adapt to particular content at inference time while avoiding granularity issues from coarse external semantics. The approach also includes a channel-shuffling module to mix modalities locally. Experiments across datasets, including cross-dataset tests and downstream semantic segmentation, demonstrate superior fusion quality and consistent task gains. A sympathetic reader would care because better adaptive fusion directly aids perception in low-light or complex environments like driving and surveillance.

Core claim

EAPFusion maintains a compact set of intrinsic priors that progressively update across scales; these evolved priors then drive prior-conditioned dynamic convolution to generate instance-adaptive kernels on the fly, shifting away from fixed pre-trained filters, while a channel-level fusion module interleaves and mixes infrared and visible features to enhance complementarity.

What carries the argument

Self-evolving intrinsic priors updated across scales, which condition dynamic convolution to produce instance-adaptive kernels, paired with channel shuffling and local mixing for cross-modal fusion.

If this is right

Fused outputs will more effectively highlight thermal targets while retaining fine visible textures in varied scenes.
Downstream tasks such as semantic segmentation will receive measurable performance improvements from the higher-quality inputs.
Cross-dataset generalization will hold because the priors adapt at inference rather than relying on dataset-specific training.
Reliance on large external pre-trained models for guidance can be reduced without sacrificing fusion quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The compact prior design may support deployment on edge devices where external model calls are costly.
The same evolving-prior mechanism could apply to other multi-modal fusion problems such as RGB-depth or medical imaging pairs.
Further scale variations in the prior update process might yield additional robustness to extreme lighting changes.

Load-bearing premise

A compact set of self-evolving intrinsic priors can capture sufficient scene-specific detail to replace external auxiliary models and resolve granularity mismatch while still generating better kernels than static weights.

What would settle it

On a new held-out dataset, if the method shows no gains over the strongest static baseline in both standard fusion metrics and downstream segmentation accuracy, the superiority of the evolving-prior approach would be refuted.

Figures

Figures reproduced from arXiv: 2605.01916 by Axi Niu, Haishen Wang, Luobin Zhang, Qingsen Yan, Zhenyu Sun.

**Figure 1.** Figure 1: Overall architecture of EAPFusion. EAPFusion consists of a dual-branch encoder, APG, DDCB, SCFB, and a progressive decoder for infrared-visible fusion. To better match the front-end encoder, we further design the Shuffle Channel Fusion Block. It shuffles and interleaves IR and VIS channels to facilitate thorough cross-modal interaction along the channel dimension, and then performs effective information in… view at source ↗

**Figure 2.** Figure 2: Adaptive Prior Generator (APG). Historical priors are aligned, summarized with current features via cross-attention, and updated by a gated evolution rule. 3.3. Prior-driven dynamic convolution block The set of auxiliary priors D i produced by the APG at each scale encapsulates key structural information and salient patterns. However, if they are merely concatenated or injected via simple feature addition,… view at source ↗

**Figure 3.** Figure 3: Prior-Driven Dynamic Convolution Block (DDCB). Each prior token generates an expert kernel via WeightGen, and dense/Top-K routing yields location-dependent mixing weights for dynamic convolution. 3.4. Shuffle channel fusion block At the same scale, IR and VIS features often encode complementary information in different channel subspaces: IR features emphasize target saliency and structural contours, while … view at source ↗

**Figure 4.** Figure 4: Shuffle Channel Fusion Block (SCFB). FiLM-based cross-modal gating, channel shuffling, local channel mixing, and multibranch interaction are followed by a 1 × 1 projection. 10 view at source ↗

**Figure 5.** Figure 5: Channel-wise mixing convolution (CWMC). A sliding-window 1D convolution along channels is folded and projected by 1×1 layers to obtain local channel interactions. 11 view at source ↗

**Figure 6.** Figure 6: Qualitative fusion comparison on MSRS and M3FD. Columns show VIS, IR, and fused results of CDDFuse, EMMA, MaeFuse, SPDFusion, SwinFusion, T2EA, TDFusion, WaveFusion, and Ours; boxed regions are enlarged. 15 view at source ↗

**Figure 7.** Figure 7: Qualitative segmentation comparison in daytime (top) and nighttime (bottom) scenes. Columns show source VIS/IR images and segmentation maps from different fusion methods; boxed regions highlight differences. VIS: visible; IR: infrared. As reported in view at source ↗

read the original abstract

Infrared-visible image fusion aims to create an information-rich fused image by integrating the complementary thermal saliency from infrared sensing and fine textures from visible imaging. Such accurate fusion is essential for real-world perception applications in complex scenes, including nighttime autonomous driving, search and rescue, and surveillance, and can further benefit downstream tasks such as semantic segmentation. However, most existing fusion methods rely upon static trained weights that cannot adapt to scene-specific content at inference time, and often suffer from a granularity mismatch when coarse auxiliary semantics are injected, which makes it difficult to simultaneously highlight targets and preserve details. In this work, we propose EAPFusion to address these issues by using self-evolving intrinsic priors instead of relying on external auxiliary models. Concretely, EAPFusion maintains a compact set of intrinsic priors and progressively updates them across scales. These evolved priors are utilized to dynamically generate convolutional kernels, shifting the paradigm from fixed, pre-trained filters to instance-adaptive parameters via prior-conditioned dynamic convolution. Furthermore, we design a channel-level fusion module that shuffles and interleaves infrared and visible channels, applying local channel mixing to boost cross-modal complementarity. Experiments on different datasets, including cross-dataset evaluation and semantic segmentation, show that the proposed method achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance. Code is coming soon.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EAPFusion adapts fusion kernels via evolving internal priors and channel shuffling, which is a reasonable step for scene-specific IR-VIS work but still needs the actual numbers to judge if it moves the needle.

read the letter

The paper's central move is to keep a small set of intrinsic priors that update across scales and then use those to generate dynamic convolution kernels on the fly, instead of relying on fixed weights or external semantic models. They pair this with a channel-shuffle mixing step to improve cross-modal complementarity. That framing directly targets the granularity mismatch problem in fusion for driving or surveillance scenes, and the downstream segmentation evaluation is a sensible way to show utility beyond pixel metrics.

Referee Report

2 major / 2 minor

Summary. The paper proposes EAPFusion for infrared-visible image fusion. It replaces static trained weights and external auxiliary models with a compact set of self-evolving intrinsic priors that are progressively updated across scales; these priors condition dynamic convolution to produce instance-adaptive kernels. A channel-level fusion module that shuffles and interleaves infrared and visible channels is introduced to enhance cross-modal complementarity. Experiments on multiple datasets (including cross-dataset evaluation) and downstream semantic segmentation are claimed to demonstrate state-of-the-art quantitative and qualitative fusion results together with consistent gains on the downstream task.

Significance. If the experimental claims are substantiated, the shift from fixed weights to prior-conditioned dynamic kernels could meaningfully advance adaptive multimodal fusion for real-world perception tasks. The emphasis on intrinsic, evolving priors without external models addresses a recognized granularity mismatch and is a clear conceptual contribution. Reproducibility is supported by the stated intent to release code.

major comments (2)

[Abstract] Abstract: the central claim that the method 'achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance' is unsupported by any numerical metrics, ablation tables, baseline comparisons, or error analysis in the provided text. This absence is load-bearing because the contribution is framed entirely around empirical superiority.
[Method] Method description: the self-evolving intrinsic priors and their use to generate dynamic kernels are described at a high level without equations, update rules, or parameter counts. It is therefore impossible to verify whether the approach is truly parameter-free or avoids the circularity of fitting to the same data it claims to generalize over.

minor comments (2)

The abstract states 'Code is coming soon' but provides no repository link or timeline; this should be updated with concrete availability information.
Dataset names and the exact number of test sets used for cross-dataset evaluation should be stated explicitly in the abstract rather than left as 'different datasets'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments on our manuscript. We appreciate the positive assessment of the conceptual contribution regarding intrinsic evolving priors. We provide point-by-point responses to the major comments below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'achieves state-of-the-art quantitative and qualitative fusion results, and consistently boosts downstream performance' is unsupported by any numerical metrics, ablation tables, baseline comparisons, or error analysis in the provided text. This absence is load-bearing because the contribution is framed entirely around empirical superiority.

Authors: The abstract is intended as a high-level summary of the paper's contributions and results. The full manuscript includes extensive experimental validation with quantitative metrics (e.g., PSNR, SSIM, LPIPS), ablation tables, comparisons against multiple baselines on several datasets, cross-dataset evaluations, and downstream semantic segmentation results with mIoU improvements. To strengthen the abstract and directly address this concern, we will revise it to include specific numerical highlights of the performance gains while maintaining conciseness. revision: yes
Referee: [Method] Method description: the self-evolving intrinsic priors and their use to generate dynamic kernels are described at a high level without equations, update rules, or parameter counts. It is therefore impossible to verify whether the approach is truly parameter-free or avoids the circularity of fitting to the same data it claims to generalize over.

Authors: We acknowledge that the method description in the current version may appear high-level. The manuscript details the self-evolving process with update rules for the intrinsic priors across multiple scales, the mathematical formulation of the prior-conditioned dynamic convolution (including equations for kernel generation and channel mixing), and the small fixed parameter count of the prior set. The evolution is performed in a self-supervised, inference-time manner based on the input image content without any retraining or access to labels, ensuring no circularity with the test data. We will expand the method section with additional equations, a clear parameter analysis, and pseudocode to make these aspects fully verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and method description introduce self-evolving intrinsic priors and dynamic kernel generation for image fusion, but present no equations, derivations, or load-bearing steps that reduce by construction to fitted parameters, self-definitions, or self-citation chains. Claims rest on experimental results across datasets rather than a closed logical loop. No specific reductions (e.g., prediction equaling input by definition) are identifiable from the supplied text, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract introduces 'intrinsic priors' and 'prior-conditioned dynamic convolution' without specifying numerical hyperparameters or external axioms; evaluation is limited to the high-level description.

invented entities (1)

self-evolving intrinsic priors no independent evidence
purpose: to provide scene-adaptive guidance for generating dynamic convolutional kernels without external auxiliary models
Presented as the core replacement for static weights and coarse external semantics

pith-pipeline@v0.9.0 · 5548 in / 1098 out tokens · 28250 ms · 2026-05-09T17:22:33.703941+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 38 canonical work pages

[1]

K. Liu, M. Li, C. Chen, C. Rao, E. Zuo, Y . Wang, Z. Y an, B. Wang, C. Chen, X. Lv, Dsfusion: Infrared and visible image fusion method combining detail and scene information, Pattern Recognit. 154 (2024) 110633. https://doi.org/10.1016/j.patcog.2024.110633

work page doi:10.1016/j.patcog.2024.110633 2024
[2]

X. Luo, J. Wang, Z. Zhang, X.-j. Wu, A full-scale hierarchical encoder-decoder network with cascading edge- prior for infrared and visible image fusion, Pattern Recognit. 148 (2024) 110192. https://doi.org/10. 1016/j.patcog.2023.110192

work page arXiv 2024
[3]

Z. Zhao, S. Xu, C. Zhang, J. Liu, J. Zhang, P . Li, DIDFuse: deep image decomposition for infrared and visible image fusion, in: Proc. 29th Int. Joint Conf. Artif. Intell. (IJCAI), 2020, pp. 970–976. https://doi.org/ 10.24963/ijcai.2020/135

work page doi:10.24963/ijcai.2020/135 2020
[5]

Zheng, R

B. Zheng, R. Wang, X. Liu, J. Li, A multi-level detection guided and co-encoding network for infrared and visible image fusion, Pattern Recognit. 168 (2025) 111778.https://doi.org/10.1016/j.patcog.2025. 111778. 19

work page doi:10.1016/j.patcog.2025 2025
[7]

Zhang, K

J. Zhang, K. He, D. Xu, H. Shi, CLIP-based natural language-guided low-redundancy fusion of infrared and visible images, IEEE Trans. Consum. Electron. 71 (2025) 931–944. https://doi.org/10.1109/TCE. 2025.3526792

work page doi:10.1109/tce 2025
[8]

Z. Zhao, L. Deng, H. Bai, Y . Cui, Z. Zhang, Y . Zhang, H. Qin, D. Chen, J. Zhang, P . Wang, L. Van Gool, Image fusion via vision-language model, in: Proc. Int. Conf. Mach. Learn. (ICML), 2024, pp. 60749–60765. https://doi.org/10.5555/3692070.3694583

work page doi:10.5555/3692070.3694583 2024
[9]

Li, X.-J

H. Li, X.-J. Wu, DenseFuse: a fusion approach to infrared and visible images, IEEE Trans. Image Process. 28 (5) (2019) 2614–2623. https://doi.org/10.1109/TIP.2018.2887342

work page doi:10.1109/tip.2018.2887342 2019
[10]

Li, X.-J

H. Li, X.-J. Wu, J. Kittler, RFN-Nest: an end-to-end residual fusion network for infrared and visible images, Inf. Fusion 73 (2021) 72–86. https://doi.org/10.1016/j.inffus.2021.02.023

work page doi:10.1016/j.inffus.2021.02.023 2021
[11]

J. Ma, W. Yu, P . Liang, C. Li, J. Jiang, FusionGAN: a generative adversarial network for infrared and visible image fusion, Inf. Fusion 48 (2019) 11–26. https://doi.org/10.1016/j.inffus.2018.09.004

work page doi:10.1016/j.inffus.2018.09.004 2019
[12]

J. Li, H. Huo, C. Li, R. Wang, Q. Feng, AttentionFGAN: infrared and visible image fusion using attention- based generative adversarial networks, IEEE Trans. Multimed. 23 (2021) 1383–1396. https://doi.org/ 10.1109/TMM.2020.2997127

work page doi:10.1109/tmm.2020.2997127 2021
[13]

W. Tang, F. He, Y . Liu, ITFuse: an interactive transformer for infrared and visible image fusion, Pattern Recognit. 156 (2024) 110822. https://doi.org/10.1016/j.patcog.2024.110822

work page doi:10.1016/j.patcog.2024.110822 2024
[14]

J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, Y . Ma, SwinFusion: cross-domain long-range learning for general image fusion via Swin Transformer, IEEE/CAA J. Autom. Sinica 9 (7) (2022) 1200–1217. https://doi. org/10.1109/JAS.2022.105686

work page doi:10.1109/jas.2022.105686 2022
[15]

Z. Zhao, H. Bai, Y . Zhu, J. Zhang, S. Xu, Y . Zhang, K. Zhang, D. Meng, R. Timofte, L. Van Gool, DDFM: denoising diffusion model for multi-modality image fusion, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 8048–8059. https://doi.org/10.1109/ICCV51070.2023.00742

work page doi:10.1109/iccv51070.2023.00742 2023
[16]

H. Xu, R. Nie, J. Cao, M. Tan, Z. Ding, MADMFuse: a multi-attribute diffusion model to fuse infrared and visible images, Digit. Signal Process. 155 (2024) 104741. https://doi.org/10.1016/j.dsp.2024. 104741

work page doi:10.1016/j.dsp.2024 2024
[17]

J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, Z. Luo, Target-aware dual adversarial learning and a multi- scenario multi-modality benchmark to fuse infrared and visible for object detection, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 5792–5801. https://doi.org/10.1109/CVPR52688. 2022.00571. 20

work page doi:10.1109/cvpr52688 2022
[18]

H. Bai, J. Zhang, Z. Zhao, Y . Wu, L. Deng, Y . Cui, T. Feng, S. Xu, Task-driven image fusion with learnable fusion loss, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 7457–7468.https: //doi.org/10.1109/CVPR52734.2025.00699

work page doi:10.1109/cvpr52734.2025.00699 2025
[19]

J. Liu, B. Zhang, Q. Mei, X. Li, Y . Zou, Z. Jiang, L. Ma, R. Liu, X. Fan, DCEvo: discriminative cross- dimensional evolutionary learning for infrared and visible image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 2226–2235. https://doi.org/10.1109/CVPR52734.2025. 00213

work page doi:10.1109/cvpr52734.2025 2025
[20]

X. Li, Y . Zou, J. Liu, Z. Jiang, L. Ma, X. Fan, R. Liu, From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion , arXiv (Jan. 2024). arXiv:2401.00421, https://doi.org/ 10.48550/arXiv.2401.00421. URL https://arxiv.org/abs/2401.00421

work page doi:10.48550/arxiv.2401.00421 2024
[21]

X. Li, J. Wang, W. Chen, R. Chen, G. Zhang, L. Cheng, Infrared and visible image fusion based on text-image core-semantic alignment and interaction, Digit. Signal Process. 163 (2025) 105203. https://doi.org/10. 1016/j.dsp.2025.105203

work page arXiv 2025
[22]

J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, B. Xu, PromptFusion: harmonized semantic prompt learning for infrared and visible image fusion, IEEE/CAA J. Autom. Sinica 12 (3) (2025) 502–515. https: //doi.org/10.1109/JAS.2024.124878

work page doi:10.1109/jas.2024.124878 2025
[23]

X. Yi, H. Xu, H. Zhang, L. Tang, J. Ma, Text-IF: leveraging semantic text guidance for degradation-aware and interactive image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 27016–27025. https://doi.org/10.1109/CVPR52733.2024.02552

work page doi:10.1109/cvpr52733.2024.02552 2024
[24]

G. Wu, H. Liu, H. Fu, Y . Peng, J. Liu, X. Fan, R. Liu, Every SAM drop counts: embracing semantic priors for multi-modality image fusion and beyond, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 17882–17891. https://doi.org/10.1109/CVPR52734.2025.01666

work page doi:10.1109/cvpr52734.2025.01666 2025
[25]

Cheng, T

C. Cheng, T. Xu, X.-J. Wu, MUFusion: a general unsupervised image fusion network based on memory unit, Inf. Fusion 92 (2023) 80–92. https://doi.org/10.1016/j.inffus.2022.11.010

work page doi:10.1016/j.inffus.2022.11.010 2023
[26]

J. He, X. Luo, Z. Zhang, X.-j. Wu, MemoryFusion: a novel architecture for infrared and visible image fusion based on memory unit, Pattern Recognit. 170 (2026) 112004. https://doi.org/10.1016/j.patcog. 2025.112004

work page doi:10.1016/j.patcog 2026
[27]

X. Yi, Y . Zhang, X. Xiang, Q. Y an, H. Xu, J. Ma, LUT-Fuse: towards extremely fast infrared and visible image fusion via distillation to learnable look-up tables, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 14559–14568

2025
[28]

H. Li, Z. Y ang, Y . Zhang, W. Jia, Z. Yu, Y . Liu, MulFS-CAP: multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion, IEEE Trans. Pattern Anal. Mach. Intell. 47 (5) (2025) 3673–3690. https://doi.org/10.1109/TPAMI.2025.3535617. 21

work page doi:10.1109/tpami.2025.3535617 2025
[29]

Huang, S

S. Huang, S. Su, J. Wei, L. Hu, Z. Cheng, Vector-quantized dual-branch fusion network for robust image fusion and anomaly suppression, Inf. Fusion 126 (2026) 103630. https://doi.org/10.1016/j.inffus. 2025.103630

work page doi:10.1016/j.inffus 2026
[30]

L. Tang, H. Zhang, H. Xu, J. Ma, Rethinking the necessity of image fusion in high-level vision tasks: a practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity, Inf. Fusion 99 (2023) 101870. https://doi.org/10.1016/j.inffus.2023.101870

work page doi:10.1016/j.inffus.2023.101870 2023
[31]

M. Lou, Y . Yu, OverLoCK: an overview-first-look-closely-next ConvNet with context-mixing dynamic ker- nels, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 128–138. https: //doi.org/10.1109/CVPR52734.2025.00021

work page doi:10.1109/cvpr52734.2025.00021 2025
[32]

Z. Wang, A. C. Bovik, H. R. Sheikh, E. P . Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612. https://doi.org/10.1109/ TIP.2003.819861

work page arXiv 2004
[33]

L. Tang, J. Yuan, H. Zhang, X. Jiang, J. Ma, Piafusion: A progressive infrared and visible image fusion network based on illumination aware, Inf. Fusion 83–84 (2022) 79–92. https://doi.org/10.1016/j. inffus.2022.03.007

work page doi:10.1016/j 2022
[34]

A. Toet, M. A. Hogervorst, Progress in color night vision, Optical Engineering 51 (2012) 010901. https: //doi.org/10.1117/1.OE.51.1.010901

work page doi:10.1117/1.oe.51.1.010901 2012
[35]

Z. Zhao, H. Bai, J. Zhang, Y . Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, L. Van Gool, Equivariant multi-modality image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 25912–25921. https://doi.org/10.1109/CVPR52733.2024.02448

work page doi:10.1109/cvpr52733.2024.02448 2024
[36]

Huang, C

Z. Huang, C. Lin, B. Xu, M. Xia, Q. Li, Y . Li, N. Sang, T2EA: target-aware Taylor expansion approximation network for infrared and visible image fusion, IEEE Trans. Circuits Syst. Video Technol. 35 (5) (2025) 4831–

2025
[37]

https://doi.org/10.1109/TCSVT.2024.3524794

work page doi:10.1109/tcsvt.2024.3524794 2024
[38]

Q. Wang, Z. Li, S. Zhang, N. Chi, Q. Dai, WaveFusion: a novel wavelet vision transformer with saliency- guided enhancement for multimodal image fusion, IEEE Trans. Circuits Syst. Video Technol. 35 (8) (2025) 7526–7542. https://doi.org/10.1109/TCSVT.2025.3549459

work page doi:10.1109/tcsvt.2025.3549459 2025
[39]

Z. Zhao, H. Bai, J. Zhang, Y . Zhang, S. Xu, Z. Lin, R. Timofte, L. Van Gool, Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 5906–5916. https://doi.org/10.1109/CVPR52729.2023.00572

work page doi:10.1109/cvpr52729.2023.00572 2023
[40]

J. Li, J. Jiang, P . Liang, J. Ma, L. Nie, MaeFuse: transferring omni features with pretrained masked au- toencoders for infrared and visible image fusion via guided training, IEEE Trans. Image Process. 34 (2025) 1340–1353. https://doi.org/10.1109/TIP.2025.3541562

work page doi:10.1109/tip.2025.3541562 2025
[41]

Q. Xiao, H. Jin, H. Su, Y . Zhang, Z. Xiao, B. Wang, SPDFusion: a semantic prior knowledge-driven method for infrared and visible image fusion, IEEE Trans. Multimed. 27 (2025) 1691–1705. https://doi.org/10. 1109/TMM.2024.3521848. 22

work page arXiv 2025
[42]

L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Computer Vision – ECCV 2018, Vol. 11211, Springer, Cham, 2018, pp. 833–851. https://doi.org/10.1007/978-3-030-01234-2_49

work page doi:10.1007/978-3-030-01234-2_49 2018
[43]

Contributors, MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark, https: //github.com/open-mmlab/mmsegmentation (2020)

M. Contributors, MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark, https: //github.com/open-mmlab/mmsegmentation (2020). 23

2020