arxiv: 2605.00461 · v1 · submitted 2026-05-01 · 📡 eess.IV · cs.CV

Recognition: unknown

Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Ge Luo, Jun-Jie Huang, Ke Liang, Meng Wang, Qi Yu, Tianrui Liu, Wentao Zhao, Xinwang Liu, Yuming Xiang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:54 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords multi-source image fusiondeep unfolding networkcoupled dictionary learninginfrared visible fusionlightweight networkunsupervised trainingfrequency fidelity lossjoint feature update

0 comments

The pith

CDNet translates the unique-common decomposition prior of coupled dictionary learning into a joint unfolding network for efficient multi-source image fusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CDNet to reduce the computational and memory costs of deep unfolding networks for fusing images from multiple sources. Existing unfolding methods update features of each modality separately through alternating minimization, which adds overhead. CDNet instead maps the unique-common decomposition idea from coupled dictionary learning into one block-sparse structure that jointly updates shared and modality-specific representations. This design is paired with a high- and low-frequency fidelity loss that supports unsupervised training without ground-truth images. Experiments across infrared-visible, multi-exposure, and medical fusion tasks show the network matches or exceeds prior performance while running more lightly.

Core claim

CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.

What carries the argument

The CDBlock, a block-sparse interaction structure derived from coupled dictionary learning that jointly updates common and modality-specific representations in a single unfolding step.

If this is right

CDNet matches or beats competing fusion methods on four of six metrics for TNO infrared-visible data and five of six for RoadScene data.
The network surpasses the second-best method by 1.23 dB PSNR on TNO and 1.59 dB on RoadScene.
A single high- and low-frequency fidelity loss enables training on multiple fusion tasks without ground-truth images.
The lightweight joint-update design supports deployment on resource-limited edge devices for real-time multi-source fusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The joint update may reduce loss of complementary details between modalities, lowering visible artifacts in fused outputs.
The same block-sparse unfolding pattern could apply to other multi-modal inverse problems such as joint denoising or super-resolution.
Efficiency improvements open the door to video-rate fusion in applications like surveillance or medical imaging pipelines.

Load-bearing premise

The unique-common decomposition prior of coupled dictionary learning can be mapped directly into a joint unfolding network without losing representational power or creating new optimization problems.

What would settle it

A side-by-side test on the TNO or RoadScene datasets in which CDNet requires equal or greater computation and memory than a comparable separate-update unfolding network while failing to match the reported PSNR gains of 1.23 dB or 1.59 dB.

Figures

Figures reproduced from arXiv: 2605.00461 by Ge Luo, Jun-Jie Huang, Ke Liang, Meng Wang, Qi Yu, Tianrui Liu, Wentao Zhao, Xinwang Liu, Yuming Xiang.

**Figure 1.** Figure 1: Comparison of leading image fusion methods on TNO and view at source ↗

**Figure 2.** Figure 2: Workflow of CDNet. The Y-channel inputs are first expanded and concatenated as view at source ↗

**Figure 3.** Figure 3: Detailed construction of the adaptive references in HLIF. view at source ↗

**Figure 4.** Figure 4: Visual comparison for “Zentrum” in MEFB dataset and “SICE-Dataset view at source ↗

**Figure 5.** Figure 5: Visual comparison for “soldier behind smoke 1” in TNO dataset and “FLIR 01415” in RoadScene dataset. TABLE VI: Quantitative results on the MIF task. The best and second-best values are marked in bold and underline, respectively. PET-MRI Medical Image Fusion Dataset [64] SPECT-MRI Medical Image Fusion Dataset [64] MSE↓ PSNR↑ SSIM↑ CC↑ Nabf↓ HyperIQA↑ MSE↓ PSNR↑ SSIM↑ CC↑ Nabf↓ HyperIQA↑ LRRNet [17] 0.05 61.… view at source ↗

**Figure 6.** Figure 6: Visual comparison for “25026” in PET-MRI dataset and “3025” in SPECT-MRI dataset. view at source ↗

read the original abstract

Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CDNet turns the unique-common dictionary prior into a joint block-sparse unfolding step and reports solid metric gains on standard fusion benchmarks, but the paper still needs to show the joint update actually preserves the prior without added instabilities.

read the letter

The core idea is translating the unique-common decomposition from coupled dictionary learning into a single CDBlock that jointly updates common and modality-specific features instead of running separate alternating-minimization steps. They pair this with a compact high-low frequency fidelity loss for unsupervised training. That combination is the actual novelty here, and it targets the practical problem of high compute and memory use in unfolding networks for edge devices.

Referee Report

3 major / 2 minor

Summary. The paper proposes CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. It translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture (CDBlock) that performs block-sparse joint updates of common and modality-specific features, avoiding the separate updates of alternating minimization. A compact High- and Low-frequency Image Fidelity loss enables unsupervised training. Experiments on four tasks (multi-exposure, IR-visible, medical fusion, and segmentation) report competitive or superior performance, including PSNR gains of 1.23 dB and 1.59 dB over the second-best method on TNO and RoadScene datasets for IR-visible fusion, with emphasis on efficiency for edge deployment.

Significance. If the joint unfolding faithfully realizes the coupled dictionary prior without representational loss or new instabilities, the work offers a principled route to more efficient model-driven deep fusion networks. The reported metric improvements and unsupervised loss design would support practical advantages for resource-constrained multi-source fusion, provided the efficiency and performance claims are substantiated by ablations and equivalence analysis.

major comments (3)

[§3.2] §3.2 (CDBlock architecture): The claim that the block-sparse interaction topology performs a model-derived joint update equivalent to the unique-common decomposition prior lacks a derivation showing preservation of the prior's decomposition power or equivalence to alternating minimization; without this, the reported PSNR gains on TNO/RoadScene could stem from the fidelity loss or network capacity rather than the prior translation.
[§4] §4 (Experiments): No ablation studies or stability analysis are provided to test whether the coupled gradients in the joint update introduce optimization instabilities or reduced expressivity compared to separate modality updates; this is load-bearing for the efficiency and performance claims.
[§3.3] §3.3 (High- and Low-frequency fidelity loss): The unsupervised loss is presented as compact, but no analysis shows how its gradient-adaptive terms interact with the CDBlock updates or whether they compensate for any loss in the joint unfolding approximation.

minor comments (2)

The abstract and introduction would benefit from explicit equation references when stating the joint update rule.
Figure captions for network diagrams should clarify the block-sparse topology with labels matching the text description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment by providing additional theoretical derivations, ablation studies, and interaction analyses in the revised version. These revisions strengthen the substantiation of our claims regarding the prior translation, efficiency, and unsupervised training.

read point-by-point responses

Referee: [§3.2] §3.2 (CDBlock architecture): The claim that the block-sparse interaction topology performs a model-derived joint update equivalent to the unique-common decomposition prior lacks a derivation showing preservation of the prior's decomposition power or equivalence to alternating minimization; without this, the reported PSNR gains on TNO/RoadScene could stem from the fidelity loss or network capacity rather than the prior translation.

Authors: We agree that an explicit derivation was not provided in the original submission. In the revised manuscript, we have added a detailed derivation in §3.2. This shows that the block-sparse joint update in CDBlock preserves the unique-common decomposition by enforcing modality-shared and modality-specific feature separation through the interaction topology, which is mathematically equivalent to the alternating minimization steps of coupled dictionary learning. We further include controls in the experiments isolating the prior's contribution from the fidelity loss and network capacity, confirming that the PSNR gains are attributable to the translated prior. revision: yes
Referee: [§4] §4 (Experiments): No ablation studies or stability analysis are provided to test whether the coupled gradients in the joint update introduce optimization instabilities or reduced expressivity compared to separate modality updates; this is load-bearing for the efficiency and performance claims.

Authors: We acknowledge that the original manuscript lacked these ablations. The revised §4 now includes new ablation studies comparing joint block-sparse updates against separate modality updates. These examine optimization stability via convergence curves, gradient norm statistics, and variance analysis, as well as expressivity through feature reconstruction quality and downstream segmentation performance. Results show no introduced instabilities from coupled gradients, with maintained or improved expressivity and the expected computational savings, directly supporting the efficiency and performance claims. revision: yes
Referee: [§3.3] §3.3 (High- and Low-frequency fidelity loss): The unsupervised loss is presented as compact, but no analysis shows how its gradient-adaptive terms interact with the CDBlock updates or whether they compensate for any loss in the joint unfolding approximation.

Authors: We have expanded §3.3 in the revision to include both theoretical and empirical analysis of the interaction. Gradient propagation analysis demonstrates that the adaptive high- and low-frequency terms dynamically balance the fidelity signals to offset any approximation effects from the joint unfolding. Empirical ablations varying the adaptive weights confirm compensation for unfolding losses, resulting in stable training and faithful multi-source reconstruction without added complexity. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural translation of prior is a design choice, not a self-referential derivation

full rationale

The paper's central step is a structural translation of the existing unique-common decomposition prior from coupled dictionary learning into a joint unfolding block (CDBlock) with block-sparse topology. This is presented as an engineering decision to reduce separate modality updates, not as a mathematical derivation whose outputs are forced by its own inputs. Performance claims (e.g., PSNR gains on TNO/RoadScene) are empirical results from unsupervised training with a high/low-frequency fidelity loss, not predictions obtained by fitting parameters to the target metrics or by self-citation chains. No equations reduce the claimed equivalence or efficiency to a tautology, and no load-bearing uniqueness theorem or ansatz is imported from the authors' prior work. The derivation chain remains self-contained as a novel network topology whose validity is tested externally on standard fusion benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the unique-common decomposition prior from coupled dictionary learning can be faithfully encoded as a block-sparse joint update rule; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The unique-common decomposition prior of coupled dictionary learning is a valid and transferable model for multi-source image fusion.
Invoked in the abstract as the basis for translating the prior into the CDBlock architecture.

pith-pipeline@v0.9.0 · 5601 in / 1309 out tokens · 31200 ms · 2026-05-09T18:54:28.367810+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 1 canonical work pages

[1]

Stathaki,Image Fusion: Algorithms and Applications

T. Stathaki,Image Fusion: Algorithms and Applications. Elsevier, 2011

2011
[2]

Fully-Connected Transformer for Multi-Source Image Fusion,

X. Wu, Z.-H. Cao, T.-Z. Huang, L.-J. Deng, J. Chanussot, and G. Vivone, “Fully-Connected Transformer for Multi-Source Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 3, pp. 2071–2088, 2025

2071
[3]

Efficient intensity- hue-saturation-based image fusion with saturation compensation,

T.-M. Tu, S.-C. Su, H.-C. Shyu, and P. S. Huang, “Efficient intensity- hue-saturation-based image fusion with saturation compensation,”Op- tical Engineering, vol. 40, no. 5, pp. 720–728, 2001

2001
[4]

A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter,

M. Choi, “A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter,”IEEE Transactions on Geoscience and Remote sensing, vol. 44, no. 6, pp. 1672–1682, 2006

2006
[5]

A Multiresolution Spline With Application to Image Mosaics,

P. J. Burt and E. H. Adelson, “A Multiresolution Spline With Application to Image Mosaics,”ACM Transactions on Graphics (ToG), vol. 2, no. 4, pp. 217–236, 1983

1983
[6]

Union Laplacian pyramid with multiple features for medical image fusion,

J. Du, W. Li, B. Xiao, and Q. Nawaz, “Union Laplacian pyramid with multiple features for medical image fusion,”Neurocomputing, vol. 194, pp. 326–339, 2016

2016
[7]

Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation,

Z. Wang, Z. Cui, and Y . Zhu, “Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation,”Computers in Biology and Medicine, vol. 123, p. 103823, 2020

2020
[8]

Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion,

J. Yao, Y . Zhao, Y . Bu, S. G. Kong, and J. C.-W. Chan, “Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 4630–4644, 2023

2023
[9]

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,

J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 8115–8124

2023
[10]

Multi-focus image fusion with a deep convolutional neural network,

Y . Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural network,”Information Fusion, vol. 36, pp. 191–207, 2017

2017
[11]

DenseFuse: A Fusion Approach to Infrared and Visible Images,

H. Li and X.-J. Wu, “DenseFuse: A Fusion Approach to Infrared and Visible Images,”IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018

2018
[12]

Deep learning for pixel-level image fusion: Recent advances and future prospects,

Y . Liu, X. Chen, Z. Wang, Z. J. Wang, R. K. Ward, and X. Wang, “Deep learning for pixel-level image fusion: Recent advances and future prospects,”Information Fusion, vol. 42, pp. 158–173, 2018

2018
[13]

An Enhanced Intelligent Diagnosis Method Based on Multi-Sensor Image Fusion via Improved Deep Learning Network,

H. Wang, S. Li, L. Song, L. Cui, and P. Wang, “An Enhanced Intelligent Diagnosis Method Based on Multi-Sensor Image Fusion via Improved Deep Learning Network,”IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 6, pp. 2648–2657, 2019

2019
[14]

Multi-focus image fusion: A Survey of the state of the art,

Y . Liu, L. Wang, J. Cheng, C. Li, and X. Chen, “Multi-focus image fusion: A Survey of the state of the art,”Information Fusion, vol. 64, pp. 71–91, 2020

2020
[15]

Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion,

X. Deng and P. L. Dragotti, “Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3333– 3348, 2020

2020
[16]

DeepM 2CDL: Deep Multi- Scale Multi-Modal Convolutional Dictionary Learning Network,

X. Deng, J. Xu, F. Gao, X. Sun, and M. Xu, “DeepM 2CDL: Deep Multi- Scale Multi-Modal Convolutional Dictionary Learning Network,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 2770–2787, 2024

2024
[17]

LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Vis- ible Images,

H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Vis- ible Images,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11 040–11 052, 2023

2023
[18]

Deep Unfolding Network Enhanced by Transformer Priors for Unregistered Hyperspectral and Multispectral Image Fusion,

J. Fang, J. Yang, A. Khader, and L. Xiao, “Deep Unfolding Network Enhanced by Transformer Priors for Unregistered Hyperspectral and Multispectral Image Fusion,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024

2024
[19]

Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis,

H. Bai, Z. Zhao, J. Zhang, B. Jiang, L. Deng, Y . Cui, S. Xu, and C. Zhang, “Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 4, pp. 3498–3511, 2024

2024
[20]

Degradation- Resistant Unfolding Network for Heterogeneous Image Fusion,

C. He, K. Li, G. Xu, Y . Zhang, R. Hu, Z. Guo, and X. Li, “Degradation- Resistant Unfolding Network for Heterogeneous Image Fusion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2023, pp. 12 611–12 621

2023
[21]

Multi-Modal Convo- lutional Dictionary Learning,

F. Gao, X. Deng, M. Xu, J. Xu, and P. L. Dragotti, “Multi-Modal Convo- lutional Dictionary Learning,”IEEE Transactions on Image Processing, vol. 31, pp. 1325–1339, 2022

2022
[22]

YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer,

W. Tang, F. He, and Y . Liu, “YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer,”IEEE Transactions on Multimedia, vol. 25, pp. 5413–5428, 2022

2022
[23]

One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion,

C. Cheng, T. Xu, Z. Feng, X. Wu, Z. Tang, H. Li, Z. Zhang, S. Atito, M. Awais, and J. Kittler, “One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 28 102–28 112

2025
[24]

Efficient Multi-Exposure Image Fusion via Filter-Dominated Fusion and Gradient-Driven Unsu- pervised Learning,

K. Zheng, J. Huang, H. Yu, and F. Zhao, “Efficient Multi-Exposure Image Fusion via Filter-Dominated Fusion and Gradient-Driven Unsu- pervised Learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 2805–2814

2023
[25]

Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion,

L. Tang, C. Li, and J. Ma, “Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 1, pp. 591–608, 2026

2026
[26]

Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,

L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,”Information Fusion, vol. 82, pp. 28–42, 2022

2022
[27]

Ensemble of cnn for multi-focus image fusion,

M. Amin-Naji, A. Aghagolzadeh, and M. Ezoji, “Ensemble of cnn for multi-focus image fusion,”Information Fusion, vol. 51, pp. 201–214, 2019

2019
[28]

IFCNN: A general image fusion framework based on convolutional neural network,

Y . Zhang, Y . Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “IFCNN: A general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020

2020
[29]

U2Fusion: A Unified Unsupervised Image Fusion Network,

H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A Unified Unsupervised Image Fusion Network,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020

2020
[30]

Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity,

H. Zhang, H. Xu, Y . Xiao, X. Guo, and J. Ma, “Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 797– 12 804

2020
[31]

Image fusion meets deep learning: A survey and perspective,

H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, “Image fusion meets deep learning: A survey and perspective,”Information Fusion, vol. 76, pp. 323–336, 2021

2021
[32]

Densely Connected Convolutional Networks,

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” inProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), July 2017

2017
[33]

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer,

J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y . Ma, “SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer,”IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022

2022
[34]

Multi- resolution infrared-visible image fusion using multi-scale residual quan- tization,

H. Wu, J.-J. Huang, H. Tan, W. Huang, Y . Tang, and X. Li, “Multi- resolution infrared-visible image fusion using multi-scale residual quan- tization,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6. SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13

2025
[35]

FusionDiff: A unified im- age fusion network based on diffusion probabilistic models,

Z. Huang, S. Yang, J. Wu, L. Zhu, and J. Liu, “FusionDiff: A unified im- age fusion network based on diffusion probabilistic models,”Computer Vision and Image Understanding, vol. 244, p. 104011, 2024

2024
[36]

DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffu- sion Prior,

L. Tang, Y . Deng, X. Yi, Q. Yan, Y . Yuan, and J. Ma, “DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffu- sion Prior,” inProceedings of the 32nd ACM International Conference on Multimedia. ACM, 2024, pp. 8546–8555

2024
[37]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,

I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”Com- munications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, vol. 57, no. 11, pp. 1413– 1457, 2004

2004
[38]

Learning fast approximations of sparse cod- ing,

K. Gregor and Y . LeCun, “Learning fast approximations of sparse cod- ing,” inProceedings of the 27th International Conference on Machine Learning. Madison, WI, USA: Omnipress, 2010, pp. 399–406

2010
[39]

ISTA-Net: Interpretable Optimization- Inspired Deep Network for Image Compressive Sensing,

J. Zhang and B. Ghanem, “ISTA-Net: Interpretable Optimization- Inspired Deep Network for Image Compressive Sensing,” June 2018

2018
[40]

MHF-Net: An Interpretable Deep Network for Multispectral and Hyperspectral Image Fusion,

Q. Xie, M. Zhou, Q. Zhao, Z. Xu, and D. Meng, “MHF-Net: An Interpretable Deep Network for Multispectral and Hyperspectral Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 44, no. 3, pp. 1457–1473, 2020

2020
[41]

Mixed X-Ray Image Separation for Artworks With Concealed Designs,

W. Pu, J.-J. Huang, B. Sober, N. Daly, C. Higgitt, I. Daubechies, P. L. Dragotti, and M. R. Rodrigues, “Mixed X-Ray Image Separation for Artworks With Concealed Designs,”IEEE Transactions on Image Processing, vol. 31, pp. 4458–4473, 2022

2022
[42]

Designing CNNs for Multimodal Image Restoration and Fusion via Unfolding the Method of Multipliers,

I. Marivani, E. Tsiligianni, B. Cornelis, and N. Deligiannis, “Designing CNNs for Multimodal Image Restoration and Fusion via Unfolding the Method of Multipliers,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 5830–5845, 2022

2022
[43]

A lightweight deep exclusion unfolding network for single image reflection removal,

J.-J. Huang, T. Liu, Z. Chen, X. Liu, M. Wang, and P. L. Dragotti, “A lightweight deep exclusion unfolding network for single image reflection removal,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4957–4973, 2025

2025
[44]

Dfdun: Deep infrared and visible image fusion with diffusion prior unfolding network,

M. Xiong, J.-J. Huang, Z. Chen, T. Liu, X. Li, L. Liu, W. Zhao, and Y . Tang, “Dfdun: Deep infrared and visible image fusion with diffusion prior unfolding network,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6

2025
[45]

Deep Convolutional Sparse Coding Networks for In- terpretable Image Fusion,

Z. Zhao, J. Zhang, H. Bai, Y . Wang, Y . Cui, L. Deng, K. Sun, C. Zhang, J. Liu, and S. Xu, “Deep Convolutional Sparse Coding Networks for In- terpretable Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2369–2377

2023
[46]

Unfolding coupled convolutional sparse representation for multi-focus image fusion,

K. Zheng, J. Cheng, and Y . Liu, “Unfolding coupled convolutional sparse representation for multi-focus image fusion,”Information Fusion, vol. 118, p. 102974, 2025

2025
[47]

ℓ 0-Regularized Sparse Coding-Based Interpretable Network for Multi-Modal Image Fu- sion,

G. Panda, S. Kundu, S. Bhattacharya, and A. Routray, “ℓ 0-Regularized Sparse Coding-Based Interpretable Network for Multi-Modal Image Fu- sion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 4, pp. 4081–4097, 2026

2026
[48]

Sparse representation with learned multiscale dictionary for image fusion,

H. Yin, “Sparse representation with learned multiscale dictionary for image fusion,”Neurocomputing, vol. 148, pp. 600–610, 2015

2015
[49]

PFAF-Net: Pyramid Feature Network for Multimodal Fusion,

A. Raza, H. Huo, and T. Fang, “PFAF-Net: Pyramid Feature Network for Multimodal Fusion,”IEEE Sensors Letters, vol. 4, no. 12, pp. 1–4, 2020

2020
[50]

SEDR- Fuse: A Symmetric Encoder–Decoder With Residual Block Network for Infrared and Visible Image Fusion,

L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, “SEDR- Fuse: A Symmetric Encoder–Decoder With Residual Block Network for Infrared and Visible Image Fusion,”IEEE Transactions on Instrumen- tation and Measurement, vol. 70, pp. 1–15, 2020

2020
[51]

Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion,

J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105–119, 2021

2021
[52]

Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models,

Z. Wang, Y . Wu, J. Wang, J. Xu, and W. Shao, “Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models,”IEEE Transactions on Instrumentation and Measure- ment, vol. 71, pp. 1–12, 2022

2022
[53]

A Task-Guided, Implicitly- Searched and Meta-Initialized Deep Model for Image Fusion,

R. Liu, Z. Liu, J. Liu, X. Fan, and Z. Luo, “A Task-Guided, Implicitly- Searched and Meta-Initialized Deep Model for Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 10, pp. 6594–6609, 2024

2024
[54]

Equivariant Multi-Modality Image Fusion,

Z. Zhao, H. Bai, J. Zhang, Y . Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, and L. Van Gool, “Equivariant Multi-Modality Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2024, pp. 25 912–25 921

2024
[55]

MUFusion: A general unsupervised image fusion network based on memory unit,

C. Cheng, T. Xu, and X.-J. Wu, “MUFusion: A general unsupervised image fusion network based on memory unit,”Information Fusion, vol. 92, pp. 80–92, 2023

2023
[56]

Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion,

B. Cao, Y . Sun, P. Zhu, and Q. Hu, “Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 23 555–23 564

2023
[57]

Coupled Dictionary Training for Image Super-Resolution,

J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled Dictionary Training for Image Super-Resolution,”IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3467–3478, 2012

2012
[58]

An Online Coupled Dictionary Learning Approach for Remote Sensing Image Fusion,

M. Guo, H. Zhang, J. Li, L. Zhang, and H. Shen, “An Online Coupled Dictionary Learning Approach for Remote Sensing Image Fusion,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 4, pp. 1284–1294, 2014

2014
[59]

Deep Coupled ISTA Network for Multi- Modal Image Super-Resolution,

X. Deng and P. L. Dragotti, “Deep Coupled ISTA Network for Multi- Modal Image Super-Resolution,”IEEE Transactions on Image Process- ing, vol. 29, pp. 1683–1698, 2019

2019
[60]

Coupled Feature Learning for Multimodal Medical Image Fusion,

F. G. Veshki, N. Ouzir, S. A. V orobyov, and E. Ollila, “Coupled Feature Learning for Multimodal Medical Image Fusion,”arXiv preprint arXiv:2102.08641, 2021

work page arXiv 2021
[61]

Benchmarking and comparing multi-exposure image fusion algorithms,

X. Zhang, “Benchmarking and comparing multi-exposure image fusion algorithms,”Information Fusion, vol. 74, pp. 111–131, 2021

2021
[62]

Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images,

J. Cai, S. Gu, and L. Zhang, “Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images,”IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018

2049
[63]

The TNO Multiband Image Data Collection,

A. Toet, “The TNO Multiband Image Data Collection,”Data in Brief, vol. 15, pp. 249–251, 2017

2017
[64]

Harvard University: Athens Digital Library of Human Anatomy,

K. A. Johnson and J. A. Becker, “Harvard University: Athens Digital Library of Human Anatomy,” http://www.med.harvard.edu/aanlib/, 2025

2025
[65]

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,

J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,” inProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2023, pp. 8115–8124

2023
[66]

HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion,

J. Liu, G. Wu, J. Luan, Z. Jiang, R. Liu, and X. Fan, “HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion,” Information Fusion, vol. 95, pp. 237–249, 2023

2023
[67]

IID-MEF: A multi-exposure fusion network based on intrinsic image decomposition,

H. Zhang and J. Ma, “IID-MEF: A multi-exposure fusion network based on intrinsic image decomposition,”Information Fusion, vol. 95, pp. 326– 340, 2023

2023
[68]

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion,

Z. Zhao, H. Bai, J. Zhang, Y . Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 5906–5916

2023
[69]

CoCoNet: Cou- pled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion,

J. Liu, R. Lin, G. Wu, R. Liu, Z. Luo, and X. Fan, “CoCoNet: Cou- pled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion,”International Journal of Computer Vision, vol. 132, no. 5, pp. 1748–1775, 2024

2024
[70]

BSAFusion: A Bidirectional Step- wise Feature Alignment Network for Unaligned Medical Image Fusion,

H. Li, D. Su, Q. Cai, and Y . Zhang, “BSAFusion: A Bidirectional Step- wise Feature Alignment Network for Unaligned Medical Image Fusion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 5, 2025, pp. 4725–4733

2025
[71]

A Review of Quality Metrics for Fused Image,

P. Jagalingam and A. V . Hegde, “A Review of Quality Metrics for Fused Image,”Aquatic Procedia, vol. 4, pp. 133–142, 2015

2015
[72]

Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform,

B. Shreyamsha Kumar, “Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform,”Signal, Image and Video Processing, vol. 7, no. 6, pp. 1125– 1143, 2013

2013
[73]

Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network,

S. Su, Q. Yan, Y . Zhu, C. Zhang, X. Ge, J. Sun, and Y . Zhang, “Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020
[74]

New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model,

J. Wang, H. Qu, Z. Zhang, and M. Xie, “New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model,”Information Fusion, vol. 105, p. 102230, 2024

2024
[75]

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,”Advances in Neural Information Processing Sys- tems, vol. 34, pp. 12 077–12 090, 2021

2021
[76]

PIAFusion: A pro- gressive infrared and visible image fusion network based on illumination aware,

L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “PIAFusion: A pro- gressive infrared and visible image fusion network based on illumination aware,”Information Fusion, vol. 83, pp. 79–92, 2022

2022
[77]

LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision,

X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3496–3504

2021