Recognition: unknown
Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion
Pith reviewed 2026-05-09 18:54 UTC · model grok-4.3
The pith
CDNet translates the unique-common decomposition prior of coupled dictionary learning into a joint unfolding network for efficient multi-source image fusion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.
What carries the argument
The CDBlock, a block-sparse interaction structure derived from coupled dictionary learning that jointly updates common and modality-specific representations in a single unfolding step.
If this is right
- CDNet matches or beats competing fusion methods on four of six metrics for TNO infrared-visible data and five of six for RoadScene data.
- The network surpasses the second-best method by 1.23 dB PSNR on TNO and 1.59 dB on RoadScene.
- A single high- and low-frequency fidelity loss enables training on multiple fusion tasks without ground-truth images.
- The lightweight joint-update design supports deployment on resource-limited edge devices for real-time multi-source fusion.
Where Pith is reading between the lines
- The joint update may reduce loss of complementary details between modalities, lowering visible artifacts in fused outputs.
- The same block-sparse unfolding pattern could apply to other multi-modal inverse problems such as joint denoising or super-resolution.
- Efficiency improvements open the door to video-rate fusion in applications like surveillance or medical imaging pipelines.
Load-bearing premise
The unique-common decomposition prior of coupled dictionary learning can be mapped directly into a joint unfolding network without losing representational power or creating new optimization problems.
What would settle it
A side-by-side test on the TNO or RoadScene datasets in which CDNet requires equal or greater computation and memory than a comparable separate-update unfolding network while failing to match the reported PSNR gains of 1.23 dB or 1.59 dB.
Figures
read the original abstract
Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. It translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture (CDBlock) that performs block-sparse joint updates of common and modality-specific features, avoiding the separate updates of alternating minimization. A compact High- and Low-frequency Image Fidelity loss enables unsupervised training. Experiments on four tasks (multi-exposure, IR-visible, medical fusion, and segmentation) report competitive or superior performance, including PSNR gains of 1.23 dB and 1.59 dB over the second-best method on TNO and RoadScene datasets for IR-visible fusion, with emphasis on efficiency for edge deployment.
Significance. If the joint unfolding faithfully realizes the coupled dictionary prior without representational loss or new instabilities, the work offers a principled route to more efficient model-driven deep fusion networks. The reported metric improvements and unsupervised loss design would support practical advantages for resource-constrained multi-source fusion, provided the efficiency and performance claims are substantiated by ablations and equivalence analysis.
major comments (3)
- [§3.2] §3.2 (CDBlock architecture): The claim that the block-sparse interaction topology performs a model-derived joint update equivalent to the unique-common decomposition prior lacks a derivation showing preservation of the prior's decomposition power or equivalence to alternating minimization; without this, the reported PSNR gains on TNO/RoadScene could stem from the fidelity loss or network capacity rather than the prior translation.
- [§4] §4 (Experiments): No ablation studies or stability analysis are provided to test whether the coupled gradients in the joint update introduce optimization instabilities or reduced expressivity compared to separate modality updates; this is load-bearing for the efficiency and performance claims.
- [§3.3] §3.3 (High- and Low-frequency fidelity loss): The unsupervised loss is presented as compact, but no analysis shows how its gradient-adaptive terms interact with the CDBlock updates or whether they compensate for any loss in the joint unfolding approximation.
minor comments (2)
- The abstract and introduction would benefit from explicit equation references when stating the joint update rule.
- Figure captions for network diagrams should clarify the block-sparse topology with labels matching the text description.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment by providing additional theoretical derivations, ablation studies, and interaction analyses in the revised version. These revisions strengthen the substantiation of our claims regarding the prior translation, efficiency, and unsupervised training.
read point-by-point responses
-
Referee: [§3.2] §3.2 (CDBlock architecture): The claim that the block-sparse interaction topology performs a model-derived joint update equivalent to the unique-common decomposition prior lacks a derivation showing preservation of the prior's decomposition power or equivalence to alternating minimization; without this, the reported PSNR gains on TNO/RoadScene could stem from the fidelity loss or network capacity rather than the prior translation.
Authors: We agree that an explicit derivation was not provided in the original submission. In the revised manuscript, we have added a detailed derivation in §3.2. This shows that the block-sparse joint update in CDBlock preserves the unique-common decomposition by enforcing modality-shared and modality-specific feature separation through the interaction topology, which is mathematically equivalent to the alternating minimization steps of coupled dictionary learning. We further include controls in the experiments isolating the prior's contribution from the fidelity loss and network capacity, confirming that the PSNR gains are attributable to the translated prior. revision: yes
-
Referee: [§4] §4 (Experiments): No ablation studies or stability analysis are provided to test whether the coupled gradients in the joint update introduce optimization instabilities or reduced expressivity compared to separate modality updates; this is load-bearing for the efficiency and performance claims.
Authors: We acknowledge that the original manuscript lacked these ablations. The revised §4 now includes new ablation studies comparing joint block-sparse updates against separate modality updates. These examine optimization stability via convergence curves, gradient norm statistics, and variance analysis, as well as expressivity through feature reconstruction quality and downstream segmentation performance. Results show no introduced instabilities from coupled gradients, with maintained or improved expressivity and the expected computational savings, directly supporting the efficiency and performance claims. revision: yes
-
Referee: [§3.3] §3.3 (High- and Low-frequency fidelity loss): The unsupervised loss is presented as compact, but no analysis shows how its gradient-adaptive terms interact with the CDBlock updates or whether they compensate for any loss in the joint unfolding approximation.
Authors: We have expanded §3.3 in the revision to include both theoretical and empirical analysis of the interaction. Gradient propagation analysis demonstrates that the adaptive high- and low-frequency terms dynamically balance the fidelity signals to offset any approximation effects from the joint unfolding. Empirical ablations varying the adaptive weights confirm compensation for unfolding losses, resulting in stable training and faithful multi-source reconstruction without added complexity. revision: yes
Circularity Check
No circularity: architectural translation of prior is a design choice, not a self-referential derivation
full rationale
The paper's central step is a structural translation of the existing unique-common decomposition prior from coupled dictionary learning into a joint unfolding block (CDBlock) with block-sparse topology. This is presented as an engineering decision to reduce separate modality updates, not as a mathematical derivation whose outputs are forced by its own inputs. Performance claims (e.g., PSNR gains on TNO/RoadScene) are empirical results from unsupervised training with a high/low-frequency fidelity loss, not predictions obtained by fitting parameters to the target metrics or by self-citation chains. No equations reduce the claimed equivalence or efficiency to a tautology, and no load-bearing uniqueness theorem or ansatz is imported from the authors' prior work. The derivation chain remains self-contained as a novel network topology whose validity is tested externally on standard fusion benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The unique-common decomposition prior of coupled dictionary learning is a valid and transferable model for multi-source image fusion.
Reference graph
Works this paper leans on
-
[1]
Stathaki,Image Fusion: Algorithms and Applications
T. Stathaki,Image Fusion: Algorithms and Applications. Elsevier, 2011
2011
-
[2]
Fully-Connected Transformer for Multi-Source Image Fusion,
X. Wu, Z.-H. Cao, T.-Z. Huang, L.-J. Deng, J. Chanussot, and G. Vivone, “Fully-Connected Transformer for Multi-Source Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 3, pp. 2071–2088, 2025
2071
-
[3]
Efficient intensity- hue-saturation-based image fusion with saturation compensation,
T.-M. Tu, S.-C. Su, H.-C. Shyu, and P. S. Huang, “Efficient intensity- hue-saturation-based image fusion with saturation compensation,”Op- tical Engineering, vol. 40, no. 5, pp. 720–728, 2001
2001
-
[4]
A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter,
M. Choi, “A new intensity-hue-saturation fusion approach to image fusion with a tradeoff parameter,”IEEE Transactions on Geoscience and Remote sensing, vol. 44, no. 6, pp. 1672–1682, 2006
2006
-
[5]
A Multiresolution Spline With Application to Image Mosaics,
P. J. Burt and E. H. Adelson, “A Multiresolution Spline With Application to Image Mosaics,”ACM Transactions on Graphics (ToG), vol. 2, no. 4, pp. 217–236, 1983
1983
-
[6]
Union Laplacian pyramid with multiple features for medical image fusion,
J. Du, W. Li, B. Xiao, and Q. Nawaz, “Union Laplacian pyramid with multiple features for medical image fusion,”Neurocomputing, vol. 194, pp. 326–339, 2016
2016
-
[7]
Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation,
Z. Wang, Z. Cui, and Y . Zhu, “Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation,”Computers in Biology and Medicine, vol. 123, p. 103823, 2020
2020
-
[8]
Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion,
J. Yao, Y . Zhao, Y . Bu, S. G. Kong, and J. C.-W. Chan, “Laplacian Pyramid Fusion Network With Hierarchical Guidance for Infrared and Visible Image Fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 4630–4644, 2023
2023
-
[9]
Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,
J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 8115–8124
2023
-
[10]
Multi-focus image fusion with a deep convolutional neural network,
Y . Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural network,”Information Fusion, vol. 36, pp. 191–207, 2017
2017
-
[11]
DenseFuse: A Fusion Approach to Infrared and Visible Images,
H. Li and X.-J. Wu, “DenseFuse: A Fusion Approach to Infrared and Visible Images,”IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018
2018
-
[12]
Deep learning for pixel-level image fusion: Recent advances and future prospects,
Y . Liu, X. Chen, Z. Wang, Z. J. Wang, R. K. Ward, and X. Wang, “Deep learning for pixel-level image fusion: Recent advances and future prospects,”Information Fusion, vol. 42, pp. 158–173, 2018
2018
-
[13]
An Enhanced Intelligent Diagnosis Method Based on Multi-Sensor Image Fusion via Improved Deep Learning Network,
H. Wang, S. Li, L. Song, L. Cui, and P. Wang, “An Enhanced Intelligent Diagnosis Method Based on Multi-Sensor Image Fusion via Improved Deep Learning Network,”IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 6, pp. 2648–2657, 2019
2019
-
[14]
Multi-focus image fusion: A Survey of the state of the art,
Y . Liu, L. Wang, J. Cheng, C. Li, and X. Chen, “Multi-focus image fusion: A Survey of the state of the art,”Information Fusion, vol. 64, pp. 71–91, 2020
2020
-
[15]
Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion,
X. Deng and P. L. Dragotti, “Deep Convolutional Neural Network for Multi-Modal Image Restoration and Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3333– 3348, 2020
2020
-
[16]
DeepM 2CDL: Deep Multi- Scale Multi-Modal Convolutional Dictionary Learning Network,
X. Deng, J. Xu, F. Gao, X. Sun, and M. Xu, “DeepM 2CDL: Deep Multi- Scale Multi-Modal Convolutional Dictionary Learning Network,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 2770–2787, 2024
2024
-
[17]
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Vis- ible Images,
H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Vis- ible Images,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11 040–11 052, 2023
2023
-
[18]
Deep Unfolding Network Enhanced by Transformer Priors for Unregistered Hyperspectral and Multispectral Image Fusion,
J. Fang, J. Yang, A. Khader, and L. Xiao, “Deep Unfolding Network Enhanced by Transformer Priors for Unregistered Hyperspectral and Multispectral Image Fusion,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024
2024
-
[19]
Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis,
H. Bai, Z. Zhao, J. Zhang, B. Jiang, L. Deng, Y . Cui, S. Xu, and C. Zhang, “Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 4, pp. 3498–3511, 2024
2024
-
[20]
Degradation- Resistant Unfolding Network for Heterogeneous Image Fusion,
C. He, K. Li, G. Xu, Y . Zhang, R. Hu, Z. Guo, and X. Li, “Degradation- Resistant Unfolding Network for Heterogeneous Image Fusion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2023, pp. 12 611–12 621
2023
-
[21]
Multi-Modal Convo- lutional Dictionary Learning,
F. Gao, X. Deng, M. Xu, J. Xu, and P. L. Dragotti, “Multi-Modal Convo- lutional Dictionary Learning,”IEEE Transactions on Image Processing, vol. 31, pp. 1325–1339, 2022
2022
-
[22]
YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer,
W. Tang, F. He, and Y . Liu, “YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer,”IEEE Transactions on Multimedia, vol. 25, pp. 5413–5428, 2022
2022
-
[23]
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion,
C. Cheng, T. Xu, Z. Feng, X. Wu, Z. Tang, H. Li, Z. Zhang, S. Atito, M. Awais, and J. Kittler, “One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 28 102–28 112
2025
-
[24]
Efficient Multi-Exposure Image Fusion via Filter-Dominated Fusion and Gradient-Driven Unsu- pervised Learning,
K. Zheng, J. Huang, H. Yu, and F. Zhao, “Efficient Multi-Exposure Image Fusion via Filter-Dominated Fusion and Gradient-Driven Unsu- pervised Learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 2805–2814
2023
-
[25]
Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion,
L. Tang, C. Li, and J. Ma, “Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 1, pp. 591–608, 2026
2026
-
[26]
Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,
L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,”Information Fusion, vol. 82, pp. 28–42, 2022
2022
-
[27]
Ensemble of cnn for multi-focus image fusion,
M. Amin-Naji, A. Aghagolzadeh, and M. Ezoji, “Ensemble of cnn for multi-focus image fusion,”Information Fusion, vol. 51, pp. 201–214, 2019
2019
-
[28]
IFCNN: A general image fusion framework based on convolutional neural network,
Y . Zhang, Y . Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “IFCNN: A general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020
2020
-
[29]
U2Fusion: A Unified Unsupervised Image Fusion Network,
H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A Unified Unsupervised Image Fusion Network,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020
2020
-
[30]
Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity,
H. Zhang, H. Xu, Y . Xiao, X. Guo, and J. Ma, “Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 797– 12 804
2020
-
[31]
Image fusion meets deep learning: A survey and perspective,
H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, “Image fusion meets deep learning: A survey and perspective,”Information Fusion, vol. 76, pp. 323–336, 2021
2021
-
[32]
Densely Connected Convolutional Networks,
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” inProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), July 2017
2017
-
[33]
SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer,
J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y . Ma, “SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer,”IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022
2022
-
[34]
Multi- resolution infrared-visible image fusion using multi-scale residual quan- tization,
H. Wu, J.-J. Huang, H. Tan, W. Huang, Y . Tang, and X. Li, “Multi- resolution infrared-visible image fusion using multi-scale residual quan- tization,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6. SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13
2025
-
[35]
FusionDiff: A unified im- age fusion network based on diffusion probabilistic models,
Z. Huang, S. Yang, J. Wu, L. Zhu, and J. Liu, “FusionDiff: A unified im- age fusion network based on diffusion probabilistic models,”Computer Vision and Image Understanding, vol. 244, p. 104011, 2024
2024
-
[36]
DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffu- sion Prior,
L. Tang, Y . Deng, X. Yi, Q. Yan, Y . Yuan, and J. Ma, “DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffu- sion Prior,” inProceedings of the 32nd ACM International Conference on Multimedia. ACM, 2024, pp. 8546–8555
2024
-
[37]
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,
I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”Com- munications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, vol. 57, no. 11, pp. 1413– 1457, 2004
2004
-
[38]
Learning fast approximations of sparse cod- ing,
K. Gregor and Y . LeCun, “Learning fast approximations of sparse cod- ing,” inProceedings of the 27th International Conference on Machine Learning. Madison, WI, USA: Omnipress, 2010, pp. 399–406
2010
-
[39]
ISTA-Net: Interpretable Optimization- Inspired Deep Network for Image Compressive Sensing,
J. Zhang and B. Ghanem, “ISTA-Net: Interpretable Optimization- Inspired Deep Network for Image Compressive Sensing,” June 2018
2018
-
[40]
MHF-Net: An Interpretable Deep Network for Multispectral and Hyperspectral Image Fusion,
Q. Xie, M. Zhou, Q. Zhao, Z. Xu, and D. Meng, “MHF-Net: An Interpretable Deep Network for Multispectral and Hyperspectral Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 44, no. 3, pp. 1457–1473, 2020
2020
-
[41]
Mixed X-Ray Image Separation for Artworks With Concealed Designs,
W. Pu, J.-J. Huang, B. Sober, N. Daly, C. Higgitt, I. Daubechies, P. L. Dragotti, and M. R. Rodrigues, “Mixed X-Ray Image Separation for Artworks With Concealed Designs,”IEEE Transactions on Image Processing, vol. 31, pp. 4458–4473, 2022
2022
-
[42]
Designing CNNs for Multimodal Image Restoration and Fusion via Unfolding the Method of Multipliers,
I. Marivani, E. Tsiligianni, B. Cornelis, and N. Deligiannis, “Designing CNNs for Multimodal Image Restoration and Fusion via Unfolding the Method of Multipliers,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 5830–5845, 2022
2022
-
[43]
A lightweight deep exclusion unfolding network for single image reflection removal,
J.-J. Huang, T. Liu, Z. Chen, X. Liu, M. Wang, and P. L. Dragotti, “A lightweight deep exclusion unfolding network for single image reflection removal,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4957–4973, 2025
2025
-
[44]
Dfdun: Deep infrared and visible image fusion with diffusion prior unfolding network,
M. Xiong, J.-J. Huang, Z. Chen, T. Liu, X. Li, L. Liu, W. Zhao, and Y . Tang, “Dfdun: Deep infrared and visible image fusion with diffusion prior unfolding network,” in2025 IEEE International Conference on Multimedia and Expo (ICME), 2025, pp. 1–6
2025
-
[45]
Deep Convolutional Sparse Coding Networks for In- terpretable Image Fusion,
Z. Zhao, J. Zhang, H. Bai, Y . Wang, Y . Cui, L. Deng, K. Sun, C. Zhang, J. Liu, and S. Xu, “Deep Convolutional Sparse Coding Networks for In- terpretable Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2369–2377
2023
-
[46]
Unfolding coupled convolutional sparse representation for multi-focus image fusion,
K. Zheng, J. Cheng, and Y . Liu, “Unfolding coupled convolutional sparse representation for multi-focus image fusion,”Information Fusion, vol. 118, p. 102974, 2025
2025
-
[47]
ℓ 0-Regularized Sparse Coding-Based Interpretable Network for Multi-Modal Image Fu- sion,
G. Panda, S. Kundu, S. Bhattacharya, and A. Routray, “ℓ 0-Regularized Sparse Coding-Based Interpretable Network for Multi-Modal Image Fu- sion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 4, pp. 4081–4097, 2026
2026
-
[48]
Sparse representation with learned multiscale dictionary for image fusion,
H. Yin, “Sparse representation with learned multiscale dictionary for image fusion,”Neurocomputing, vol. 148, pp. 600–610, 2015
2015
-
[49]
PFAF-Net: Pyramid Feature Network for Multimodal Fusion,
A. Raza, H. Huo, and T. Fang, “PFAF-Net: Pyramid Feature Network for Multimodal Fusion,”IEEE Sensors Letters, vol. 4, no. 12, pp. 1–4, 2020
2020
-
[50]
SEDR- Fuse: A Symmetric Encoder–Decoder With Residual Block Network for Infrared and Visible Image Fusion,
L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, “SEDR- Fuse: A Symmetric Encoder–Decoder With Residual Block Network for Infrared and Visible Image Fusion,”IEEE Transactions on Instrumen- tation and Measurement, vol. 70, pp. 1–15, 2020
2020
-
[51]
Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion,
J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105–119, 2021
2021
-
[52]
Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models,
Z. Wang, Y . Wu, J. Wang, J. Xu, and W. Shao, “Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models,”IEEE Transactions on Instrumentation and Measure- ment, vol. 71, pp. 1–12, 2022
2022
-
[53]
A Task-Guided, Implicitly- Searched and Meta-Initialized Deep Model for Image Fusion,
R. Liu, Z. Liu, J. Liu, X. Fan, and Z. Luo, “A Task-Guided, Implicitly- Searched and Meta-Initialized Deep Model for Image Fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 10, pp. 6594–6609, 2024
2024
-
[54]
Equivariant Multi-Modality Image Fusion,
Z. Zhao, H. Bai, J. Zhang, Y . Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, and L. Van Gool, “Equivariant Multi-Modality Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2024, pp. 25 912–25 921
2024
-
[55]
MUFusion: A general unsupervised image fusion network based on memory unit,
C. Cheng, T. Xu, and X.-J. Wu, “MUFusion: A general unsupervised image fusion network based on memory unit,”Information Fusion, vol. 92, pp. 80–92, 2023
2023
-
[56]
Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion,
B. Cao, Y . Sun, P. Zhu, and Q. Hu, “Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 23 555–23 564
2023
-
[57]
Coupled Dictionary Training for Image Super-Resolution,
J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled Dictionary Training for Image Super-Resolution,”IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3467–3478, 2012
2012
-
[58]
An Online Coupled Dictionary Learning Approach for Remote Sensing Image Fusion,
M. Guo, H. Zhang, J. Li, L. Zhang, and H. Shen, “An Online Coupled Dictionary Learning Approach for Remote Sensing Image Fusion,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 4, pp. 1284–1294, 2014
2014
-
[59]
Deep Coupled ISTA Network for Multi- Modal Image Super-Resolution,
X. Deng and P. L. Dragotti, “Deep Coupled ISTA Network for Multi- Modal Image Super-Resolution,”IEEE Transactions on Image Process- ing, vol. 29, pp. 1683–1698, 2019
2019
-
[60]
Coupled Feature Learning for Multimodal Medical Image Fusion,
F. G. Veshki, N. Ouzir, S. A. V orobyov, and E. Ollila, “Coupled Feature Learning for Multimodal Medical Image Fusion,”arXiv preprint arXiv:2102.08641, 2021
-
[61]
Benchmarking and comparing multi-exposure image fusion algorithms,
X. Zhang, “Benchmarking and comparing multi-exposure image fusion algorithms,”Information Fusion, vol. 74, pp. 111–131, 2021
2021
-
[62]
Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images,
J. Cai, S. Gu, and L. Zhang, “Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images,”IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018
2049
-
[63]
The TNO Multiband Image Data Collection,
A. Toet, “The TNO Multiband Image Data Collection,”Data in Brief, vol. 15, pp. 249–251, 2017
2017
-
[64]
Harvard University: Athens Digital Library of Human Anatomy,
K. A. Johnson and J. A. Becker, “Harvard University: Athens Digital Library of Human Anatomy,” http://www.med.harvard.edu/aanlib/, 2025
2025
-
[65]
Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,
J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation,” inProceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2023, pp. 8115–8124
2023
-
[66]
HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion,
J. Liu, G. Wu, J. Luan, Z. Jiang, R. Liu, and X. Fan, “HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion,” Information Fusion, vol. 95, pp. 237–249, 2023
2023
-
[67]
IID-MEF: A multi-exposure fusion network based on intrinsic image decomposition,
H. Zhang and J. Ma, “IID-MEF: A multi-exposure fusion network based on intrinsic image decomposition,”Information Fusion, vol. 95, pp. 326– 340, 2023
2023
-
[68]
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion,
Z. Zhao, H. Bai, J. Zhang, Y . Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2023, pp. 5906–5916
2023
-
[69]
CoCoNet: Cou- pled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion,
J. Liu, R. Lin, G. Wu, R. Liu, Z. Luo, and X. Fan, “CoCoNet: Cou- pled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion,”International Journal of Computer Vision, vol. 132, no. 5, pp. 1748–1775, 2024
2024
-
[70]
BSAFusion: A Bidirectional Step- wise Feature Alignment Network for Unaligned Medical Image Fusion,
H. Li, D. Su, Q. Cai, and Y . Zhang, “BSAFusion: A Bidirectional Step- wise Feature Alignment Network for Unaligned Medical Image Fusion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 5, 2025, pp. 4725–4733
2025
-
[71]
A Review of Quality Metrics for Fused Image,
P. Jagalingam and A. V . Hegde, “A Review of Quality Metrics for Fused Image,”Aquatic Procedia, vol. 4, pp. 133–142, 2015
2015
-
[72]
Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform,
B. Shreyamsha Kumar, “Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform,”Signal, Image and Video Processing, vol. 7, no. 6, pp. 1125– 1143, 2013
2013
-
[73]
Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network,
S. Su, Q. Yan, Y . Zhu, C. Zhang, X. Ge, J. Sun, and Y . Zhang, “Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
2020
-
[74]
New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model,
J. Wang, H. Qu, Z. Zhang, and M. Xie, “New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model,”Information Fusion, vol. 105, p. 102230, 2024
2024
-
[75]
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,”Advances in Neural Information Processing Sys- tems, vol. 34, pp. 12 077–12 090, 2021
2021
-
[76]
PIAFusion: A pro- gressive infrared and visible image fusion network based on illumination aware,
L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “PIAFusion: A pro- gressive infrared and visible image fusion network based on illumination aware,”Information Fusion, vol. 83, pp. 79–92, 2022
2022
-
[77]
LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision,
X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3496–3504
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.