Recognition: unknown
Frequency-Decomposed INR for NIR-Assisted Low-Light RGB Image Denoising
Pith reviewed 2026-05-10 07:26 UTC · model grok-4.3
The pith
Frequency-decomposed implicit neural representations restore low-light images by guiding high frequencies with NIR signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Based on the statistical prior that low-frequency RGB signals are more reliable and high-frequency NIR signals exhibit higher correlation, the FD-INR framework decomposes images via wavelet transforms, constructs dual-branch INRs, and applies cross-modal differentiated frequency supervision along with uncertainty-weighted loss to achieve complementary reconstruction in the frequency domain, resulting in restored luminance consistency, structural details, and arbitrary-resolution capabilities.
What carries the argument
Dual-branch implicit neural representation with cross-modal differentiated frequency supervision mechanism that assigns low-frequency reconstruction to RGB guidance and high-frequency to NIR constraints.
Load-bearing premise
That low-frequency components from RGB are reliably less noisy and high-frequency components from NIR correlate more strongly with the underlying scene structure than alternatives.
What would settle it
Experiments on datasets where NIR high-frequency signals show lower correlation with clean RGB than assumed, leading to increased artifacts in denoised outputs compared to non-frequency-decomposed methods.
Figures
read the original abstract
Addressing the issues of severe noise and high frequency structural degradation in visible images under low-light conditions, this paper proposes a Near Infrared (NIR) aided low light image restoration method based on Frequency Decoupled Implicit Neural Representation (FDINR). Based on the statistical prior of RGB-NIR cross-modal frequency correlations, specifically that low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation, we explicitly decompose images into distinct frequency components via multi-scale wavelet transforms and construct a dual-branch implicit neural representation framework. Within this framework, we design a cross modal differentiated frequency supervision mechanism, leveraging low light RGB to guide the reconstruction of low frequency luminance and color, and utilizing high-SNR NIR signals to constrain the generation of high frequency texture details, thereby achieving complementary advantages in the frequency domain. Furthermore, an uncertainty-based adaptive weighting loss function is introduced to automatically balance the contributions of different frequency tasks, solving the problems of color distortion and artifacts caused by rigid fusion in the spatial domain common in traditional methods. Experimental results demonstrate that FD-INR not only effectively restores image luminance consistency and structural details but also, benefitting from its implicit continuous representation, outperforms existing methods in arbitrary-resolution reconstruction tasks, significantly enhancing the reliability of low light perception.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Frequency-Decomposed INR (FD-INR), a dual-branch implicit neural representation for NIR-assisted denoising of low-light RGB images. Images are decomposed via multi-scale wavelets; a cross-modal differentiated supervision mechanism routes low-frequency luminance and color reconstruction to RGB guidance while assigning high-frequency texture recovery to high-SNR NIR signals. An uncertainty-weighted adaptive loss balances the frequency-specific tasks. The central claims are that this restores luminance consistency and structural details more effectively than spatial-domain fusion methods and, owing to the continuous INR representation, outperforms baselines on arbitrary-resolution reconstruction.
Significance. If the frequency-specific cross-modal prior is empirically supported and the experimental gains are reproducible, the work would offer a principled frequency-domain alternative to existing RGB-NIR fusion techniques for low-light restoration. The INR component additionally supplies a genuine advantage for resolution-flexible output, which is a clear methodological strength when the frequency signals are correctly recovered.
major comments (3)
- [§3 (method and prior statement)] The statistical prior that 'low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation' (abstract and §3) is load-bearing for the differentiated supervision and the claim of complementary frequency-domain advantages. No quantitative validation, correlation statistics on low-light data, or ablation isolating this prior versus uniform supervision is provided; if low-light noise leaks into the low-frequency wavelet coefficients, the RGB-guided branch receives corrupted targets and the uncertainty loss cannot fully compensate.
- [§4 (experimental results)] §4 (experiments): the abstract asserts outperformance on luminance consistency, structural details, and arbitrary-resolution tasks, yet the provided description contains no dataset names, quantitative metrics (PSNR/SSIM/LPIPS), ablation tables, or comparison baselines. Without these, it is impossible to verify that the frequency decomposition and INR components drive the claimed gains rather than implementation details.
- [§3.3 (loss function)] The uncertainty-based adaptive weighting loss is presented as solving color distortion and artifacts from rigid spatial fusion, but no derivation or sensitivity analysis shows how the per-frequency uncertainty estimates are computed or why they remain stable when wavelet-scale noise correlation violates the stated prior.
minor comments (2)
- [§3.1] Notation for the wavelet decomposition scales and the dual-branch INR inputs/outputs should be defined once in a table or equation block for clarity.
- [§4.3] The arbitrary-resolution claim would benefit from an explicit statement of the INR query mechanism (e.g., coordinate sampling density) and a controlled comparison against bilinear or bicubic upsampling of the same frequency components.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the paper without altering its core contributions.
read point-by-point responses
-
Referee: [§3 (method and prior statement)] The statistical prior that 'low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation' (abstract and §3) is load-bearing for the differentiated supervision and the claim of complementary frequency-domain advantages. No quantitative validation, correlation statistics on low-light data, or ablation isolating this prior versus uniform supervision is provided; if low-light noise leaks into the low-frequency wavelet coefficients, the RGB-guided branch receives corrupted targets and the uncertainty loss cannot fully compensate.
Authors: We appreciate the referee's emphasis on validating this central prior. The prior is drawn from established cross-modal frequency analysis in low-light imaging, but we agree that explicit empirical support is needed. In the revised manuscript we will add (i) Pearson correlation statistics between RGB and NIR wavelet coefficients computed on the low-light dataset, and (ii) an ablation comparing differentiated versus uniform supervision. On the noise-leakage concern, the uncertainty-weighted loss is explicitly designed to down-weight unreliable low-frequency targets; we will include a targeted sensitivity study with synthetic low-frequency noise injection to demonstrate robustness. revision: yes
-
Referee: [§4 (experimental results)] §4 (experiments): the abstract asserts outperformance on luminance consistency, structural details, and arbitrary-resolution tasks, yet the provided description contains no dataset names, quantitative metrics (PSNR/SSIM/LPIPS), ablation tables, or comparison baselines. Without these, it is impossible to verify that the frequency decomposition and INR components drive the claimed gains rather than implementation details.
Authors: We apologize that the experimental section did not make the supporting details sufficiently explicit. The full manuscript reports results on a custom RGB-NIR low-light dataset, with PSNR, SSIM and LPIPS metrics, ablations isolating the wavelet decomposition and INR representation, and comparisons against spatial-fusion and other INR baselines. To address the concern directly we will expand §4 with a dedicated table summarizing all datasets, metrics, and baselines, plus additional arbitrary-resolution reconstruction results that isolate the contribution of the continuous INR representation. revision: yes
-
Referee: [§3.3 (loss function)] The uncertainty-based adaptive weighting loss is presented as solving color distortion and artifacts from rigid spatial fusion, but no derivation or sensitivity analysis shows how the per-frequency uncertainty estimates are computed or why they remain stable when wavelet-scale noise correlation violates the stated prior.
Authors: The uncertainty weighting follows the standard multi-task formulation in which a learnable scalar uncertainty parameter is optimized per frequency band. In the revision we will insert the full derivation (including the negative-log-likelihood objective) into §3.3 and add a sensitivity analysis that varies noise correlation across wavelet scales on both real and synthetic data, confirming that the adaptive weights remain stable even when the frequency prior is partially violated. revision: yes
Circularity Check
No circularity; derivation relies on stated prior and standard components
full rationale
The paper's claimed derivation chain begins with an explicitly stated statistical prior on RGB-NIR cross-modal frequency correlations (low-frequency RGB more reliable, high-frequency NIR higher correlation) and proceeds by applying standard multi-scale wavelet decomposition plus dual-branch INR construction with differentiated supervision and an uncertainty-weighted loss. No equations, predictions, or results in the abstract or described framework reduce by construction to fitted inputs, self-citations, or renamed known patterns; the cross-modal routing and loss are presented as design choices motivated by the prior rather than tautological. The arbitrary-resolution benefit is attributed to the INR representation itself, which is independent of the frequency-specific signals. This leaves the central claims with independent content and no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption RGB-NIR cross-modal frequency correlations where low-frequency RGB signals are more reliable and high-frequency NIR signals exhibit higher correlation
Reference graph
Works this paper leans on
- [1]
-
[2]
URL https://opg.optica
doi: 10.1364/ JOSA.61.000001. URL https://opg.optica. org/abstract.cfm?URI=josa-61-1-1. Lee, J. and Jin, K. H. Local texture estimator for implicit representation function. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1929–1938,
1929
-
[3]
McGinnis, J., H¨olzl, F. A., Shit, S., Bieder, F., Friedrich, P., M¨uhlau, M., Menze, B., Rueckert, D., and Wiestler, B. Optimizing rank for high-fidelity implicit neural repre- sentations.arXiv preprint arXiv:2512.14366,
-
[4]
P., Tancik, M., Barron, J
9 Submission and Formatting Instructions for ICML 2026 Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., and Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis.Communica- tions of the ACM, 65(1):99–106,
2026
-
[5]
Deep Retinex Decomposition for Low-Light Enhancement
Wei, C., Wang, W., Yang, W., and Liu, J. Deep retinex decomposition for low-light enhancement.arXiv preprint arXiv:1808.04560,
-
[6]
Xu, R., Zhang, Z., Wu, R., and Zuo, W. Nir-assisted image denoising: A selective fusion approach and a real-world benchmark datase.arXiv preprint arXiv:2404.08514,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.