arxiv: 2604.16800 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

Frequency-Decomposed INR for NIR-Assisted Low-Light RGB Image Denoising

Chang Liu, Jun Qiu, Ligen Shi, Shuchen Sun, Zengyu Pang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords low-light denoisingNIR assistanceimplicit neural representationfrequency decompositionwavelet transformcross-modal fusionarbitrary resolutionuncertainty weighting

0 comments

The pith

Frequency-decomposed implicit neural representations restore low-light images by guiding high frequencies with NIR signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method called FD-INR that decomposes low-light RGB and NIR images into frequency components using multi-scale wavelets and builds a dual-branch implicit neural representation to handle them separately. Low-frequency parts rely on the more reliable RGB signals for luminance and color, while high-frequency details are constrained by the higher-signal-to-noise NIR. This setup with an uncertainty-based adaptive loss avoids color distortions common in direct fusion approaches and benefits from the continuous representation of INR to handle any output resolution. A reader would care because it addresses practical problems in low-light imaging like noise and lost details without requiring new hardware, and the frequency-specific cross-modal guidance provides a principled way to combine modalities.

Core claim

Based on the statistical prior that low-frequency RGB signals are more reliable and high-frequency NIR signals exhibit higher correlation, the FD-INR framework decomposes images via wavelet transforms, constructs dual-branch INRs, and applies cross-modal differentiated frequency supervision along with uncertainty-weighted loss to achieve complementary reconstruction in the frequency domain, resulting in restored luminance consistency, structural details, and arbitrary-resolution capabilities.

What carries the argument

Dual-branch implicit neural representation with cross-modal differentiated frequency supervision mechanism that assigns low-frequency reconstruction to RGB guidance and high-frequency to NIR constraints.

Load-bearing premise

That low-frequency components from RGB are reliably less noisy and high-frequency components from NIR correlate more strongly with the underlying scene structure than alternatives.

What would settle it

Experiments on datasets where NIR high-frequency signals show lower correlation with clean RGB than assumed, leading to increased artifacts in denoised outputs compared to non-frequency-decomposed methods.

Figures

Figures reproduced from arXiv: 2604.16800 by Chang Liu, Jun Qiu, Ligen Shi, Shuchen Sun, Zengyu Pang.

**Figure 1.** Figure 1: The architecture of the proposed FD-INR. Our framework explicitly decomposes the image field into two specialized components: a low-frequency branch (FLF ) and a high-frequency branch (GHF ). During the optimization, we employ a multi-scale wavelet supervision mechanism where the low-frequency branch is constrained by the color manifold of the RGB input, while the high-frequency branch is guided by the str… view at source ↗

**Figure 2.** Figure 2: Visual comparison of different fusion paradigms, where (a) Input RGB and (b) NIR (c) Naive DWT-based fusion, (d) DVD baseline method (Jin et al., 2022), and (e) our method. 4.3. Ablation Study and Analysis [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative ablation study and component visualization of FD-INR. (a)–(e) visualize the explicit spectral decoupling process, where the low-frequency branch captures color manifolds and the high-frequency branch extracts pure textures. (f)–(j) validate the necessity of each component: β and the centering constraint prevent overexposure; gradient loss ensures edge sharpness; decoupling regularization preven… view at source ↗

read the original abstract

Addressing the issues of severe noise and high frequency structural degradation in visible images under low-light conditions, this paper proposes a Near Infrared (NIR) aided low light image restoration method based on Frequency Decoupled Implicit Neural Representation (FDINR). Based on the statistical prior of RGB-NIR cross-modal frequency correlations, specifically that low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation, we explicitly decompose images into distinct frequency components via multi-scale wavelet transforms and construct a dual-branch implicit neural representation framework. Within this framework, we design a cross modal differentiated frequency supervision mechanism, leveraging low light RGB to guide the reconstruction of low frequency luminance and color, and utilizing high-SNR NIR signals to constrain the generation of high frequency texture details, thereby achieving complementary advantages in the frequency domain. Furthermore, an uncertainty-based adaptive weighting loss function is introduced to automatically balance the contributions of different frequency tasks, solving the problems of color distortion and artifacts caused by rigid fusion in the spatial domain common in traditional methods. Experimental results demonstrate that FD-INR not only effectively restores image luminance consistency and structural details but also, benefitting from its implicit continuous representation, outperforms existing methods in arbitrary-resolution reconstruction tasks, significantly enhancing the reliability of low light perception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main idea splits frequencies via wavelets so RGB guides low bands and NIR handles high bands inside dual INRs, but the split rests on a prior that gets little direct support.

read the letter

The work puts forward a dual-branch implicit neural representation that decomposes inputs with multi-scale wavelets, then routes low-frequency luminance and color recovery to the RGB branch while letting the NIR branch supply high-frequency texture. An uncertainty-adaptive loss balances the two tasks. This is the concrete new piece: the differentiated cross-modal supervision inside the continuous representation, rather than another spatial-domain fusion network. The INR backbone also gives the claimed arbitrary-resolution output without retraining, which is a practical side benefit for some imaging pipelines. The abstract reports that the method improves luminance consistency and structural detail over prior approaches, and the frequency-aware design is a reasonable attempt to exploit the different noise and correlation properties of the two modalities. The architecture itself is cleanly described and builds on standard INR and wavelet tools without obvious circularity. The soft spot is the load-bearing prior itself. The paper states that low-frequency RGB remains more reliable while high-frequency NIR shows stronger correlation, yet the abstract gives no ablation that isolates this assumption or tests what happens when noise leaks across wavelet scales. If that prior is only approximately true, the differentiated supervision could pass corrupted guidance to the low-frequency branch and the uncertainty loss may not fully compensate. The arbitrary-resolution advantage is real but secondary; it still requires the frequency signals to be recovered correctly first. This is a targeted method paper for researchers in multi-modal low-light restoration and computational photography. A reader already working on INR-based restoration or frequency priors would find the architecture worth examining. It is coherent enough on its own terms to deserve a serious referee, mainly to check the experimental controls and the strength of the frequency-correlation evidence.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Frequency-Decomposed INR (FD-INR), a dual-branch implicit neural representation for NIR-assisted denoising of low-light RGB images. Images are decomposed via multi-scale wavelets; a cross-modal differentiated supervision mechanism routes low-frequency luminance and color reconstruction to RGB guidance while assigning high-frequency texture recovery to high-SNR NIR signals. An uncertainty-weighted adaptive loss balances the frequency-specific tasks. The central claims are that this restores luminance consistency and structural details more effectively than spatial-domain fusion methods and, owing to the continuous INR representation, outperforms baselines on arbitrary-resolution reconstruction.

Significance. If the frequency-specific cross-modal prior is empirically supported and the experimental gains are reproducible, the work would offer a principled frequency-domain alternative to existing RGB-NIR fusion techniques for low-light restoration. The INR component additionally supplies a genuine advantage for resolution-flexible output, which is a clear methodological strength when the frequency signals are correctly recovered.

major comments (3)

[§3 (method and prior statement)] The statistical prior that 'low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation' (abstract and §3) is load-bearing for the differentiated supervision and the claim of complementary frequency-domain advantages. No quantitative validation, correlation statistics on low-light data, or ablation isolating this prior versus uniform supervision is provided; if low-light noise leaks into the low-frequency wavelet coefficients, the RGB-guided branch receives corrupted targets and the uncertainty loss cannot fully compensate.
[§4 (experimental results)] §4 (experiments): the abstract asserts outperformance on luminance consistency, structural details, and arbitrary-resolution tasks, yet the provided description contains no dataset names, quantitative metrics (PSNR/SSIM/LPIPS), ablation tables, or comparison baselines. Without these, it is impossible to verify that the frequency decomposition and INR components drive the claimed gains rather than implementation details.
[§3.3 (loss function)] The uncertainty-based adaptive weighting loss is presented as solving color distortion and artifacts from rigid spatial fusion, but no derivation or sensitivity analysis shows how the per-frequency uncertainty estimates are computed or why they remain stable when wavelet-scale noise correlation violates the stated prior.

minor comments (2)

[§3.1] Notation for the wavelet decomposition scales and the dual-branch INR inputs/outputs should be defined once in a table or equation block for clarity.
[§4.3] The arbitrary-resolution claim would benefit from an explicit statement of the INR query mechanism (e.g., coordinate sampling density) and a controlled comparison against bilinear or bicubic upsampling of the same frequency components.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the paper without altering its core contributions.

read point-by-point responses

Referee: [§3 (method and prior statement)] The statistical prior that 'low-frequency RGB signals are more reliable, whereas high frequency NIR signals exhibit higher correlation' (abstract and §3) is load-bearing for the differentiated supervision and the claim of complementary frequency-domain advantages. No quantitative validation, correlation statistics on low-light data, or ablation isolating this prior versus uniform supervision is provided; if low-light noise leaks into the low-frequency wavelet coefficients, the RGB-guided branch receives corrupted targets and the uncertainty loss cannot fully compensate.

Authors: We appreciate the referee's emphasis on validating this central prior. The prior is drawn from established cross-modal frequency analysis in low-light imaging, but we agree that explicit empirical support is needed. In the revised manuscript we will add (i) Pearson correlation statistics between RGB and NIR wavelet coefficients computed on the low-light dataset, and (ii) an ablation comparing differentiated versus uniform supervision. On the noise-leakage concern, the uncertainty-weighted loss is explicitly designed to down-weight unreliable low-frequency targets; we will include a targeted sensitivity study with synthetic low-frequency noise injection to demonstrate robustness. revision: yes
Referee: [§4 (experimental results)] §4 (experiments): the abstract asserts outperformance on luminance consistency, structural details, and arbitrary-resolution tasks, yet the provided description contains no dataset names, quantitative metrics (PSNR/SSIM/LPIPS), ablation tables, or comparison baselines. Without these, it is impossible to verify that the frequency decomposition and INR components drive the claimed gains rather than implementation details.

Authors: We apologize that the experimental section did not make the supporting details sufficiently explicit. The full manuscript reports results on a custom RGB-NIR low-light dataset, with PSNR, SSIM and LPIPS metrics, ablations isolating the wavelet decomposition and INR representation, and comparisons against spatial-fusion and other INR baselines. To address the concern directly we will expand §4 with a dedicated table summarizing all datasets, metrics, and baselines, plus additional arbitrary-resolution reconstruction results that isolate the contribution of the continuous INR representation. revision: yes
Referee: [§3.3 (loss function)] The uncertainty-based adaptive weighting loss is presented as solving color distortion and artifacts from rigid spatial fusion, but no derivation or sensitivity analysis shows how the per-frequency uncertainty estimates are computed or why they remain stable when wavelet-scale noise correlation violates the stated prior.

Authors: The uncertainty weighting follows the standard multi-task formulation in which a learnable scalar uncertainty parameter is optimized per frequency band. In the revision we will insert the full derivation (including the negative-log-likelihood objective) into §3.3 and add a sensitivity analysis that varies noise correlation across wavelet scales on both real and synthetic data, confirming that the adaptive weights remain stable even when the frequency prior is partially violated. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on stated prior and standard components

full rationale

The paper's claimed derivation chain begins with an explicitly stated statistical prior on RGB-NIR cross-modal frequency correlations (low-frequency RGB more reliable, high-frequency NIR higher correlation) and proceeds by applying standard multi-scale wavelet decomposition plus dual-branch INR construction with differentiated supervision and an uncertainty-weighted loss. No equations, predictions, or results in the abstract or described framework reduce by construction to fitted inputs, self-citations, or renamed known patterns; the cross-modal routing and loss are presented as design choices motivated by the prior rather than tautological. The arbitrary-resolution benefit is attributed to the INR representation itself, which is independent of the frequency-specific signals. This leaves the central claims with independent content and no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of specific RGB-NIR frequency correlations. No free parameters or invented physical entities are explicitly introduced in the abstract; the model architecture itself is the primary addition.

axioms (1)

domain assumption RGB-NIR cross-modal frequency correlations where low-frequency RGB signals are more reliable and high-frequency NIR signals exhibit higher correlation
Explicitly invoked in the abstract as the statistical prior motivating the frequency decomposition and supervision strategy.

pith-pipeline@v0.9.0 · 5526 in / 1263 out tokens · 39893 ms · 2026-05-10T07:26:07.256002+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 4 canonical work pages

[1]

Essakine, A., Cheng, Y ., Cheng, C.-W., Zhang, L., Deng, Z., Zhu, L., Sch ¨onlieb, C.-B., and Aviles-Rivero, A. I. Where do we stand with implicit neural representations? a technical and performance survey.arXiv preprint arXiv:2411.03688,

work page arXiv
[2]

URL https://opg.optica

doi: 10.1364/ JOSA.61.000001. URL https://opg.optica. org/abstract.cfm?URI=josa-61-1-1. Lee, J. and Jin, K. H. Local texture estimator for implicit representation function. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1929–1938,

1929
[3]

Optimizing rank for high- fidelity implicit neural representations.arXiv preprint arXiv:2512.14366, 2025

McGinnis, J., H¨olzl, F. A., Shit, S., Bieder, F., Friedrich, P., M¨uhlau, M., Menze, B., Rueckert, D., and Wiestler, B. Optimizing rank for high-fidelity implicit neural repre- sentations.arXiv preprint arXiv:2512.14366,

work page arXiv
[4]

P., Tancik, M., Barron, J

9 Submission and Formatting Instructions for ICML 2026 Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., and Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis.Communica- tions of the ACM, 65(1):99–106,

2026
[5]

Deep Retinex Decomposition for Low-Light Enhancement

Wei, C., Wang, W., Yang, W., and Liu, J. Deep retinex decomposition for low-light enhancement.arXiv preprint arXiv:1808.04560,

work page Pith review arXiv
[6]

Nir-assisted image denoising: A selective fusion approach and a real-world benchmark datase.arXiv preprint arXiv:2404.08514,

Xu, R., Zhang, Z., Wu, R., and Zuo, W. Nir-assisted image denoising: A selective fusion approach and a real-world benchmark datase.arXiv preprint arXiv:2404.08514,

work page arXiv