arxiv: 2604.28136 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

Beyond Pixel Fidelity: Minimizing Perceptual Distortion and Color Bias in Night Photography Rendering

Furkan K{\i}nl{\i}

Pith reviewed 2026-05-07 06:02 UTC · model grok-4.3

classification 💻 cs.CV

keywords night photography renderingperceptual distortioncolor biasHVI color spaceRAW-to-RGBwavelet propagationdynamic loss coefficientsfeature distribution loss

0 comments

The pith

A RAW-to-RGB network in HVI color space with RAW-domain processing, wavelet propagation, dynamic losses, and feature-distribution matching reduces color differences and perceptual distortion in night photographs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Night photography rendering must handle extreme contrast between nearly black regions and intense point lights, yet most existing methods optimize only for pixel-level fidelity and produce results that look poor to viewers. The paper introduces pHVI-ISPNet, a framework built on the HVI color space that adds four targeted refinements to close this perceptual gap. RAW-domain feature processing and wavelet-based propagation limit loss of fine detail, while sample-specific dynamic loss coefficients stabilize training across different exposure levels and a feature-distribution loss enforces color constancy. When tested on the NTIRE 2025 night photography dataset, the network remains competitive on standard fidelity scores yet records new state-of-the-art results on CIE2000 color difference and LPIPS perceptual distance. The outcome shows that a perceptually driven pipeline can deliver nighttime images that are both technically accurate and visually preferable.

Core claim

The paper claims that pHVI-ISPNet, by integrating RAW-domain feature processing and wavelet-based feature propagation to mitigate high-frequency detail loss, sample-based dynamic loss coefficients to ensure stable learning across varying exposure levels, and a loss term based on feature distributions to maintain rigorous color constancy, all within a robust HVI color space architecture, achieves competitive fidelity while establishing new state-of-the-art results in both CIE2000 color difference and LPIPS on the NTIRE 2025 night photography rendering dataset, thereby validating a perceptually-driven design for high-quality nighttime imaging.

What carries the argument

pHVI-ISPNet, a RAW-to-RGB network built on the HVI color space that combines RAW-domain processing, wavelet-based feature propagation, sample-based dynamic loss coefficients, and feature-distribution loss to preserve detail and enforce color constancy under extreme contrast.

If this is right

Night photography can be rendered with lower color bias and reduced perceptual distortion while remaining competitive on conventional fidelity metrics.
Dynamic loss coefficients that adapt to each sample's exposure level support stable training across the wide range of lighting conditions typical in nighttime scenes.
Wavelet-based propagation limits high-frequency detail loss that commonly occurs when processing severely underexposed regions.
A loss based on matching feature distributions across the image helps enforce consistent color rendering even when bright sources and dark backgrounds coexist in the same frame.
Perceptually optimized networks can narrow the gap between pixel accuracy and human visual preference in high-contrast low-light imaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the refinements remain effective on datasets collected with different sensors or under different nighttime conditions, the same design pattern could be applied to other high-contrast imaging tasks such as surveillance or automotive night vision.
Wider adoption of the HVI color space might benefit any rendering pipeline where human color perception matters more than strict pixel matching.
Controlled experiments that isolate the contribution of each refinement would clarify which components are responsible for the largest share of the observed perceptual improvement.

Load-bearing premise

The reported gains in perceptual metrics arise primarily from the four proposed refinements rather than from dataset-specific tuning or unstated implementation details.

What would settle it

An ablation experiment in which any one of the four refinements is removed and the resulting network no longer achieves the claimed state-of-the-art CIE2000 and LPIPS scores on the same evaluation set.

read the original abstract

Night Photography Rendering (NPR) poses a significant challenge due to the extreme contrast between dark and illuminated areas in scenes, stemming from concurrent capture of severely dark regions alongside intense point light sources. Existing methods, which are mainly tailored for fidelity metrics, reveal considerable perceptual gaps and often detract from visual quality. We introduce pHVI-ISPNet, a novel RAW-to-RGB framework built on the robust HVI color space. Our network integrates four distinct key refinements: RAW-domain feature processing and Wavelet-based feature propagation to mitigate high-frequency detail loss; sample-based dynamic loss coefficients to ensure stable learning across varying exposure levels; and loss term based on feature distributions to maintain rigorous color constancy. Evaluations on the dataset introduced in the NTIRE 2025 challenge on NPR confirm our approach achieves competitive fidelity while establishing new state-of-the-art results in both CIE2000 color difference and LPIPS. This validates our perceptually-driven design for high-quality nighttime imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces pHVI-ISPNet, a RAW-to-RGB framework for Night Photography Rendering built on the HVI color space. It adds four refinements—RAW-domain feature processing and wavelet-based feature propagation to reduce high-frequency loss, sample-based dynamic loss coefficients for stable training across exposures, and a feature-distribution loss for color constancy. On the NTIRE 2025 NPR challenge dataset the method is reported to deliver competitive fidelity while setting new SOTA results on CIE2000 and LPIPS, validating the perceptually-driven design.

Significance. If the SOTA gains can be shown to arise from the four listed refinements rather than base architecture or tuning, the work would meaningfully advance perceptual rendering for high-contrast nighttime scenes. It would supply a concrete example of trading pixel-level fidelity for improved visual quality in ISP pipelines, with direct relevance to consumer photography applications.

major comments (2)

[Abstract] Abstract: the central claim that the four refinements (RAW-domain processing, wavelet propagation, sample-based dynamic loss coefficients, and feature-distribution loss) are responsible for the new SOTA on CIE2000 and LPIPS is unsupported by any ablation or removal experiments. Without controlled variants that disable individual components, the attribution of gains to the perceptually-driven design choices remains unverified.
[Experimental evaluation] Experimental evaluation: the reported CIE2000 and LPIPS numbers are single-point estimates with no error bars, confidence intervals, or statistical significance tests against competing methods. This weakens the reliability of the SOTA assertion and prevents assessment of whether the improvements exceed training variance.

minor comments (1)

[Methods] The precise formulation and sampling procedure for the 'sample-based dynamic loss coefficients' are not fully specified; adding the exact weighting formula or pseudocode would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate additional experiments that directly support the attribution of gains to our design choices and improve the statistical presentation of results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the four refinements (RAW-domain processing, wavelet propagation, sample-based dynamic loss coefficients, and feature-distribution loss) are responsible for the new SOTA on CIE2000 and LPIPS is unsupported by any ablation or removal experiments. Without controlled variants that disable individual components, the attribution of gains to the perceptually-driven design choices remains unverified.

Authors: We agree that the manuscript currently lacks ablation studies isolating the contribution of each of the four refinements. The full model is evaluated against baselines, but component-wise removal experiments are not reported. In the revised version we will add a dedicated ablation section with controlled variants (e.g., pHVI-ISPNet without RAW-domain processing, without wavelet propagation, without dynamic loss coefficients, and without the feature-distribution loss). We will report the resulting CIE2000 and LPIPS scores on the NTIRE 2025 NPR test set to quantify the incremental benefit of each component. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: the reported CIE2000 and LPIPS numbers are single-point estimates with no error bars, confidence intervals, or statistical significance tests against competing methods. This weakens the reliability of the SOTA assertion and prevents assessment of whether the improvements exceed training variance.

Authors: We acknowledge that the current results are single-run point estimates. To address this, we will retrain the model and the strongest competing methods with at least three different random seeds and report mean and standard deviation for CIE2000 and LPIPS. We will also include paired t-tests or Wilcoxon signed-rank tests against the top competing entries to establish statistical significance. These statistics will be added to the experimental tables and discussed in the text. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical SOTA claims rest on independent benchmark evaluation

full rationale

The paper describes an architecture (pHVI-ISPNet) incorporating four explicit refinements and reports quantitative results on the external NTIRE 2025 NPR dataset. No equations, predictions, or first-principles derivations are presented that reduce the claimed improvements in CIE2000 or LPIPS to quantities defined or fitted inside the same model. The evaluation metrics are applied post-training on held-out challenge data, and the abstract contains no self-citations or load-bearing uniqueness theorems. The derivation chain is therefore self-contained; the headline numbers are ordinary empirical outcomes rather than tautological restatements of the training losses or architectural choices.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger reflects high-level claims. The work assumes standard supervised deep-learning training and the representativeness of the NTIRE 2025 NPR dataset. No new physical entities or unproven mathematical axioms are introduced.

free parameters (1)

sample-based dynamic loss coefficients
Coefficients are adjusted per sample to stabilize training across exposure levels; their exact values or tuning procedure are not specified in the abstract.

pith-pipeline@v0.9.0 · 5465 in / 1219 out tokens · 77874 ms · 2026-05-07T06:02:13.797902+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages · 2 internal anchors

[1]

INTRODUCTION The transformation of raw sensor-driven data from resource- limited devices to high-fidelity RGB-based imagery, in par- ticular for Night Photography Rendering (NPR), represents an ongoing, pivotal challenge in computational photography. Al- though closely related to low-light image enhancement, NPR has its own very specific and more complica...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

The foundational LLIE work was based on the Retinex theory [6], which distinguishes illumination and reflectance, stimulating network designs such as RetinexNet [7]

RELATED WORKS Night Photography Rendering (NPR) is a challenging do- main related to low-light image enhancement (LLIE), but distinct due to the necessity to process raw sensor data chal- lenged by extreme noise, high-dynamic range conditions, and multi-illuminant color casts. The foundational LLIE work was based on the Retinex theory [6], which distingui...

2025
[3]

METHODOLOGY The proposedpHVI-ISPNetis a highly specialized architec- ture built on the Color and Intensity Decoupling Network (CIDNet) [2], which utilizes the HVI color space to address the extreme high-dynamic range and multi-illuminant chal- lenges in Night Photography Rendering (NPR). Our method- ology incorporates four key innovations into the archite...
[4]

Experimental Setup We perform all training and evaluation on the dataset given by the NTIRE 2025 Challenge on Night Photography Ren- dering (NPR)

EXPERIMENTS 4.1. Experimental Setup We perform all training and evaluation on the dataset given by the NTIRE 2025 Challenge on Night Photography Ren- dering (NPR). This dataset consists of paired low-light RAW images captured by a mobile phone (Huawei) and correspond- ing high-quality RGB images (Sony) serving as ground truth. For training, we utilize ran...

2025
[5]

We introducedpHVI-ISPNet, a unified framework leveraging the HVI color space to manage extreme high-dynamic range and multi-illuminant conditions

CONCLUSION This paper addresses the significant perceptual gap in Night Photography Rendering (NPR) by structurally redesigning the RAW-to-RGB pipeline. We introducedpHVI-ISPNet, a unified framework leveraging the HVI color space to manage extreme high-dynamic range and multi-illuminant conditions. Our approach integrates novel architectural improvements ...

2025
[6]

Ntire 2025 challenge on night pho- tography rendering,

Egor Ershov, et al., “Ntire 2025 challenge on night pho- tography rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR) Workshops, June 2025, pp. 1514–1524

2025
[7]

Hvi: A new color space for low- light image enhancement,

Qingsen Yan, Yixu Feng, Cheng Zhang, Guansong Pang, Kangbiao Shi, Peng Wu, Wei Dong, Jinqiu Sun, and Yanning Zhang, “Hvi: A new color space for low- light image enhancement,” inProceedings of the Com- puter Vision and Pattern Recognition Conference, 2025, pp. 5678–5687

2025
[8]

Multi-level wavelet-cnn for image restoration,

Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo, “Multi-level wavelet-cnn for image restoration,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 773–782

2018
[9]

Feature distribution statistics as a loss objective for robust white balance cor- rection,

Furkan Kınlı and Furkan Kırac ¸, “Feature distribution statistics as a loss objective for robust white balance cor- rection,”Machine Vision and Applications, vol. 36, no. 3, pp. 1–20, 2025

2025
[10]

Exact feature distribution matching for arbitrary style transfer and domain generalization,

Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, and Lei Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8035–8045

2022
[11]

Lightness and retinex theory,

Edwin H. Land and John J. McCann, “Lightness and retinex theory,”J. Opt. Soc. Am., vol. 61, no. 1, pp. 1– 11, Jan 1971

1971
[12]

Deep retinex decomposition for low-light enhancement

Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu, “Deep retinex decomposition for low-light en- hancement,”arXiv preprint arXiv:1808.04560, 2018

work page arXiv 2018
[13]

Low-light image en- hancement via breaking down the darkness,

Xiaojie Guo and Qiming Hu, “Low-light image en- hancement via breaking down the darkness,”Interna- tional Journal of Computer Vision, vol. 131, no. 1, pp. 48–66, 2023

2023
[14]

Diff-retinex: Rethinking low-light image en- hancement with a generative diffusion model,

Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, and Jiayi Ma, “Diff-retinex: Rethinking low-light image en- hancement with a generative diffusion model,” inPro- ceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 12302–12311

2023
[15]

Ntire 2022 challenge on night pho- tography rendering,

Egor Ershov, et al., “Ntire 2022 challenge on night pho- tography rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR) Workshops, June 2022, pp. 1287–1300

2022
[16]

Ntire 2023 challenge on night photography rendering,

Alina Shutova, et al., “Ntire 2023 challenge on night photography rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2023, pp. 1982– 1993

2023
[17]

Ntire 2024 challenge on night pho- tography rendering,

Nikola Bani ´c, et al., “Ntire 2024 challenge on night pho- tography rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR) Workshops, June 2024

2024
[18]

Deep-flexisp: A three-stage framework for night photography render- ing,

Shuai Liu, Chaoyu Feng, Xiaotao Wang, Hao Wang, Ran Zhu, Yongqiang Li, and Lei Lei, “Deep-flexisp: A three-stage framework for night photography render- ing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1211–1220

2022
[19]

Dnf: Decouple and feedback network for seeing in the dark,

Xin Jin, Ling-Hao Han, Zhen Li, Chun-Le Guo, Zhi Chai, and Chongyi Li, “Dnf: Decouple and feedback network for seeing in the dark,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18135–18144

2023
[20]

Image super-resolution using very deep residual channel attention networks,

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bi- neng Zhong, and Yun Fu, “Image super-resolution using very deep residual channel attention networks,” inPro- ceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301

2018
[21]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” inInternational Conference on Learning Representations, 2021

2021
[22]

Pytorch: An imperative style, high-performance deep learning library,

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019

2019
[23]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter, “Decoupled weight decay regularization,”arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review arXiv 2017
[24]

The unreasonable ef- fectiveness of deep features as a perceptual metric,

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang, “The unreasonable ef- fectiveness of deep features as a perceptual metric,” in CVPR, 2018

2018
[25]

The development of the cie 2000 colour-difference formula: Ciede2000,

M Ronnier Luo, Guihua Cui, and Bryan Rigg, “The development of the cie 2000 colour-difference formula: Ciede2000,”Color Research & Application, vol. 26, no. 5, pp. 340–350, 2001

2000