arxiv: 2605.07359 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniISP: A Unified ISP Framework for Both Human and Machine Vision

Bo Zhang, Hanxi Li, Li Zeng, Yao Cheng

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords image signal processingunified ISPhuman visionmachine visionhybrid attentionfeature adapterraw sensor datacomputer vision

0 comments

The pith

UniISP creates a single ISP pipeline that produces images appealing to humans while preserving details for machine vision tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve the conflict where traditional image signal processing creates nice-looking photos but loses information needed by computer vision models, while raw sensor data helps machines but looks unappealing to people. It introduces UniISP as a framework that processes raw data into RGB images using a Hybrid Attention Module trained with supervision to prioritize visual quality. A Feature Adapter then passes key features forward to downstream networks without forcing a choice between the two goals. If this holds, camera systems could use one processing path for both photography and AI applications, especially in difficult conditions like low light.

Core claim

UniISP is a unified ISP framework that incorporates a Hybrid Attention Module with supervised learning to generate visually pleasing RGB images from raw sensor data and a Feature Adapter module to propagate informative features to subsequent computer vision networks, achieving state-of-the-art performance across various scenarios and multiple datasets.

What carries the argument

The Hybrid Attention Module (HAM) that emphasizes relevant features for human visual quality combined with the Feature Adapter that transfers preserved information to machine vision models.

If this is right

Generated images satisfy human aesthetic standards while supporting high accuracy in computer vision tasks.
The framework performs well in low-light and other challenging capture conditions.
Performance holds across multiple public datasets without task-specific retraining.
A single pipeline removes the need for separate human and machine processing branches in camera systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Device makers could embed this processing to deliver better photos alongside stronger AI features without extra hardware modes.
End-to-end training of the ISP with specific vision tasks becomes feasible as a next step.
Real-time video versions could be tested for applications like mobile photography or vehicle cameras.

Load-bearing premise

That the attention module and feature adapter can jointly optimize for human visual appeal and machine information integrity without meaningful trade-offs in either.

What would settle it

Compare UniISP outputs against traditional ISP and minimal-ISP baselines on a held-out low-light dataset using both human visual quality ratings and accuracy of a fixed downstream object detector; if either score is worse than the stronger baseline, the unified benefit fails.

Figures

Figures reproduced from arXiv: 2605.07359 by Bo Zhang, Hanxi Li, Li Zeng, Yao Cheng.

**Figure 2.** Figure 2: illustrates the overall architecture of the UniISP model and its constituent modules. This paper will first detail the key modules designed to enhance human visual quality (section 3.1). Subsequently, we introduce the components incorporated to improve perceptual performance for downstream tasks (section 3.2). Finally, we elaborate on the adaptive training framework designed for the joint optimization of t… view at source ↗

**Figure 2.** Figure 2: (a) Overall framework of UniISP. Through supervised learning with RGB reference and [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Joint training with GCM. The wellaligned supervisory target sRGB image y w is synthesized through Global Color Mapping (GCM) and optical flow consistency mask m to enforce spatiotemporal alignment constraints during training. Since x and y are captured by different cameras, there is inevitably a spatial misalignment. Furthermore, the severe color discrepancies between x and y make the image alignment … view at source ↗

**Figure 4.** Figure 4: Visual results comparison of a typical scene in the ZRR dataset. Our method obtains better [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of object detection results on PASCAL RAW. Three rows represent dark, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of semantic segmentation results on ADE20K RAW. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of RAW-to-RGB results on the ZRR dataset. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Cross-sensor generalization on NOD-Nikon dataset. UniISP(Sony) trained on Sony data [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of object detection results on PASCAL RAW. Three rows represent dark, [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Experimental and visual comparisons under the real extremely dark dataset LOD. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: isualization of semantic segmentation results on ADE20K RAW. Three rows represent [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

read the original abstract

Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline generates visually pleasing RGB images for human perception through a series of steps, but some of these operations may adversely impact the information integrity by introducing compression and loss. Furthermore, in computer vision tasks that directly utilize raw camera data, most existing methods integrate minimal ISP processing with downstream networks, yet the resulting images are often difficult to visualize or do not align with human aesthetic preferences. This paper proposes UniISP, a novel ISP framework designed to simultaneously meet the requirements of both human visual perception and computer vision applications. By incorporating a carefully designed Hybrid Attention Module (HAM) and employing supervised learning, the proposed method ensures that the generated images are visually appealing. Additionally, a Feature Adapter module is introduced to effectively propagate informative features from the ISP stage to subsequent downstream networks. Extensive experiments demonstrate that our approach achieves state-of-the-art performance across various scenarios and multiple datasets, proving its generalizability and effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniISP tries to unify human and machine vision in one ISP pipeline using HAM and a Feature Adapter, but the abstract's SOTA claims need the actual numbers to hold up.

read the letter

The key point with this paper is that it presents UniISP as a single framework that handles both human visual quality and machine vision needs through a Hybrid Attention Module for appealing images and a Feature Adapter for passing features downstream. This addresses a genuine tension in ISP design, but the strength of the claims depends on experiments that aren't quantified here. The work does well by clearly stating the problem: raw data has rich info good for recognition especially in low light, but standard ISP steps for pretty RGB can lose some of that. Existing methods either do minimal processing for CV or full ISP for humans, leading to either ugly images or lost performance. UniISP tries to do both at once with supervised training on the attention module and the adapter module to bridge to later networks. The modules are new in this combination, and the framing avoids circularity by treating it as a supervised learning setup rather than assuming the result. Where it gets soft is in the evidence. The abstract mentions extensive experiments achieving SOTA across scenarios and datasets, proving generalizability, but supplies no specific metrics, no baseline comparisons, no ablation studies on the modules, and no dataset details. This leaves the central assumption untested in the provided text—that the adapter can preserve information integrity even after human-oriented processing. If the full paper has those details, they need to be scrutinized for whether the trade-offs are minimal or if one goal suffers. This paper is for computer vision and imaging researchers focused on practical camera pipelines and low-light applications. A reader looking for ideas on dual-purpose processing would get some value from the module designs. It deserves serious peer review because the problem is relevant and the proposed solution is specific enough to evaluate once the results are shown. Referees can push for the missing quantitative support.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes UniISP, a unified ISP framework that processes raw sensor data into RGB images suitable for both human visual perception and downstream machine vision tasks. It introduces a Hybrid Attention Module (HAM) trained with supervised learning to ensure visual appeal, along with a Feature Adapter module to propagate informative features to subsequent networks. The central claim is that this approach achieves state-of-the-art performance across various scenarios and multiple datasets while avoiding the information loss typical of traditional ISP pipelines.

Significance. If the empirical results hold, the work could be significant for computer vision applications that rely on raw or minimally processed data, such as low-light recognition. By jointly optimizing for human aesthetics and machine-usable features via the HAM and Feature Adapter, it offers a practical alternative to either fully traditional ISP or minimal-ISP approaches that produce unappealing outputs. The multi-dataset evaluation, if substantiated, would support claims of generalizability.

major comments (2)

Abstract: The claim that 'extensive experiments demonstrate that our approach achieves state-of-the-art performance across various scenarios and multiple datasets' is presented without any quantitative metrics, baseline comparisons, ablation results, or dataset specifications. This is load-bearing for the central empirical claim, as the soundness of the method and the absence of trade-offs between human visual quality and machine vision performance cannot be evaluated from the provided description alone.
Method description (inferred from abstract): The assertion that the Feature Adapter 'effectively propagate[s] informative features' and that the overall framework avoids 'significant trade-offs' requires explicit experimental validation (e.g., downstream task accuracy with vs. without the adapter, or human vs. machine metrics on the same outputs). Without such controls, the weakest assumption—that simultaneous optimization is possible without degradation—remains untested in the visible text.

minor comments (1)

Abstract: Consider adding one sentence specifying the downstream tasks (e.g., object detection, classification) and example datasets to make the SOTA claim more concrete for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below, clarifying the content of the full manuscript while noting where revisions can strengthen the presentation.

read point-by-point responses

Referee: Abstract: The claim that 'extensive experiments demonstrate that our approach achieves state-of-the-art performance across various scenarios and multiple datasets' is presented without any quantitative metrics, baseline comparisons, ablation results, or dataset specifications. This is load-bearing for the central empirical claim, as the soundness of the method and the absence of trade-offs between human visual quality and machine vision performance cannot be evaluated from the provided description alone.

Authors: We agree that the abstract, as a concise summary, does not include specific numbers or dataset names. The full manuscript contains multiple tables and figures reporting quantitative SOTA comparisons, baseline results, ablation studies, and dataset details (including low-light and standard scenarios). To better support the central claim for readers who focus on the abstract, we will revise it to include one or two key quantitative highlights (e.g., accuracy gains and perceptual scores) while remaining within length limits. revision: yes
Referee: Method description (inferred from abstract): The assertion that the Feature Adapter 'effectively propagate[s] informative features' and that the overall framework avoids 'significant trade-offs' requires explicit experimental validation (e.g., downstream task accuracy with vs. without the adapter, or human vs. machine metrics on the same outputs). Without such controls, the weakest assumption—that simultaneous optimization is possible without degradation—remains untested in the visible text.

Authors: The full manuscript includes dedicated ablation experiments that directly compare downstream task performance (e.g., recognition accuracy) with and without the Feature Adapter, as well as joint reporting of human perceptual quality metrics and machine vision accuracy on identical outputs. These results demonstrate that the adapter improves feature propagation without introducing measurable degradation in either domain. The experiments section already contains the requested controls; we can add a dedicated paragraph or table footnote if the referee believes the connection needs to be made more explicit. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces UniISP as an architectural proposal combining a Hybrid Attention Module with supervised learning and a Feature Adapter module. Its claims rest on empirical results from training and evaluation on multiple datasets rather than any closed-form derivation, parameter fitting that is then relabeled as prediction, or load-bearing self-citation chains. No equations or definitions are shown that reduce the output to the input by construction, and the framework is presented as a new supervised pipeline whose performance is assessed externally via experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim depends on the effectiveness of newly introduced modules and the assumption that supervised learning can jointly optimize dual objectives; no specific free parameters are named in the abstract.

axioms (1)

domain assumption Supervised learning on paired data can balance human visual quality and machine-usable feature preservation in ISP pipelines
The method relies on this to train the framework for both goals simultaneously.

invented entities (2)

Hybrid Attention Module (HAM) no independent evidence
purpose: Ensure generated images are visually appealing while processing raw data
New module introduced to handle attention for visual quality in the ISP stage.
Feature Adapter module no independent evidence
purpose: Propagate informative features from the ISP stage to downstream computer vision networks
New module for bridging ISP output to machine vision tasks.

pith-pipeline@v0.9.0 · 5490 in / 1275 out tokens · 56891 ms · 2026-05-11T02:34:26.716741+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Hybrid Attention Module (HAM) ... channel-wise SA ... FreqFusion module aggregates multi-scale encoder outputs ... Ltotal = λ·Lhuman + (1−λ)·Lmachine with EMA weighting
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
lightweight end-to-end neural ISP framework ... U-Net architecture with the HAM as its backbone

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Reconfiguring the imaging pipeline for computer vision

Mark Buckler, Suren Jayasuriya, and Adrian Sampson. Reconfiguring the imaging pipeline for computer vision. InProceedings of the IEEE International Conference on Computer Vision, pages 975–984, 2017

work page 2017
[2]

Learning to see in the dark

Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3291– 3300, 2018

work page 2018
[3]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection toolbox and benchmark.arXiv preprint arXiv:1906.07155, 2019

work page Pith review arXiv 1906
[4]

Frequency- aware feature fusion for dense image prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, and Gao Huang. Frequency- aware feature fusion for dense image prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[5]

Mmsegmentation: Openmmlab semantic segmentation toolbox and benchmark, 2020

MMSegmentation Contributors. Mmsegmentation: Openmmlab semantic segmentation toolbox and benchmark, 2020

work page 2020
[6]

Raw-adapter: Adapting pre-trained visual model to camera raw images

Ziteng Cui and Tatsuya Harada. Raw-adapter: Adapting pre-trained visual model to camera raw images. InEuropean Conference on Computer Vision, pages 37–56. Springer, 2025

work page 2025
[7]

Multitask aet with orthogonal tangent regularity for dark object detection

Ziteng Cui, Guo-Jun Qi, Lin Gu, Shaodi You, Zenghui Zhang, and Tatsuya Harada. Multitask aet with orthogonal tangent regularity for dark object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 2553–2562, 2021

work page 2021
[8]

Awnet: Attentive wavelet network for image isp

Linhui Dai, Xiaohong Liu, Chengqi Li, and Jun Chen. Awnet: Attentive wavelet network for image isp. InComputer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 185–201. Springer, 2020

work page 2020
[9]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[10]

Dirty pixels: Towards end-to-end image processing and perception.ACM Transactions on Graphics (TOG), 40(3):1–15, 2021

Steven Diamond, Vincent Sitzmann, Frank Julca-Aguilar, Stephen Boyd, Gordon Wetzstein, and Felix Heide. Dirty pixels: Towards end-to-end image processing and perception.ACM Transactions on Graphics (TOG), 40(3):1–15, 2021

work page 2021
[11]

Learning degradation-independent representations for camera isp pipelines

Yanhui Guo, Fangzhou Luo, and Xiaolin Wu. Learning degradation-independent representations for camera isp pipelines. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25774–25783, 2024

work page 2024
[12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[13]

Enhancing raw-to-srgb with decoupled style structure in fourier domain

Xuanhua He, Tao Hu, Guoli Wang, Zejin Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chengjun Xie, et al. Enhancing raw-to-srgb with decoupled style structure in fourier domain. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2130–2138, 2024

work page 2024
[14]

Crafting object detection in very low light

Yang Hong, Kaixuan Wei, Linwei Chen, and Ying Fu. Crafting object detection in very low light. InBMVC, volume 1, page 3, 2021

work page 2021
[15]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018

work page 2018
[16]

Aim 2019 challenge on raw to rgb mapping: Methods and results

Andrey Ignatov, Radu Timofte, Sung-Jea Ko, Seung-Wook Kim, Kwang-Hyun Uhm, Seo-Won Ji, Sung-Jin Cho, Jun-Pyo Hong, Kangfu Mei, Juncheng Li, et al. Aim 2019 challenge on raw to rgb mapping: Methods and results. In2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3584–3590. IEEE, 2019. 10

work page 2019
[17]

Aim 2020 challenge on learned image signal processing pipeline

Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, Wangmeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, et al. Aim 2020 challenge on learned image signal processing pipeline. InComputer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 152–170. Springer, 2020

work page 2020
[18]

Replacing mobile camera isp with a single deep learning model

Andrey Ignatov, Luc Van Gool, and Radu Timofte. Replacing mobile camera isp with a single deep learning model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 536–537, 2020

work page 2020
[19]

Fine-grained fashion represen- tation learning by online deep clustering

Yang Jiao, Ning Xie, Yan Gao, Chien-Chih Wang, and Yi Sun. Fine-grained fashion represen- tation learning by online deep clustering. InEuropean conference on computer vision, pages 19–35. Springer, 2022

work page 2022
[20]

Learning attribute and class- specific representation duet for fine-grained fashion analysis

Yang Jiao, Yan Gao, Jingjing Meng, Jin Shang, and Yi Sun. Learning attribute and class- specific representation duet for fine-grained fashion analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11050–11059, 2023

work page 2023
[21]

Dnf: Decouple and feedback network for seeing in the dark

Xin Jin, Ling-Hao Han, Zhen Li, Chun-Le Guo, Zhi Chai, and Chongyi Li. Dnf: Decouple and feedback network for seeing in the dark. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18135–18144, 2023

work page 2023
[22]

A software platform for manipulating the camera imaging pipeline

Hakki Can Karaimer and Michael S Brown. A software platform for manipulating the camera imaging pipeline. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 429–444. Springer, 2016

work page 2016
[23]

Paramisp: learned forward and inverse isps using camera parameters.arXiv preprint arXiv:2312.13313, 2023

Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, and Sunghyun Cho. Paramisp: learned forward and inverse isps using camera parameters.arXiv preprint arXiv:2312.13313, 2023

work page arXiv 2023
[24]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[25]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023

work page 2023
[26]

Polarized color image denoising

Zhuoxiao Li, Haiyang Jiang, Mingdeng Cao, and Yinqiang Zheng. Polarized color image denoising. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9873–9882. IEEE, 2023

work page 2023
[27]

https://doi.org/10.48550/arXiv.1708.02002

T Lin. Focal loss for dense object detection.arXiv preprint arXiv:1708.02002, 2017

work page arXiv 2017
[28]

Feature pyramid networks for object detection

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017

work page 2017
[29]

Multi-level wavelet convolutional neural networks.IEEE Access, 7:74973–74985, 2019

Pengju Liu, Hongzhi Zhang, Wei Lian, and Wangmeng Zuo. Multi-level wavelet convolutional neural networks.IEEE Access, 7:74973–74985, 2019

work page 2019
[30]

Least squares generative adversarial networks

Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. InProceedings of the IEEE international conference on computer vision, pages 2794–2802, 2017

work page 2017
[31]

Dancing under the stars: video denoising in starlight

Kristina Monakhova, Stephan R Richter, Laura Waller, and Vladlen Koltun. Dancing under the stars: video denoising in starlight. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16241–16251, 2022

work page 2022
[32]

Genisp: Neural isp for low-light machine cognition

Igor Morawski, Yu-An Chen, Yu-Sheng Lin, Shusil Dangi, Kai He, and Winston H Hsu. Genisp: Neural isp for low-light machine cognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 630–639, 2022. 11

work page 2022
[33]

Hardware-in-the-loop end-to-end optimization of camera image processing pipelines

Ali Mosleh, Avinash Sharma, Emmanuel Onzon, Fahim Mannan, Nicolas Robidoux, and Felix Heide. Hardware-in-the-loop end-to-end optimization of camera image processing pipelines. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7529–7538, 2020

work page 2020
[34]

Pascalraw: raw image database for object detection.Stanford Digital Repository, 2014

Alex Omid-Zohoor, David Ta, and Boris Murmann. Pascalraw: raw image database for object detection.Stanford Digital Repository, 2014

work page 2014
[35]

Attention-aware learning for hyperparameter prediction in image processing pipelines

Haina Qin, Longfei Han, Juan Wang, Congxuan Zhang, Yanwei Li, Bing Li, and Weiming Hu. Attention-aware learning for hyperparameter prediction in image processing pipelines. In European Conference on Computer Vision, pages 271–287. Springer, 2022

work page 2022
[36]

Learning to exploit the sequence-specific prior knowledge for image processing pipelines optimization

Haina Qin, Longfei Han, Weihua Xiong, Juan Wang, Wentao Ma, Bing Li, and Weiming Hu. Learning to exploit the sequence-specific prior knowledge for image processing pipelines optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22314–22323, 2023

work page 2023
[37]

Color image processing pipeline.IEEE Signal processing magazine, 22(1):34–43, 2005

Rajeev Ramanath, Wesley E Snyder, Youngjun Yoo, and Mark S Drew. Color image processing pipeline.IEEE Signal processing magazine, 22(1):34–43, 2005

work page 2005
[38]

YOLOv3: An Incremental Improvement

Joseph Redmon. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018

work page internal anchor Pith review arXiv 2018
[39]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015

work page 2015
[40]

An overview of gradient descent optimization algorithms,

Sebastian Ruder. An overview of gradient descent optimization algorithms.arXiv preprint arXiv:1609.04747, 2016

work page arXiv 2016
[41]

Transform your smartphone into a dslr camera: Learning the isp in the wild

Ardhendu Shekhar Tripathi, Martin Danelljan, Samarth Shukla, Radu Timofte, and Luc Van Gool. Transform your smartphone into a dslr camera: Learning the isp in the wild. InEuropean Conference on Computer Vision, pages 625–641. Springer, 2022

work page 2022
[42]

Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8934–8943, 2018

work page 2018
[43]

Sparse r-cnn: End-to-end object detection with learnable proposals

Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, et al. Sparse r-cnn: End-to-end object detection with learnable proposals. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14454–14463, 2021

work page 2021
[44]

Adaptiveisp: Learning an adaptive image signal processor for object detection.Advances in Neural Information Processing Systems, 37:112598–112623, 2024

Yujin Wang, Tianyi Xu, Zhang Fan, Tianfan Xue, and Jinwei Gu. Adaptiveisp: Learning an adaptive image signal processor for object detection.Advances in Neural Information Processing Systems, 37:112598–112623, 2024

work page 2024
[45]

Multiscale structural similarity for im- age quality assessment

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for im- age quality assessment. InThe Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003

work page 2003
[46]

A physics-based noise formation model for extreme low-light raw denoising

Kaixuan Wei, Ying Fu, Jiaolong Yang, and Hua Huang. A physics-based noise formation model for extreme low-light raw denoising. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2758–2767, 2020

work page 2020
[47]

Physics-based noise modeling for extreme low-light photography.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8520–8537, 2021

Kaixuan Wei, Ying Fu, Yinqiang Zheng, and Jiaolong Yang. Physics-based noise modeling for extreme low-light photography.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8520–8537, 2021

work page 2021
[48]

Visionisp: Repurposing the image signal processor for computer vision applications

Chyuan-Tyng Wu, Leo F Isikdogan, Sushma Rao, Bhavin Nayak, Timo Gerasimow, Aleksandar Sutic, Liron Ain-Kedem, and Gilad Michael. Visionisp: Repurposing the image signal processor for computer vision applications. In2019 IEEE International Conference on Image Processing (ICIP), pages 4624–4628. IEEE, 2019. 12

work page 2019
[49]

Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems, 34:12077–12090, 2021

work page 2021
[50]

Invertible image signal processing

Yazhou Xing, Zian Qian, and Qifeng Chen. Invertible image signal processing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6287–6296, 2021

work page 2021
[51]

Dynamicisp: dynamically controlled image signal processor for image recognition

Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, and Takeshi Ohashi. Dynamicisp: dynamically controlled image signal processor for image recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12866–12876, 2023

work page 2023
[52]

Reconfigisp: Reconfigurable camera image processing pipeline

Ke Yu, Zexian Li, Yue Peng, Chen Change Loy, and Jinwei Gu. Reconfigisp: Reconfigurable camera image processing pipeline. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4248–4257, 2021

work page 2021
[53]

Cycleisp: Real image restoration via improved data synthesis

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming- Hsuan Yang, and Ling Shao. Cycleisp: Real image restoration via improved data synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2696–2705, 2020

work page 2020
[54]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

work page 2022
[55]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[56]

Learning raw-to-srgb mappings with inaccurately aligned supervision

Zhilu Zhang, Haolin Wang, Ming Liu, Ruohao Wang, Jiawei Zhang, and Wangmeng Zuo. Learning raw-to-srgb mappings with inaccurately aligned supervision. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4348–4358, 2021

work page 2021
[57]

Scene parsing through ade20k dataset

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017

work page 2017
[58]

wall", "bed

Wei Zhou, Shengyu Gao, Ling Zhang, and Xin Lou. Histogram of oriented gradients feature extraction from raw bayer pattern images.IEEE Transactions on Circuits and Systems II: Express Briefs, 67(5):946–950, 2020. 13 A Raw-to-RGB Mapping A qualitative comparison of the proposed method and existing approaches on the ZRR test set is provided in Fig 7. As depi...

work page 2020