arxiv: 2605.03680 · v1 · submitted 2026-05-05 · 💻 cs.CV · cs.LG

Recognition: unknown

Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs

Faraz Kayani , Sarmad Kayani , Asad Ahmed , Radu Timofte , Dmitry Ignatov

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:55 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords mobiledenoisingdistillationimagenpuspsnrstudentapproach

0 comments

The pith

A 1.96M-parameter LiteDenoiseNet student model achieves 37.58 dB PSNR on full-resolution real image denoising benchmarks while running in 34-46 ms on mobile NPUs by leveraging NPU-compatible primitives and high-alpha knowledge distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern phones have special chips called NPUs that can run certain math operations very efficiently but struggle with complex neural networks. The authors train a small student model to clean noisy photos by having it learn from a much larger teacher model. They restrict the student to simple operations the NPU handles well and use a training trick that gradually shows the model larger image patches. The result is a tiny model that matches the big one's quality almost exactly but runs quickly on the phone's NPU, sometimes even faster than the phone's regular graphics chip.

Core claim

The 1.96M-parameter student recovers 99.8% of the teacher's restoration quality via high-alpha knowledge distillation (alpha = 0.9), achieving a 21.2x parameter reduction while closing the PSNR gap from 1.63 dB to only 0.05 dB.

Load-bearing premise

That restricting the student to NPU-native primitives (3x3 convolutions, ReLU, nearest-neighbor upsampling) combined with progressive context expansion up to 1024x1024 crops will preserve generalization on real-world noisy images without significant quality loss outside the specific Mobile AI 2026 benchmarks.

Figures

Figures reproduced from arXiv: 2605.03680 by Asad Ahmed, Dmitry Ignatov, Faraz Kayani, Radu Timofte, Sarmad Kayani.

**Figure 1.** Figure 1: Qualitative comparison on a representative validation example. The first row shows the full images, while the second row view at source ↗

**Figure 2.** Figure 2: Overview of the proposed teacher–student mobile denoising pipeline. During training, a high-capacity teacher supervises a view at source ↗

**Figure 3.** Figure 3: Internal structure of the LiteDenoisingBlock used view at source ↗

**Figure 4.** Figure 4: Simplified internal architecture of the lightweight student denoising model. The network follows a compact U-Net-style de view at source ↗

**Figure 5.** Figure 5: Sorted per-image PSNR gap between the high-capacity view at source ↗

read the original abstract

While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. Our approach employs a high-capacity teacher to supervise a lightweight student network specifically designed to leverage the tiled-memory architectures of modern mobile SoCs. By prioritizing NPU-native primitives -- standard 3x3 convolutions, ReLU activations, and nearest-neighbor upsampling -- and employing a progressive context expansion strategy (up to 1024x1024 crops), the model achieves 37.66 dB PSNR / 0.9278 SSIM on the validation benchmark and 37.58 dB PSNR / 0.9098 SSIM on the held-out test benchmark at full resolution (2432x3200) in the Mobile AI 2026 challenge. Following the official challenge rules, the inference runtime is measured under a standardized Full HD (1088x1920) protocol, where it runs in 34.0 ms on the MediaTek Dimensity 9500 and 46.1 ms on the Qualcomm Snapdragon 8 Elite NPU. We further reveal an "Inference Inversion" effect, where strict adherence to NPU-compatible operations enables dedicated NPU execution up to 3.88x faster than the integrated mobile GPU. The 1.96M-parameter student recovers 99.8% of the teacher's restoration quality via high-alpha knowledge distillation (alpha = 0.9), achieving a 21.2x parameter reduction while closing the PSNR gap from 1.63 dB to only 0.05 dB. These results establish hardware-aware distillation as an effective strategy for unifying high-fidelity denoising with practical deployment across diverse mobile NPU architectures. The proposed lightweight student model (LiteDenoiseNet) and its training statistics are provided in the NN Dataset, available at https://github.com/ABrain-One/NN-Dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a 2M-param NPU-only student recovering 99.8% of a larger teacher's PSNR on real denoising benchmarks via high-alpha distillation, with measured runtimes and an inference inversion effect.

read the letter

The main thing to know is that a student network limited to 3x3 convolutions, ReLU, and nearest-neighbor upsampling closes the PSNR gap to its teacher from 1.63 dB down to 0.05 dB on the Mobile AI 2026 real-image denoising task while using only 1.96M parameters. It also reports 37.58 dB PSNR on the held-out test set at full resolution and gives concrete inference times of 34 ms on the MediaTek Dimensity 9500 and 46.1 ms on the Snapdragon 8 Elite under the challenge protocol. The code and training stats are released, which helps reproducibility. The inference inversion result, where strict NPU compatibility makes the dedicated accelerator faster than the integrated GPU, is a useful practical observation. Progressive context expansion during training appears to support full-resolution performance without major quality drop on these benchmarks. The work applies standard knowledge distillation with hardware constraints rather than introducing a new framework, so the novelty sits in the specific combination and the measured outcomes rather than in the core technique. The central empirical claims look directly testable from the numbers and protocol given. One soft spot is that generalization beyond the Mobile AI 2026 data splits is not heavily probed in the provided details, and the reliance on NPU-native primitives could limit flexibility on other hardware or noisier real-world conditions. No obvious internal contradictions in the reported gap closure or runtime claims. This paper is mainly for researchers focused on mobile NPU deployment of restoration models and hardware-aware optimization. It has enough concrete results, runtime data, and public code to deserve a serious referee rather than a desk reject, even if the reviewers will likely ask for more ablation detail and broader testing.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a hardware-algorithm co-design for real-world image denoising on mobile NPUs. A high-capacity teacher supervises a 1.96M-parameter student (LiteDenoiseNet) restricted to NPU-native primitives (3x3 convolutions, ReLU, nearest-neighbor upsampling) with progressive context expansion up to 1024x1024 crops. It reports 37.66 dB PSNR / 0.9278 SSIM on validation and 37.58 dB PSNR / 0.9098 SSIM on held-out test at full resolution (2432x3200) for the Mobile AI 2026 challenge, with runtimes of 34.0 ms (MediaTek Dimensity 9500) and 46.1 ms (Qualcomm Snapdragon 8 Elite) under standardized Full HD protocol. The student recovers 99.8% of teacher quality via alpha=0.9 knowledge distillation, closing the PSNR gap from 1.63 dB to 0.05 dB (21.2x parameter reduction), and demonstrates an 'Inference Inversion' effect with up to 3.88x NPU speedup over GPU. The model and training statistics are publicly released.

Significance. If the reported metrics hold under the stated protocols, the work is significant for practical deployment of high-fidelity denoising on diverse mobile NPUs. It provides concrete, falsifiable evidence via held-out test metrics, standardized runtime measurements, and public code/dataset release at https://github.com/ABrain-One/NN-Dataset, enabling direct verification of the 99.8% recovery claim and 21.2x reduction. The NPU-aware design turning operator constraints into a performance advantage (Inference Inversion) offers a reproducible template for hardware-aware distillation in computer vision.

major comments (2)

§5 (Results): The central claim that the student recovers 99.8% of teacher quality and closes the PSNR gap from 1.63 dB to 0.05 dB requires the teacher's absolute PSNR/SSIM values to be stated explicitly alongside the student's, together with the precise formula used for the recovery percentage; without these, independent verification of the gap closure is not possible from the reported numbers alone.
§4 (Experimental setup): The manuscript reports concrete PSNR/SSIM on validation and held-out test sets but does not specify the data splits, number of random seeds, or error bars; these details are load-bearing for assessing the statistical reliability of the 0.05 dB gap and the generalization of the progressive crop + NPU-primitive design beyond the Mobile AI 2026 benchmarks.

minor comments (2)

Abstract: The phrase 'following the official challenge rules' for runtime measurement should be expanded with a one-sentence reference to the exact benchmark dataset and resolution protocol used for the PSNR/SSIM figures.
§3 (Method): The description of the progressive context expansion schedule would benefit from a small table listing the crop sizes and corresponding training epochs to improve reproducibility.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on empirical training success rather than theoretical derivation; the approach assumes standard supervised learning and KD loss formulations hold under the chosen hardware constraints.

free parameters (2)

alpha
Distillation loss weighting hyperparameter set to 0.9 to achieve the reported recovery rate.
crop_size_schedule
Progressive expansion up to 1024x1024 chosen to balance context and NPU memory limits.

axioms (1)

domain assumption NPU-native primitives (3x3 conv, ReLU, nearest-neighbor upsampling) are sufficient to represent high-quality denoising mappings when supervised by a teacher.
Invoked when restricting the student architecture to these operations.

pith-pipeline@v0.9.0 · 5703 in / 1357 out tokens · 108764 ms · 2026-05-07T17:55:39.329059+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Abdelrahman Abdelhamed, Stephen Lin, and Michael S. Brown. A high-quality denoising dataset for smartphone cameras. InCVPR, pages 1692–1700, 2018. 2

2018
[2]

Abdelrahman Abdelhamed, Radu Timofte, and Michael S. Brown. NTIRE 2019 challenge on real image denoising: Methods and results. InCVPRW, pages 0–0, 2019. 2

2019
[3]

Abdelrahman Abdelhamed, Radu Timofte, and Michael S. Brown. NTIRE 2020 challenge on real image denoising: Dataset, methods and results. InCVPRW, pages 496–497,

2020
[4]

Real image denoising with feature attention

Saeed Anwar and Nick Barnes. Real image denoising with feature attention. InICCV, pages 3155–3164, 2019. 2

2019
[5]

Hinet: Half instance normalization network for image restoration

Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- peng Chen. Hinet: Half instance normalization network for image restoration. InCVPRW, pages 182–192, 2021. 2

2021
[6]

Simple baselines for image restoration

Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InECCV, pages 17– 33, 2022. 2

2022
[7]

AI on the edge: An automated pipeline for PyTorch-to-Android deployment and benchmarking.Preprints, 2025

Saif U Din, Muhammad Ahsan Hussain, Mohsin Ikram, Dmitry Ignatov, and Radu Timofte. AI on the edge: An automated pipeline for PyTorch-to-Android deployment and benchmarking.Preprints, 2025. 5

2025
[8]

LIR: A lightweight baseline for image restoration

Dongqi Fan, Ting Yue, Xin Zhao, Renjing Xu, and Liang Chang. LIR: A lightweight baseline for image restoration. arXiv preprint, arXiv:2402.01368, 2024. 3

work page arXiv 2024
[9]

Real-world mobile image denoising dataset with efficient baselines

Roman Flepp, Andrey Ignatov, Radu Timofte, and Luc Van Gool. Real-world mobile image denoising dataset with efficient baselines. InCVPR, pages 22743–22753, 2024. 2, 3

2024
[10]

LEMUR neural net- work dataset: Towards seamless AutoML.arXiv preprint, arXiv:2504.10552, 2025

Arash Torabi Goodarzi, Roman Kochnev, Waleed Khalid, Furui Qin, Tolgay Atinc Uzun, Yashkumar Sanjaybhai Dhameliya, Yash Kanubhai Kathiriya, Zofia Antonina Ben- tyn, Dmitry Ignatov, and Radu Timofte. LEMUR neural net- work dataset: Towards seamless AutoML.arXiv preprint, arXiv:2504.10552, 2025. 5

work page arXiv 2025
[11]

Toward convolutional blind denoising of real pho- tographs

Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. Toward convolutional blind denoising of real pho- tographs. InCVPR, pages 1712–1722, 2019. 2

2019
[12]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network.CoRR, abs/1503.02531,

work page internal anchor Pith review arXiv
[13]

Fast camera image denoising on mobile GPUs with deep learning, mobile AI 2021 challenge: Report

Andrey Ignatov, Byeoung-su Kim, Radu Timofte, et al. Fast camera image denoising on mobile GPUs with deep learning, mobile AI 2021 challenge: Report. InCVPRW, 2021. 3, 8

2021
[14]

Efficient image denoising on smartphone GPUs: Mobile AI 2026 challenge report

Andrey Ignatov, Georgy Perevozchikov, Radu Timofte, et al. Efficient image denoising on smartphone GPUs: Mobile AI 2026 challenge report. InCVPRW, 2026. To appear. 3, 8

2026
[15]

Lightweight modules for ef- ficient deep learning based image restoration.IEEE Trans- actions on Circuits and Systems for Video Technology, 31(3): 1168–1180, 2021

Avisek Lahiri, Sourav Bairagya, Sutanu Bera, Siddhant Hal- dar, and Prabir Kumar Biswas. Lightweight modules for ef- ficient deep learning based image restoration.IEEE Trans- actions on Circuits and Systems for Video Technology, 31(3): 1168–1180, 2021. 3

2021
[16]

Multiple degradation and reconstruction network for single image denoising via knowledge distillation

Juncheng Li, Hanhui Yang, Qiaosi Yi, Faming Fang, Guang- wei Gao, Tieyong Zeng, and Guixu Zhang. Multiple degradation and reconstruction network for single image denoising via knowledge distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 471–480, 2022. 2

2022
[17]

Swinir: Image restoration us- ing swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision Workshops, pages 1833–1844, 2021. 2

2021
[18]

Lightweight network to- wards real-time image denoising on mobile devices.arXiv preprint, arXiv:2211.04687, 2022

Zhuoqun Liu, Meiguang Jin, Ying Chen, Huaida Liu, Can- qian Yang, and Hongkai Xiong. Lightweight network to- wards real-time image denoising on mobile devices.arXiv preprint, arXiv:2211.04687, 2022. 3

work page arXiv 2022
[19]

Mobile AI workshop 2026.https://ai-benchmark.com/workshops/ mai/2026/, 2026

MAI 2026 Workshop Organizers. Mobile AI workshop 2026.https://ai-benchmark.com/workshops/ mai/2026/, 2026. Accessed: 2026-03-17. 3, 8

2026
[20]

Benchmarking denoising al- gorithms with real photographs

Tobias Pl ¨otz and Stefan Roth. Benchmarking denoising al- gorithms with real photographs. InCVPR, pages 1586–1595,
[21]

Fitnets: Hints for thin deep nets

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. InICLR, 2015. 2

2015
[22]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMedical Image Computing and Computer-Assisted Inter- vention (MICCAI), pages 234–241, 2015. 3, 5

2015
[23]

Practical deep raw image denoising on mobile devices

Yuzhi Wang, Haibin Huang, Qin Xu, Jiaming Liu, Yiqun Liu, and Jue Wang. Practical deep raw image denoising on mobile devices. InECCV, pages 1–16, 2020. 3

2020
[24]

Uformer: A gen- eral u-shaped transformer for image restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A gen- eral u-shaped transformer for image restoration. InCVPR, pages 17683–17693, 2022. 2

2022
[25]

Young, Fitsum A

Lucas D. Young, Fitsum A. Reda, Rakesh Ranjan, Jon Morton, Jun Hu, Yazhu Ling, Xiaoyu Xiang, David Liu, and Vikas Chandra. Feature-align network with knowledge distillation for efficient denoising. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pages 1–10, 2022. 2

2022
[26]

Learning enriched features for real image restoration and enhancement

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. InECCV, pages 492–511, 2020. 2

2020
[27]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InCVPR, pages 14821–14831, 2021

2021
[28]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InCVPR, pages 5728–5739, 2022. 2

2022
[29]

Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. InCVPR, pages 3929–3937,
[30]

FFDNet: To- ward a fast and flexible solution for CNN-based image de- noising.IEEE TIP, 27(9):4608–4622, 2018

Kai Zhang, Wangmeng Zuo, and Lei Zhang. FFDNet: To- ward a fast and flexible solution for CNN-based image de- noising.IEEE TIP, 27(9):4608–4622, 2018. 2

2018