GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Jixin Zhao; Lei Zhang; Lingchen Sun; Rongyuan Wu; Xiangtao Kong

arxiv: 2605.31039 · v2 · pith:UXY45WZQnew · submitted 2026-05-29 · 💻 cs.CV

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Xiangtao Kong , Jixin Zhao , Lingchen Sun , Rongyuan Wu , Lei Zhang This is my paper

Pith reviewed 2026-06-28 22:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords image restorationpaired datasetreal-world degradationgenerative modelsmultimodal foundation modelsdata synthesisgeneralizationground truth generation

0 comments

The pith

Generative multimodal models can create reliable high-quality targets from real low-quality images to train better image restoration models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real-world image restoration lacks enough paired low-quality and high-quality training data, so models trained on synthetic pairs often fail to handle actual camera degradations. The paper demonstrates that certain multimodal foundation models can generate perceptually realistic and content-faithful high-quality versions of real low-quality inputs. A pipeline built around the strongest of these models produces GGT-100K, a dataset of 103,707 training pairs plus a 500-pair test set covering diverse scenes and degradations. Models trained or fine-tuned on this dataset show improved performance on real-world restoration benchmarks. The approach is especially helpful when adapting generative restoration models.

Core claim

The authors establish that Nano-Banana-2 with VLM-based adaptive prompting produces high-quality targets from real low-quality images that are sufficiently content-faithful and perceptually realistic to serve as ground truth. They build a multi-stage quality-control pipeline around this capability, construct the GGT-100K paired dataset, and show that training or fine-tuning a range of image restoration models on it yields consistent gains in real-world generalization, with the largest benefits appearing when fine-tuning generative models.

What carries the argument

The GGT synthesis pipeline that uses Nano-Banana-2 with VLM-based adaptive prompting followed by multi-stage quality control to turn real low-quality inputs into usable high-quality targets.

If this is right

A wide range of image restoration models achieve better generalization to real-world degradations after training on GGT-100K.
Finetuning generative models for image restoration receives particularly large gains from the new pairs.
Multimodal foundation models can function as practical tools for generating restoration-oriented training data.
The resulting dataset expands the set of usable training resources beyond expensive real paired captures or synthetic degradations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same generative-target approach could be applied to other low-level vision tasks that currently lack large paired real-world datasets.
Larger-scale versions of GGT-100K might further close the gap between synthetic and real training distributions.
The quality-control stages in the pipeline could be adapted to filter outputs from other generative models for similar data-generation tasks.

Load-bearing premise

The high-quality targets generated by Nano-Banana-2 with VLM-based adaptive prompting are sufficiently content-faithful and perceptually realistic to serve as effective ground truth for training image restoration models on real-world degradations.

What would settle it

If image restoration models trained on GGT-100K show no improvement or outright worse results on the paper's 500-pair real-world test set compared with models trained on prior synthetic or captured paired datasets, the central claim would be falsified.

read the original abstract

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GGT-100K gives a workable route to large real-world IR pairs via MFM generation, but the faithfulness of those targets rests on indirect checks that leave room for generator bias.

read the letter

The paper's core move is to treat strong multimodal models as a source of clean targets for real degraded photos. They run nine MFMs through a head-to-head test on perceptual quality and content match, settle on Nano-Banana-2 with VLM-guided prompting, add staged filtering, and release 103k training pairs plus a 500-pair test set. That scale and the explicit comparison of generators are the parts that stand out.

The experiments then show consistent gains when several restoration models are trained or fine-tuned on the new pairs, with bigger lifts for generative IR approaches. The practical payoff is clear: real paired data is scarce, and this pipeline offers one way to expand it without new camera captures.

The weak link is exactly what the stress-test flags. No ground-truth clean images exist for the chosen real LQ inputs, so faithfulness is judged by VLM scores, proxy metrics on synthetic cases, and human review of obvious failures. That leaves open the chance that the generated targets carry model-specific priors or small semantic shifts. If test images are also produced the same way, reported generalization gains could partly reflect matching the generator rather than recovering the original scene. The paper would be stronger with more direct probes for this, such as downstream task consistency or cross-generator comparisons on held-out real data.

The work is aimed at people who train or fine-tune restoration models and need more diverse real-world pairs. It is coherent on its own terms and engages the literature on data scarcity, so it merits a full referee process even if the data-quality argument needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Generative Ground Truth (GGT) by leveraging multimodal foundation models (MFMs) to synthesize high-quality targets from real-world low-quality images. After systematically evaluating nine MFMs, it identifies Nano-Banana-2 with VLM-based adaptive prompting as superior for perceptual realism and content faithfulness, then applies a multi-stage quality-control pipeline to construct the GGT-100K dataset (103,707 training pairs plus a 500-pair test set) covering diverse scenes and real degradations. Experiments claim that training or finetuning a range of image restoration models on GGT-100K yields consistent gains in real-world generalization, with especially strong benefits for generative IR models.

Significance. If the generated targets are verifiably content-faithful, the work would address a core bottleneck in real-world image restoration by providing a scalable source of paired data without physical capture. The systematic MFM comparison and the reported empirical gains across multiple model families constitute a practical contribution; credit is due for the scale of the released dataset and the focus on finetuning generative restorers.

major comments (2)

[MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.
[Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.

minor comments (2)

[Abstract and Method] Model names such as 'Nano-Banana-2' and 'GPT-Image-2' appear non-standard; clarify whether they are pseudonyms, internal versions, or specific checkpoints to support reproducibility.
[Dataset Construction] The multi-stage quality-control procedure is described at a high level; additional quantitative thresholds or failure-mode statistics would help readers assess how subtle hallucinations are filtered.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The two major comments identify important assumptions underlying the GGT-100K construction and evaluation. We respond point-by-point below, indicating planned revisions where the concerns are valid.

read point-by-point responses

Referee: [MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.

Authors: We agree that content faithfulness for real-world LQ inputs can only be assessed indirectly, as no paired real HQ references exist. The systematic comparison on synthetic cases (where ground truth is available) and VLM proxy judgments provide supporting evidence for selecting Nano-Banana-2, but this does not constitute direct proof for real degradations. We will revise the manuscript to add an explicit limitations subsection discussing the indirect assessment and its implications for dataset validity. revision: yes
Referee: [Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.

Authors: This concern is valid: because the test set is produced by the same pipeline, observed gains could partly reflect adaptation to the specific MFM prior rather than broader restoration capability on unseen real degradations. We will revise the manuscript to explicitly acknowledge this limitation, reframe the generalization claims accordingly, and clarify that the reported results demonstrate performance on GGT-style pairs rather than fully independent real-world test distributions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction and evaluation study

full rationale

The paper is an empirical study that evaluates nine MFMs on perceptual and content metrics, selects Nano-Banana-2, applies multi-stage filtering to synthesize 103k LQ-HQ pairs, and reports downstream IR model improvements on real-world test data. No equations, predictions, or derivations are claimed; the central claim rests on experimental outcomes rather than any self-referential reduction or fitted parameter renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear. The work is self-contained against external benchmarks (human/VLM judgments and IR model performance) with no reduction of results to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no access to any mathematical formulations, parameter fittings, or axiomatic assumptions used in the work.

pith-pipeline@v0.9.1-grok · 5853 in / 1163 out tokens · 34500 ms · 2026-06-28T22:40:17.857236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 20 canonical work pages · 7 internal anchors

[1]

Foundir: Unleashing million-scale training data to advance foundation models for image restoration

Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, and Jinshan Pan. Foundir: Unleashing million-scale training data to advance foundation models for image restoration. InProceedings of the IEEE/CVF international conference on computer vision, pages 12626–12636, 2025

2025
[2]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, YileiChen,YuxiangChen,ZechengTang,ZekaiZhang,ZhengyiWang,AnYang,BowenYu,ChenCheng,Dayiheng Liu,DeqingLi,HangZhang,HaoMeng,HuWei,JingyuanNi,KaiChen,KuanCao,LiangPeng,LinQu,Minggang Wu, Peng Wang, Shuting Yu, Tingkun Wen, Wensen Feng, Xiaoxiao Xu, Yi ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

2017
[4]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 184–199. Springer, 2014

2014
[5]

Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

2019
[6]

Rethinking coarse-to-fine approach in single image deblurring

Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko. Rethinking coarse-to-fine approach in single image deblurring. InProceedings of the IEEE/CVF international conference on computer vision, pages 4641–4650, 2021

2021
[7]

Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel

JianruiCai,HuiZeng,HongweiYong,ZishengCao,andLeiZhang. Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel. InProceedingsoftheIEEE/CVFInternationalConferenceon ComputerVision, pages 3086–3095, 2019

2019
[8]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

2021
[9]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

2021
[10]

Simple baselines for image restoration

Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 17–33. Springer, 2022

2022
[11]

A comparative study of image restoration networks for general backbone network design

Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024

2024
[12]

All-In-OneImageRestorationforUnknown Corruption

BoyunLi,XiaoLiu,PengHu,ZhongqinWu,JianchengLv,andXiPeng. All-In-OneImageRestorationforUnknown Corruption. InIEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, June 2022

2022
[13]

Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

VaishnavPotlapalli,SyedWaqasZamir,SalmanKhan,andFahadShahbazKhan. Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[14]

Complexity experts are task-discriminative learners for any image restoration, 2024

Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Complexity experts are task-discriminative learners for any image restoration, 2024

2024
[15]

Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

ZiweiLuo,FredrikKGustafsson,ZhengZhao,JensSjölund,andThomasBSchön. Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

work page arXiv 2024
[16]

Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang, Yu Qiao, and Chao Dong. Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

work page arXiv 2023
[17]

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

FanghuaYu, JinjinGu, ZheyuanLi, JinfanHu, XiangtaoKong, XintaoWang, JingwenHe, YuQiao, andChaoDong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024. Visual Computing Lab·The Hong Kong Polytechni...

2024
[18]

Seesr: Towards semantics- aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024

2024
[19]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021

2021
[20]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021

1905
[21]

Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

work page arXiv 2022
[22]

Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

2023
[23]

A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, and Chao Dong. A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

work page arXiv 2024
[24]

Component divide-and-conquer for real-world image super-resolution

Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 101–117. Springer, 2020

2020
[25]

Benchmarking denoising algorithms with real photographs

Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1586–1595, 2017

2017
[26]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Gpt-image-1.5 model documentation

OpenAI. Gpt-image-1.5 model documentation. https://platform.openai.com/docs/models/gpt-image-1.5, 2025

2025
[28]

Gpt-image-2 model documentation

OpenAI. Gpt-image-2 model documentation. https://platform.openai.com/docs/models/gpt-image-2, 2025

2025
[29]

Black Forest Labs. Flux. https://blackforestlabs.ai/announcing-black-forest-labs/, 2024

2024
[30]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

2022
[31]

Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

JonathanHo,AjayJain,andPieterAbbeel. Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

2020
[32]

Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

Hongyang Wei, Shuaizheng Liu, Chun Yuan, and Lei Zhang. Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

work page arXiv 2025
[33]

Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

Weixiong Sun, Xiang Yin, and Chao Dong. Can nano banana 2 replace traditional image restoration models? an evaluation of its performance on image restoration tasks.arXiv preprint arXiv:2604.03061, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, Fukun Yin, Jianzhuang Liu, Wei Cheng, Shiyu Liu, Yuqi Peng, Gang YU, Shifeng Chen, et al. Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

work page arXiv 2026
[35]

Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

2021
[36]

Robustvideocontentalignmentandcompensation forrainremovalinacnnframework

JieChen, Cheen-HauTan, JunhuiHou,Lap-PuiChau, andHeLi. Robustvideocontentalignmentandcompensation forrainremovalinacnnframework. InIEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR), pages 6341–6349, 2018. doi: 10.1109/CVPR.2018.00658

work page doi:10.1109/cvpr.2018.00658 2018
[37]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

2009
[38]

Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019

Yuen Peng Loh and Chee Seng Chan. Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019. doi: https://doi.org/10.1016/j.cviu.2018.10.010. Visual Computing Lab·The Hong Kong Polytechnic University 15 / 33

work page doi:10.1016/j.cviu.2018.10.010 2019
[39]

Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

Wenhan Yang, Ye Yuan, Wenqi Ren, Jiaying Liu, Walter J Scheirer, Zhangyang Wang, Taiheng Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, et al. Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

2020
[40]

ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, October 2021

2021
[41]

Unsplash. Website. https://unsplash.com
[42]

Pexels. Website. https://www.pexels.com
[43]

Pixabay. Website. https://pixabay.com
[44]

Flickr. Website. https://www.flickr.com
[45]

Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

Super Intelligence Team, Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, et al. Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

work page arXiv 2026
[46]

FLUX.2: Frontier Visual Intelligence

Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025

2025
[47]

Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration

KlingTeamandMiraclePlus. Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration. https://klingai.com, 2025

2025
[48]

Seedream 4.0-5.0 tutorial

ByteDance. Seedream 4.0-5.0 tutorial. https://docs.byteplus.com/zh-CN/docs/ModelArk/1824121, 2025

work page arXiv 2025
[49]

ChatGPT GPT-5.4 Release Notes

OpenAI. ChatGPT GPT-5.4 Release Notes. https://help.openai.com/en/articles/6825453-chatgpt-release-notes, 2026

work page arXiv 2026
[50]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017

2017
[51]

A high-quality denoising dataset for smartphone cameras

Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1692–1700, 2018

2018
[52]

FaithDiff: Unleashing diffusion priors for faithful image super-resolution

Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithDiff: Unleashing diffusion priors for faithful image super-resolution. InIEEE Conference on Computer Vision and Pattern Recognition, 2025

2025
[53]

Jarvisir: Elevating autonomous driving perception with intelligent image restoration

Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Wen Kairun, Yeying Jin, Wenbo Li, and Xinghao Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2025
[54]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

2004
[55]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

2018
[56]

Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

2020
[57]

A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

Lin Zhang, Lei Zhang, and Alan C Bovik. A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

2015
[58]

Musiq: Multi-scaleimagequalitytransformer

JunjieKe,QifeiWang,YilinWang, PeymanMilanfar, andFengYang. Musiq: Multi-scaleimagequalitytransformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021

2021
[59]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022

2022
[60]

Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

2024
[61]

Toward generalized image quality assessment: Relaxing the perfect reference quality assumption

Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward generalized image quality assessment: Relaxing the perfect reference quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742–12752, 2025. Visual Computing Lab·The Hong Kong Polytechnic University 16 / 33

2025
[62]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[63]

Pytorch: An imperative style, high-performance deep learning library

AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019
[64]

Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults

Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults. InProceedingsoftheIEEEconferenceoncomputervisionandpattern recognition workshops, pages 114–125, 2017

2017
[65]

LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

XiangChen,HaoLi,JiangxinDong,JinshanPan,XinLi,XinHe,NaiweiChen,ShengyuanLi,FengningLiu,Haoyi Lv, et al. Lovif 2026 challenge on real-world all-in-one image restoration: Methods and results.arXiv preprint arXiv:2604.19445, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[66]

Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms

JaesungRim, HaeyunLee, JucheolWon, andSunghyunCho. Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms. InEuropean conference on computer vision, pages 184–201. Springer, 2020

2020
[67]

Real-world Noisy Image Denoising: A New Benchmark

J Xu, H Li, Z Liang, D Zhang, and L Zhang. Real-world noisy image denoising: A new benchmark. arxiv 2018. arXiv preprint arXiv:1804.02603

work page internal anchor Pith review Pith/arXiv arXiv 2018
[68]

Deep Retinex Decomposition for Low-Light Enhancement

Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[69]

Embedding fourier for ultra-high-definition low-light image enhancement

Chongyi Li, Chun-Le Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding fourier for ultra-high-definition low-light image enhancement. InICLR, 2023

2023
[70]

Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration

Qiyuan Guan, Qianfeng Yang, Xiang Chen, Tianyu Song, Guiyue Jin, and Jiyu Jin. Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration. InProceedings of the 33rd ACM international conference on multimedia, pages 12607–12613, 2025

2025
[71]

Deep joint rain detection andremovalfromasingleimage

Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection andremovalfromasingleimage. InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition, pages 1357–1366, 2017

2017
[72]

Density-aware single image de-raining using a multi-stream dense network

He Zhang and Vishal M Patel. Density-aware single image de-raining using a multi-stream dense network. InCVPR, 2018

2018
[73]

Removing raindrops and rain streaks in one go

Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9147–9156, 2021

2021
[74]

do not change the image content

Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond.arXiv preprint arXiv:2206.05514, 2022. Visual Computing Lab·The Hong Kong Polytechnic University 17 / 33 Appendix In this appendix, we provide the following materials: •A.More details of source image collection f...

work page arXiv 2022
[75]

Scene content and depth structure (foreground/background, sky)
[76]

Haze characteristics (global veil, local dense haze, low contrast, color shift, bright-airlight effect)
[77]

Then output one detailed English restoration prompt

Severity and where haze is strongest. Then output one detailed English restoration prompt. It must: - Set fidelity as the first priority: preserve scene content, geometry, and structure. - In all regions where content is identifiable (including mid/background elements that are still visible), aim for the best possible haze removal and visibility recovery ...
[78]

FIRST IMAGE = LQ (low-quality degraded input)
[79]

SECOND IMAGE = HQ (restored output) EVALUATE 5 DIMENSIONS IN FULL DETAIL:
[80]

RESTORATION QUALITY (0-100)

Showing first 80 references.

[1] [1]

Foundir: Unleashing million-scale training data to advance foundation models for image restoration

Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, and Jinshan Pan. Foundir: Unleashing million-scale training data to advance foundation models for image restoration. InProceedings of the IEEE/CVF international conference on computer vision, pages 12626–12636, 2025

2025

[2] [2]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, YileiChen,YuxiangChen,ZechengTang,ZekaiZhang,ZhengyiWang,AnYang,BowenYu,ChenCheng,Dayiheng Liu,DeqingLi,HangZhang,HaoMeng,HuWei,JingyuanNi,KaiChen,KuanCao,LiangPeng,LinQu,Minggang Wu, Peng Wang, Shuting Yu, Tingkun Wen, Wensen Feng, Xiaoxiao Xu, Yi ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

2017

[4] [4]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 184–199. Springer, 2014

2014

[5] [5]

Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

2019

[6] [6]

Rethinking coarse-to-fine approach in single image deblurring

Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko. Rethinking coarse-to-fine approach in single image deblurring. InProceedings of the IEEE/CVF international conference on computer vision, pages 4641–4650, 2021

2021

[7] [7]

Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel

JianruiCai,HuiZeng,HongweiYong,ZishengCao,andLeiZhang. Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel. InProceedingsoftheIEEE/CVFInternationalConferenceon ComputerVision, pages 3086–3095, 2019

2019

[8] [8]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

2021

[9] [9]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

2021

[10] [10]

Simple baselines for image restoration

Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 17–33. Springer, 2022

2022

[11] [11]

A comparative study of image restoration networks for general backbone network design

Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024

2024

[12] [12]

All-In-OneImageRestorationforUnknown Corruption

BoyunLi,XiaoLiu,PengHu,ZhongqinWu,JianchengLv,andXiPeng. All-In-OneImageRestorationforUnknown Corruption. InIEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, June 2022

2022

[13] [13]

Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

VaishnavPotlapalli,SyedWaqasZamir,SalmanKhan,andFahadShahbazKhan. Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[14] [14]

Complexity experts are task-discriminative learners for any image restoration, 2024

Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Complexity experts are task-discriminative learners for any image restoration, 2024

2024

[15] [15]

Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

ZiweiLuo,FredrikKGustafsson,ZhengZhao,JensSjölund,andThomasBSchön. Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

work page arXiv 2024

[16] [16]

Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang, Yu Qiao, and Chao Dong. Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

work page arXiv 2023

[17] [17]

Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

FanghuaYu, JinjinGu, ZheyuanLi, JinfanHu, XiangtaoKong, XintaoWang, JingwenHe, YuQiao, andChaoDong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024. Visual Computing Lab·The Hong Kong Polytechni...

2024

[18] [18]

Seesr: Towards semantics- aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024

2024

[19] [19]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021

2021

[20] [20]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021

1905

[21] [21]

Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

work page arXiv 2022

[22] [22]

Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

2023

[23] [23]

A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, and Chao Dong. A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

work page arXiv 2024

[24] [24]

Component divide-and-conquer for real-world image super-resolution

Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 101–117. Springer, 2020

2020

[25] [25]

Benchmarking denoising algorithms with real photographs

Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1586–1595, 2017

2017

[26] [26]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Gpt-image-1.5 model documentation

OpenAI. Gpt-image-1.5 model documentation. https://platform.openai.com/docs/models/gpt-image-1.5, 2025

2025

[28] [28]

Gpt-image-2 model documentation

OpenAI. Gpt-image-2 model documentation. https://platform.openai.com/docs/models/gpt-image-2, 2025

2025

[29] [29]

Black Forest Labs. Flux. https://blackforestlabs.ai/announcing-black-forest-labs/, 2024

2024

[30] [30]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

2022

[31] [31]

Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

JonathanHo,AjayJain,andPieterAbbeel. Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

2020

[32] [32]

Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

Hongyang Wei, Shuaizheng Liu, Chun Yuan, and Lei Zhang. Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

work page arXiv 2025

[33] [33]

Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

Weixiong Sun, Xiang Yin, and Chao Dong. Can nano banana 2 replace traditional image restoration models? an evaluation of its performance on image restoration tasks.arXiv preprint arXiv:2604.03061, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[34] [34]

Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, Fukun Yin, Jianzhuang Liu, Wei Cheng, Shiyu Liu, Yuqi Peng, Gang YU, Shifeng Chen, et al. Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

work page arXiv 2026

[35] [35]

Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

2021

[36] [36]

Robustvideocontentalignmentandcompensation forrainremovalinacnnframework

JieChen, Cheen-HauTan, JunhuiHou,Lap-PuiChau, andHeLi. Robustvideocontentalignmentandcompensation forrainremovalinacnnframework. InIEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR), pages 6341–6349, 2018. doi: 10.1109/CVPR.2018.00658

work page doi:10.1109/cvpr.2018.00658 2018

[37] [37]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

2009

[38] [38]

Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019

Yuen Peng Loh and Chee Seng Chan. Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019. doi: https://doi.org/10.1016/j.cviu.2018.10.010. Visual Computing Lab·The Hong Kong Polytechnic University 15 / 33

work page doi:10.1016/j.cviu.2018.10.010 2019

[39] [39]

Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

Wenhan Yang, Ye Yuan, Wenqi Ren, Jiaying Liu, Walter J Scheirer, Zhangyang Wang, Taiheng Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, et al. Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

2020

[40] [40]

ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, October 2021

2021

[41] [41]

Unsplash. Website. https://unsplash.com

[42] [42]

Pexels. Website. https://www.pexels.com

[43] [43]

Pixabay. Website. https://pixabay.com

[44] [44]

Flickr. Website. https://www.flickr.com

[45] [45]

Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

Super Intelligence Team, Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, et al. Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

work page arXiv 2026

[46] [46]

FLUX.2: Frontier Visual Intelligence

Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025

2025

[47] [47]

Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration

KlingTeamandMiraclePlus. Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration. https://klingai.com, 2025

2025

[48] [48]

Seedream 4.0-5.0 tutorial

ByteDance. Seedream 4.0-5.0 tutorial. https://docs.byteplus.com/zh-CN/docs/ModelArk/1824121, 2025

work page arXiv 2025

[49] [49]

ChatGPT GPT-5.4 Release Notes

OpenAI. ChatGPT GPT-5.4 Release Notes. https://help.openai.com/en/articles/6825453-chatgpt-release-notes, 2026

work page arXiv 2026

[50] [50]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017

2017

[51] [51]

A high-quality denoising dataset for smartphone cameras

Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1692–1700, 2018

2018

[52] [52]

FaithDiff: Unleashing diffusion priors for faithful image super-resolution

Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithDiff: Unleashing diffusion priors for faithful image super-resolution. InIEEE Conference on Computer Vision and Pattern Recognition, 2025

2025

[53] [53]

Jarvisir: Elevating autonomous driving perception with intelligent image restoration

Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Wen Kairun, Yeying Jin, Wenbo Li, and Xinghao Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2025

[54] [54]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

2004

[55] [55]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

2018

[56] [56]

Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

2020

[57] [57]

A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

Lin Zhang, Lei Zhang, and Alan C Bovik. A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

2015

[58] [58]

Musiq: Multi-scaleimagequalitytransformer

JunjieKe,QifeiWang,YilinWang, PeymanMilanfar, andFengYang. Musiq: Multi-scaleimagequalitytransformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021

2021

[59] [59]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022

2022

[60] [60]

Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

2024

[61] [61]

Toward generalized image quality assessment: Relaxing the perfect reference quality assumption

Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward generalized image quality assessment: Relaxing the perfect reference quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742–12752, 2025. Visual Computing Lab·The Hong Kong Polytechnic University 16 / 33

2025

[62] [62]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[63] [63]

Pytorch: An imperative style, high-performance deep learning library

AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019

[64] [64]

Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults

Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults. InProceedingsoftheIEEEconferenceoncomputervisionandpattern recognition workshops, pages 114–125, 2017

2017

[65] [65]

LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

XiangChen,HaoLi,JiangxinDong,JinshanPan,XinLi,XinHe,NaiweiChen,ShengyuanLi,FengningLiu,Haoyi Lv, et al. Lovif 2026 challenge on real-world all-in-one image restoration: Methods and results.arXiv preprint arXiv:2604.19445, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[66] [66]

Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms

JaesungRim, HaeyunLee, JucheolWon, andSunghyunCho. Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms. InEuropean conference on computer vision, pages 184–201. Springer, 2020

2020

[67] [67]

Real-world Noisy Image Denoising: A New Benchmark

J Xu, H Li, Z Liang, D Zhang, and L Zhang. Real-world noisy image denoising: A new benchmark. arxiv 2018. arXiv preprint arXiv:1804.02603

work page internal anchor Pith review Pith/arXiv arXiv 2018

[68] [68]

Deep Retinex Decomposition for Low-Light Enhancement

Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[69] [69]

Embedding fourier for ultra-high-definition low-light image enhancement

Chongyi Li, Chun-Le Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding fourier for ultra-high-definition low-light image enhancement. InICLR, 2023

2023

[70] [70]

Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration

Qiyuan Guan, Qianfeng Yang, Xiang Chen, Tianyu Song, Guiyue Jin, and Jiyu Jin. Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration. InProceedings of the 33rd ACM international conference on multimedia, pages 12607–12613, 2025

2025

[71] [71]

Deep joint rain detection andremovalfromasingleimage

Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection andremovalfromasingleimage. InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition, pages 1357–1366, 2017

2017

[72] [72]

Density-aware single image de-raining using a multi-stream dense network

He Zhang and Vishal M Patel. Density-aware single image de-raining using a multi-stream dense network. InCVPR, 2018

2018

[73] [73]

Removing raindrops and rain streaks in one go

Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9147–9156, 2021

2021

[74] [74]

do not change the image content

Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond.arXiv preprint arXiv:2206.05514, 2022. Visual Computing Lab·The Hong Kong Polytechnic University 17 / 33 Appendix In this appendix, we provide the following materials: •A.More details of source image collection f...

work page arXiv 2022

[75] [75]

Scene content and depth structure (foreground/background, sky)

[76] [76]

Haze characteristics (global veil, local dense haze, low contrast, color shift, bright-airlight effect)

[77] [77]

Then output one detailed English restoration prompt

Severity and where haze is strongest. Then output one detailed English restoration prompt. It must: - Set fidelity as the first priority: preserve scene content, geometry, and structure. - In all regions where content is identifiable (including mid/background elements that are still visible), aim for the best possible haze removal and visibility recovery ...

[78] [78]

FIRST IMAGE = LQ (low-quality degraded input)

[79] [79]

SECOND IMAGE = HQ (restored output) EVALUATE 5 DIMENSIONS IN FULL DETAIL:

[80] [80]

RESTORATION QUALITY (0-100)