pith. sign in

arxiv: 2605.31039 · v2 · pith:UXY45WZQnew · submitted 2026-05-29 · 💻 cs.CV

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Pith reviewed 2026-06-28 22:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationpaired datasetreal-world degradationgenerative modelsmultimodal foundation modelsdata synthesisgeneralizationground truth generation
0
0 comments X

The pith

Generative multimodal models can create reliable high-quality targets from real low-quality images to train better image restoration models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real-world image restoration lacks enough paired low-quality and high-quality training data, so models trained on synthetic pairs often fail to handle actual camera degradations. The paper demonstrates that certain multimodal foundation models can generate perceptually realistic and content-faithful high-quality versions of real low-quality inputs. A pipeline built around the strongest of these models produces GGT-100K, a dataset of 103,707 training pairs plus a 500-pair test set covering diverse scenes and degradations. Models trained or fine-tuned on this dataset show improved performance on real-world restoration benchmarks. The approach is especially helpful when adapting generative restoration models.

Core claim

The authors establish that Nano-Banana-2 with VLM-based adaptive prompting produces high-quality targets from real low-quality images that are sufficiently content-faithful and perceptually realistic to serve as ground truth. They build a multi-stage quality-control pipeline around this capability, construct the GGT-100K paired dataset, and show that training or fine-tuning a range of image restoration models on it yields consistent gains in real-world generalization, with the largest benefits appearing when fine-tuning generative models.

What carries the argument

The GGT synthesis pipeline that uses Nano-Banana-2 with VLM-based adaptive prompting followed by multi-stage quality control to turn real low-quality inputs into usable high-quality targets.

If this is right

  • A wide range of image restoration models achieve better generalization to real-world degradations after training on GGT-100K.
  • Finetuning generative models for image restoration receives particularly large gains from the new pairs.
  • Multimodal foundation models can function as practical tools for generating restoration-oriented training data.
  • The resulting dataset expands the set of usable training resources beyond expensive real paired captures or synthetic degradations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same generative-target approach could be applied to other low-level vision tasks that currently lack large paired real-world datasets.
  • Larger-scale versions of GGT-100K might further close the gap between synthetic and real training distributions.
  • The quality-control stages in the pipeline could be adapted to filter outputs from other generative models for similar data-generation tasks.

Load-bearing premise

The high-quality targets generated by Nano-Banana-2 with VLM-based adaptive prompting are sufficiently content-faithful and perceptually realistic to serve as effective ground truth for training image restoration models on real-world degradations.

What would settle it

If image restoration models trained on GGT-100K show no improvement or outright worse results on the paper's 500-pair real-world test set compared with models trained on prior synthetic or captured paired datasets, the central claim would be falsified.

read the original abstract

Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Generative Ground Truth (GGT) by leveraging multimodal foundation models (MFMs) to synthesize high-quality targets from real-world low-quality images. After systematically evaluating nine MFMs, it identifies Nano-Banana-2 with VLM-based adaptive prompting as superior for perceptual realism and content faithfulness, then applies a multi-stage quality-control pipeline to construct the GGT-100K dataset (103,707 training pairs plus a 500-pair test set) covering diverse scenes and real degradations. Experiments claim that training or finetuning a range of image restoration models on GGT-100K yields consistent gains in real-world generalization, with especially strong benefits for generative IR models.

Significance. If the generated targets are verifiably content-faithful, the work would address a core bottleneck in real-world image restoration by providing a scalable source of paired data without physical capture. The systematic MFM comparison and the reported empirical gains across multiple model families constitute a practical contribution; credit is due for the scale of the released dataset and the focus on finetuning generative restorers.

major comments (2)
  1. [MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.
  2. [Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.
minor comments (2)
  1. [Abstract and Method] Model names such as 'Nano-Banana-2' and 'GPT-Image-2' appear non-standard; clarify whether they are pseudonyms, internal versions, or specific checkpoints to support reproducibility.
  2. [Dataset Construction] The multi-stage quality-control procedure is described at a high level; additional quantitative thresholds or failure-mode statistics would help readers assess how subtle hallucinations are filtered.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The two major comments identify important assumptions underlying the GGT-100K construction and evaluation. We respond point-by-point below, indicating planned revisions where the concerns are valid.

read point-by-point responses
  1. Referee: [MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.

    Authors: We agree that content faithfulness for real-world LQ inputs can only be assessed indirectly, as no paired real HQ references exist. The systematic comparison on synthetic cases (where ground truth is available) and VLM proxy judgments provide supporting evidence for selecting Nano-Banana-2, but this does not constitute direct proof for real degradations. We will revise the manuscript to add an explicit limitations subsection discussing the indirect assessment and its implications for dataset validity. revision: yes

  2. Referee: [Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.

    Authors: This concern is valid: because the test set is produced by the same pipeline, observed gains could partly reflect adaptation to the specific MFM prior rather than broader restoration capability on unseen real degradations. We will revise the manuscript to explicitly acknowledge this limitation, reframe the generalization claims accordingly, and clarify that the reported results demonstrate performance on GGT-style pairs rather than fully independent real-world test distributions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction and evaluation study

full rationale

The paper is an empirical study that evaluates nine MFMs on perceptual and content metrics, selects Nano-Banana-2, applies multi-stage filtering to synthesize 103k LQ-HQ pairs, and reports downstream IR model improvements on real-world test data. No equations, predictions, or derivations are claimed; the central claim rests on experimental outcomes rather than any self-referential reduction or fitted parameter renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear. The work is self-contained against external benchmarks (human/VLM judgments and IR model performance) with no reduction of results to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no access to any mathematical formulations, parameter fittings, or axiomatic assumptions used in the work.

pith-pipeline@v0.9.1-grok · 5853 in / 1163 out tokens · 34500 ms · 2026-06-28T22:40:17.857236+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 20 canonical work pages · 7 internal anchors

  1. [1]

    Foundir: Unleashing million-scale training data to advance foundation models for image restoration

    Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, and Jinshan Pan. Foundir: Unleashing million-scale training data to advance foundation models for image restoration. InProceedings of the IEEE/CVF international conference on computer vision, pages 12626–12636, 2025

  2. [2]

    Qwen-Image Technical Report

    Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, YileiChen,YuxiangChen,ZechengTang,ZekaiZhang,ZhengyiWang,AnYang,BowenYu,ChenCheng,Dayiheng Liu,DeqingLi,HangZhang,HaoMeng,HuWei,JingyuanNi,KaiChen,KuanCao,LiangPeng,LinQu,Minggang Wu, Peng Wang, Shuting Yu, Tingkun Wen, Wensen Feng, Xiaoxiao Xu, Yi ...

  3. [3]

    Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017

  4. [4]

    Learning a deep convolutional network for image super-resolution

    Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 184–199. Springer, 2014

  5. [5]

    Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019

  6. [6]

    Rethinking coarse-to-fine approach in single image deblurring

    Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko. Rethinking coarse-to-fine approach in single image deblurring. InProceedings of the IEEE/CVF international conference on computer vision, pages 4641–4650, 2021

  7. [7]

    Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel

    JianruiCai,HuiZeng,HongweiYong,ZishengCao,andLeiZhang. Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel. InProceedingsoftheIEEE/CVFInternationalConferenceon ComputerVision, pages 3086–3095, 2019

  8. [8]

    Swinir: Image restoration using swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

  9. [9]

    Multi-stage progressive image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

  10. [10]

    Simple baselines for image restoration

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 17–33. Springer, 2022

  11. [11]

    A comparative study of image restoration networks for general backbone network design

    Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024

  12. [12]

    All-In-OneImageRestorationforUnknown Corruption

    BoyunLi,XiaoLiu,PengHu,ZhongqinWu,JianchengLv,andXiPeng. All-In-OneImageRestorationforUnknown Corruption. InIEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, June 2022

  13. [13]

    Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

    VaishnavPotlapalli,SyedWaqasZamir,SalmanKhan,andFahadShahbazKhan. Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023

  14. [14]

    Complexity experts are task-discriminative learners for any image restoration, 2024

    Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Complexity experts are task-discriminative learners for any image restoration, 2024

  15. [15]

    Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

    ZiweiLuo,FredrikKGustafsson,ZhengZhao,JensSjölund,andThomasBSchön. Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024

  16. [16]

    Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

    Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang, Yu Qiao, and Chao Dong. Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023

  17. [17]

    Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild

    FanghuaYu, JinjinGu, ZheyuanLi, JinfanHu, XiangtaoKong, XintaoWang, JingwenHe, YuQiao, andChaoDong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024. Visual Computing Lab·The Hong Kong Polytechni...

  18. [18]

    Seesr: Towards semantics- aware real-world image super-resolution

    Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024

  19. [19]

    Designing a practical degradation model for deep blind image super-resolution

    Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021

  20. [20]

    Real-esrgan: Training real-world blind super-resolution with pure synthetic data

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021

  21. [21]

    Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

    Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022

  22. [22]

    Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

    Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  23. [23]

    A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

    Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, and Chao Dong. A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024

  24. [24]

    Component divide-and-conquer for real-world image super-resolution

    Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 101–117. Springer, 2020

  25. [25]

    Benchmarking denoising algorithms with real photographs

    Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1586–1595, 2017

  26. [26]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

  27. [27]

    Gpt-image-1.5 model documentation

    OpenAI. Gpt-image-1.5 model documentation. https://platform.openai.com/docs/models/gpt-image-1.5, 2025

  28. [28]

    Gpt-image-2 model documentation

    OpenAI. Gpt-image-2 model documentation. https://platform.openai.com/docs/models/gpt-image-2, 2025

  29. [29]

    Black Forest Labs. Flux. https://blackforestlabs.ai/announcing-black-forest-labs/, 2024

  30. [30]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

  31. [31]

    Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

    JonathanHo,AjayJain,andPieterAbbeel. Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020

  32. [32]

    Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

    Hongyang Wei, Shuaizheng Liu, Chun Yuan, and Lei Zhang. Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025

  33. [33]

    Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks

    Weixiong Sun, Xiang Yin, and Chao Dong. Can nano banana 2 replace traditional image restoration models? an evaluation of its performance on image restoration tasks.arXiv preprint arXiv:2604.03061, 2026

  34. [34]

    Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

    Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, Fukun Yin, Jianzhuang Liu, Wei Cheng, Shiyu Liu, Yuqi Peng, Gang YU, Shifeng Chen, et al. Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026

  35. [35]

    Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

    Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021

  36. [36]

    Robustvideocontentalignmentandcompensation forrainremovalinacnnframework

    JieChen, Cheen-HauTan, JunhuiHou,Lap-PuiChau, andHeLi. Robustvideocontentalignmentandcompensation forrainremovalinacnnframework. InIEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR), pages 6341–6349, 2018. doi: 10.1109/CVPR.2018.00658

  37. [37]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

  38. [38]

    Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019

    Yuen Peng Loh and Chee Seng Chan. Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019. doi: https://doi.org/10.1016/j.cviu.2018.10.010. Visual Computing Lab·The Hong Kong Polytechnic University 15 / 33

  39. [39]

    Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

    Wenhan Yang, Ye Yuan, Wenqi Ren, Jiaying Liu, Walter J Scheirer, Zhangyang Wang, Taiheng Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, et al. Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020

  40. [40]

    ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding

    Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, October 2021

  41. [41]

    Unsplash. Website. https://unsplash.com

  42. [42]

    Pexels. Website. https://www.pexels.com

  43. [43]

    Pixabay. Website. https://pixabay.com

  44. [44]

    Flickr. Website. https://www.flickr.com

  45. [45]

    Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

    Super Intelligence Team, Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, et al. Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026

  46. [46]

    FLUX.2: Frontier Visual Intelligence

    Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025

  47. [47]

    Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration

    KlingTeamandMiraclePlus. Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration. https://klingai.com, 2025

  48. [48]

    Seedream 4.0-5.0 tutorial

    ByteDance. Seedream 4.0-5.0 tutorial. https://docs.byteplus.com/zh-CN/docs/ModelArk/1824121, 2025

  49. [49]

    ChatGPT GPT-5.4 Release Notes

    OpenAI. ChatGPT GPT-5.4 Release Notes. https://help.openai.com/en/articles/6825453-chatgpt-release-notes, 2026

  50. [50]

    Ntire 2017 challenge on single image super-resolution: Dataset and study

    Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017

  51. [51]

    A high-quality denoising dataset for smartphone cameras

    Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1692–1700, 2018

  52. [52]

    FaithDiff: Unleashing diffusion priors for faithful image super-resolution

    Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithDiff: Unleashing diffusion priors for faithful image super-resolution. InIEEE Conference on Computer Vision and Pattern Recognition, 2025

  53. [53]

    Jarvisir: Elevating autonomous driving perception with intelligent image restoration

    Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Wen Kairun, Yeying Jin, Wenbo Li, and Xinghao Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

  54. [54]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

  55. [55]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

  56. [56]

    Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

    Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020

  57. [57]

    A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

    Lin Zhang, Lei Zhang, and Alan C Bovik. A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015

  58. [58]

    Musiq: Multi-scaleimagequalitytransformer

    JunjieKe,QifeiWang,YilinWang, PeymanMilanfar, andFengYang. Musiq: Multi-scaleimagequalitytransformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021

  59. [59]

    Maniqa: Multi-dimension attention network for no-reference image quality assessment

    Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022

  60. [60]

    Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

    Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024

  61. [61]

    Toward generalized image quality assessment: Relaxing the perfect reference quality assumption

    Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward generalized image quality assessment: Relaxing the perfect reference quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742–12752, 2025. Visual Computing Lab·The Hong Kong Polytechnic University 16 / 33

  62. [62]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685, 2021

  63. [63]

    Pytorch: An imperative style, high-performance deep learning library

    AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

  64. [64]

    Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults

    Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults. InProceedingsoftheIEEEconferenceoncomputervisionandpattern recognition workshops, pages 114–125, 2017

  65. [65]

    LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results

    XiangChen,HaoLi,JiangxinDong,JinshanPan,XinLi,XinHe,NaiweiChen,ShengyuanLi,FengningLiu,Haoyi Lv, et al. Lovif 2026 challenge on real-world all-in-one image restoration: Methods and results.arXiv preprint arXiv:2604.19445, 2026

  66. [66]

    Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms

    JaesungRim, HaeyunLee, JucheolWon, andSunghyunCho. Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms. InEuropean conference on computer vision, pages 184–201. Springer, 2020

  67. [67]

    Real-world Noisy Image Denoising: A New Benchmark

    J Xu, H Li, Z Liang, D Zhang, and L Zhang. Real-world noisy image denoising: A new benchmark. arxiv 2018. arXiv preprint arXiv:1804.02603

  68. [68]

    Deep Retinex Decomposition for Low-Light Enhancement

    Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018

  69. [69]

    Embedding fourier for ultra-high-definition low-light image enhancement

    Chongyi Li, Chun-Le Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding fourier for ultra-high-definition low-light image enhancement. InICLR, 2023

  70. [70]

    Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration

    Qiyuan Guan, Qianfeng Yang, Xiang Chen, Tianyu Song, Guiyue Jin, and Jiyu Jin. Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration. InProceedings of the 33rd ACM international conference on multimedia, pages 12607–12613, 2025

  71. [71]

    Deep joint rain detection andremovalfromasingleimage

    Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection andremovalfromasingleimage. InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition, pages 1357–1366, 2017

  72. [72]

    Density-aware single image de-raining using a multi-stream dense network

    He Zhang and Vishal M Patel. Density-aware single image de-raining using a multi-stream dense network. InCVPR, 2018

  73. [73]

    Removing raindrops and rain streaks in one go

    Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9147–9156, 2021

  74. [74]

    do not change the image content

    Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond.arXiv preprint arXiv:2206.05514, 2022. Visual Computing Lab·The Hong Kong Polytechnic University 17 / 33 Appendix In this appendix, we provide the following materials: •A.More details of source image collection f...

  75. [75]

    Scene content and depth structure (foreground/background, sky)

  76. [76]

    Haze characteristics (global veil, local dense haze, low contrast, color shift, bright-airlight effect)

  77. [77]

    Then output one detailed English restoration prompt

    Severity and where haze is strongest. Then output one detailed English restoration prompt. It must: - Set fidelity as the first priority: preserve scene content, geometry, and structure. - In all regions where content is identifiable (including mid/background elements that are still visible), aim for the best possible haze removal and visibility recovery ...

  78. [78]

    FIRST IMAGE = LQ (low-quality degraded input)

  79. [79]

    SECOND IMAGE = HQ (restored output) EVALUATE 5 DIMENSIONS IN FULL DETAIL:

  80. [80]

    RESTORATION QUALITY (0-100)

Showing first 80 references.