GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
Pith reviewed 2026-06-28 22:40 UTC · model grok-4.3
The pith
Generative multimodal models can create reliable high-quality targets from real low-quality images to train better image restoration models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that Nano-Banana-2 with VLM-based adaptive prompting produces high-quality targets from real low-quality images that are sufficiently content-faithful and perceptually realistic to serve as ground truth. They build a multi-stage quality-control pipeline around this capability, construct the GGT-100K paired dataset, and show that training or fine-tuning a range of image restoration models on it yields consistent gains in real-world generalization, with the largest benefits appearing when fine-tuning generative models.
What carries the argument
The GGT synthesis pipeline that uses Nano-Banana-2 with VLM-based adaptive prompting followed by multi-stage quality control to turn real low-quality inputs into usable high-quality targets.
If this is right
- A wide range of image restoration models achieve better generalization to real-world degradations after training on GGT-100K.
- Finetuning generative models for image restoration receives particularly large gains from the new pairs.
- Multimodal foundation models can function as practical tools for generating restoration-oriented training data.
- The resulting dataset expands the set of usable training resources beyond expensive real paired captures or synthetic degradations.
Where Pith is reading between the lines
- The same generative-target approach could be applied to other low-level vision tasks that currently lack large paired real-world datasets.
- Larger-scale versions of GGT-100K might further close the gap between synthetic and real training distributions.
- The quality-control stages in the pipeline could be adapted to filter outputs from other generative models for similar data-generation tasks.
Load-bearing premise
The high-quality targets generated by Nano-Banana-2 with VLM-based adaptive prompting are sufficiently content-faithful and perceptually realistic to serve as effective ground truth for training image restoration models on real-world degradations.
What would settle it
If image restoration models trained on GGT-100K show no improvement or outright worse results on the paper's 500-pair real-world test set compared with models trained on prior synthetic or captured paired datasets, the central claim would be falsified.
read the original abstract
Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Generative Ground Truth (GGT) by leveraging multimodal foundation models (MFMs) to synthesize high-quality targets from real-world low-quality images. After systematically evaluating nine MFMs, it identifies Nano-Banana-2 with VLM-based adaptive prompting as superior for perceptual realism and content faithfulness, then applies a multi-stage quality-control pipeline to construct the GGT-100K dataset (103,707 training pairs plus a 500-pair test set) covering diverse scenes and real degradations. Experiments claim that training or finetuning a range of image restoration models on GGT-100K yields consistent gains in real-world generalization, with especially strong benefits for generative IR models.
Significance. If the generated targets are verifiably content-faithful, the work would address a core bottleneck in real-world image restoration by providing a scalable source of paired data without physical capture. The systematic MFM comparison and the reported empirical gains across multiple model families constitute a practical contribution; credit is due for the scale of the released dataset and the focus on finetuning generative restorers.
major comments (2)
- [MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.
- [Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.
minor comments (2)
- [Abstract and Method] Model names such as 'Nano-Banana-2' and 'GPT-Image-2' appear non-standard; clarify whether they are pseudonyms, internal versions, or specific checkpoints to support reproducibility.
- [Dataset Construction] The multi-stage quality-control procedure is described at a high level; additional quantitative thresholds or failure-mode statistics would help readers assess how subtle hallucinations are filtered.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The two major comments identify important assumptions underlying the GGT-100K construction and evaluation. We respond point-by-point below, indicating planned revisions where the concerns are valid.
read point-by-point responses
-
Referee: [MFM Evaluation] MFM evaluation and selection: The claim that Nano-Banana-2 outputs serve as reliable ground truth rests on VLM judgments and proxy metrics evaluated on synthetic cases; because no real paired high-quality references exist for the chosen real-world LQ inputs, content faithfulness can only be assessed indirectly. This assumption is load-bearing for the dataset's validity and for the generalization results.
Authors: We agree that content faithfulness for real-world LQ inputs can only be assessed indirectly, as no paired real HQ references exist. The systematic comparison on synthetic cases (where ground truth is available) and VLM proxy judgments provide supporting evidence for selecting Nano-Banana-2, but this does not constitute direct proof for real degradations. We will revise the manuscript to add an explicit limitations subsection discussing the indirect assessment and its implications for dataset validity. revision: yes
-
Referee: [Experiments] Test-set construction and generalization experiments: The 500-pair test set is generated by the identical MFM pipeline used for training data. Reported improvements may therefore reflect models learning the generator's prior rather than performing true restoration on unseen real degradations, weakening the central claim of improved real-world generalization.
Authors: This concern is valid: because the test set is produced by the same pipeline, observed gains could partly reflect adaptation to the specific MFM prior rather than broader restoration capability on unseen real degradations. We will revise the manuscript to explicitly acknowledge this limitation, reframe the generalization claims accordingly, and clarify that the reported results demonstrate performance on GGT-style pairs rather than fully independent real-world test distributions. revision: yes
Circularity Check
No circularity: empirical dataset construction and evaluation study
full rationale
The paper is an empirical study that evaluates nine MFMs on perceptual and content metrics, selects Nano-Banana-2, applies multi-stage filtering to synthesize 103k LQ-HQ pairs, and reports downstream IR model improvements on real-world test data. No equations, predictions, or derivations are claimed; the central claim rests on experimental outcomes rather than any self-referential reduction or fitted parameter renamed as prediction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear. The work is self-contained against external benchmarks (human/VLM judgments and IR model performance) with no reduction of results to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Foundir: Unleashing million-scale training data to advance foundation models for image restoration
Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, and Jinshan Pan. Foundir: Unleashing million-scale training data to advance foundation models for image restoration. InProceedings of the IEEE/CVF international conference on computer vision, pages 12626–12636, 2025
2025
-
[2]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, YileiChen,YuxiangChen,ZechengTang,ZekaiZhang,ZhengyiWang,AnYang,BowenYu,ChenCheng,Dayiheng Liu,DeqingLi,HangZhang,HaoMeng,HuWei,JingyuanNi,KaiChen,KuanCao,LiangPeng,LinQu,Minggang Wu, Peng Wang, Shuting Yu, Tingkun Wen, Wensen Feng, Xiaoxiao Xu, Yi ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE transactions on image processing, 26(7):3142–3155, 2017
2017
-
[4]
Learning a deep convolutional network for image super-resolution
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 184–199. Springer, 2014
2014
-
[5]
Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019
Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE Transactions on Image Processing, 28(1):492–505, 2019
2019
-
[6]
Rethinking coarse-to-fine approach in single image deblurring
Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko. Rethinking coarse-to-fine approach in single image deblurring. InProceedings of the IEEE/CVF international conference on computer vision, pages 4641–4650, 2021
2021
-
[7]
Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel
JianruiCai,HuiZeng,HongweiYong,ZishengCao,andLeiZhang. Towardreal-worldsingleimagesuper-resolution: Anewbenchmarkandanewmodel. InProceedingsoftheIEEE/CVFInternationalConferenceon ComputerVision, pages 3086–3095, 2019
2019
-
[8]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021
2021
-
[9]
Multi-stage progressive image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021
2021
-
[10]
Simple baselines for image restoration
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII, pages 17–33. Springer, 2022
2022
-
[11]
A comparative study of image restoration networks for general backbone network design
Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024
2024
-
[12]
All-In-OneImageRestorationforUnknown Corruption
BoyunLi,XiaoLiu,PengHu,ZhongqinWu,JianchengLv,andXiPeng. All-In-OneImageRestorationforUnknown Corruption. InIEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, June 2022
2022
-
[13]
Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023
VaishnavPotlapalli,SyedWaqasZamir,SalmanKhan,andFahadShahbazKhan. Promptir: Promptingforall-in-one blind image restoration.Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[14]
Complexity experts are task-discriminative learners for any image restoration, 2024
Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Complexity experts are task-discriminative learners for any image restoration, 2024
2024
-
[15]
ZiweiLuo,FredrikKGustafsson,ZhengZhao,JensSjölund,andThomasBSchön. Photo-realisticimagerestoration in the wild with controlled vision-language models.arXiv preprint arXiv:2404.09732, 2024
-
[16]
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang, Yu Qiao, and Chao Dong. Diffbir: Towards blind image restoration with generative diffusion prior.arXiv preprint arXiv:2308.15070, 2023
-
[17]
Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild
FanghuaYu, JinjinGu, ZheyuanLi, JinfanHu, XiangtaoKong, XintaoWang, JingwenHe, YuQiao, andChaoDong. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25669–25680, 2024. Visual Computing Lab·The Hong Kong Polytechni...
2024
-
[18]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024
2024
-
[19]
Designing a practical degradation model for deep blind image super-resolution
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. Designing a practical degradation model for deep blind image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021
2021
-
[20]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021
1905
-
[21]
Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.arXiv preprint arXiv:2205.07019, 2022
-
[22]
Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong. Evaluating the generalization ability of super-resolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
2023
-
[23]
A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024
Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, and Chao Dong. A preliminary exploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024
-
[24]
Component divide-and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 101–117. Springer, 2020
2020
-
[25]
Benchmarking denoising algorithms with real photographs
Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1586–1595, 2017
2017
-
[26]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Gpt-image-1.5 model documentation
OpenAI. Gpt-image-1.5 model documentation. https://platform.openai.com/docs/models/gpt-image-1.5, 2025
2025
-
[28]
Gpt-image-2 model documentation
OpenAI. Gpt-image-2 model documentation. https://platform.openai.com/docs/models/gpt-image-2, 2025
2025
-
[29]
Black Forest Labs. Flux. https://blackforestlabs.ai/announcing-black-forest-labs/, 2024
2024
-
[30]
Restormer: Efficient transformer for high-resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022
2022
-
[31]
Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020
JonathanHo,AjayJain,andPieterAbbeel. Denoisingdiffusionprobabilisticmodels.Advancesinneuralinformation processing systems, 33:6840–6851, 2020
2020
-
[32]
Hongyang Wei, Shuaizheng Liu, Chun Yuan, and Lei Zhang. Perceive, understand and restore: Real-world image super-resolution with autoregressive multimodal generative models.arXiv preprint arXiv:2503.11073, 2025
-
[33]
Weixiong Sun, Xiang Yin, and Chao Dong. Can nano banana 2 replace traditional image restoration models? an evaluation of its performance on image restoration tasks.arXiv preprint arXiv:2604.03061, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Yufeng Yang, Xianfang Zeng, Zhangqi Jiang, Fukun Yin, Jianzhuang Liu, Wei Cheng, Shiyu Liu, Yuqi Peng, Gang YU, Shifeng Chen, et al. Realrestorer: Towards generalizable real-world image restoration with large-scale image editing models.arXiv preprint arXiv:2603.25502, 2026
-
[35]
Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021
Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. Deep dense multi-scale network for snow removal using semantic and geometric priors.IEEE Transactions on Image Processing, 2021
2021
-
[36]
Robustvideocontentalignmentandcompensation forrainremovalinacnnframework
JieChen, Cheen-HauTan, JunhuiHou,Lap-PuiChau, andHeLi. Robustvideocontentalignmentandcompensation forrainremovalinacnnframework. InIEEE/CVFConferenceonComputerVisionandPatternRecognition(CVPR), pages 6341–6349, 2018. doi: 10.1109/CVPR.2018.00658
-
[37]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009
2009
-
[38]
Yuen Peng Loh and Chee Seng Chan. Getting to know low-light images with the exclusively dark dataset.Computer Vision and Image Understanding, 178:30–42, 2019. doi: https://doi.org/10.1016/j.cviu.2018.10.010. Visual Computing Lab·The Hong Kong Polytechnic University 15 / 33
-
[39]
Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020
Wenhan Yang, Ye Yuan, Wenqi Ren, Jiaying Liu, Walter J Scheirer, Zhangyang Wang, Taiheng Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, et al. Advancing image understanding in poor visibility environments: A collective benchmark study.IEEE Transactions on Image Processing, 29:5737–5752, 2020
2020
-
[40]
ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. InProceedings of the IEEE/CVF International Conference on Computer Vision, October 2021
2021
-
[41]
Unsplash. Website. https://unsplash.com
-
[42]
Pexels. Website. https://www.pexels.com
-
[43]
Pixabay. Website. https://pixabay.com
-
[44]
Flickr. Website. https://www.flickr.com
-
[45]
Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026
Super Intelligence Team, Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, et al. Firered-image-edit-1.0 technical report.arXiv preprint arXiv:2602.13344, 2026
-
[46]
FLUX.2: Frontier Visual Intelligence
Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025
2025
-
[47]
Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration
KlingTeamandMiraclePlus. Kling-image-o1: Technicalreportonhigh-fidelityvideogeneration. https://klingai.com, 2025
2025
-
[48]
ByteDance. Seedream 4.0-5.0 tutorial. https://docs.byteplus.com/zh-CN/docs/ModelArk/1824121, 2025
-
[49]
OpenAI. ChatGPT GPT-5.4 Release Notes. https://help.openai.com/en/articles/6825453-chatgpt-release-notes, 2026
-
[50]
Ntire 2017 challenge on single image super-resolution: Dataset and study
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017
2017
-
[51]
A high-quality denoising dataset for smartphone cameras
Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1692–1700, 2018
2018
-
[52]
FaithDiff: Unleashing diffusion priors for faithful image super-resolution
Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithDiff: Unleashing diffusion priors for faithful image super-resolution. InIEEE Conference on Computer Vision and Pattern Recognition, 2025
2025
-
[53]
Jarvisir: Elevating autonomous driving perception with intelligent image restoration
Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Wen Kairun, Yeying Jin, Wenbo Li, and Xinghao Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2025
-
[54]
Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004
2004
-
[55]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018
2018
-
[56]
Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020
Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020
2020
-
[57]
A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015
Lin Zhang, Lei Zhang, and Alan C Bovik. A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015
2015
-
[58]
Musiq: Multi-scaleimagequalitytransformer
JunjieKe,QifeiWang,YilinWang, PeymanMilanfar, andFengYang. Musiq: Multi-scaleimagequalitytransformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021
2021
-
[59]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022
2022
-
[60]
Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024
Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024
2024
-
[61]
Toward generalized image quality assessment: Relaxing the perfect reference quality assumption
Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward generalized image quality assessment: Relaxing the perfect reference quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742–12752, 2025. Visual Computing Lab·The Hong Kong Polytechnic University 16 / 33
2025
-
[62]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.arXiv preprint arXiv:2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[63]
Pytorch: An imperative style, high-performance deep learning library
AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019
2019
-
[64]
Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults
Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single imagesuper-resolution: Methodsandresults. InProceedingsoftheIEEEconferenceoncomputervisionandpattern recognition workshops, pages 114–125, 2017
2017
-
[65]
LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and Results
XiangChen,HaoLi,JiangxinDong,JinshanPan,XinLi,XinHe,NaiweiChen,ShengyuanLi,FengningLiu,Haoyi Lv, et al. Lovif 2026 challenge on real-world all-in-one image restoration: Methods and results.arXiv preprint arXiv:2604.19445, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[66]
Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms
JaesungRim, HaeyunLee, JucheolWon, andSunghyunCho. Real-worldblurdatasetforlearningandbenchmarking deblurring algorithms. InEuropean conference on computer vision, pages 184–201. Springer, 2020
2020
-
[67]
Real-world Noisy Image Denoising: A New Benchmark
J Xu, H Li, Z Liang, D Zhang, and L Zhang. Real-world noisy image denoising: A new benchmark. arxiv 2018. arXiv preprint arXiv:1804.02603
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[68]
Deep Retinex Decomposition for Low-Light Enhancement
Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[69]
Embedding fourier for ultra-high-definition low-light image enhancement
Chongyi Li, Chun-Le Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding fourier for ultra-high-definition low-light image enhancement. InICLR, 2023
2023
-
[70]
Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration
Qiyuan Guan, Qianfeng Yang, Xiang Chen, Tianyu Song, Guiyue Jin, and Jiyu Jin. Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration. InProceedings of the 33rd ACM international conference on multimedia, pages 12607–12613, 2025
2025
-
[71]
Deep joint rain detection andremovalfromasingleimage
Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. Deep joint rain detection andremovalfromasingleimage. InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition, pages 1357–1366, 2017
2017
-
[72]
Density-aware single image de-raining using a multi-stream dense network
He Zhang and Vishal M Patel. Density-aware single image de-raining using a multi-stream dense network. InCVPR, 2018
2018
-
[73]
Removing raindrops and rain streaks in one go
Ruijie Quan, Xin Yu, Yuanzhi Liang, and Yi Yang. Removing raindrops and rain streaks in one go. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9147–9156, 2021
2021
-
[74]
do not change the image content
Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, and Dacheng Tao. Toward real-world single image deraining: A new benchmark and beyond.arXiv preprint arXiv:2206.05514, 2022. Visual Computing Lab·The Hong Kong Polytechnic University 17 / 33 Appendix In this appendix, we provide the following materials: •A.More details of source image collection f...
-
[75]
Scene content and depth structure (foreground/background, sky)
-
[76]
Haze characteristics (global veil, local dense haze, low contrast, color shift, bright-airlight effect)
-
[77]
Then output one detailed English restoration prompt
Severity and where haze is strongest. Then output one detailed English restoration prompt. It must: - Set fidelity as the first priority: preserve scene content, geometry, and structure. - In all regions where content is identifiable (including mid/background elements that are still visible), aim for the best possible haze removal and visibility recovery ...
-
[78]
FIRST IMAGE = LQ (low-quality degraded input)
-
[79]
SECOND IMAGE = HQ (restored output) EVALUATE 5 DIMENSIONS IN FULL DETAIL:
-
[80]
RESTORATION QUALITY (0-100)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.