Recognition: 2 theorem links
· Lean TheoremBanana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro
Pith reviewed 2026-05-13 20:14 UTC · model grok-4.3
The pith
Iterative image editing over 100 steps creates severely degraded images that standard quality metrics fail to rate lower than clean originals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using models such as Nano Banana Pro for 100 iterative replication and editing steps produces images with visible noise and instruction-following failures. When evaluated with 21 no-reference image quality assessment metrics, none consistently assign lower quality scores to these degraded images compared to their clean starting points, despite the clear visual deterioration.
What carries the argument
Banana100 dataset of images degraded through 100 iterative editing steps, used to test and reveal the limitations of NR-IQA metrics in detecting accumulated artifacts.
If this is right
- Multi-turn editing in agentic systems risks generating undetectable low-quality images.
- Quality filters based on current NR-IQA metrics may fail to prevent degraded data from entering training pipelines.
- The stability of future multi-modal models could be compromised by reliance on these flawed evaluators.
- Development of new robust quality assessment methods is necessary for safe deployment of iterative editing agents.
Where Pith is reading between the lines
- Similar issues may arise in other iterative generative processes beyond images, such as in video or 3D content creation.
- Agent designs could benefit from built-in quality checkpoints at fewer steps rather than relying on end-of-process metrics.
- Human preference studies might be needed to calibrate new metrics for detecting iterative degradation specifically.
Load-bearing premise
That the degradation from 100 iterative edits is a form of quality loss that NR-IQA metrics should be expected to detect and penalize.
What would settle it
Finding even one NR-IQA metric among the tested ones that assigns consistently lower scores to the 100-step degraded images than to the originals across the Banana100 dataset would contradict the claim.
Figures
read the original abstract
The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Banana100, a dataset of 28,000 images produced by applying 100 iterative editing steps to diverse source images using multi-modal agentic systems. It reports that none of 21 popular NR-IQA metrics consistently assign lower scores to the resulting heavily degraded images than to the clean originals, and argues that this reveals a dual weakness in generators (accumulating artifacts) and evaluators (failing to detect them), with potential risks for synthetic data in model training. The authors release code and data.
Significance. If the empirical results survive correction for metric polarity and are supported by appropriate statistical controls, the work would usefully document a practical limitation of current NR-IQA metrics when confronted with accumulated, instruction-driven degradations rather than conventional distortions. The public release of code and data is a clear strength that supports reproducibility and follow-on research on robust quality assessment for agentic image pipelines.
major comments (2)
- [Abstract] Abstract: the central claim that 'none of them consistently assign lower scores to heavily degraded images than to clean ones' presupposes a uniform polarity in which lower numerical output always indicates worse quality. Standard implementations differ (BRISQUE, NIQE, and PIQE output lower values for higher quality; many learned regressors output higher values for higher quality). Without explicit inversion or rescaling to a common convention before comparison, the reported failure cannot be interpreted as genuine insensitivity to the Banana100 degradations.
- [Methods] The manuscript provides no description of how 'consistently' was operationalized across the 28,000 images (e.g., fraction of images for which the degraded score is worse, any statistical test, or handling of ties). This detail is load-bearing for the claim that the metrics 'fail to detect the degradation.'
minor comments (1)
- A supplementary table listing the 21 metrics together with their native polarity (higher-better or lower-better) and the exact implementation version used would improve clarity and allow readers to verify the comparison protocol.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The comments correctly identify areas where the manuscript lacks sufficient clarity on metric handling and experimental definitions. We will make major revisions to address both points explicitly while preserving the core empirical findings.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'none of them consistently assign lower scores to heavily degraded images than to clean ones' presupposes a uniform polarity in which lower numerical output always indicates worse quality. Standard implementations differ (BRISQUE, NIQE, and PIQE output lower values for higher quality; many learned regressors output higher values for higher quality). Without explicit inversion or rescaling to a common convention before comparison, the reported failure cannot be interpreted as genuine insensitivity to the Banana100 degradations.
Authors: The referee is correct that the manuscript does not describe polarity normalization. In our analysis we inverted scores for BRISQUE, NIQE, and PIQE (and any other metrics with opposite polarity) so that, after normalization, lower values always indicate worse quality. This step was performed but omitted from the text. We will add a clear Methods subsection listing all inverted metrics and the normalization procedure. revision: yes
-
Referee: [Methods] The manuscript provides no description of how 'consistently' was operationalized across the 28,000 images (e.g., fraction of images for which the degraded score is worse, any statistical test, or handling of ties). This detail is load-bearing for the claim that the metrics 'fail to detect the degradation.'
Authors: We agree that the operational definition of 'consistently' must be stated explicitly. We defined it as the case in which the normalized score for the degraded image was not worse than the original in the majority of pairs (i.e., failure rate > 50 %). Ties were counted as failures to detect degradation. We will revise the Methods section to state this threshold, describe tie handling, and report per-metric fractions together with a paired sign test for statistical significance. revision: yes
Circularity Check
No circularity: direct empirical benchmarking without derivation or fitted inputs
full rationale
The paper is an empirical study that generates a dataset of iteratively degraded images and directly compares scores from 21 existing NR-IQA metrics on clean versus degraded versions. No mathematical derivation chain, parameter fitting, self-definitional equations, or load-bearing self-citations are present. The central claim rests on observed score comparisons rather than any reduction to prior inputs by construction. This is a standard experimental benchmarking setup that remains self-contained against external metrics and data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Iterative image editing accumulates visible noise and artifacts that standard NR-IQA metrics should detect as reduced quality.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We normalized NR-IQA scores (BRISQUE as an example) and calculated the difference across steps to quantify the score trend.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Krzysztof Adamkiewicz, Brian Moser, Stanislav Frolov, To- bias Christian Nauen, Federico Raue, and Andreas Dengel. When pretty isn’t useful: Investigating why modern text-to- image models fail as reliable training data generators.arXiv preprint arXiv:2602.19946, 2026. 2
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Lorenzo Agnolucci, Leonardo Galteri, and Marco Bertini. Quality-aware image-text alignment for opinion-unaware image quality assessment.arXiv preprint arXiv:2403.11176,
-
[3]
Arniqa: Learning distortion mani- fold for image quality assessment
Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, and Alberto Del Bimbo. Arniqa: Learning distortion mani- fold for image quality assessment. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 189–198, 2024. 6
work page 2024
-
[4]
Reed-vae: Re- encode decode training for iterative image editing with dif- fusion models
Gal Almog, Ariel Shamir, and Ohad Fried. Reed-vae: Re- encode decode training for iterative image editing with dif- fusion models. InComputer Graphics Forum, page e70020. Wiley Online Library, 2025. 2
work page 2025
-
[5]
10006 attemptA turn5.png.https : / / ml - site
Apple. 10006 attemptA turn5.png.https : / / ml - site . cdn - apple . com / datasets / pico - banana- 300k/nb/images/multi- turn/10006_ attemptA_turn5.png, 2026. Accessed: 2026-03-18. 2
work page 2026
-
[6]
10006 attemptA turn6.png.https : / / ml - site
Apple. 10006 attemptA turn6.png.https : / / ml - site . cdn - apple . com / datasets / pico - banana- 300k/nb/images/multi- turn/10006_ attemptA_turn6.png, 2026. Accessed: 2026-03-18. 2
work page 2026
-
[7]
Image Editing AI Leaderboard - Best Mod- els Compared.https://arena.ai/leaderboard/ image-edit, 2026
Arena AI. Image Editing AI Leaderboard - Best Mod- els Compared.https://arena.ai/leaderboard/ image-edit, 2026. Accessed: 2026-03-14. 4
work page 2026
-
[8]
FLUX.2: Frontier Visual Intelligence
Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux- 2, 2025. Accessed: 2026-03-14. 1, 8
work page 2025
-
[9]
The 2018 pirm challenge on percep- tual image super-resolution
Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, and Lihi Zelnik-Manor. The 2018 pirm challenge on percep- tual image super-resolution. InProceedings of the European conference on computer vision (ECCV) workshops, pages 0– 0, 2018. 6
work page 2018
-
[10]
Sebastian Bosse, Dominique Maniry, Klaus-Robert M ¨uller, Thomas Wiegand, and Wojciech Samek. Deep neural net- works for no-reference and full-reference image quality as- sessment.IEEE Transactions on image processing, 27(1): 206–219, 2017. 6
work page 2017
-
[11]
Hunyuanimage 3.0 technical report.arXiv preprint arXiv:2509.23951, 2025
Siyu Cao, Hangting Chen, Peng Chen, Yiji Cheng, Yutao Cui, Xinchi Deng, Ying Dong, Kipper Gong, Tianpeng Gu, Xiusen Gu, et al. Hunyuanimage 3.0 technical report.arXiv preprint arXiv:2509.23951, 2025. 1
-
[12]
Model Cards for IQA-PyTorch - py- iqa 0.1.13 documentation.https://iqa- pytorch
Chaofeng Chen. Model Cards for IQA-PyTorch - py- iqa 0.1.13 documentation.https://iqa- pytorch. readthedocs.io/en/latest/ModelCard.html,
-
[13]
Accessed: 2026-03-15. 6
work page 2026
-
[14]
Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024. 6
work page 2024
-
[15]
Toward gen- eralized image quality assessment: Relaxing the perfect ref- erence quality assumption
Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward gen- eralized image quality assessment: Relaxing the perfect ref- erence quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742– 12752, 2025. 4
work page 2025
-
[16]
Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, and Bing Zhao. Swe-ci: Evaluating agent capabilities in maintain- ing codebases via continuous integration.arXiv preprint arXiv:2603.03823, 2026. 2
-
[17]
Alexandre Ciancio, Eduardo AB Da Silva, Amir Said, Ramin Samadani, Pere Obrador, et al. No-reference blur assessment of digital pictures based on multifeature classifiers.IEEE Transactions on image processing, 20(1):64–75, 2010. 4
work page 2010
-
[18]
Perceptual quality assessment of smartphone pho- tography
Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality assessment of smartphone pho- tography. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3677–3686,
-
[19]
Deepti Ghadiyaram and Alan C Bovik. Massive online crowdsourced study of subjective and objective picture qual- ity.IEEE transactions on image processing, 25(1):372–387,
-
[20]
No-reference image quality assessment via transformers, rel- ative ranking, and self-consistency
S Alireza Golestaneh, Saba Dadsetan, and Kris M Kitani. No-reference image quality assessment via transformers, rel- ative ranking, and self-consistency. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1220–1230, 2022. 6
work page 2022
-
[21]
Google. Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind.https://blog.google/ innovation- and- ai/products/nano- banana- pro/, 2025. Accessed: 2026-03-19. 1
work page 2025
-
[22]
Google. Nano Banana 2: Combining Pro capabilities with lightning-fast speed.https://blog.google/ innovation - and - ai / technology / ai / nano - banana-2/, 2026. Accessed: 2026-03-14. 4, 8
work page 2026
-
[23]
SynthID - Google DeepMind.https: / / deepmind
Google DeepMind. SynthID - Google DeepMind.https: / / deepmind . google / models / synthid/, 2026. Accessed: 2026-03-14. 8
work page 2026
-
[24]
Steve G ¨oring, Rakesh Rao Ramachandra Rao, and Alexander Raake. Quality assessment of higher resolution images and videos with remote testing.Quality and user experience, 8 (1):2, 2023. 4
work page 2023
-
[25]
Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025
Sven Gowal, Rudy Bunel, Florian Stimberg, David Stutz, Guillermo Ortiz-Jimenez, Christina Kouridi, Mel Vecerik, Jamie Hayes, Sylvestre-Alvise Rebuffi, Paul Bernard, et al. Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025. 8
-
[26]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017. 6
work page 2017
-
[27]
Image quality metrics: Psnr vs
Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010. 6
work page 2010
-
[28]
Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. Koniq-10k: An ecologically valid database for deep learning 9 of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020. 4, 7
work page 2020
-
[29]
Uhd-iqa benchmark database: Push- ing the boundaries of blind photo quality assessment
Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, and Dietmar Saupe. Uhd-iqa benchmark database: Push- ing the boundaries of blind photo quality assessment. In European Conference on Computer Vision, pages 467–482. Springer, 2024. 4
work page 2024
-
[30]
Flaw or artifact? rethinking prompt sensitivity in evaluating LLMs
Andong Hua, Kenan Tang, Chenhe Gu, Jindong Gu, Eric Wong, and Yao Qin. Flaw or artifact? rethinking prompt sensitivity in evaluating LLMs. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing, pages 19889–19899, Suzhou, China, 2025. Associ- ation for Computational Linguistics. 3
work page 2025
-
[31]
Convolu- tional neural networks for no-reference image quality assess- ment
Le Kang, Peng Ye, Yi Li, and David Doermann. Convolu- tional neural networks for no-reference image quality assess- ment. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1733–1740, 2014. 6
work page 2014
-
[32]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 6
work page 2021
-
[33]
Chunyi Li, Zicheng Zhang, Haoning Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. Agiqa-3k: An open database for ai-generated image quality assessment.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(8):6833–6846, 2023. 4
work page 2023
-
[34]
Yucheng Liao, Jiajun Liang, Kaiqian Cui, Baoquan Zhao, Haoran Xie, Wei Liu, Qing Li, and Xudong Mao. Freqedit: Preserving high-frequency features for robust multi-turn im- age editing.arXiv preprint arXiv:2512.01755, 2025. 2
-
[35]
Beyond cosine similarity: Magnitude-aware clip for no- reference image quality assessment
Zhicheng Liao, Dongxu Wu, Zhenshan Shi, Sijie Mai, Han- wei Zhu, Lingyu Zhu, Yuncheng Jiang, and Baoliang Chen. Beyond cosine similarity: Magnitude-aware clip for no- reference image quality assessment. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6934– 6942, 2026. 6
work page 2026
-
[36]
Kadid-10k: A large-scale artificially distorted iqa database
Hanhe Lin, Vlad Hosu, and Dietmar Saupe. Kadid-10k: A large-scale artificially distorted iqa database. In2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pages 1–3. IEEE, 2019. 4
work page 2019
-
[37]
Jarvisart: Liberating human artistic creativity via an intelligent photo retouching agent
Yunlong Lin, ZiXu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding, Wenbo Li, and Shuicheng Y AN. Jarvisart: Liberating human artistic creativity via an intelligent photo retouching agent. InThe Thirty-ninth Annual Conference on Neural In- formation Processing Systems, 2025. 1, 4
work page 2025
-
[38]
Yiming Liu, Jue Wang, Sunghyun Cho, Adam Finkelstein, and Szymon Rusinkiewicz. A no-reference metric for eval- uating the quality of motion deblurring.ACM Transactions on Graphics, 2013. 4
work page 2013
-
[39]
Magicquill: An intelligent interactive image editing system
Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Wen Wang, Zhiheng Liu, Qifeng Chen, and Yujun Shen. Magicquill: An intelligent interactive image editing system. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13072–13082, 2025. 1
work page 2025
-
[40]
Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, et al. Magicquillv2: Precise and interac- tive image editing with layered visual cues.arXiv preprint arXiv:2512.03046, 2025. 1
-
[41]
Chao Ma, Chih-Yuan Yang, Xiaokang Yang, and Ming- Hsuan Yang. Learning a no-reference quality metric for single-image super-resolution.Computer Vision and Image Understanding, 158:1–16, 2017. 6
work page 2017
-
[42]
Xiongkuo Min, Guangtao Zhai, Ke Gu, Yucheng Zhu, Jiantao Zhou, Guodong Guo, Xiaokang Yang, Xinping Guan, and Wenjun Zhang. Quality evaluation of image de- hazing methods using synthetic hazy images.IEEE Transac- tions on Multimedia, 21(9):2319–2333, 2019. 4
work page 2019
-
[43]
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012. 6
work page 2012
-
[44]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6
work page 2012
-
[45]
Dynamic prompt optimizing for text- to-image generation
Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, and Qing Yang. Dynamic prompt optimizing for text- to-image generation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 26627–26636, 2024. 3
work page 2024
-
[46]
Ava: A large-scale database for aesthetic visual analysis
Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012. 4
work page 2012
-
[47]
Pico- banana-400k: A large-scale dataset for text-guided image editing, 2025
Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, and Zhe Gan. Pico- banana-400k: A large-scale dataset for text-guided image editing, 2025. 2, 4
work page 2025
-
[48]
Qwen/Qwen-Image-Edit-2511 - Hugging Face
Qwen. Qwen/Qwen-Image-Edit-2511 - Hugging Face. https://huggingface.co/Qwen/Qwen- Image- Edit-2511, 2025. Accessed: 2026-03-14. 8
work page 2025
-
[49]
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Team Seedream, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, Zhiyao Guo, Xiaoxia Hou, Weilin Huang, Yixuan Huang, et al. Seedream 4.0: Toward next- generation multimodal image generation.arXiv preprint arXiv:2509.20427, 2025. 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Your agent may misevolve: Emergent risks in self-evolving LLM agents
Shuai Shao, Qihan Ren, Chen Qian, Boyi Wei, Dadi Guo, Yang JingYi, Xinhao Song, Linfeng Zhang, Weinan Zhang, Dongrui Liu, and Jing Shao. Your agent may misevolve: Emergent risks in self-evolving LLM agents. InSocially Re- sponsible and Trustworthy Foundation Models at NeurIPS 2025, 2025. 2
work page 2025
-
[51]
Ai models collapse when trained on recursively generated data.Nature, 631 (8022):755–759, 2024
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Pa- pernot, Ross Anderson, and Yarin Gal. Ai models collapse when trained on recursively generated data.Nature, 631 (8022):755–759, 2024. 2
work page 2024
-
[52]
Blindly assess image qual- ity in the wild guided by a self-adaptive hyper network
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. Blindly assess image qual- ity in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3667–3676, 2020. 6
work page 2020
-
[53]
Nima: Neural image assessment.IEEE transactions on image processing, 27(8): 3998–4011, 2018
Hossein Talebi and Peyman Milanfar. Nima: Neural image assessment.IEEE transactions on image processing, 27(8): 3998–4011, 2018. 6 10
work page 2018
-
[54]
SPICE: A synergis- tic, precise, iterative, and customizable image editing work- flow
Kenan Tang, Yanhong Li, and Yao Qin. SPICE: A synergis- tic, precise, iterative, and customizable image editing work- flow. InThe Thirty-ninth Annual Conference on Neural In- formation Processing Systems Creative AI Track: Humanity,
-
[55]
Blind image quality evaluation using perception based features
Narasimhan Venkatanath, D Praneeth, S Channappayya Sumohana, S Medasani Swarup, et al. Blind image quality evaluation using perception based features. In2015 twenty first national conference on communications (NCC), pages 1–6. IEEE, 2015. 6
work page 2015
-
[56]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 6
work page 2023
-
[57]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6
work page 2004
-
[58]
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
Xinyu Wei, Kangrui Cen, Hongyang Wei, Zhen Guo, Bairui Li, Zeqing Wang, Jinrui Zhang, and Lei Zhang. Mico-150k: A comprehensive dataset advancing multi-image composi- tion.arXiv preprint arXiv:2512.07348, 2025. 4
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 1, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
Q-align: Teaching LMMs for visual scoring via discrete text-defined levels
Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guang- tao Zhai, and Weisi Lin. Q-align: Teaching LMMs for visual scoring via discrete text-defined levels. InForty-first Inter- national Conference on Machine Learning, 2024. 6
work page 2024
-
[61]
Visualquality-r1: Reasoning-induced image quality assess- ment via reinforcement learning to rank
Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. Visualquality-r1: Reasoning-induced image quality assess- ment via reinforcement learning to rank. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 6, 7
work page 2025
-
[62]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 6
work page 2022
-
[63]
arXiv preprint arXiv:2602.22809(2026)
Mingde Yao, Zhiyuan You, Tam-King Man, Menglu Wang, and Tianfan Xue. Photoagent: Agentic photo editing with exploratory visual aesthetic planning.arXiv preprint arXiv:2602.22809, 2026. 1, 4
-
[64]
arXiv preprint arXiv:2602.09084 (2026)
Ruijie Ye, Jiayi Zhang, Zhuoxin Liu, Zihao Zhu, Siyuan Yang, Li Li, Tianfu Fu, Franck Dernoncourt, Yue Zhao, Jiacheng Zhu, et al. Agent banana: High-fidelity image editing with agentic thinking and tooling.arXiv preprint arXiv:2602.09084, 2026. 1, 4
-
[65]
From patches to pictures (paq-2-piq): Mapping the perceptual space of pic- ture quality
Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Maha- jan, Deepti Ghadiyaram, and Alan Bovik. From patches to pictures (paq-2-piq): Mapping the perceptual space of pic- ture quality. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3575–3585,
-
[66]
Youngseok Yoon, Dainong Hu, Iain Weissburg, Yao Qin, and Haewon Jeong. Model collapse in the self-consuming chain of diffusion finetuning: A novel perspective from quantita- tive trait modeling.arXiv preprint arXiv:2407.17493, 2024. 2
-
[67]
Lin Zhang, Lei Zhang, and Alan C Bovik. A feature-enriched completely blind image quality evaluator.IEEE Transactions on Image Processing, 24(8):2579–2591, 2015. 6
work page 2015
-
[68]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6
work page 2018
-
[69]
Weixia Zhang, Kede Ma, Jia Yan, Dexiang Deng, and Zhou Wang. Blind image quality assessment using a deep bilinear convolutional neural network.IEEE Transactions on Cir- cuits and Systems for Video Technology, 30(1):36–47, 2018. 6
work page 2018
-
[70]
Blind image quality assessment via vision- language correspondence: A multitask learning perspective
Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 14071–14081, 2023. 6
work page 2023
-
[71]
Shijie Zhao, Xuanyu Zhang, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, and Jian Zhang. Reasoning as representation: Rethinking visual reinforcement learning in image quality assessment.Proceedings of the International Conference on Learning Representations (ICLR), 2026. 6, 7
work page 2026
-
[72]
Jialong Zuo, Haoyou Deng, Hanyu Zhou, Jiaxin Zhu, Yicheng Zhang, Yiwei Zhang, Yongxin Yan, Kaixing Huang, Weisen Chen, Yongtai Deng, et al. Is nano banana pro a low-level vision all-rounder? a comprehensive evaluation on 14 tasks and 40 datasets.arXiv preprint arXiv:2512.15110,
-
[73]
4KAgent: Agentic any image to 4k super- resolution
Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Ren- jie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu. 4KAgent: Agentic any image to 4k super- resolution. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1, 4 11
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.