Recognition: no theorem link
Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback
Pith reviewed 2026-05-16 20:53 UTC · model grok-4.3
The pith
A reinforcement learning agent trained solely on multimodal LLM perceptual feedback learns efficient restoration sequences and matches state-of-the-art image quality without any labels or supervision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a policy-optimization framework in which a sequential decision agent learns to output tool-calling sequences that maximize final image quality, with the only training signal supplied by multimodal LLM perceptual feedback in a completely label-free setting. This yields a deterministic restoration plan that runs faster than prior agent-based methods while matching supervised performance on full-reference metrics and improving on no-reference metrics.
What carries the argument
The policy optimization agent that selects the next restoration operation at each step to maximize the multimodal LLM's perceptual quality reward.
If this is right
- Restoration agents can be trained end-to-end in label-free environments for any combination of degradations.
- Inference speed improves because the trained policy produces a single deterministic sequence without reflection or rollback steps.
- No-reference quality metrics improve because the LLM feedback directly optimizes for human-aligned perceptual criteria.
- The same training loop can be applied to new tool sets or new degradation types by swapping only the LLM evaluator.
Where Pith is reading between the lines
- The approach could be extended to video restoration by treating frame sequences as the state space and using the same LLM reward on temporal consistency.
- If the LLM evaluator is kept fixed, the method offers a way to benchmark new restoration tools without collecting human annotations.
- The deterministic policy may serve as a fast initialization for fine-tuning on small labeled sets when higher precision is needed.
Load-bearing premise
Multimodal LLMs can reliably judge perceptual image quality in a way that aligns with human preferences and is stable enough to train an effective restoration policy without ground-truth labels.
What would settle it
An experiment in which the same agent is retrained with the multimodal LLM reward replaced by random scores or by scores from a non-perceptual metric and then evaluated on held-out multi-degradation images to check whether performance collapses.
Figures
read the original abstract
Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations such as blur, noise, rain, and compression artifacts. Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from significant efficiency bottlenecks due to reflection, rollback, and iterative tool searching. Moreover, their performance heavily depends on degradation recognition models that require extensive annotations for training, limiting their applicability in label-free environments. To address these limitations, we propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences. The agent operates in a sequential decision process, selecting the most appropriate restoration operation at each step to maximize final image quality. To enable training within label-free environments, we introduce a novel reward mechanism driven by multimodal large language models, which act as human-aligned evaluator and provide perceptual feedback for policy improvement. Once trained, our agent executes a deterministic restoration plans without redundant tool invocations, significantly accelerating inference while maintaining high restoration quality. Extensive experiments show that despite using no supervision, our method matches SOTA performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Restore-R1, a reinforcement learning framework for training a lightweight policy that selects sequences of image restoration tools. Training occurs in a label-free setting by using multimodal LLMs as perceptual evaluators to supply the sole reward signal. The resulting deterministic agent is claimed to match SOTA performance on full-reference metrics (PSNR/SSIM) while surpassing prior methods on no-reference metrics across multiple degradation types, with substantially faster inference by eliminating iterative reflection and tool search.
Significance. If the empirical claims hold after proper validation, the work would demonstrate that MLLM-derived rewards can substitute for supervised signals in restoration policy learning, offering a route to label-free training and efficient inference for complex degradations.
major comments (3)
- [Abstract] Abstract: the central claim that the method 'matches SOTA performance on full-reference metrics' despite using no supervision is unsupported; no correlation, ablation, or human validation between MLLM perceptual scores and objective metrics (PSNR/SSIM) is reported, leaving open the possibility that the policy optimizes for LLM biases rather than pixel-level fidelity.
- [Method] Method / Reward section: the reward formulation relies on an external MLLM evaluator with no described ablations on prompt design, score aggregation, temperature, or alternative reward shapes; this is load-bearing for the label-free training claim and must be shown to be robust.
- [Experiments] Experiments: the abstract asserts 'extensive experiments' yet supplies no datasets, baselines, exact metric tables, statistical significance tests, or validation procedures, preventing assessment of whether the reported gains are reproducible or generalizable.
minor comments (2)
- Define all acronyms (MLLM, SOTA, etc.) on first use and ensure consistent notation for policy parameters versus reward components.
- Clarify the exact architecture of the 'lightweight agent' (parameter count, backbone) and provide direct runtime comparisons to prior reflection-based agents.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and have made revisions to strengthen the paper accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'matches SOTA performance on full-reference metrics' despite using no supervision is unsupported; no correlation, ablation, or human validation between MLLM perceptual scores and objective metrics (PSNR/SSIM) is reported, leaving open the possibility that the policy optimizes for LLM biases rather than pixel-level fidelity.
Authors: We appreciate this observation. To address the lack of explicit validation, we have added to the revised manuscript a dedicated analysis subsection that reports the correlation (using Spearman rank correlation) between the MLLM perceptual rewards and ground-truth PSNR/SSIM values on validation sets. Additionally, we conducted a human evaluation study with 20 participants rating restored images, showing alignment between MLLM scores and human preferences. These results support that the policy optimizes for perceptual quality that correlates with objective fidelity, reducing concerns about LLM-specific biases. revision: yes
-
Referee: [Method] Method / Reward section: the reward formulation relies on an external MLLM evaluator with no described ablations on prompt design, score aggregation, temperature, or alternative reward shapes; this is load-bearing for the label-free training claim and must be shown to be robust.
Authors: We concur that ablations are necessary to validate the reward design. In the revised Method section, we now include comprehensive ablations covering variations in prompt engineering for the MLLM evaluator, different methods for aggregating scores (mean, median, and ensemble), MLLM sampling temperatures from 0.1 to 1.0, and alternative reward shapes including linear, logarithmic, and binary threshold-based rewards. The policy learning curves and final performance metrics remain consistent, demonstrating robustness of the label-free training approach. revision: yes
-
Referee: [Experiments] Experiments: the abstract asserts 'extensive experiments' yet supplies no datasets, baselines, exact metric tables, statistical significance tests, or validation procedures, preventing assessment of whether the reported gains are reproducible or generalizable.
Authors: We thank the referee for pointing this out. To improve clarity and completeness, we have revised the Experiments section to include explicit descriptions of the datasets (e.g., DIV2K for training and standard test sets like Set5, BSD100 for evaluation), full tables of quantitative results with all metrics, comparisons to relevant baselines, statistical significance tests (e.g., t-tests), and detailed validation procedures. These additions make the experimental claims fully assessable and reproducible. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external MLLM reward
full rationale
The paper trains a lightweight RL policy for sequential tool selection using only MLLM perceptual feedback as the reward signal in a label-free setting. The reported matching of SOTA full-reference metrics (PSNR/SSIM) and superiority on no-reference metrics are presented as experimental outcomes from evaluation on held-out data, not as quantities derived by construction from the reward model or any fitted parameters. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text that reduce the performance claims to the inputs. The MLLM evaluator is treated as an independent black-box source of human-aligned feedback.
Axiom & Free-Parameter Ledger
free parameters (1)
- RL policy parameters
axioms (1)
- domain assumption Multimodal LLMs provide human-aligned perceptual feedback usable as a reward signal
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback.arXiv preprint arXiv:2204.05862, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[3]
The perception-distortion tradeoff
Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6228–6237, 2018. 8
work page 2018
-
[4]
Dspo: Direct semantic pref- erence optimization for real-world image super-resolution
Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, and Yunhe Wang. Dspo: Direct semantic pref- erence optimization for real-world image super-resolution. arXiv preprint arXiv:2504.15176, 2025. 3
-
[5]
Restoreagent: Autonomous image restoration agent via multimodal large language models
Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, and Lei Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models. InAdvances in Neural Information Processing Systems, pages 110643– 110666. Curran Associates, Inc., 2024. 1, 2, 3, 4, 6, 8
work page 2024
-
[6]
A comparative study of image restoration networks for general backbone network design
Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. InEuropean Conference on Computer Vision, pages 74–91. Springer, 2024. 5, 1
work page 2024
-
[7]
Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, and Ajay Divakaran. Dress: Instructing large vision-language models to align and interact with humans via natural lan- guage feedback. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 14239–14250, 2024. 3
work page 2024
-
[8]
Instruc- tir: High-quality image restoration following human instruc- tions
Marcos V Conde, Gregor Geigle, and Radu Timofte. Instruc- tir: High-quality image restoration following human instruc- tions. InEuropean Conference on Computer Vision, pages 1–21. Springer, 2024. 2, 3, 5, 6, 7
work page 2024
-
[9]
The llama 3 herd of models.arXiv e-prints, pages arXiv–2407,
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407,
-
[10]
Feededit: Text-based image editing with dynamic feedback regulation
Fengyi Fu, Lei Zhang, Mengqi Huang, and Zhendong Mao. Feededit: Text-based image editing with dynamic feedback regulation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 2661–2670, 2025. 3
work page 2025
-
[11]
Fully convolutional network with multi-step reinforcement learning for image processing
Ryosuke Furuta, Naoto Inoue, and Toshihiko Yamasaki. Fully convolutional network with multi-step reinforcement learning for image processing. InProceedings of the AAAI conference on artificial intelligence, pages 3598–3605,
-
[12]
Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,
-
[13]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Towards flex- ible blind jpeg artifacts removal
Jiaxi Jiang, Kai Zhang, and Radu Timofte. Towards flex- ible blind jpeg artifacts removal. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4997–5006, 2021. 1
work page 2021
-
[15]
Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, and Xi- anming Liu. A survey on all-in-one image restoration: Tax- onomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 1
work page 2025
-
[16]
Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025
Xu Jiang, Gehui Li, Bin Chen, and Jian Zhang. Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025. 1, 2, 3, 4, 5, 6, 8
-
[17]
Autodir: Automatic all-in-one image restoration with latent diffusion
Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, and Jinwei Gu. Autodir: Automatic all-in-one image restoration with latent diffusion. InEuropean Conference on Computer Vi- sion, pages 340–359. Springer, 2024. 1, 2, 3, 5, 6, 7
work page 2024
-
[18]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 7, 1
work page 2021
-
[19]
Xiangtao Kong, Chao Dong, and Lei Zhang. Towards ef- fective multiple-in-one image restoration: A sequential and prompt learning strategy.arXiv preprint arXiv:2401.03379,
-
[20]
A preliminary ex- ploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024
Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xi- angyu Chen, Yu Qiao, and Chao Dong. A preliminary ex- ploration towards general image restoration.arXiv preprint arXiv:2408.15143, 2024. 6
-
[21]
Iterative filter adaptive network for single image defocus deblurring
Junyong Lee, Hyeongseok Son, Jaesung Rim, Sunghyun Cho, and Seungyong Lee. Iterative filter adaptive network for single image defocus deblurring. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2034–2042, 2021. 1
work page 2034
-
[22]
Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single- image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018. 1
work page 2018
-
[23]
All-in-one image restoration for unknown corruption
Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17452– 17462, 2022. 1, 2, 5, 6, 3, 7
work page 2022
-
[24]
Hybrid agents for image restoration.arXiv preprint arXiv:2503.10120, 2025
Bingchen Li, Xin Li, Yiting Lu, and Zhibo Chen. Hybrid agents for image restoration.arXiv preprint arXiv:2503.10120, 2025. 1, 2, 3, 4, 6
-
[25]
Foundir: Unleashing million-scale training data to advance foundation models for image restoration
Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, and Jin- shan Pan. Foundir: Unleashing million-scale training data to advance foundation models for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12626–12636, 2025. 1
work page 2025
-
[26]
Swinir: Image restoration us- ing swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1833–1844,
-
[27]
Improving image restoration through removing degradations in textual repre- sentations
Jingbo Lin, Zhilu Zhang, Yuxiang Wei, Dongwei Ren, Dong- sheng Jiang, Qi Tian, and Wangmeng Zuo. Improving image restoration through removing degradations in textual repre- sentations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2866– 2878, 2024. 1
work page 2024
-
[28]
Controlling vision-language models for multi-task image restoration
Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B Sch ¨on. Controlling vision-language models for multi-task image restoration. InICLR, 2024. 2, 3, 5, 6, 7
work page 2024
-
[29]
Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming.arXiv preprint arXiv:1907.07484, 2019. 1
-
[30]
Playing Atari with Deep Reinforcement Learning
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. 3
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[31]
Human-level control through deep reinforcement learn- ing.nature, 518(7540):529–533, 2015
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, An- drei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learn- ing.nature, 518(7540):529–533, 2015
work page 2015
-
[32]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems, 35:27730– 27744, 2022. 3
work page 2022
-
[33]
Hir-diff: Unsupervised hyper- spectral image restoration via improved diffusion models
Li Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, and Xiangyong Cao. Hir-diff: Unsupervised hyper- spectral image restoration via improved diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3005–3014, 2024. 1
work page 2024
-
[34]
Dongwon Park, Byung Hyun Lee, and Se Young Chun. All- in-one image restoration for unknown degradations using adaptive discriminative filters for specific degradations. In 2023 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 5815–5824. IEEE, 2023. 1, 2
work page 2023
-
[35]
Distort-and-recover: Color enhancement using deep reinforcement learning
Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. Distort-and-recover: Color enhancement using deep reinforcement learning. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 5928–5936, 2018. 3
work page 2018
-
[36]
Robust un- supervised stylegan image restoration
Yohan Poirier-Ginter and Jean-Franc ¸ois Lalonde. Robust un- supervised stylegan image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22292–22301, 2023. 1
work page 2023
-
[37]
Vaishnav Potlapalli, Syed Waqas Zamir, Salman H Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in- one image restoration.Advances in Neural Information Pro- cessing Systems, 36:71275–71293, 2023. 2, 5, 6, 3, 7
work page 2023
-
[38]
Junbo Qiao, Miaomiao Cai, Wei Li, Yutong Liu, Xudong Huang, Gaoqi He, Jiao Xie, Jie Hu, Xinghao Chen, and Shaohui Lin. Realsr-r1: Reinforcement learning for real- world image super-resolution with vision-language chain-of- thought.arXiv preprint arXiv:2506.16796, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 6
work page 2021
-
[40]
Learning to deblur using light field generated and real de- focus images
Lingyan Ruan, Bin Chen, Jizhou Li, and Miuling Lam. Learning to deblur using light field generated and real de- focus images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16304– 16313, 2022. 1
work page 2022
-
[41]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jor- dan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–
-
[42]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael Jor- dan, and Pieter Abbeel. High-dimensional continuous con- trol using generalized advantage estimation.arXiv preprint arXiv:1506.02438, 2015. 4
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[43]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 2, 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of math- ematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 2, 3, 4, 8
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrit- twieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016. 3
work page 2016
-
[46]
Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lu- cas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017. 3
work page 2017
-
[47]
Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 5, 1
work page 1927
-
[48]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[49]
Maxim: Multi-axis mlp for image processing
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxim: Multi-axis mlp for image processing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5769–5780, 2022. 5, 1
work page 2022
-
[50]
Image demoir ´eing with a dual-domain distilling network
Hailing Wang, Qiaoyu Tian, Liang Li, and Xiaojie Guo. Image demoir ´eing with a dual-domain distilling network. In2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2021. 1
work page 2021
-
[51]
Hailing Wang, Wei Li, Yuanyuan Xi, Jie Hu, Hanting Chen, Longyu Li, and Yunhe Wang. Ift: Image fusion transformer 10 for ghost-free high dynamic range imaging.arXiv preprint arXiv:2309.15019, 2023. 1
-
[52]
Outlier-aware post-training quantization for image super- resolution
Hailing Wang, Jianglin Lu, Yitian Zhang, and Yun Fu. Outlier-aware post-training quantization for image super- resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16175–16184, 2025. 1
work page 2025
-
[53]
Otc: Optimal tool calls via reinforce- ment learning.arXiv e-prints, pages arXiv–2504, 2025
Hongru Wang, Cheng Qian, Wanjun Zhong, Xiusi Chen, Jia- hao Qiu, Shijue Huang, Bowen Jin, Mengdi Wang, Kam-Fai Wong, and Heng Ji. Otc: Optimal tool calls via reinforce- ment learning.arXiv e-prints, pages arXiv–2504, 2025. 3
work page 2025
-
[54]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 7, 1
work page 2023
-
[55]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 7, 1
work page 2004
-
[57]
Uformer: A general u-shaped transformer for image restoration
Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022. 1, 6
work page 2022
-
[58]
Q-learning.Ma- chine learning, 8(3):279–292, 1992
Christopher JCH Watkins and Peter Dayan. Q-learning.Ma- chine learning, 8(3):279–292, 1992. 3
work page 1992
-
[59]
Ronald J Williams. Simple statistical gradient-following al- gorithms for connectionist reinforcement learning.Machine learning, 8(3):229–256, 1992. 3
work page 1992
-
[60]
Towards open-ended visual quality comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, et al. Towards open-ended visual quality comparison. InEuropean Conference on Computer Vision, pages 360–377. Springer, 2024. 2, 4
work page 2024
-
[61]
Ridcp: Revitalizing real image dehaz- ing via high-quality codebook priors
Rui-Qi Wu, Zheng-Peng Duan, Chun-Le Guo, Zhi Chai, and Chongyi Li. Ridcp: Revitalizing real image dehaz- ing via high-quality codebook priors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22282–22291, 2023. 5, 7, 8, 1
work page 2023
-
[62]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 7, 1
work page 2022
-
[63]
All- in-one medical image restoration via task-adaptive routing
Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, and Yan Xu. All- in-one medical image restoration via task-adaptive routing. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 67–77. Springer,
-
[64]
Depicting beyond scores: Advanc- ing image quality assessment through multi-modal language models
Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. Depicting beyond scores: Advanc- ing image quality assessment through multi-modal language models. InEuropean Conference on Computer Vision, pages 259–276. Springer, 2024. 2, 4
work page 2024
-
[65]
Teaching large language models to regress accurate image quality scores using score distribution
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 14483–14494, 2025. 5, 7, 1
work page 2025
-
[66]
Craft- ing a toolchain for image restoration by deep reinforce- ment learning
Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. Craft- ing a toolchain for image restoration by deep reinforce- ment learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2443–2452,
-
[67]
Learning enriched features for real image restoration and enhancement
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. InEuropean conference on computer vi- sion, pages 492–511. Springer, 2020. 1
work page 2020
-
[68]
Multi-stage progressive image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021. 1
work page 2021
-
[69]
Restormer: Efficient transformer for high-resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739,
-
[70]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 7, 1
work page 2018
-
[71]
Residual Non-local Attention Networks for Image Restoration
Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. Residual non-local attention networks for image restora- tion.arXiv preprint arXiv:1903.10082, 2019. 1
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[72]
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Qagent: Quality-driven chain-of-thought image restoration agent through robust multimodal large lan- guage model.arXiv preprint arXiv:2504.07148, 2025. 2, 3, 4, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[73]
An intelligent agentic system for complex image restoration problems
Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, and Chao Dong. An intelligent agentic system for complex image restoration problems. InThe Thirteenth International Con- ference on Learning Representations, 2025. 1, 2, 3, 4, 5, 6, 7, 8
work page 2025
-
[74]
Academic Press Professional, Inc., USA, 1994
Karel Zuiderveld.Contrast limited adaptive histogram equalization, page 474–485. Academic Press Professional, Inc., USA, 1994. 1 11 SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback Supplementary Material
work page 1994
-
[75]
Data In this section, we show how to synthesize degraded im- ages following existing work [73]
Experimental Details 6.1. Data In this section, we show how to synthesize degraded im- ages following existing work [73]. For dark images, the V channel value of the images in the HSV color space will be randomly decreased by one of the following strategies: lin- ear mapping, Gamma correction, and subtracting a constant. For defocus blur, the images will ...
-
[76]
•Denoising: SwinIR [26] (noise level 15), SwinIR
(quality factor 5), FBCNN [14] (blind to quality factor). •Denoising: SwinIR [26] (noise level 15), SwinIR
-
[77]
•Deraining: MAXIM [49], MPRNet [68], Restormer [69], X-Restormer [6]
(noise level 50), MAXIM [49], MPRNet [68], Restormer [69], X-Restormer [6]. •Deraining: MAXIM [49], MPRNet [68], Restormer [69], X-Restormer [6]. •Motion deblurring: MAXIM [49], MPRNet [68], Restormer [69], X-Restormer [6]. •Dehazing: MAXIM [49], X-Restormer [6]; RIDCP [61], DehazeFormer [47]. 6.3. Evaluation Metrics We assess model performance using thre...
-
[78]
More Results 7.1. Supervised Extension Table 5 reports the results of our method when extended to the label-available setting. In this configuration, we use the clean reference images as supervision and define the re- 1 Table 4. Degradation data construction Settings # of Degradations Case Number Combinations I 2 Case 1 dark+noise Case 2 defocus blur+JPEG...
-
[79]
under rain+haze and rain+dark+noise degradation cases. The results further demonstrate that our method ef- fectively removes multiple co-occurring corruptions from degraded images and produces visual quality that is compa- rable to, or even exceeds, these supervised baselines. 7.3. Quantitative Comparison Tabls 6, 7, 8 present the performance comparison b...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.