DiTTo: Scalable Order-aware All-in-One Image Restoration Agent

Jihyong Oh; Seungho Choi

arxiv: 2605.30915 · v2 · pith:5GO43TOBnew · submitted 2026-05-29 · 💻 cs.CV

DiTTo: Scalable Order-aware All-in-One Image Restoration Agent

Seungho Choi , Jihyong Oh This is my paper

Pith reviewed 2026-06-28 23:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords image restorationmulti-degradation restorationagent-based image restorationorder-aware restorationrestoration simulatorall-in-one image restorationplug-and-play extensibility

0 comments

The pith

DiTTo trains an order-aware image restoration agent with linear-cost simulator data and plug-and-play expert addition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real-world images often carry several degradations whose removal order affects final quality, yet building training data for agents that choose the right sequence has required a quadratic number of restoration-expert calls. The paper demonstrates that a simulator using single-step restoration simulation combined with per-action quality prediction can generate the needed Optimal Restoration-action Trajectory Dataset in only linear time. An agent is then trained on this data through supervised fine-tuning and a separate Order-aware Restoration Alignment step that handles degradation identification, ordering, and output format on independent axes. This yields state-of-the-art quality on the MiO-100 benchmark for images with up to five concurrent degradations. A reader would care because the linear scaling and modular alignment remove the main obstacles to handling larger degradation sets and evolving pools of restoration models.

Core claim

DiTTo overcomes efficiency and extensibility bottlenecks in agent-based image restoration by introducing the DiTTo Simulator, which reduces ORTD construction to O(N^D) simulator calls per image via ∪S-IR single-step restoration simulation and AiO-IQA per-action quality prediction, and the DiTTo Agent, trained by SFT on the generated trajectories followed by Order-aware Restoration Alignment (ORA) that aligns degradation identification, restoration-action-ordering, and output format along independent axes, thereby enabling plug-and-play scalable extensibility when adding new restoration-experts.

What carries the argument

The DiTTo Simulator, which combines single-step restoration-action simulation (∪S-IR) and per-action quality prediction (AiO-IQA) to produce order-aware training trajectories at linear cost.

If this is right

Training data construction for the agent scales linearly rather than quadratically with the number of degradation types.
A new restoration expert can be added by updating only the lightweight ORA stage without retraining the full agent.
The resulting agent reaches state-of-the-art multi-degradation restoration quality on MiO-100 among prior agent-based methods.
Order-aware scheduling improves final quality when degradations interact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of concerns in ORA could let the agent adapt to changing expert pools over time without repeated full training.
Similar linear-cost simulation of action sequences might reduce data-generation expense in other vision tasks that involve ordered operations.
Direct measurement of how well simulator-predicted quality ranks match actual quality rankings on held-out real images would test the core generalization premise.

Load-bearing premise

The simulator's single-step simulations and quality predictions generate trajectories accurate enough that an agent trained on them generalizes to real multi-degraded images.

What would settle it

If agents trained solely on simulator-generated trajectories produce lower restoration quality than agents trained on fully enumerated real trajectories when both are tested on the same set of real multi-degraded images, the claim that the reduced-cost data suffices would be refuted.

Figures

Figures reproduced from arXiv: 2605.30915 by Jihyong Oh, Seungho Choi.

**Figure 2.** Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the DiTTo Simulator, which constructs DDiTTo ORTD without exhaustive real restoration-expert calls. The simulator consists of two modules: ∪S-IR, instantiated as the single-degradation restoration simulator Sθ, and AiO-IQA, instantiated as the IQAbased scoring model fψ. ∪S-IR is first trained to approximate the next restored imagestate induced by a candidate restoration-action identifier ρ ∈ … view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on multi-degraded images shows that DiTTo Agent more effectively removes mixed degradations while preserving natural textures and semantic details [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Degradation process (left) and restoration process (right) for two instances sharing the type set Dδ = {D1, D2, DND } but using different degradation-orderings δ, so they are distinct instances. The degradation side is indexed by i D ∈ {0, . . . , j} (counts degradationactions applied); the restoration side is indexed by i R ∈ {0, . . . , j} (counts degradations still present), so the restoration-action-t… view at source ↗

**Figure 6.** Figure 6: An ORTD example with j=3. (a) The simulator-generated optimal restorationaction-trajectory (Ieδ,∗ iR ) 0 iR=3 in DDiTTo ORTD. (b) The corresponding agent response, decomposed into DP (Degradation Perception-Reasoning), OR (Order-aware Restoration), and Tool (JSON-based tool call) axes used in ORA. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Additional qualitative comparisons on multi-degraded inputs with j ∈ {2, 3} concurrent degradations. Input 4KAgent JarvisIR DiTTo Agent ⋆DiTTo Agent [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

**Figure 8.** Figure 8: Additional qualitative comparisons on multi-degraded inputs with j ∈ {3, 4, 5} concurrent degradations. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

read the original abstract

Real-world images rarely suffer from a single degradation, and the order in which degradations are removed substantially affects the final restoration quality, motivating agent-based image restoration (IR), where a vision-language model schedules a pool of pre-built restoration-experts. However, existing training-based agents require $\mathcal{O}((N^{\mathbf{D}})^{2})$ restoration-expert calls per image to construct the Optimal Restoration-action Trajectory Dataset (ORTD), where $N^{\mathbf{D}}$ denotes the number of degradation types in the universe $\mathbf{D}$, and couple agent training to a fixed restoration-expert pool, preventing extension to newly introduced restoration-experts without full retraining. To overcome these efficiency and extensibility bottlenecks, we propose \textbf{DiTTo}, a novel order-aware image restoration agent framework consisting of the DiTTo Simulator and the DiTTo Agent. The DiTTo Simulator combines $\cup$S-IR for single-step restoration-action simulation and AiO-IQA for per-action quality prediction, reducing ORTD construction to $\mathcal{O}(N^{\mathbf{D}})$ simulator calls per image; the DiTTo Agent is trained by SFT on the simulator-generated ORTD, followed by \textbf{Order-aware Restoration Alignment (ORA)} that aligns degradation identification, restoration-action-ordering, and output format along independent axes. This enables \textbf{plug-and-play scalable extensibility}: adding a new restoration-expert requires updating only the lightweight ORA stage. On the MiO-100 evaluation set with up to five concurrent degradations, our DiTTo Agent achieves state-of-the-art multi-degradation restoration quality among previous agent-based IR methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiTTo reduces ORTD construction to linear cost via simulator and adds ORA for easy expert addition, but the simulator's accuracy against real trajectories is unverified.

read the letter

The main things to know are that the simulator drops the cost of making the optimal trajectory dataset from quadratic in the number of degradations to linear, and the ORA step lets new restoration experts be added by updating only a lightweight alignment module instead of retraining the agent.

The simulator combines single-step restoration simulation with a per-action quality predictor to generate the training data. The agent is then trained with supervised fine-tuning on that data followed by separate alignment on degradation identification, ordering, and output format. This setup directly targets the quadratic cost and fixed-pool problems in earlier agent-based restoration work.

It does handle a concrete issue: order matters when multiple degradations are present, and the method aims at practical scaling on sets like MiO-100 with up to five degradations. The plug-and-play extensibility is a clear engineering win if it holds.

The soft spot is the simulator. The abstract gives no numbers on how well its predicted rankings or quality scores match the results from actually running full ordered sequences on real images. If single-step approximations miss cumulative interactions between degradations, the agent trains on misaligned data. The stress-test concern about divergence in the MiO-100 regime lands because no correlation study or error analysis is mentioned.

This is for researchers building agents or all-in-one restoration pipelines who care about training efficiency and extensibility. A reader focused on deployment would find the cost reduction and alignment approach useful to examine, provided the full paper supplies the missing validation.

It deserves peer review so the simulator accuracy and experimental comparisons can be checked in detail.

Referee Report

3 major / 0 minor

Summary. The paper introduces DiTTo, an agent-based framework for order-aware all-in-one image restoration. It consists of the DiTTo Simulator, which uses ∪S-IR single-step simulation combined with AiO-IQA per-action quality prediction to construct the Optimal Restoration-action Trajectory Dataset (ORTD) in O(N^D) calls per image (down from O((N^D)^2)), and the DiTTo Agent, trained via supervised fine-tuning on simulator-generated ORTD followed by Order-aware Restoration Alignment (ORA) for degradation identification, ordering, and output format. The framework claims plug-and-play extensibility to new restoration experts and state-of-the-art multi-degradation restoration quality on the MiO-100 set (up to five concurrent degradations) among prior agent-based IR methods.

Significance. If the simulator's trajectories prove representative, the O(N^D) reduction and ORA-based extensibility would meaningfully lower the barrier to training scalable agents for real-world multi-degradation restoration without retraining on every new expert; the explicit separation of simulation from agent training is a clear engineering contribution.

major comments (3)

[Abstract] Abstract: the central SOTA claim on MiO-100 among agent-based methods rests on the DiTTo Simulator generating ORTD trajectories whose rankings align with real full-sequence restoration quality, yet no quantitative validation, error analysis, or correlation between simulator-predicted order rankings and ground-truth PSNR/SSIM after executing the full ordered chains is reported.
[Abstract] Abstract / Method description: the reduction of ORTD construction to O(N^D) via single-step ∪S-IR simulation plus AiO-IQA implicitly assumes that (a) single-step restorations compose sufficiently linearly to rank multi-degradation orders and (b) per-action AiO-IQA scores predict final restored-image metrics after the complete sequence; no empirical test of these assumptions on held-out images with ≥3 concurrent degradations is described.
[Abstract] Abstract: the claim of 'plug-and-play scalable extensibility' via lightweight ORA updates is presented without any ablation showing that adding a new restoration-expert actually preserves or improves performance on MiO-100 without full retraining of the SFT stage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical support of the simulator assumptions and extensibility claims. We address each major comment below and will incorporate the requested validations and ablations in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central SOTA claim on MiO-100 among agent-based methods rests on the DiTTo Simulator generating ORTD trajectories whose rankings align with real full-sequence restoration quality, yet no quantitative validation, error analysis, or correlation between simulator-predicted order rankings and ground-truth PSNR/SSIM after executing the full ordered chains is reported.

Authors: We agree that explicit quantitative validation of the alignment between simulator rankings and real full-sequence metrics would strengthen the SOTA claim. In the revised manuscript we will add a dedicated validation subsection reporting error analysis together with correlation coefficients (Pearson and Spearman) between simulator-predicted order rankings and ground-truth PSNR/SSIM obtained by executing the complete ordered trajectories on held-out images. revision: yes
Referee: [Abstract] Abstract / Method description: the reduction of ORTD construction to O(N^D) via single-step ∪S-IR simulation plus AiO-IQA implicitly assumes that (a) single-step restorations compose sufficiently linearly to rank multi-degradation orders and (b) per-action AiO-IQA scores predict final restored-image metrics after the complete sequence; no empirical test of these assumptions on held-out images with ≥3 concurrent degradations is described.

Authors: We acknowledge that the composition assumptions require direct empirical testing, especially for images with three or more degradations. The revision will include new experiments on held-out images with ≥3 concurrent degradations that quantify ranking accuracy under the linearity assumption and the predictive correlation of per-action AiO-IQA scores with final full-sequence metrics. revision: yes
Referee: [Abstract] Abstract: the claim of 'plug-and-play scalable extensibility' via lightweight ORA updates is presented without any ablation showing that adding a new restoration-expert actually preserves or improves performance on MiO-100 without full retraining of the SFT stage.

Authors: We agree that an ablation study is necessary to substantiate the plug-and-play claim. The revised manuscript will add an ablation that measures MiO-100 performance when a new restoration expert is introduced using only the lightweight ORA stage versus full SFT retraining, demonstrating that performance is preserved or improved without retraining the SFT component. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with independent evaluation

full rationale

The paper proposes DiTTo Simulator (∪S-IR + AiO-IQA) to generate ORTD at reduced cost, then trains the DiTTo Agent via SFT + ORA and reports empirical SOTA on MiO-100. No equations, derivations, or self-citations reduce the central performance claim to a quantity defined by the method itself; the result is obtained by running the trained agent on held-out images rather than by construction from fitted inputs or prior self-work. The simulator approximation is an engineering choice whose validity is externally testable via correlation with real PSNR/SSIM, not a self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no concrete free parameters, axioms, or invented entities; ledger left empty pending full text.

pith-pipeline@v0.9.1-grok · 5835 in / 1012 out tokens · 23292 ms · 2026-06-28T23:14:50.806849+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017

2017
[2]

H. Chen, W. Li, J. Gu, J. Ren, S. Chen, T. Ye, R. Pei, K. Zhou, F. Song, and L. Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models. Advances in Neural Information Processing Systems , 37:110643–110666, 2024

2024
[3]

I.-H. Chen, I. Hadji, E. Sanchez, A. Bulat, S.-Y. Kuo, R. Timofte, G. Tzimiropoulos, and B. Martinez. Restore, assess, repeat: A uniﬁed framework for iterative image restoration. arXiv preprint arXiv:2603.26385 , 2026

work page arXiv 2026
[4]

M. V. Conde, G. Geigle, and R. Timofte. Instructir: High-quality image restoration following human instructions. In European Conference on Computer Vision , pages 1–21. Springer, 2024

2024
[5]

Z. Duan, J. Zhang, X. Jin, Z. Zhang, Z. Xiong, D. Zou, J. S. Ren, C. Guo, and C. Li. Dit4sr: Taming diﬀusion transformer for real-world image super-resolution. arXiv preprint arXiv:2503.23580, 2025

work page arXiv 2025
[6]

Esser, S

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach. Scaling rectiﬁed ﬂow trans- formers for high-resolution image synthesis. In Forty-ﬁrst International Conference on Machine Learning, 2024

2024
[7]

S. Gu, A. Lugmayr, M. Danelljan, M. Fritsche, J. Lamour, and R. Timofte. Div8k: Diverse 8k resolution image dataset. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3512–3516. IEEE, 2019. 10

2019
[8]

K. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence , 33(12):2341–2353, 2010

2010
[9]

Hodosh, P

M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artiﬁcial Intelligence Research , 47:853–899, 2013

2013
[10]

Jiang, Z

J. Jiang, Z. Zuo, G. Wu, K. Jiang, and X. Liu. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(12):11892–11911, 2025

2025
[11]

Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

X. Jiang, G. Li, B. Chen, and J. Zhang. Multi-agent image restoration. arXiv preprint arXiv:2503.09403, 2025

work page arXiv 2025
[12]

Jiang, Z

Y. Jiang, Z. Zhang, T. Xue, and J. Gu. Autodir: Automatic all-in-one image restoration with latent diﬀusion. In European Conference on Computer Vision , pages 340–359. Springer, 2024

2024
[13]

J. Ke, Q. Wang, Y. Wang, P. Milanfar, and F. Yang. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision , pages 5148–5157, 2021

2021
[14]

X. Kong, C. Dong, and L. Zhang. Towards eﬀective multiple-in-one image restoration: A sequential and prompt learning strategy. arXiv preprint arXiv:2401.03379 , 2024

work page arXiv 2024
[15]

B. Li, X. Liu, P. Hu, Z. Wu, J. Lv, and X. Peng. All-in-one image restoration for unknown corruption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17452–17462, June 2022

2022
[16]

B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 136–144, 2017

2017
[17]

X. Lin, J. He, Z. Chen, Z. Lyu, B. Dai, F. Yu, Y. Qiao, W. Ouyang, and C. Dong. Diﬀbir: Toward blind image restoration with generative diﬀusion prior. In European conference on computer vision , pages 430–448. Springer, 2024

2024
[18]

Y. Lin, Z. Lin, H. Chen, P. Pan, C. Li, S. Chen, K. Wen, Y. Jin, W. Li, and X. Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 22369–22380, 2025

2025
[19]

J. Lu, Y. Wu, Z. Zhao, H. Wang, F. Jimenez, A. Majeedi, and Y. Fu. Simplecall: A lightweight image restoration agent in label-free environments with mllm perceptual feedback. arXiv preprint arXiv:2512.18599 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, and T. B. Schön. Controlling vision-language models for multi-task image restoration. In The Twelfth International Conference on Learning Representations, 2024

2024
[21]

Mittal, R

A. Mittal, R. Soundararajan, and A. C. Bovik. Making a completely blind image quality analyzer. IEEE Signal processing letters , 20(3):209–212, 2012

2012
[22]

S. Nah, T. Hyun Kim, and K. Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3883–3891, 2017

2017
[23]

S. Nah, S. Son, S. Lee, R. Timofte, K. M. Lee, L. Chen, J. Zhang, X. Lu, X. Chu, C. Chen, et al. Ntire 2021 challenge on image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 149–165, 2021

2021
[24]

Perez, F

E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artiﬁcial intelligence , volume 32, 2018

2018
[25]

Potlapalli, S

V. Potlapalli, S. W. Zamir, S. H. Khan, and F. Shahbaz Khan. Promptir: Prompting for all-in-one image restoration. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 71275–71293. Curran Associates, Inc., 2023. 11

2023
[26]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023

2023
[27]

J. Wang, K. C. Chan, and C. C. Loy. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI conference on artiﬁcial intelligence , volume 37, pages 2555–2563, 2023

2023
[28]

Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing , 13(4):600–612, 2004

2004
[29]

S. Yang, T. Wu, S. Shi, S. Lao, Y. Gong, M. Cao, J. Wang, and Y. Yang. Maniqa: Multi- dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 1191–1200, 2022

2022
[30]

M. Yao, R. Xu, Y. Guan, J. Huang, and Z. Xiong. Neural degradation representation learning for all-in-one image restoration. IEEE Transactions on Image Processing , 33:5408–5423, 2024

2024
[31]

S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang. Restormer: Eﬃcient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 5728–5739, 2022

2022
[32]

L. Zhai, Y. Wang, S. Cui, and Y. Zhou. A comprehensive review of deep learning-based real-world image restoration. IEEE Access, 11:21049–21067, 2023

2023
[33]

Zhang, W

X. Zhang, W. Gao, G. Li, Q. Jiang, and R. Cong. Image quality assessmentdriven reinforcement learning for mixed distorted image restoration. ACM Trans. Multimedia Comput. Commun. Appl., 19(1s), Feb. 2023

2023
[34]

Tir-agent: Training an explorative and efficient agent for image restoration

Y. Zhang, G. Jia, H. Hu, S. Zhao, K. Zhao, L. Sun, X. Long, K. Tian, C. Jiang, Z. Liu, K. Wang, S. Lian, K. Zhang, and B. Zhou. Tir-agent: Training an explorative and eﬃcient agent for image restoration. arXiv preprint arXiv:2603.27742 , 2026

work page arXiv 2026
[35]

Y. Zhou, J. Cao, Z. Zhang, F. Wen, Y. Jiang, J. Jia, X. Liu, X. Min, and G. Zhai. Q- agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model. arXiv preprint arXiv:2504.07148 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

K. Zhu, J. Gu, Z. You, Y. Qiao, and C. Dong. An intelligent agentic system for complex image restoration problems. In The Thirteenth International Conference on Learning Representations, 2025

2025
[37]

R. Zhu, Z. Tu, J. Liu, A. C. Bovik, and Y. Fan. Mwformer: Multi-weather image restoration using degradation-aware transformers. IEEE Transactions on Image Processing , 33:6790–6805, 2024

2024
[38]

the value this symbol takes in the instance with degradation-ordering δ

Y. Zuo, Q. Zheng, M. Wu, X. Jiang, R. Li, J. Wang, Y. Zhang, G. Mai, L. Wang, J. Zou, X. Wang, M.-H. Yang, and Z. Tu. 4KAgent: Agentic any image to 4k super-resolution. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2026. 12 Appendix Contents A Related Work 15 B Notation 15 C Algorithm 19 C.1 Training ∪S-IR . . . . . . . ...

2026
[39]

DiTTo Agent

We use greedy decoding at inference for structured-JSON parse stability. F.4 Stage 2 ORA (Order-aware Restoration Alignment) Objective. ORA is a DPO-style objective applied to the decomposed planning axes (DP, OR, Tool) introduced in the main paper. Let πθ and πref be the policy and reference models, and let (yc, yr) be a chosen/rejected response pair sha...

2025

[1] [1]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017

2017

[2] [2]

H. Chen, W. Li, J. Gu, J. Ren, S. Chen, T. Ye, R. Pei, K. Zhou, F. Song, and L. Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models. Advances in Neural Information Processing Systems , 37:110643–110666, 2024

2024

[3] [3]

I.-H. Chen, I. Hadji, E. Sanchez, A. Bulat, S.-Y. Kuo, R. Timofte, G. Tzimiropoulos, and B. Martinez. Restore, assess, repeat: A uniﬁed framework for iterative image restoration. arXiv preprint arXiv:2603.26385 , 2026

work page arXiv 2026

[4] [4]

M. V. Conde, G. Geigle, and R. Timofte. Instructir: High-quality image restoration following human instructions. In European Conference on Computer Vision , pages 1–21. Springer, 2024

2024

[5] [5]

Z. Duan, J. Zhang, X. Jin, Z. Zhang, Z. Xiong, D. Zou, J. S. Ren, C. Guo, and C. Li. Dit4sr: Taming diﬀusion transformer for real-world image super-resolution. arXiv preprint arXiv:2503.23580, 2025

work page arXiv 2025

[6] [6]

Esser, S

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach. Scaling rectiﬁed ﬂow trans- formers for high-resolution image synthesis. In Forty-ﬁrst International Conference on Machine Learning, 2024

2024

[7] [7]

S. Gu, A. Lugmayr, M. Danelljan, M. Fritsche, J. Lamour, and R. Timofte. Div8k: Diverse 8k resolution image dataset. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3512–3516. IEEE, 2019. 10

2019

[8] [8]

K. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence , 33(12):2341–2353, 2010

2010

[9] [9]

Hodosh, P

M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artiﬁcial Intelligence Research , 47:853–899, 2013

2013

[10] [10]

Jiang, Z

J. Jiang, Z. Zuo, G. Wu, K. Jiang, and X. Liu. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(12):11892–11911, 2025

2025

[11] [11]

Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

X. Jiang, G. Li, B. Chen, and J. Zhang. Multi-agent image restoration. arXiv preprint arXiv:2503.09403, 2025

work page arXiv 2025

[12] [12]

Jiang, Z

Y. Jiang, Z. Zhang, T. Xue, and J. Gu. Autodir: Automatic all-in-one image restoration with latent diﬀusion. In European Conference on Computer Vision , pages 340–359. Springer, 2024

2024

[13] [13]

J. Ke, Q. Wang, Y. Wang, P. Milanfar, and F. Yang. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision , pages 5148–5157, 2021

2021

[14] [14]

X. Kong, C. Dong, and L. Zhang. Towards eﬀective multiple-in-one image restoration: A sequential and prompt learning strategy. arXiv preprint arXiv:2401.03379 , 2024

work page arXiv 2024

[15] [15]

B. Li, X. Liu, P. Hu, Z. Wu, J. Lv, and X. Peng. All-in-one image restoration for unknown corruption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17452–17462, June 2022

2022

[16] [16]

B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 136–144, 2017

2017

[17] [17]

X. Lin, J. He, Z. Chen, Z. Lyu, B. Dai, F. Yu, Y. Qiao, W. Ouyang, and C. Dong. Diﬀbir: Toward blind image restoration with generative diﬀusion prior. In European conference on computer vision , pages 430–448. Springer, 2024

2024

[18] [18]

Y. Lin, Z. Lin, H. Chen, P. Pan, C. Li, S. Chen, K. Wen, Y. Jin, W. Li, and X. Ding. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 22369–22380, 2025

2025

[19] [19]

J. Lu, Y. Wu, Z. Zhao, H. Wang, F. Jimenez, A. Majeedi, and Y. Fu. Simplecall: A lightweight image restoration agent in label-free environments with mllm perceptual feedback. arXiv preprint arXiv:2512.18599 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, and T. B. Schön. Controlling vision-language models for multi-task image restoration. In The Twelfth International Conference on Learning Representations, 2024

2024

[21] [21]

Mittal, R

A. Mittal, R. Soundararajan, and A. C. Bovik. Making a completely blind image quality analyzer. IEEE Signal processing letters , 20(3):209–212, 2012

2012

[22] [22]

S. Nah, T. Hyun Kim, and K. Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3883–3891, 2017

2017

[23] [23]

S. Nah, S. Son, S. Lee, R. Timofte, K. M. Lee, L. Chen, J. Zhang, X. Lu, X. Chu, C. Chen, et al. Ntire 2021 challenge on image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 149–165, 2021

2021

[24] [24]

Perez, F

E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artiﬁcial intelligence , volume 32, 2018

2018

[25] [25]

Potlapalli, S

V. Potlapalli, S. W. Zamir, S. H. Khan, and F. Shahbaz Khan. Promptir: Prompting for all-in-one image restoration. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems , volume 36, pages 71275–71293. Curran Associates, Inc., 2023. 11

2023

[26] [26]

Rafailov, A

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems, 36:53728–53741, 2023

2023

[27] [27]

J. Wang, K. C. Chan, and C. C. Loy. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI conference on artiﬁcial intelligence , volume 37, pages 2555–2563, 2023

2023

[28] [28]

Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing , 13(4):600–612, 2004

2004

[29] [29]

S. Yang, T. Wu, S. Shi, S. Lao, Y. Gong, M. Cao, J. Wang, and Y. Yang. Maniqa: Multi- dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 1191–1200, 2022

2022

[30] [30]

M. Yao, R. Xu, Y. Guan, J. Huang, and Z. Xiong. Neural degradation representation learning for all-in-one image restoration. IEEE Transactions on Image Processing , 33:5408–5423, 2024

2024

[31] [31]

S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang. Restormer: Eﬃcient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 5728–5739, 2022

2022

[32] [32]

L. Zhai, Y. Wang, S. Cui, and Y. Zhou. A comprehensive review of deep learning-based real-world image restoration. IEEE Access, 11:21049–21067, 2023

2023

[33] [33]

Zhang, W

X. Zhang, W. Gao, G. Li, Q. Jiang, and R. Cong. Image quality assessmentdriven reinforcement learning for mixed distorted image restoration. ACM Trans. Multimedia Comput. Commun. Appl., 19(1s), Feb. 2023

2023

[34] [34]

Tir-agent: Training an explorative and efficient agent for image restoration

Y. Zhang, G. Jia, H. Hu, S. Zhao, K. Zhao, L. Sun, X. Long, K. Tian, C. Jiang, Z. Liu, K. Wang, S. Lian, K. Zhang, and B. Zhou. Tir-agent: Training an explorative and eﬃcient agent for image restoration. arXiv preprint arXiv:2603.27742 , 2026

work page arXiv 2026

[35] [35]

Y. Zhou, J. Cao, Z. Zhang, F. Wen, Y. Jiang, J. Jia, X. Liu, X. Min, and G. Zhai. Q- agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model. arXiv preprint arXiv:2504.07148 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

K. Zhu, J. Gu, Z. You, Y. Qiao, and C. Dong. An intelligent agentic system for complex image restoration problems. In The Thirteenth International Conference on Learning Representations, 2025

2025

[37] [37]

R. Zhu, Z. Tu, J. Liu, A. C. Bovik, and Y. Fan. Mwformer: Multi-weather image restoration using degradation-aware transformers. IEEE Transactions on Image Processing , 33:6790–6805, 2024

2024

[38] [38]

the value this symbol takes in the instance with degradation-ordering δ

Y. Zuo, Q. Zheng, M. Wu, X. Jiang, R. Li, J. Wang, Y. Zhang, G. Mai, L. Wang, J. Zou, X. Wang, M.-H. Yang, and Z. Tu. 4KAgent: Agentic any image to 4k super-resolution. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2026. 12 Appendix Contents A Related Work 15 B Notation 15 C Algorithm 19 C.1 Training ∪S-IR . . . . . . . ...

2026

[39] [39]

DiTTo Agent

We use greedy decoding at inference for structured-JSON parse stability. F.4 Stage 2 ORA (Order-aware Restoration Alignment) Objective. ORA is a DPO-style objective applied to the decomposed planning axes (DP, OR, Tool) introduced in the main paper. Let πθ and πref be the policy and reference models, and let (yc, yr) be a chosen/rejected response pair sha...

2025