pith. machine review for the scientific record. sign in

arxiv: 2604.02742 · v1 · submitted 2026-04-03 · 📡 eess.IV · cs.CV

Recognition: no theorem link

Task-Guided Prompting for Unified Remote Sensing Image Restoration

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:29 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords remote sensing image restorationunified multi-task frameworktask-guided promptingimage restorationcloud removalSAR despecklingmulti-modal restoration
0
0 comments X

The pith

A single network with task-specific prompts restores remote sensing images across five degradation types using one set of shared weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that learnable task embeddings can generate degradation-aware cues to modulate a decoder hierarchically, allowing one architecture to manage denoising, cloud removal, shadow removal, deblurring, and SAR despeckling across RGB, multispectral, SAR, and thermal infrared data. A sympathetic reader would care because real-world remote sensing observations routinely mix these degradations and sensor types, yet prior methods required separate specialized models for each case. By building a unified benchmark and showing gains on both joint training scenarios and unseen composite degradations, the work demonstrates that task-guided modulation can replace the need for multiple independent networks.

Core claim

TGPNet unifies five restoration tasks inside one architecture by inserting learnable task-specific embeddings that produce degradation-aware cues; these cues then hierarchically modulate features throughout the decoder while all weights remain shared, enabling precise adaptation to each pattern without separate models or retraining.

What carries the argument

Task-Guided Prompting (TGP), which creates learnable task-specific embeddings that generate degradation-aware cues for hierarchical modulation of decoder features.

If this is right

  • The same weights handle unseen composite degradations without additional training.
  • Performance exceeds that of dedicated single-task models on individual problems such as cloud removal.
  • One architecture covers restoration needs for RGB, multispectral, SAR, and thermal infrared modalities.
  • Operational pipelines can replace multiple specialized models with a single adaptive system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The prompting approach may transfer to multi-task restoration problems outside remote sensing, such as medical or astronomical imaging.
  • Memory and deployment costs drop when a single model replaces an ensemble of task-specific networks.
  • Extending the benchmark to include additional sensor types or degradations would test whether the hierarchical modulation scales further.

Load-bearing premise

Task-specific embeddings can precisely tailor feature modulation in a shared-weight network for distinct degradations across modalities without causing interference or accuracy loss.

What would settle it

Jointly training the unified model on all five tasks and measuring whether its cloud removal performance falls below that of a model trained only on cloud removal.

Figures

Figures reproduced from arXiv: 2604.02742 by Jinjun Wang, Wenli Huang, Xiaomeng Xin, Yang Wu, Ye Deng, Zhihong Liu.

Figure 1
Figure 1. Figure 1: Conceptual comparison of different RSIR paradigms. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the proposed Task-Guided Prompting Network (TGPNet) for unified remote sensing image restoration. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of restored images for four degradation types on our URSIR benchmark. Key local details (in green [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison of restored images for multispectral declouding on SEN12MS-CR and thermal deblurring on HIT [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual evaluation of TGPNet on unseen real-world imagery from the WHU-Shadow dataset [57], demonstrating [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison of restoration results for out-of-distribution composite degradations: direct vs. sequential processing. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of restored images on composite degradation tasks under Gaussian noise ( [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison of restored images for single-degradation declouding on RICE2. Key local details (in green boxes) [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of ablation study results comparing the [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: t-SNE visualization of decoder stage 2 features before [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Remote sensing image restoration (RSIR) is essential for recovering high-fidelity imagery from degraded observations, enabling accurate downstream analysis. However, most existing methods focus on single degradation types within homogeneous data, restricting their practicality in real-world scenarios where multiple degradations often across diverse spectral bands or sensor modalities, creating a significant operational bottleneck. To address this fundamental gap, we propose TGPNet, a unified framework capable of handling denoising, cloud removal, shadow removal, deblurring, and SAR despeckling within a single, unified architecture. The core of our framework is a novel Task-Guided Prompting (TGP) strategy. TGP leverages learnable, task-specific embeddings to generate degradation-aware cues, which then hierarchically modulate features throughout the decoder. This task-adaptive mechanism allows the network to precisely tailor its restoration process for distinct degradation patterns while maintaining a single set of shared weights. To validate our framework, we construct a unified RSIR benchmark covering RGB, multispectral, SAR, and thermal infrared modalities for five aforementioned restoration tasks. Experimental results demonstrate that TGPNet achieves state-of-the-art performance on both unified multi-task scenarios and unseen composite degradations, surpassing even specialized models in individual domains such as cloud removal. By successfully unifying heterogeneous degradation removal within a single adaptive framework, this work presents a significant advancement for multi-task RSIR, offering a practical and scalable solution for operational pipelines. The code and benchmark will be released at https://github.com/huangwenwenlili/TGPNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes TGPNet, a unified framework for remote sensing image restoration (RSIR) that handles five tasks—denoising, cloud removal, shadow removal, deblurring, and SAR despeckling—across RGB, multispectral, SAR, and thermal infrared modalities using a single shared-weight architecture. The core contribution is Task-Guided Prompting (TGP), which employs learnable task-specific embeddings to produce degradation-aware cues that hierarchically modulate features in the decoder. The authors introduce a new multi-modal RSIR benchmark and claim that TGPNet achieves state-of-the-art results on both unified multi-task settings and unseen composite degradations, outperforming even specialized single-task models in domains such as cloud removal.

Significance. If the superiority claims are substantiated with proper controls, the work would advance multi-task RSIR by demonstrating that a single adaptive network can address heterogeneous degradations and sensor modalities without task-specific retraining, offering a scalable alternative to maintaining separate models. The construction and planned release of the unified benchmark is a concrete contribution that would facilitate future research on composite degradations. The hierarchical prompting mechanism provides a reusable design pattern for task-conditioned feature modulation.

major comments (2)
  1. [Experiments] Experiments section: The headline claims that TGPNet surpasses specialized single-task models (e.g., on cloud removal) rest on comparisons whose validity depends on whether those baselines were retrained from scratch on the exact data splits, schedule, and optimization protocol of the new unified benchmark. The manuscript does not report this information, leaving open the possibility that observed gains arise from training-regime differences or implicit capacity expansion via the task embeddings rather than from the TGP modulation itself.
  2. [Method] Method section (TGP description): The central assumption that a single shared backbone modulated by task-specific embeddings can precisely tailor restoration for distinct degradation patterns across modalities without interference or negative transfer is load-bearing for the unified-framework claim, yet the paper provides no ablation or capacity-matched comparison isolating the effect of the hierarchical modulation from the benefits of joint training.
minor comments (2)
  1. [Abstract] Abstract: The statement of SOTA performance would be strengthened by naming the primary quantitative metrics (PSNR/SSIM) and briefly indicating the magnitude of improvement over the strongest baseline.
  2. [Results] Figure captions and tables: Ensure all reported results include standard deviations or confidence intervals when multiple runs are performed, and clearly label whether results are on the unified multi-task test set or on composite-degradation hold-out sets.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to incorporate the requested clarifications and additional analyses.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: The headline claims that TGPNet surpasses specialized single-task models (e.g., on cloud removal) rest on comparisons whose validity depends on whether those baselines were retrained from scratch on the exact data splits, schedule, and optimization protocol of the new unified benchmark. The manuscript does not report this information, leaving open the possibility that observed gains arise from training-regime differences or implicit capacity expansion via the task embeddings rather than from the TGP modulation itself.

    Authors: We agree that the manuscript should explicitly document the baseline training details to substantiate the comparisons. All single-task baselines were retrained from scratch on the identical data splits, using the same optimization schedule, learning rate policy, and batch size as TGPNet. We will revise the Experiments section to include a dedicated subsection detailing these protocols for every compared method, along with confirmation that no additional capacity or task-specific architectural changes were introduced beyond the original baseline designs. This will demonstrate that the reported gains arise from the TGP mechanism rather than training differences. revision: yes

  2. Referee: [Method] Method section (TGP description): The central assumption that a single shared backbone modulated by task-specific embeddings can precisely tailor restoration for distinct degradation patterns across modalities without interference or negative transfer is load-bearing for the unified-framework claim, yet the paper provides no ablation or capacity-matched comparison isolating the effect of the hierarchical modulation from the benefits of joint training.

    Authors: We acknowledge that an explicit isolation of the hierarchical modulation's contribution would strengthen the unified-framework claim. We will add two new experiments in the revised manuscript: (1) a capacity-matched ablation in which a single-task baseline is augmented with an equivalent number of parameters to the task embeddings and trained jointly, and (2) a component ablation that disables the hierarchical prompting while retaining joint training. These results will quantify the specific benefit of the TGP modulation versus joint-training effects and confirm the absence of negative transfer across modalities. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces TGPNet, a new neural architecture using learnable task-specific embeddings for hierarchical feature modulation in unified RSIR. All claims rest on experimental validation against a constructed multi-modal benchmark rather than any closed-form derivations, predictions, or self-referential definitions. No equations are presented that reduce performance metrics to fitted inputs by construction, and no load-bearing self-citations or uniqueness theorems are invoked. The framework is self-contained with independent empirical support.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach rests on standard assumptions of deep convolutional networks being adaptable via conditioning signals, plus the new invented prompting mechanism; no free parameters beyond the learnable embeddings are specified.

free parameters (1)
  • task-specific embeddings
    Learnable embeddings per task that are trained to produce degradation-aware cues modulating decoder features.
axioms (1)
  • domain assumption Hierarchical feature modulation by task embeddings can adapt a shared network to multiple distinct degradation types without cross-task interference.
    Invoked in the design of the TGP strategy to enable unified processing.
invented entities (1)
  • Task-Guided Prompting (TGP) no independent evidence
    purpose: Generate degradation-aware cues from learnable task-specific embeddings to modulate features throughout the decoder.
    Core novel component introduced to unify the five restoration tasks.

pith-pipeline@v0.9.0 · 5587 in / 1332 out tokens · 31115 ms · 2026-05-13T18:29:55.649143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 2 internal anchors

  1. [1]

    Landslide detection, monitoring and prediction with remote-sensing techniques,

    N. Casagli, E. Intrieri, V . Tofani, G. Gigli, and F. Raspini, “Landslide detection, monitoring and prediction with remote-sensing techniques,” Nature Reviews Earth & Environment, vol. 4, no. 1, pp. 51–64, 2023

  2. [2]

    Satellite remote sensing for water resources management: Potential for supporting sustainable development in data- poor regions,

    J. Sheffield, E. F. Wood, M. Pan, H. Beck, G. Coccia, A. Serrat- Capdevila, and K. Verbist, “Satellite remote sensing for water resources management: Potential for supporting sustainable development in data- poor regions,”Water Resources Research, vol. 54, no. 12, pp. 9724– 9758, 2018

  3. [3]

    Statistical machine learning methods and remote sensing for sustainable development goals: A review,

    J. Holloway and K. Mengersen, “Statistical machine learning methods and remote sensing for sustainable development goals: A review,” Remote Sensing, vol. 10, no. 9, p. 1365, 2018

  4. [4]

    Multiscale and direction target detecting in remote sensing images via modified yolo-v4,

    Z. Zakria, J. Deng, R. Kumar, M. S. Khokhar, J. Cai, and J. Kumar, “Multiscale and direction target detecting in remote sensing images via modified yolo-v4,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 1039–1048, 2022

  5. [5]

    Remote sensing image segmentation advances: A meta-analysis,

    I. Kotaridis and M. Lazaridou, “Remote sensing image segmentation advances: A meta-analysis,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 309–322, 2021

  6. [6]

    Rsid-cr: Remote sensing image denoising based on contrastive learning,

    Z. Wang, X. He, B. Xiao, L. Chen, and X. Bi, “Rsid-cr: Remote sensing image denoising based on contrastive learning,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024

  7. [7]

    Cr- former: Single image cloud removal with focused taylor attention,

    Y . Wu, Y . Deng, S. Zhou, Y . Liu, W. Huang, and J. Wang, “Cr- former: Single image cloud removal with focused taylor attention,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  8. [8]

    Cascaded memory network for optical remote sensing imagery cloud removal,

    J. Liu, B. Pan, and Z. Shi, “Cascaded memory network for optical remote sensing imagery cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–11, 2024

  9. [9]

    Shadowformer: Global context helps shadow removal,

    L. Guo, S. Huang, D. Liu, H. Cheng, and B. Wen, “Shadowformer: Global context helps shadow removal,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 1, 2023, pp. 710–718

  10. [10]

    Homoformer: Homogenized transformer for image shadow removal,

    J. Xiao, X. Fu, Y . Zhu, D. Li, J. Huang, K. Zhu, and Z.-J. Zha, “Homoformer: Homogenized transformer for image shadow removal,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 25 617–25 626

  11. [11]

    Sar image despeckling using continuous attention module,

    J. Ko and S. Lee, “Sar image despeckling using continuous attention module,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 3–19, 2021

  12. [12]

    Contrastive learn- ing for real sar image despeckling,

    Y . Fang, R. Liu, Y . Peng, J. Guan, D. Li, and X. Tian, “Contrastive learn- ing for real sar image despeckling,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 218, pp. 376–391, 2024

  13. [13]

    All-in-one image restoration for unknown corruption,

    B. Li, X. Liu, P. Hu, Z. Wu, J. Lv, and X. Peng, “All-in-one image restoration for unknown corruption,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 17 452–17 462

  14. [14]

    Promptir: Prompting for all-in-one blind image restoration,

    V . Potlapalli, S. Zamir, S. Khan, and F. Khan, “Promptir: Prompting for all-in-one blind image restoration,”arXiv preprint arXiv:2306.13090, vol. 6

  15. [15]

    Adair: Adaptive all-in-one image restoration via frequency mining and modulation,

    Y . Cui, S. W. Zamir, S. Khan, A. Knoll, M. Shah, and F. S. Khan, “Adair: Adaptive all-in-one image restoration via frequency mining and modulation,” inThe Thirteenth International Conference on Learning Representations. VOL. XX, NO. XX, DEC. 2025 16

  16. [16]

    Image restoration for remote sensing: Overview and toolbox,

    B. Rasti, Y . Chang, E. Dalsasso, L. Denis, and P. Ghamisi, “Image restoration for remote sensing: Overview and toolbox,”IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 2, pp. 201–230, 2021

  17. [17]

    Coupling model-and data-driven methods for remote sensing image restoration and fusion: Improving physical interpretability,

    H. Shen, M. Jiang, J. Li, C. Zhou, Q. Yuan, and L. Zhang, “Coupling model-and data-driven methods for remote sensing image restoration and fusion: Improving physical interpretability,”IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 2, pp. 231–249, 2022

  18. [18]

    Deep memory connected neural network for optical remote sensing image restoration,

    W. Xu, G. Xu, Y . Wang, X. Sun, D. Lin, and Y . Wu, “Deep memory connected neural network for optical remote sensing image restoration,” Remote Sensing, vol. 10, no. 12, p. 1893, 2018

  19. [19]

    Hybrid convolutional and attention network for hyperspectral image denoising,

    S. Hu, F. Gao, X. Zhou, J. Dong, and Q. Du, “Hybrid convolutional and attention network for hyperspectral image denoising,”IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024

  20. [20]

    Mb-taylorformer v2: improved multi-branch linear transformer expanded by taylor formula for image restoration,

    Z. Jin, Y . Qiu, K. Zhang, H. Li, and W. Luo, “Mb-taylorformer v2: improved multi-branch linear transformer expanded by taylor formula for image restoration,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  21. [21]

    Deep dense multi-scale network for snow removal using semantic and depth priors,

    K. Zhang, R. Li, Y . Yu, W. Luo, and C. Li, “Deep dense multi-scale network for snow removal using semantic and depth priors,”IEEE Transactions on Image Processing, vol. 30, pp. 7419–7431, 2021

  22. [22]

    Enhanced spatio- temporal interaction learning for video deraining: faster and better,

    K. Zhang, D. Li, W. Luo, W. Ren, and W. Liu, “Enhanced spatio- temporal interaction learning for video deraining: faster and better,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1287–1293, 2022

  23. [23]

    Adversarial spatio-temporal learning for video deblurring,

    K. Zhang, W. Luo, Y . Zhong, L. Ma, W. Liu, and H. Li, “Adversarial spatio-temporal learning for video deblurring,”IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 291–301, 2018

  24. [24]

    Lldiffusion: Learning degradation representations in diffusion models for low-light image enhancement,

    T. Wang, K. Zhang, Y . Zhang, W. Luo, B. Stenger, T. Lu, T.-K. Kim, and W. Liu, “Lldiffusion: Learning degradation representations in diffusion models for low-light image enhancement,”Pattern Recognition, vol. 166, p. 111628, 2025

  25. [25]

    Despecknet: Generalizing deep learning-based sar image despeckling,

    A. G. Mullissa, D. Marcos, D. Tuia, M. Herold, and J. Reiche, “Despecknet: Generalizing deep learning-based sar image despeckling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1– 15, 2020

  26. [26]

    Hir-diff: Unsupervised hyperspectral image restoration via improved diffusion models,

    L. Pang, X. Rui, L. Cui, H. Wang, D. Meng, and X. Cao, “Hir-diff: Unsupervised hyperspectral image restoration via improved diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 3005–3014

  27. [27]

    A progressive image restoration network for high-order degradation imaging in remote sensing,

    Y . Feng, Y . Yang, X. Fan, Z. Zhang, L. Bu, and J. Zhang, “A progressive image restoration network for high-order degradation imaging in remote sensing,”arXiv preprint arXiv:2412.07195, 2024

  28. [28]

    Prompthsi: Universal hyperspectral image restoration framework for composite degradation,

    C.-M. Lee, C.-H. Cheng, Y .-F. Lin, Y .-C. Cheng, W.-T. Liao, C.-C. Hsu, F.-E. Yang, and Y .-C. F. Wang, “Prompthsi: Universal hyperspectral image restoration framework for composite degradation,”arXiv e-prints, pp. arXiv–2411, 2024

  29. [29]

    A survey on all-in- one image restoration: Taxonomy, evaluation and future trends,

    J. Jiang, Z. Zuo, G. Wu, K. Jiang, and X. Liu, “A survey on all-in- one image restoration: Taxonomy, evaluation and future trends,”arXiv preprint arXiv:2410.15067, 2024

  30. [30]

    Pre-trained image processing transformer,

    H. Chen, Y . Wang, T. Guo, C. Xu, Y . Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, “Pre-trained image processing transformer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 299–12 310

  31. [31]

    Lora-ir: taming low-rank experts for efficient all-in-one image restoration,

    Y . Ai, H. Huang, and R. He, “Lora-ir: taming low-rank experts for efficient all-in-one image restoration,”arXiv preprint arXiv:2410.15385, 2024

  32. [32]

    Complexity experts are task-discriminative learners for any image restoration,

    E. Zamfir, Z. Wu, N. Mehta, Y . Tan, D. P. Paudel, Y . Zhang, and R. Timofte, “Complexity experts are task-discriminative learners for any image restoration,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 753–12 763

  33. [33]

    Onerestore: A universal restoration framework for composite degradation,

    Y . Guo, Y . Gao, Y . Lu, H. Zhu, R. W. Liu, and S. He, “Onerestore: A universal restoration framework for composite degradation,” inEuropean conference on computer vision. Springer, 2024, pp. 255–272

  34. [34]

    Allrestorer: All-in-one transformer for image restoration under composite degradations,

    J. Mao, Y . Yang, X. Yin, L. Shao, and H. Tang, “Allrestorer: All-in-one transformer for image restoration under composite degradations,”arXiv preprint arXiv:2411.10708, 2024

  35. [35]

    Restoring vision in adverse weather conditions with patch-based denoising diffusion models,

    O. Ozdenizci and R. Legenstein, “Restoring vision in adverse weather conditions with patch-based denoising diffusion models,”IEEE Trans- actions on Pattern Analysis & Machine Intelligence, vol. 45, no. 08, pp. 10 346–10 357, 2023

  36. [36]

    Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in- one image restoration,

    Y . Ai, H. Huang, X. Zhou, J. Wang, and R. He, “Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in- one image restoration,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 25 432–25 444

  37. [37]

    Autodir: Automatic all-in-one image restoration with latent diffusion,

    Y . Jiang, Z. Zhang, T. Xue, and J. Gu, “Autodir: Automatic all-in-one image restoration with latent diffusion,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 340–359

  38. [38]

    Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior,

    I. Chen, W.-T. Chen, Y .-W. Liu, Y .-C. Chiang, S.-Y . Kuo, M.-H. Yanget al., “Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 17 969–17 979

  39. [39]

    Unicorn: Latent diffusion-based unified controllable image restoration network across multiple degradations,

    D. Mandal, S. Chattopadhyay, G. Tong, and P. Chakravarthula, “Unicorn: Latent diffusion-based unified controllable image restoration network across multiple degradations,”arXiv preprint arXiv:2503.15868, 2025

  40. [40]

    Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

    Y . Zhou, J. Cao, Z. Zhang, F. Wen, Y . Jiang, J. Jia, X. Liu, X. Min, and G. Zhai, “Q-agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model,”arXiv preprint arXiv:2504.07148, 2025

  41. [41]

    Vision-language gradient descent-driven all-in-one deep unfolding networks,

    H. Zeng, X. Wang, Y . Chen, J. Su, and J. Liu, “Vision-language gradient descent-driven all-in-one deep unfolding networks,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 7524–7533

  42. [42]

    Instructir: High-quality image restoration following human instructions,

    M. V . Conde, G. Geigle, and R. Timofte, “Instructir: High-quality image restoration following human instructions,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–21

  43. [43]

    Spire: Semantic prompt-driven image restoration,

    C. Qi, Z. Tu, K. Ye, M. Delbracio, P. Milanfar, Q. Chen, and H. Talebi, “Spire: Semantic prompt-driven image restoration,” inEuropean Con- ference on Computer Vision. Springer, 2024, pp. 446–464

  44. [44]

    Multi-axis prompt and multi-dimension fusion network for all-in-one weather-degraded image restoration,

    Y . Wen, T. Gao, J. Zhang, Z. Li, and T. Chen, “Multi-axis prompt and multi-dimension fusion network for all-in-one weather-degraded image restoration,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8323–8331

  45. [45]

    Restormer: Efficient transformer for high-resolution image restoration,

    S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5728–5739

  46. [46]

    Film: Visual reasoning with a general conditioning layer,

    E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

  47. [47]

    Loss functions for image restoration with neural networks,

    H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,”IEEE Transactions on computational imaging, vol. 3, no. 1, pp. 47–57, 2016

  48. [48]

    Bag-of-visual-words and spatial extensions for land-use classification,

    Y . Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” inProceedings of the 18th SIGSPATIAL in- ternational conference on advances in geographic information systems, 2010, pp. 270–279

  49. [49]

    A remote sensing image dataset for cloud removal,

    D. Lin, G. Xu, X. Wang, Y . Wang, X. Sun, and K. Fu, “A remote sensing image dataset for cloud removal,”arXiv preprint arXiv:1901.00600, 2019

  50. [50]

    Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,

    A. Meraner, P. Ebel, X. X. Zhu, and M. Schmitt, “Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 333–346, 2020

  51. [51]

    Deshadownet: A multi- context embedding deep network for shadow removal,

    L. Qu, J. Tian, S. He, Y . Tang, and R. W. Lau, “Deshadownet: A multi- context embedding deep network for shadow removal,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4067–4075

  52. [52]

    Robust sar image despeckling by deep learning from near-real datasets,

    J. Guan, R. Liu, X. Tian, X. Tang, and S. Li, “Robust sar image despeckling by deep learning from near-real datasets,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, pp. 2963–2979, 2023

  53. [53]

    Hit-uav: A high-altitude infrared thermal dataset for unmanned aerial vehicle-based object detection,

    J. Suo, T. Wang, X. Zhang, H. Chen, W. Zhou, and W. Shi, “Hit-uav: A high-altitude infrared thermal dataset for unmanned aerial vehicle-based object detection,”Scientific Data, vol. 10, no. 1, p. 227, 2023

  54. [54]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

  55. [55]

    Attentive contextual attention for cloud removal,

    W. Huang, Y . Deng, Y . Wu, and J. Wang, “Attentive contextual attention for cloud removal,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  56. [56]

    Harmony in diversity: Improving all-in-one image restoration via multi-task collaboration,

    G. Wu, J. Jiang, K. Jiang, and X. Liu, “Harmony in diversity: Improving all-in-one image restoration via multi-task collaboration,” inProceedings of the 32nd ACM international conference on multimedia, 2024, pp. 6015–6023

  57. [57]

    Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset,

    S. Luo, H. Li, and H. Shen, “Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset,”ISPRS Journal of Photogrammetry and remote sensing, vol. 167, pp. 443–457, 2020

  58. [58]

    Cloud removal for remote sensing imagery via spatial attention generative adversarial network,

    H. Pan, “Cloud removal for remote sensing imagery via spatial attention generative adversarial network,”arXiv preprint arXiv:2009.13015, 2020

  59. [59]

    Uncertainty-based thin cloud removal network via conditional variational autoencoders,

    H. Ding, Y . Zi, and F. Xie, “Uncertainty-based thin cloud removal network via conditional variational autoencoders,” inProceedings of the Asian Conference on Computer Vision, 2022, pp. 469–485

  60. [60]

    Recovering realistic texture in image super-resolution by deep spatial feature transform,

    X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” VOL. XX, NO. XX, DEC. 2025 17 inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 606–615. Wenli Huangreceived her Ph.D. from Xi’an Jiao- tong University, Xi’an, China, in 2023. She is c...

  61. [61]

    degree in the Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University

    She is currently pursuing a Ph.D. degree in the Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University. Her research interests include Knowledge graph completion and Graph Representation Learning. Xiaomeng Xinreceived the M.E. degree from ShaanXi Normal University, Xi an, China, in 2015. She is currently pursuing the Ph.D. degree i...