pith. machine review for the scientific record. sign in

arxiv: 2601.02018 · v2 · submitted 2026-01-05 · 💻 cs.CV

Recognition: no theorem link

Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement

Authors on Pith no claims yet

Pith reviewed 2026-05-16 18:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords image segmentationSegment Anything Modellow-quality imagesgenerative enhancementdegradation awarenesslatent spacerobustnessSAM2
0
0 comments X

The pith

GleSAM++ adapts SAM for any image quality by generating enhanced latent features and predicting degradation levels before reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GleSAM++ as a way to make Segment Anything Models robust to low-quality and degraded images while preserving their zero-shot performance on clean ones. It starts with generative enhancement of latent features from a diffusion model, adds alignment steps for compatibility, and introduces a two-stage Degradation-aware Adaptive Enhancement that first estimates how degraded an image is and then applies targeted reconstruction. This setup uses only minimal new parameters on top of pre-trained SAM or SAM2. A reader would care because real images in applications like surveillance or medical analysis are frequently imperfect, so reliable segmentation across quality levels would broaden practical use without retraining from scratch.

Core claim

GleSAM++ improves segmentation on degraded images by first aligning and expanding features from a pre-trained diffusion model via Feature Distribution Alignment and Channel Replication and Expansion, then applying Degradation-aware Adaptive Enhancement that decouples the process into explicit degradation-level prediction followed by degradation-aware reconstruction, enabling better handling of complex and unseen degradations.

What carries the argument

Degradation-aware Adaptive Enhancement (DAE), which splits arbitrary-quality feature reconstruction into a degradation-level prediction stage and a subsequent degradation-aware reconstruction stage.

If this is right

  • Pre-trained SAM and SAM2 models gain robustness to low-quality inputs with only small added parameter counts.
  • Segmentation accuracy rises on complex degradations without a corresponding drop on clear images.
  • The same enhancement pipeline works on degradation types absent from training data.
  • Minimal-parameter adaptation becomes feasible for deploying foundation segmentation models in variable real-world conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The two-stage prediction-then-reconstruct pattern in DAE could transfer to other generative tasks that need to handle input variability.
  • Combining this latent enhancement with downstream tasks like detection or captioning might improve end-to-end pipelines on imperfect inputs.
  • Further tests on video sequences could reveal whether the per-frame degradation prediction remains stable across time.
  • The approach suggests a general template for making other large vision models quality-agnostic without full retraining.

Load-bearing premise

Degradation levels can be predicted accurately enough from data alone without explicit labels and the generated features will remain compatible with the segmentation model after alignment.

What would settle it

An experiment where GleSAM++ shows no accuracy gain over baseline SAM on a held-out set of images with previously unseen degradation types such as heavy motion blur or extreme compression.

read the original abstract

Segment Anything Models (SAMs), known for their exceptional zero-shot segmentation performance, have garnered significant attention in the research community. Nevertheless, their performance drops significantly on severely degraded, low-quality images, limiting their effectiveness in real-world scenarios. To address this, we propose GleSAM++, which utilizes Generative Latent space Enhancement to boost robustness on low-quality images, thus enabling generalization across various image qualities. Additionally, to improve compatibility between the pre-trained diffusion model and the segmentation framework, we introduce two techniques, i.e., Feature Distribution Alignment (FDA) and Channel Replication and Expansion (CRE). However, the above components lack explicit guidance regarding the degree of degradation. The model is forced to implicitly fit a complex noise distribution that spans conditions from mild noise to severe artifacts, which substantially increases the learning burden and leads to suboptimal reconstructions. To address this issue, we further introduce a Degradation-aware Adaptive Enhancement (DAE) mechanism. The key principle of DAE is to decouple the reconstruction process for arbitrary-quality features into two stages: degradation-level prediction and degradation-aware reconstruction. Our method can be applied to pre-trained SAM and SAM2 with only minimal additional learnable parameters, allowing for efficient optimization. Extensive experiments demonstrate that GleSAM++ significantly improves segmentation robustness on complex degradations while maintaining generalization to clear images. Furthermore, GleSAM++ also performs well on unseen degradations, underscoring the versatility of our approach and dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes GleSAM++, which augments pre-trained SAM and SAM2 models with generative latent-space enhancement to improve zero-shot segmentation on low-quality and degraded images. It introduces Feature Distribution Alignment (FDA) and Channel Replication and Expansion (CRE) for compatibility between diffusion features and the segmentation backbone, plus a Degradation-aware Adaptive Enhancement (DAE) module that decouples reconstruction into unsupervised degradation-level prediction followed by aware enhancement. The method adds only minimal learnable parameters and claims, via extensive experiments, substantially higher robustness on complex degradations, preserved performance on clean images, and strong generalization to unseen degradations.

Significance. If the quantitative gains and generalization claims are substantiated, the work would meaningfully extend the practical utility of SAM-family models to real-world low-quality imagery (e.g., surveillance, medical, or autonomous-driving scenes). The design choice of freezing the backbone and diffusion model while adding lightweight adapters is practically attractive and could serve as a template for other foundation-model adaptations.

major comments (3)
  1. [DAE mechanism] DAE section (around the description of degradation-level prediction): the central claim that DAE reliably decouples reconstruction without explicit degradation labels rests on the unsupervised prediction head accurately separating mild-to-severe conditions from unlabeled data. The abstract itself acknowledges the increased learning burden of fitting a complex noise distribution; no quantitative diagnostics (prediction accuracy, calibration plots, or ablation removing the prediction stage) are referenced to confirm this step succeeds, which directly affects whether the subsequent generative features remain compatible with SAM.
  2. [Experiments] Experiments and ablations: the reported robustness gains on complex degradations are load-bearing for the paper's contribution, yet the description does not isolate the incremental benefit of DAE versus FDA+CRE alone. Without such controlled ablations (or tables showing performance when the prediction head is replaced by a constant or oracle), it is impossible to verify that the adaptive component is responsible for the claimed improvements rather than the generative enhancement in general.
  3. [Evaluation on unseen degradations] Generalization claims to unseen degradations: the abstract states strong performance on unseen degradations, but the evaluation protocol (types of degradations, severity ranges, and whether they overlap with training distributions) is not detailed enough to assess whether the result reflects true out-of-distribution robustness or merely interpolation within the training degradation family.
minor comments (2)
  1. [Method] Notation for the two-stage DAE process could be clarified with explicit equations showing how the predicted degradation level modulates the generative reconstruction.
  2. [Figures] Figure captions for qualitative results should explicitly state the degradation type and severity level shown in each column to aid reader interpretation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional diagnostics, ablations, and protocol clarifications that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [DAE mechanism] DAE section (around the description of degradation-level prediction): the central claim that DAE reliably decouples reconstruction without explicit degradation labels rests on the unsupervised prediction head accurately separating mild-to-severe conditions from unlabeled data. The abstract itself acknowledges the increased learning burden of fitting a complex noise distribution; no quantitative diagnostics (prediction accuracy, calibration plots, or ablation removing the prediction stage) are referenced to confirm this step succeeds, which directly affects whether the subsequent generative features remain compatible with SAM.

    Authors: We agree that explicit validation of the unsupervised prediction head is necessary to support the decoupling claim. In the revised manuscript we add: (i) prediction accuracy of the degradation-level head on a held-out labeled validation set, (ii) calibration plots comparing predicted versus ground-truth severity distributions, and (iii) an ablation that replaces the learned prediction head with a constant (average) degradation level. These additions confirm that the head learns a meaningful separation and that the subsequent generative features remain compatible with SAM. revision: yes

  2. Referee: [Experiments] Experiments and ablations: the reported robustness gains on complex degradations are load-bearing for the paper's contribution, yet the description does not isolate the incremental benefit of DAE versus FDA+CRE alone. Without such controlled ablations (or tables showing performance when the prediction head is replaced by a constant or oracle), it is impossible to verify that the adaptive component is responsible for the claimed improvements rather than the generative enhancement in general.

    Authors: We acknowledge the need for controlled isolation of DAE. The revised version includes a new ablation table (main paper and supplementary) that reports: (a) FDA+CRE only, (b) full GleSAM++ with learned DAE, and (c) an oracle-Dae variant that receives ground-truth degradation levels. The results show consistent additional gains from the adaptive component on severe degradations, while the constant-head variant underperforms the learned DAE, confirming that the adaptive mechanism drives part of the reported robustness. revision: yes

  3. Referee: [Evaluation on unseen degradations] Generalization claims to unseen degradations: the abstract states strong performance on unseen degradations, but the evaluation protocol (types of degradations, severity ranges, and whether they overlap with training distributions) is not detailed enough to assess whether the result reflects true out-of-distribution robustness or merely interpolation within the training degradation family.

    Authors: We have expanded the experimental section and supplementary material to fully specify the protocol. Training degradations comprise Gaussian noise (std 5-50), Gaussian blur (kernel 3-15), and JPEG compression (quality 10-90). Unseen test degradations are rain streaks, snow, and low-light conditions with severity parameters deliberately outside the training ranges and with no overlap in degradation type. We report per-degradation mIoU tables demonstrating that performance remains high on these OOD cases, supporting true generalization rather than interpolation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on introducing new modules (generative latent enhancement, FDA, CRE, and DAE) atop frozen pre-trained SAM/SAM2 backbones, with performance evaluated via standard segmentation metrics on degraded and clear images. No equation or claim reduces by construction to a fitted parameter renamed as prediction, nor does any load-bearing step rely on a self-citation whose content is itself unverified or defined circularly within the work. The degradation-level prediction in DAE is presented as learned implicitly from unlabeled data, but this is an empirical modeling choice whose success is tested externally rather than guaranteed by definition. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that pre-trained diffusion models can be aligned to SAM feature spaces and that degradation prediction can be decoupled without additional supervision. No new physical entities or free parameters are introduced beyond standard training hyperparameters.

axioms (2)
  • domain assumption Pre-trained diffusion models and SAM can be made compatible via feature distribution alignment and channel replication
    Invoked when describing FDA and CRE as solutions to compatibility issues.
  • domain assumption Decoupling degradation-level prediction from reconstruction reduces learning burden for arbitrary-quality inputs
    Stated as the key principle of the DAE mechanism.

pith-pipeline@v0.9.0 · 5580 in / 1282 out tokens · 46637 ms · 2026-05-16T18:07:29.935723+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 7 internal anchors

  1. [1]

    In: Proceedings of the IEEE International Conference on Computer Vision, pp

    He, K., Gkioxari, G., Doll´ ar, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  2. [2]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Kirillov, A., He, K., Girshick, R., Rother, C., Doll´ ar, P.: Panoptic segmenta- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)

  3. [3]

    In: Proceedings of the IEEE International Conference on Computer Vision, pp

    Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9157–9166 (2019)

  4. [4]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Wang, Y., Xu, Z., Wang, X., Shen, C., Cheng, B., Shen, H., Xia, H.: End-to- end video instance segmentation with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)

  5. [5]

    IEEE Transactions on Image Processing32, 892–904 (2023)

    Pang, Y., Zhao, X., Zhang, L., Lu, H.: Caver: Cross-modal view-mixed trans- former for bi-modal salient object detection. IEEE Transactions on Image Processing32, 892–904 (2023)

  6. [6]

    IEEE Transactions on Image Processing32, 1052–1064 (2023)

    Xu, R., Wang, C., Zhang, J., Xu, S., Meng, W., Zhang, X.: Rssformer: Fore- ground saliency enhancement for remote sensing land-cover segmentation. IEEE Transactions on Image Processing32, 1052–1064 (2023)

  7. [7]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pp

    Guo, G., Guo, Y., Yu, X., Li, W., Wang, Y., Gao, S.: Segment any-quality images with generative latent space enhancement. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pp. 2366–2376 (2025)

  8. [8]

    Segment Anything

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  9. [9]

    SAM 2: Segment Anything in Images and Videos

    Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., R¨ adle, R., Rolland, C., Gustafson, L., et al.: Sam2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

  10. [10]

    IEEE Transactions on Geoscience and Remote Sensing (2024)

    Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.: Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Transactions on Geoscience and Remote Sensing (2024)

  11. [11]

    Nature Communications15(1), 654 (2024)

    Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications15(1), 654 (2024)

  12. [12]

    Medical Image Analysis89, 102918 (2023) 29

    Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N., Zhang, Y.: Segment anything model for medical image analysis: an experimental study. Medical Image Analysis89, 102918 (2023) 29

  13. [14]

    Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

    Yuan, H., Li, X., Zhang, T., Huang, Z., Xu, S., Ji, S., Tong, Y., Qi, L., Feng, J., Yang, M.-H.: Sa2va: Marrying sam2 with llava for dense grounded understanding of images and videos. arXiv preprint arXiv:2501.04001 (2025)

  14. [15]

    International Journal of Applied Earth Observation and Geoinformation124, 103540 (2023)

    Osco, L.P., Wu, Q., Lemos, E.L., Gon¸ calves, W.N., Ramos, A.P.M., Li, J., Junior, J.M.: The segment anything model (sam) for remote sensing applica- tions: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation124, 103540 (2023)

  15. [16]

    IEEE Transactions on Geoscience and Remote Sensing (2024)

    Ding, L., Zhu, K., Peng, D., Tang, H., Yang, K., Bruzzone, L.: Adapting seg- ment anything model for change detection in vhr remote sensing images. IEEE Transactions on Geoscience and Remote Sensing (2024)

  16. [17]

    In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023)

    Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.: Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023)

  17. [18]

    8355–8365 (2024)

    Ren, S., Luzi, F., Lahrichi, S., Kassaw, K., Collins, L.M., Bradbury, K., Malof, J.M.: Segment anything, from space? In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 8355–8365 (2024)

  18. [19]

    In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (2024)

    Guo, G., Shao, D., Zhu, C., Meng, S., Wang, X., Gao, S.: P2p: Transforming from point supervision to explicit visual prompt for object detection and seg- mentation. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (2024)

  19. [20]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Chen, W.-T., Vong, Y.-J., Kuo, S.-Y., Ma, S., Wang, J.: Robustsam: Segment anything robustly on degraded images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4081–4091 (2024)

  20. [21]

    Pattern Recognition, 110685 (2024)

    Wang, Y., Zhao, Y., Petzold, L.: An empirical study on the robustness of the segment anything model (sam). Pattern Recognition, 110685 (2024)

  21. [22]

    arXiv preprint arXiv:2306.07713 (2023)

    Qiao, Y., Zhang, C., Kang, T., Kim, D., Zhang, C., Hong, C.S.: Robust- ness of sam: Segment anything under corruptions and beyond. arXiv preprint arXiv:2306.07713 (2023)

  22. [23]

    arXiv preprint arXiv:2305.16220 (2023)

    Huang, Y., Cao, Y., Li, T., Juefei-Xu, F., Lin, D., Tsang, I.W., Liu, Y., Guo, Q.: On the robustness of segment anything. arXiv preprint arXiv:2305.16220 (2023)

  23. [24]

    arXiv preprint 30 arXiv:2306.13290 (2023)

    Shan, X., Zhang, C.: Robustness of segment anything model (sam) for autonomous driving in adverse weather conditions. arXiv preprint 30 arXiv:2306.13290 (2023)

  24. [25]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 16(3), 267–276 (1994)

    Healey, G.E., Kondepudy, R.: Radiometric ccd camera calibration and noise estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(3), 267–276 (1994)

  25. [26]

    IEEE transactions on pattern analysis and machine intelligence30(2), 299–314 (2007)

    Liu, C., Szeliski, R., Kang, S.B., Zitnick, C.L., Freeman, W.T.: Automatic esti- mation and removal of noise from a single image. IEEE transactions on pattern analysis and machine intelligence30(2), 299–314 (2007)

  26. [27]

    IEEE Transactions on Information Forensics and Security1(2), 205–214 (2006)

    Lukas, J., Fridrich, J., Goljan, M.: Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security1(2), 205–214 (2006)

  27. [28]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp

    Endo, K., Tanaka, M., Okutomi, M.: Semantic segmentation of degraded images using layer-wise feature adjustor. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3205–3213 (2023)

  28. [29]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5191–5198 (2020)

  29. [30]

    Advances in neural information processing systems, 6840–6851 (2020)

    Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems, 6840–6851 (2020)

  30. [31]

    In: Interna- tional Conference on Learning Representations (2021)

    Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: Interna- tional Conference on Learning Representations (2021)

  31. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

  32. [33]

    Advances in Neural Information Processing Systems35, 25278–25294 (2022)

    Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M.,et al.: Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems35, 25278–25294 (2022)

  33. [34]

    ICLR1(2), 3 (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.,et al.: Lora: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)

  34. [35]

    In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp

    Liew, J.H., Cohen, S., Price, B., Mai, L., Feng, J.: Deep interactive thin object selection. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 305–314 (2021)

  35. [36]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance 31 segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5356–5364 (2019)

  36. [37]

    In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp

    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ ar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer

  37. [38]

    IEEE transactions on pattern analysis and machine intelligence37(3), 569–582 (2014)

    Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.-M.: Global con- trast based salient region detection. IEEE transactions on pattern analysis and machine intelligence37(3), 569–582 (2014)

  38. [39]

    IEEE transactions on pattern analysis and machine intelligence38(4), 717–729 (2015)

    Shi, J., Yan, Q., Xu, L., Jia, J.: Hierarchical image saliency detection on extended cssd. IEEE transactions on pattern analysis and machine intelligence38(4), 717–729 (2015)

  39. [40]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914 (2021)

  40. [41]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degra- dation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4791–4800 (2021)

  41. [42]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Kamann, C., Rother, C.: Benchmarking the robustness of semantic segmentation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8828–8838 (2020)

  42. [43]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Rajagopalan, A.,et al.: Improving robustness of semantic segmentation to motion-blur using class-centric augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10470–10479 (2023)

  43. [44]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Lee, S., Son, T., Kwak, S.: Fifo: Learning fog-invariant features for foggy scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18911–18921 (2022)

  44. [45]

    IEEE Transactions on Image Processing29, 782–795 (2019)

    Guo, D., Pei, Y., Zheng, K., Yu, H., Lu, Y., Wang, S.: Degraded image semantic segmentation with dense-gram networks. IEEE Transactions on Image Processing29, 782–795 (2019)

  45. [46]

    In: European Conference on Computer Vision, pp

    Son, T., Kang, J., Kim, N., Cho, S., Kwak, S.: Urie: Universal image enhance- ment for visual recognition in the wild. In: European Conference on Computer Vision, pp. 749–765 (2020). Springer

  46. [47]

    In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Kim, I., Han, S., Baek, J.-W., Park, S.-J., Han, J.-J., Shin, J.: Quality-agnostic 32 image recognition via invertible decoder. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12252–12261 (2021)

  47. [48]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

    Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W.T., Park, T.: One-step diffusion with distribution matching distillation. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  48. [49]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

    Nguyen, T.H., Tran, A.: Swiftbrush: One-step text-to-image diffusion model with variational score distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  49. [50]

    arXiv preprint arXiv:2112.00390 (2021)

    Amit, T., Shaharbany, T., Nachmani, E., Wolf, L.: Segdiff: Image segmentation with diffusion probabilistic models. arXiv preprint arXiv:2112.00390 (2021)

  50. [51]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Zhao, W., Rao, Y., Liu, Z., Liu, B., Zhou, J., Lu, J.: Unleashing text-to- image diffusion models for visual perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5729–5739 (2023)

  51. [52]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Ji, Y., Chen, Z., Xie, E., Hong, L., Liu, X., Liu, Z., Lu, T., Li, Z., Luo, P.: Ddp: Diffusion model for dense visual prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21741–21752 (2023)

  52. [53]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Pnvr, K., Singh, B., Ghosh, P., Siddiquie, B., Jacobs, D.: Ld-znet: A latent diffusion approach for text-based image segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4157–4168 (2023)

  53. [54]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Chen, S., Sun, P., Song, Y., Luo, P.: Diffusiondet: Diffusion model for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19830–19843 (2023)

  54. [55]

    Advances in Neural Information Processing Systems36, 79761–79780 (2023)

    Wang, M., Ding, H., Liew, J.H., Liu, J., Zhao, Y., Wei, Y.: Segrefiner: Towards model-agnostic segmentation refinement with discrete diffusion process. Advances in Neural Information Processing Systems36, 79761–79780 (2023)

  55. [56]

    In: International Confer- ence on Learning Representations (2022)

    Baranchuk, D., Voynov, A., Rubachev, I., Khrulkov, V., Babenko, A.: Label- efficient semantic segmentation with diffusion models. In: International Confer- ence on Learning Representations (2022)

  56. [57]

    arXiv preprint arXiv:2308.15070 (2023)

    Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Ouyang, W., Qiao, Y., Dong, C.: Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070 (2023)

  57. [58]

    One-step effective diffusion network for real-world image super-resolution.arXiv preprint arXiv:2406.08177, 2024

    Wu, R., Sun, L., Ma, Z., Zhang, L.: One-step effective diffusion network for real-world image super-resolution. arXiv preprint arXiv:2406.08177 (2024) 33

  58. [59]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Chen, T., Li, L., Saxena, S., Hinton, G., Fleet, D.J.: A generalist framework for panoptic segmentation of images and videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 909–919 (2023)

  59. [60]

    In: Medical Imaging with Deep Learning, pp

    Wu, J., Fu, R., Fang, H., Zhang, Y., Yang, Y., Xiong, H., Liu, H., Xu, Y.: Medsegdiff: Medical image segmentation with diffusion probabilistic model. In: Medical Imaging with Deep Learning, pp. 1623–1639 (2024). PMLR

  60. [61]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Wu, J., Ji, W., Fu, H., Xu, M., Jin, Y., Xu, Y.: Medsegdiff-v2: Diffusion-based medical image segmentation with transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 6030–6038 (2024)

  61. [62]

    In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp

    Gu, Z., Chen, H., Xu, Z.: Diffusioninst: Diffusion model for instance segmenta- tion. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2730–2734 (2024). IEEE

  62. [63]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Kondapaneni, N., Marks, M., Knott, M., Guimaraes, R., Perona, P.: Text-image alignment for diffusion-based perception. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13883–13893 (2024)

  63. [64]

    arXiv preprint arXiv:2307.02138 (2023)

    Gong, R., Danelljan, M., Sun, H., Mangas, J.D., Van Gool, L.: Prompting dif- fusion representations for cross-domain semantic segmentation. arXiv preprint arXiv:2307.02138 (2023)

  64. [65]

    International Journal of Computer Vision 132(12), 5929–5949 (2024)

    Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

  65. [66]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Yu, F., Gu, J., Li, Z., Hu, J., Kong, X., Wang, X., He, J., Qiao, Y., Dong, C.: Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 25669–25680 (2024)

  66. [67]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Whang, J., Delbracio, M., Talebi, H., Saharia, C., Dimakis, A.G., Milanfar, P.: Deblurring via stochastic refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16293–16303 (2022)

  67. [68]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Ren, M., Delbracio, M., Talebi, H., Gerig, G., Milanfar, P.: Multiscale struc- ture guided diffusion for image deblurring. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10721–10733 (2023)

  68. [69]

    ACM Transactions on Graphics (TOG)42(6), 1–14 (2023) 34

    Jiang, H., Luo, A., Fan, H., Han, S., Liu, S.: Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG)42(6), 1–14 (2023) 34

  69. [70]

    In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision, pp

    Wang, Y., Yu, Y., Yang, W., Guo, L., Chau, L.-P., Kot, A.C., Wen, B.: Exposurediffusion: Learning to expose for low-light image enhancement. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12438–12448 (2023)

  70. [71]

    Advances in Neural Information Processing Systems 36(2024)

    Ke, L., Ye, M., Danelljan, M., Tai, Y.-W., Tang, C.-K., Yu, F., et al.: Segment anything in high quality. Advances in Neural Information Processing Systems 36(2024)

  71. [72]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Li, B., Xiao, H., Tang, L.: Asam: Boosting segment anything model with adver- sarial tuning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3699–3710 (2024)

  72. [73]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Wei, Z., Chen, P., Yu, X., Li, G., Jiao, J., Han, Z.: Semantic-aware sam for point-prompted instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3585–3594 (2024)

  73. [74]

    Medical SAM 2: Segment medical images as video via segment anything model 2.CoRR, abs/2408.00874, 2024

    Zhu, J., Qi, Y., Wu, J.: Medical sam 2: Segment medical images as video via segment anything model 2. arXiv preprint arXiv:2408.00874 (2024)

  74. [75]

    arXiv preprint arXiv:2408.02635 (2024)

    Shen, C., Li, W., Shi, Y., Wang, X.: Interactive 3d medical image segmentation with sam 2. arXiv preprint arXiv:2408.02635 (2024)

  75. [76]

    In: European Conference on Computer Vision, pp

    Ye, M., Danelljan, M., Yu, F., Ke, L.: Gaussian grouping: Segment and edit anything in 3d scenes. In: European Conference on Computer Vision, pp. 162– 179 (2025). Springer

  76. [77]

    arXiv preprint arXiv:2304.06022 (2023)

    Ji, G.-P., Fan, D.-P., Xu, P., Cheng, M.-M., Zhou, B., Van Gool, L.: Sam strug- gles in concealed scenes–empirical study on” segment anything”. arXiv preprint arXiv:2304.06022 (2023)

  77. [78]

    arXiv preprint arXiv:2304.04709 (2023)

    Tang, L., Xiao, H., Li, B.: Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709 (2023)

  78. [79]

    Machine Intelligence Research

    Ji, W., Li, J., Bi, Q., Liu, T., Li, W., Cheng, L.: Segment anything is not always perfect: An investigation of sam on different real-world applications. Machine Intelligence Research

  79. [80]

    arXiv preprint arXiv:2304.12620 (2023)

    Wu, J., Ji, W., Liu, Y., Fu, H., Xu, M., Xu, Y., Jin, Y.: Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)

  80. [81]

    arXiv preprint arXiv:2304.09148 (2023) 35

    Chen, T., Zhu, L., Ding, C., Cao, R., Zhang, S., Wang, Y., Li, Z., Sun, L., Mao, P., Zang, Y.: Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv preprint arXiv:2304.09148 (2023) 35

Showing first 80 references.