arxiv: 2604.12113 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.AI

Recognition: unknown

PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation

Minjae Lee, Soojin Hwang, Sungwoo Hur, Won Hwa Kim

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords prompt refinementin-context segmentationsegment anything modelgradient flowtest-time adaptationmask decodervisual inconsistenciesone-shot segmentation

0 comments

The pith

Gradient flow from the mask decoder refines prompts to improve in-context segmentation without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PR-MaGIC as a training-free test-time method that refines initial prompts for segmenting query images by using gradient flow from the mask decoder in models like SAM. It targets the problem of sub-optimal prompts caused by visual inconsistencies between support and query images in existing in-context segmentation approaches. The method integrates into current frameworks, uses a top-1 selection strategy for stability, and shows consistent segmentation gains across benchmarks. This matters because it removes the need for manual prompt tuning or model retraining while directly addressing a key failure mode in few-shot visual segmentation.

Core claim

The central claim is that gradients derived from the mask decoder can be used to refine prompts at test time, correcting for mismatches between support and query images in in-context segmentation. This refinement is theoretically grounded in the SAM architecture and practically stabilized by selecting the top-1 refined prompt, leading to better segmentation quality on various benchmarks without any additional training or architectural changes.

What carries the argument

The mask decoder gradient flow, which back-propagates signals from the predicted mask to adjust prompt embeddings and mitigate visual inconsistencies between support and query images.

Load-bearing premise

Gradient signals from the mask decoder reliably indicate and correct visual inconsistencies between support and query images in a way that improves final masks without creating new instabilities.

What would settle it

Apply the refinement on a benchmark dataset with known large visual differences between support and query images and measure whether the refined-prompt IoU scores fall to or below the baseline in-context method scores.

Figures

Figures reproduced from arXiv: 2604.12113 by Minjae Lee, Soojin Hwang, Sungwoo Hur, Won Hwa Kim.

**Figure 1.** Figure 1: Illustration of PR-MaGIC. PR-MaGIC iteratively refines prompts for segmentation by updating the embedding vector distribution ρt with mask decoder gradient flow. This process minimizes the KL divergence between ρt and the target embedding vector distribution µ from the ground truth mask. At each iteration t, given an initial set of prompts and their corresponding segmentation masks, the embedding vector… view at source ↗

**Figure 2.** Figure 2: Example of prompts (green dots) and segmentation results (red masks). (a) Support Image (elephant), (b) and (c) Misaligned prompts and segmentation results from PerSAM-F and Matcher, (d) Result from Matcher refined by PR-MaGIC, (e) Ground Truth. reduce ambiguity. Matcher [14] assembles existing VFMs with sophisticated prompt sampling to solve one/few-shot segmentation without extra training. Although the m… view at source ↗

**Figure 3.** Figure 3: Overview of PR-MaGIC. Encoder box denotes image encoder (Eθ), sim box denotes similarity computing module, and Mask Decoder box denotes SAM’s mask decoder. SAM’s prompt encoder is omitted here to show the clear flow of our method, and green boxes denote our core method. 1) Image Embedding: Encoder maps the support and query images to embeddings z s 0 and z q 0 , with z s 0 selected using the given support … view at source ↗

**Figure 4.** Figure 4: Refinement process of PR-MaGIC. Curated illustrations of refinement process with PerSAM-F as the baseline. a) Target objects in the support images with blue masks (bear (top) and airplane (bottom)), b) Segmentation results from PerSAMF with point prompts in green dots. c), d) Refined prompt and segmentation after one and five iterations, e) Ground truth (G.T.). mask filtering, and robust sampling [14, 32… view at source ↗

**Figure 5.** Figure 5: Qualitative results for semantic and part segmentation. Baselines from top to bottom: PerSAM-F, Matcher. a): Semantic segmentation. Target objects in support images are highlighted with blue masks, and point prompts are denoted by green dots. b): Part segmentation. Here, target objects are highlighted with blue masks in green boxes, and point prompts are similarly marked and emphasized. Refinement with PR-… view at source ↗

**Figure 6.** Figure 6: Sensitivity analysis on the step size η and iterations T shows that the gradient flow can substantially improve segmentation performance but may exhibit instability under certain conditions, which motivates top-1 mask selection. • η ∈ {10−4 , 10−5} converges slowly and often saturates below the best achievable mIoU (under-refinement). Implication and design choice. Given this practical difficulty in conve… view at source ↗

**Figure 7.** Figure 7: , PR-MaGIC fails to localize fine-grained parts of a bicycle (2nd row) and underperforms when the support shows a generic tray while the query target is a specific boundary region (3rd row). These observations collectively emphasize (a) Support (b) Baseline (c) t = 1 (d) t = 5 (e) GT [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Iterative refinement behavior of PR-MaGIC for Matcher. Each example shows the support image, refinement outputs from [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Iterative refinement behavior of PR-MaGIC for PerSAM. Each example shows the support image, refinement outputs from [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative result of PR-MaGIC with PerSAM-F in semantic segmentation. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative result of PR-MaGIC with Matcher in semantic segmentation. [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative result of PR-MaGIC with PerSAM-F in part segmentation. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative result of PR-MaGIC with Matcher in part segmentation. [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative result of five shot semantic segmentation (Matcher). The test samples are separated by dividers. For each sample, the [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

**Figure 15.** Figure 15: Qualitative result of five shot part segmentation (Matcher). The test samples are separated by dividers. For each sample, the upper [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗

**Figure 16.** Figure 16: Illustration of prompt refinement of PR-MaGIC [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗

read the original abstract

Visual Foundation Models (VFMs) such as the Segment Anything Model (SAM) have significantly advanced broad use of image segmentation. However, SAM and its variants necessitate substantial manual effort for prompt generation and additional training for specific applications. Recent approaches address these limitations by integrating SAM into in-context (one/few shot) segmentation, enabling auto-prompting through semantic alignment between query and support images. Despite these efforts, they still generate sub-optimal prompts that degrade segmentation quality due to visual inconsistencies between support and query images. To tackle this limitation, we introduce PR-MaGIC (Prompt Refinement via Mask Decoder Gradient Flow for In-Context Segmentation), a training-free test-time framework that refines prompts via gradient flow derived from SAM's mask decoder. PR-MaGIC seamlessly integrates into in-context segmentation frameworks, being theoretically grounded yet practically stabilized through a simple top-1 selection strategy that ensures robust performance across samples. Extensive evaluations demonstrate that PR-MaGIC consistently improves segmentation quality across various benchmarks, effectively mitigating inadequate prompts without requiring additional training or architectural modifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces PR-MaGIC, a training-free test-time framework for in-context segmentation that refines initial prompts by back-propagating gradients through SAM's frozen mask decoder. It integrates with existing auto-prompting methods to address visual inconsistencies between support and query images, employs a top-1 selection heuristic for stability, and claims consistent benchmark improvements without additional training or architectural changes.

Significance. If validated, the approach would offer a practical, plug-and-play enhancement to prompt-based in-context segmentation in vision foundation models. Its training-free nature and use of existing decoder gradients could reduce manual effort in few-shot settings and inspire similar test-time refinements, provided the gradient signals reliably correct rather than amplify prompt errors.

major comments (3)

[§3 (Method)] §3 (Method): The central mechanism assumes that gradients from the mask decoder will produce prompt updates that reduce support-query visual mismatch. No derivation, loss formulation, or analysis of the non-convex optimization landscape is provided to show why this corrects rather than reinforces errors when initial prompts are severely misaligned (e.g., large scale or appearance differences).
[§4 (Experiments)] §4 (Experiments): The claims of 'consistent improvements across various benchmarks' and 'extensive evaluations' are not accompanied by any reported metrics, ablation tables, error bars, or controls isolating the gradient refinement from the top-1 selection strategy. This makes it impossible to assess whether gains are robust or due to post-hoc choices.
[Abstract and §1] Abstract and §1: The top-1 selection strategy is presented as practical stabilization, yet no analysis or experiments demonstrate that at least one sampled trajectory reaches a useful basin or that the method avoids sensitivity to initialization in challenging cases.

minor comments (2)

[Abstract] The abstract sentence 'being theoretically grounded yet practically stabilized through a simple top-1 selection strategy' is somewhat vague; a brief clarification of what 'theoretically grounded' refers to would improve readability.
[§3 (Method)] Notation for the prompt update rule and gradient flow could be formalized with an equation in §3 to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [§3 (Method)] §3 (Method): The central mechanism assumes that gradients from the mask decoder will produce prompt updates that reduce support-query visual mismatch. No derivation, loss formulation, or analysis of the non-convex optimization landscape is provided to show why this corrects rather than reinforces errors when initial prompts are severely misaligned (e.g., large scale or appearance differences).

Authors: We thank the referee for highlighting this important aspect. The loss formulation minimizes a consistency objective between the mask decoder output on the refined prompt and a pseudo-target derived from support-query feature alignment. However, we acknowledge that a formal derivation and analysis of the non-convex optimization landscape were not provided. In the revised version, we will include an explicit loss function definition, a step-by-step derivation of the gradient flow, and an empirical analysis (including loss landscape visualizations on misaligned cases) to show that updates tend to correct rather than reinforce errors when combined with top-1 selection. revision: yes
Referee: [§4 (Experiments)] §4 (Experiments): The claims of 'consistent improvements across various benchmarks' and 'extensive evaluations' are not accompanied by any reported metrics, ablation tables, error bars, or controls isolating the gradient refinement from the top-1 selection strategy. This makes it impossible to assess whether gains are robust or due to post-hoc choices.

Authors: We thank the referee for pointing this out. Upon review, the submitted version claimed improvements but did not present the metrics, ablations, or error bars with sufficient clarity or controls. In the revised manuscript, we will add comprehensive tables with mIoU and Dice scores across benchmarks (COCO, PASCAL, etc.), explicit ablations isolating gradient refinement from top-1 selection, and error bars from multiple random seeds to demonstrate robustness. revision: yes
Referee: [Abstract and §1] Abstract and §1: The top-1 selection strategy is presented as practical stabilization, yet no analysis or experiments demonstrate that at least one sampled trajectory reaches a useful basin or that the method avoids sensitivity to initialization in challenging cases.

Authors: We agree that additional validation is warranted. The top-1 heuristic is based on the empirical observation that multiple trajectories frequently include at least one improved prompt. In the revision, we will add experiments showing performance distributions across trajectories on challenging cases (large scale/appearance differences), the fraction of cases where at least one trajectory reaches a useful basin, and sensitivity analysis to initialization, supported by new figures and discussion. revision: yes

Circularity Check

0 steps flagged

No circularity: new test-time gradient refinement is independent of fitted inputs or self-citation chains

full rationale

The paper introduces PR-MaGIC as a training-free test-time optimization that refines prompts by back-propagating through SAM's frozen mask decoder and applies a top-1 selection heuristic. This procedure is presented as an additive mechanism on top of existing in-context segmentation pipelines rather than a quantity derived from or fitted to the paper's own data or prior self-citations. No equation or claim reduces the output performance gain to a re-expression of the input prompt quality or to a self-referential theorem; the central improvement is therefore self-contained and externally falsifiable on benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the existing SAM architecture and standard in-context segmentation pipelines; no new free parameters, axioms beyond domain-standard assumptions, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Gradient information from SAM's mask decoder can be used to improve prompt quality for visually inconsistent support-query pairs
This is the core premise enabling the refinement step.

pith-pipeline@v0.9.0 · 5493 in / 1266 out tokens · 35527 ms · 2026-05-10T15:14:37.903476+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Refin- ing deep generative models via discriminator gradient flow

Abdul Fatir Ansari, Ming Liang Ang, and Harold Soh. Refin- ing deep generative models via discriminator gradient flow. In International Conference on Learning Representations, 2021. 3

2021
[2]

Lan- guage models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901, 2020. 1

1901
[3]

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End- to-end object detection with transformers. InEuropean con- ference on computer vision, pages 213–229. Springer, 2020. 1

2020
[4]

Sam-adapter: Adapting segment any- thing in underperformed scenes

Tianrun Chen, Lanyun Zhu, Chaotao Deng, Runlong Cao, Yan Wang, Shangzhan Zhang, Zejian Li, Lingyun Sun, Ying Zang, and Papa Mao. Sam-adapter: Adapting segment any- thing in underperformed scenes. InIEEE/CVF International Conference on Computer Vision, pages 3367–3375, 2023. 1, 3

2023
[5]

Detect what you can: De- tecting and representing objects using holistic models and body parts

Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. Detect what you can: De- tecting and representing objects using holistic models and body parts. InIEEE conference on computer vision and pattern recognition, pages 1971–1978, 2014. 6

1971
[6]

Per- pixel classification is not all you need for semantic segmen- tation.Advances in Neural Information Processing Systems, 34:17864–17875, 2021

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classification is not all you need for semantic segmen- tation.Advances in Neural Information Processing Systems, 34:17864–17875, 2021. 1

2021
[7]

An image is worth 16x16 words: Transformers for image recognition at scale, 2021

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. 1, 6

2021
[8]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88:303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88:303–338, 2010. 6

2010
[9]

Cost aggregation with 4d convolutional swin transformer for few-shot segmentation

Sunghwan Hong, Seokju Cho, Jisu Nam, Stephen Lin, and Seungryong Kim. Cost aggregation with 4d convolutional swin transformer for few-shot segmentation. InEuropean Conference on Computer Vision, pages 108–126. Springer,
[10]

Segment anything in high quality.Advances in Neural Information Processing Systems, 36, 2024

Lei Ke, Mingqiao Ye, Martin Danelljan, Yu-Wing Tai, Chi- Keung Tang, Fisher Yu, et al. Segment anything in high quality.Advances in Neural Information Processing Systems, 36, 2024. 1, 3

2024
[11]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InIEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023. 1

2023
[12]

Springer, 1992

Peter E Kloeden, Eckhard Platen, Peter E Kloeden, and Eck- hard Platen.Stochastic differential equations. Springer, 1992. 3

1992
[13]

Fss-1000: A 1000-class dataset for few-shot segmentation

Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, and Chi-Keung Tang. Fss-1000: A 1000-class dataset for few-shot segmentation. InIEEE/CVF conference on computer vision and pattern recognition, pages 2869–2878, 2020. 6

2020
[14]

Matcher: Segment anything with one shot using all-purpose feature matching

Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching. InInternational Conference on Learning Representations, 2024. 2, 3, 4, 5, 6

2024
[15]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 1

2021
[16]

Feature weighting and boosting for few-shot segmentation

Khoi Nguyen and Sinisa Todorovic. Feature weighting and boosting for few-shot segmentation. InIEEE/CVF Interna- tional Conference on Computer Vision, pages 622–631, 2019. 6

2019
[17]

Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Laba...
[18]

Highly accurate dichotomous image segmentation

Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly accurate dichotomous image segmentation. InECCV, 2022. 6

2022
[19]

Paco: Parts and attributes of common objects

Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian, et al. Paco: Parts and attributes of common objects. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7141–7151,
[20]

The fokker-planck equation, 1996

H Risken. The fokker-planck equation, 1996. 3

1996
[21]

{Euclidean, metric, and Wasserstein} gradient flows: an overview.Bulletin of Mathematical Sci- ences, 7:87–154, 2017

Filippo Santambrogio. {Euclidean, metric, and Wasserstein} gradient flows: an overview.Bulletin of Mathematical Sci- ences, 7:87–154, 2017. 3

2017
[22]

One-shot learning for semantic segmentation

Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, and Byron Boots. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410, 2017. 2

work page arXiv 2017
[23]

Segmenter: Transformer for semantic segmentation

Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. Segmenter: Transformer for semantic segmentation. InIEEE/CVF international conference on computer vision, pages 7262–7272, 2021. 1

2021
[24]

Density ratio estimation in machine learning

Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio estimation in machine learning. Cambridge University Press, 2012. 4

2012
[25]

Vrp-sam: Sam with visual reference prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, and Zechao Li. Vrp-sam: Sam with visual reference prompt. InIEEE/CVF 9 Conference on Computer Vision and Pattern Recognition, pages 23565–23574, 2024. 3

2024
[26]

Training data-efficient image transformers & distillation through atten- tion

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through atten- tion. InInternational conference on machine learning, pages 10347–10357. PMLR, 2021. 1

2021
[27]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Mar- tinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Roz- ière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023. 1

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Medical SAM adapter: Adapting seg- ment anything model for medical image segmentation.arXiv preprint arXiv:2304.12620, 2023

Junde Wu, Wei Ji, Yuanpei Liu, Huazhu Fu, Min Xu, Yanwu Xu, and Yueming Jin. Medical sam adapter: Adapting seg- ment anything model for medical image segmentation.arXiv preprint arXiv:2304.12620, 2023. 1

work page arXiv 2023
[29]

GLM-130B: An Open Bilingual Pre-trained Model

Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022. 1

work page internal anchor Pith review arXiv 2022
[30]

Faster segment anything: Towards lightweight sam for mobile applications,

Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, and Choong Seon Hong. Faster segment anything: Towards lightweight sam for mobile appli- cations.arXiv preprint arXiv:2306.14289, 2023. 3

work page arXiv 2023
[31]

Feature- proxy transformer for few-shot segmentation

Jian-Wei Zhang, Yifan Sun, Yi Yang, and Wei Chen. Feature- proxy transformer for few-shot segmentation. InAdvances in Neural Information Processing Systems, 2022. 2

2022
[32]

Personalize segment anything model with one shot

Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junt- ing Pan, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li. Personalize segment anything model with one shot. In International Conference on Learning Representations, 2024. 1, 2, 3, 4, 5, 6 10 PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation Supplementary Material ...

2024