Recognition: unknown
Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection
Pith reviewed 2026-05-10 16:25 UTC · model grok-4.3
The pith
Modality-agnostic prompts let the Segment Anything Model adapt to any auxiliary sensor for camouflaged object detection without custom fusion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that multi-modal learning for camouflaged object detection can be reduced to generating unified modality-agnostic prompts: interactions between a data-driven content domain and a knowledge-driven prompt domain distill complementary cues into prompts that SAM can decode directly, with a subsequent Mask Refine Module incorporating fine-grained prompt information to correct coarse segmentations and improve boundary accuracy.
What carries the argument
Modality-agnostic multi-modal prompts produced by domain interactions between data-driven content and knowledge-driven prompts, then fed to SAM for decoding.
If this is right
- Performance improves on RGB-Depth, RGB-Thermal, and RGB-Polarization COD benchmarks relative to prior multi-modal methods.
- Parameter-efficient adaptation becomes possible for new auxiliary modalities without retraining the entire model.
- Coarse SAM outputs receive boundary corrections from the Mask Refine Module that uses the same unified prompts.
- Customized fusion modules or modality-specific encoders are no longer required for each sensor combination.
Where Pith is reading between the lines
- The same prompt-generation pattern could be applied to other foundation segmentation models or to tasks outside camouflaged detection such as medical or remote-sensing imagery.
- Real-world systems that switch between sensor suites could keep one prompt set and one decoder rather than maintaining separate pipelines.
- Extending the content-prompt interaction to three or more simultaneous modalities would test whether the unification step remains stable.
- If the prompts transfer across datasets collected under different lighting or weather, deployment cost for multi-sensor field systems would drop further.
Load-bearing premise
A single fixed set of prompts can extract and use useful complementary signals from any added visual modality without needing designs tailored to that modality.
What would settle it
Training the prompts on RGB-Depth and RGB-Thermal data then testing on a held-out RGB-Polarization set or a fresh modality such as RGB-LiDAR and measuring whether accuracy falls below a modality-specific baseline would directly test the claim.
Figures
read the original abstract
Camouflaged Object Detection (COD) aims to segment objects that blend seamlessly into complex backgrounds, with growing interest in exploiting additional visual modalities to enhance robustness through complementary information. However, most existing approaches generally rely on modality-specific architectures or customized fusion strategies, which limit scalability and cross-modal generalization. To address this, we propose a novel framework that generates modality-agnostic multi-modal prompts for the Segment Anything Model (SAM), enabling parameter-efficient adaptation to arbitrary auxiliary modalities and significantly improving overall performance on COD tasks. Specifically, we model multi-modal learning through interactions between a data-driven content domain and a knowledge-driven prompt domain, distilling task-relevant cues into unified prompts for SAM decoding. We further introduce a lightweight Mask Refine Module to calibrate coarse predictions by incorporating fine-grained prompt cues, leading to more accurate camouflaged object boundaries. Extensive experiments on RGB-Depth, RGB-Thermal, and RGB-Polarization benchmarks validate the effectiveness and generalization of our modality-agnostic framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a framework for multi-modal camouflaged object detection that generates modality-agnostic prompts for the Segment Anything Model (SAM). It models interactions between a data-driven content domain and a knowledge-driven prompt domain to distill task-relevant cues into unified prompts, introduces a lightweight Mask Refine Module for calibrating coarse predictions with fine-grained cues, and claims parameter-efficient adaptation to arbitrary auxiliary modalities with improved performance on COD tasks, validated on RGB-Depth, RGB-Thermal, and RGB-Polarization benchmarks.
Significance. If the central claims hold, the work offers a scalable alternative to modality-specific architectures in multi-modal segmentation by leveraging prompt-based adaptation of foundation models like SAM. This could enable more flexible incorporation of auxiliary modalities in challenging vision tasks such as camouflaged object detection without requiring custom fusion modules.
major comments (1)
- Abstract: The claim of enabling 'parameter-efficient adaptation to arbitrary auxiliary modalities' and a 'modality-agnostic' framework is not supported by the reported validation. Experiments are confined to three specific modality pairs (RGB-Depth, RGB-Thermal, RGB-Polarization), with no tests on unseen modalities, no evidence of a single shared encoder for truly arbitrary inputs, and no ablation confirming the absence of implicit modality-specific branches in the content-domain ingestion or prompt generator. This directly undermines the load-bearing assertion that the content-prompt domain interactions distill complementary cues without customized strategies.
minor comments (2)
- Abstract: The statement that the method 'significantly improving overall performance' is made without any quantitative metrics, baseline comparisons, or specific gains, which reduces the ability to assess the strength of the empirical claims at a glance.
- Abstract: The Mask Refine Module is introduced as 'lightweight' but without details on its parameter count, architecture, or integration point with SAM decoding, which would aid reproducibility.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's comments. We address the major concern about the abstract claims as follows.
read point-by-point responses
-
Referee: Abstract: The claim of enabling 'parameter-efficient adaptation to arbitrary auxiliary modalities' and a 'modality-agnostic' framework is not supported by the reported validation. Experiments are confined to three specific modality pairs (RGB-Depth, RGB-Thermal, RGB-Polarization), with no tests on unseen modalities, no evidence of a single shared encoder for truly arbitrary inputs, and no ablation confirming the absence of implicit modality-specific branches in the content-domain ingestion or prompt generator. This directly undermines the load-bearing assertion that the content-prompt domain interactions distill complementary cues without customized strategies.
Authors: We acknowledge that our experimental validation is limited to three modality pairs and does not include tests on entirely unseen modalities, which would provide stronger evidence for 'arbitrary' adaptation. However, the framework is modality-agnostic by design: the same prompt learning and interaction modules are used across all tested modalities without any customized fusion strategies or modality-specific components, as detailed in the method section. This is what enables parameter-efficient adaptation, where only a small number of parameters (prompts and the mask refine module) are learned for each new modality. We will revise the abstract to replace 'arbitrary' with 'diverse' or 'additional' auxiliary modalities to more accurately reflect the scope of our experiments. Additionally, we will add an ablation study demonstrating that the content-domain ingestion and prompt generator do not contain implicit modality-specific branches, by showing equivalent performance when using a unified encoder. We believe this addresses the concern while preserving the validity of the core claims. revision: partial
Circularity Check
No circularity: architectural proposal with independent empirical validation
full rationale
The paper proposes a framework that models multi-modal learning via interactions between a data-driven content domain and knowledge-driven prompt domain to distill cues into unified SAM prompts, plus a lightweight Mask Refine Module. This is a design choice and architectural contribution, not a mathematical derivation or parameter fit that reduces to its own inputs by construction. No equations, self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. Validation occurs on external RGB-Depth, RGB-Thermal, and RGB-Polarization benchmarks, which are independent of the internal claims. The modality-agnostic assertion is a generalization claim tested on three pairs rather than a tautology.
Axiom & Free-Parameter Ledger
invented entities (2)
-
modality-agnostic multi-modal prompts
no independent evidence
-
Mask Refine Module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Camouflaged object detection,
D.-P. Fan, G.-P. Ji, G. Sun, M.-M. Cheng, J. Shen, and L. Shao, “Camouflaged object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2777–2787
2020
-
[2]
Ip- net: Polarization-based camouflaged object detection via dual-flow network,
X. Wang, J. Ding, Z. Zhang, J. Xu, and J. Gao, “Ip- net: Polarization-based camouflaged object detection via dual-flow network,”Engineering Applications of Artifi- cial Intelligence, vol. 127, p. 107303, 2024
2024
-
[3]
Visible- infrared camouflaged object detection,
C. Liu, Z. Wang, X. Yan, M. Sun, and Q. Hu, “Visible- infrared camouflaged object detection,”IEEE Transac- tions on Circuits and Systems for Video Technology, pp. 1–1, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2025 9
2025
-
[4]
Depth-aided camouflaged object detection,
Q. Wang, J. Yang, X. Yu, F. Wang, P. Chen, and F. Zheng, “Depth-aided camouflaged object detection,” inProceedings of the 31st ACM international conference on multimedia, 2023, pp. 3297–3306
2023
-
[5]
Segment anything,
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.- Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026
2023
-
[6]
Com- prompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection,
X. Zhang, Z. Yu, L. Zhao, D.-P. Fan, and G. Xiao, “Com- prompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection,” Science China Information Sciences, vol. 68, no. 1, p. 112104, 2025
2025
-
[7]
Sam-adapter: Adapting segment anything in underperformed scenes,
T. Chen, L. Zhu, C. Deng, R. Cao, Y . Wang, S. Zhang, Z. Li, L. Sun, Y . Zang, and P. Mao, “Sam-adapter: Adapting segment anything in underperformed scenes,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision, 2023, pp. 3367–3375
2023
-
[8]
Explor- ing deeper! segment anything model with depth percep- tion for camouflaged object detection,
Z. Yu, X. Zhang, L. Zhao, Y . Bin, and G. Xiao, “Explor- ing deeper! segment anything model with depth percep- tion for camouflaged object detection,” inProceedings of the 32nd ACM international conference on multimedia, 2024, pp. 4322–4330
2024
-
[9]
Improving sam for camouflaged object detection via dual stream adapters,
J. Liu, L. Kong, and G. Chen, “Improving sam for camouflaged object detection via dual stream adapters,” arXiv preprint arXiv:2503.06042, 2025
-
[10]
Camouflaged object segmentation with distraction mining,
H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8772– 8781
2021
-
[11]
Boundary- guided camouflaged object detection,
Y . Sun, S. Wang, C. Chen, and T.-Z. Xiang, “Boundary- guided camouflaged object detection,” inIJCAI, 2022, pp. 1335–1341
2022
-
[12]
High-resolution iterative feedback network for camouflaged object detection,
X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, Y . Tai, C. Wang, and L. Shao, “High-resolution iterative feedback network for camouflaged object detection,” 2023. [Online]. Available: https://arxiv.org/abs/2203.11624
-
[13]
Frequency-spatial entanglement learning for camou- flaged object detection,
Y . Sun, C. Xu, J. Yang, H. Xuan, and L. Luo, “Frequency-spatial entanglement learning for camou- flaged object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 343–360
2024
-
[14]
Uncertainty-guided diffusion model for cam- ouflaged object detection,
J. Yang, B. Zhong, Q. Liang, Z. Mo, S. Zhang, and S. Song, “Uncertainty-guided diffusion model for cam- ouflaged object detection,”IEEE Transactions on Multi- media, vol. 27, pp. 4656–4669, 2025
2025
-
[15]
Depth-aware concealed crop detection in dense agricul- tural scenes,
L. Wang, J. Yang, Y . Zhang, F. Wang, and F. Zheng, “Depth-aware concealed crop detection in dense agricul- tural scenes,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2024, pp. 17 201–17 211
2024
-
[16]
Multi- modal segment anything model for camouflaged scene segmentation,
G. Ren, H. Liu, M. Lazarou, and T. Stathaki, “Multi- modal segment anything model for camouflaged scene segmentation,” inProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), October 2025, pp. 19 882–19 892
2025
-
[17]
Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,
J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” inInternational confer- ence on machine learning. PMLR, 2022, pp. 12 888– 12 900
2022
-
[18]
Focus: Towards universal foreground segmentation,
Z. You, L. Kong, L. Meng, and Z. Wu, “Focus: Towards universal foreground segmentation,” 2025. [Online]. Available: https://arxiv.org/abs/2501.05238
-
[19]
Explicit vi- sual prompting for low-level structure segmentations,
W. Liu, X. Shen, C.-M. Pun, and X. Cun, “Explicit vi- sual prompting for low-level structure segmentations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 434–19 445
2023
-
[20]
Emerging properties in self-supervised vision transformers.arXiv preprint arXiv:2104.14294,
M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2104.14294
-
[21]
Learning Transferable Visual Models From Natural Language Supervision
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[22]
Boosting segment anything model to generalize visually non-salient scenarios,
G. Guo, P. Chen, Y . Guo, H. Chen, B. Zhang, and S. Gao, “Boosting segment anything model to generalize visually non-salient scenarios,”IEEE Transactions on Image Processing, 2026
2026
-
[23]
Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” 2021. [Online]. Available: https: //arxiv.org/abs/2102.12122
-
[24]
Feature shrinkage pyramid for camouflaged object detection with transformers,
Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, and H. Xiong, “Feature shrinkage pyramid for camouflaged object detection with transformers,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5557–5566
2023
-
[25]
Exploring figure-ground assignment mechanism in perceptual or- ganization,
W. Zhai, Y . Cao, J. Zhang, and Z.-J. Zha, “Exploring figure-ground assignment mechanism in perceptual or- ganization,”Advances in Neural Information Processing Systems, vol. 35, pp. 17 030–17 042, 2022
2022
-
[26]
Camouflaged object detection with feature decomposition and edge reconstruction,
C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, and X. Li, “Camouflaged object detection with feature decomposition and edge reconstruction,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 046–22 055
2023
-
[27]
Segment anything in high quality,
L. Ke, M. Ye, M. Danelljan, Y .-W. Tai, C.-K. Tang, F. Yu et al., “Segment anything in high quality,”Advances in Neural Information Processing Systems, vol. 36, pp. 29 914–29 934, 2023
2023
-
[28]
Focusdiffuser: Perceiving local disparities for camouflaged object detection,
J. Zhao, X. Li, F. Yang, Q. Zhai, A. Luo, Z. Jiao, and H. Cheng, “Focusdiffuser: Perceiving local disparities for camouflaged object detection,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 181–198
2024
-
[29]
Zoomnext: A unified collaborative pyramid network for camouflaged object detection,
Y . Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoomnext: A unified collaborative pyramid network for camouflaged object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
2024
-
[30]
T. Chen, A. Lu, L. Zhu, C. Ding, C. Yu, D. Ji, Z. Li, L. Sun, P. Mao, and Y . Zang, “Sam2-adapter: Evaluating & adapting segment anything 2 in downstream tasks: JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2025 10 Camouflage, shadow, medical image segmentation, and more,”arXiv preprint arXiv:2408.04579, 2024
-
[31]
Relax image-specific prompt requirement in sam: A single generic prompt for segmenting camouflaged objects,
J. Hu, J. Lin, S. Gong, and W. Cai, “Relax image-specific prompt requirement in sam: A single generic prompt for segmenting camouflaged objects,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 11, pp. 12 511–12 518
-
[32]
Segment anything in medical images,
J. Ma, Y . He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,”Nature Commu- nications, vol. 15, no. 1, p. 654, 2024
2024
-
[33]
A unet-like transformer network for camouflaged object detection,
F. Sun, J. Han, W. Wu, J. Sun, M. Wang, and H. Li, “A unet-like transformer network for camouflaged object detection,”IEEE Transactions on Multimedia, pp. 1–15, 2025
2025
-
[34]
Plantcamo: Plant camouflage detection,
J. Yang, Q. Wang, F. Zheng, P. Chen, A. Leonardis, and D.-P. Fan, “Plantcamo: Plant camouflage detection,” arXiv preprint arXiv:2410.17598, 2024
-
[35]
Escnet: Edge-semantic collaborative network for camouflaged object detection,
S. Ye, X. Chen, Y . Zhang, X. Lin, and L. Cao, “Escnet: Edge-semantic collaborative network for camouflaged object detection,” inProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, 2025, pp. 20 053–20 063
2025
-
[36]
Sam-ttt: Segment anything model via reverse parameter config- uration and test-time training for camouflaged object detection,
Z. Yu, L. Zhao, G. Xiao, and X. Zhang, “Sam-ttt: Segment anything model via reverse parameter config- uration and test-time training for camouflaged object detection,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 4030–4038
2025
-
[37]
St-sam: Sam-driven self-training framework for semi-supervised camouflaged object detection,
X. Hu, F. Sun, J. Liu, F. Xu, and X. Zhang, “St-sam: Sam-driven self-training framework for semi-supervised camouflaged object detection,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 8194–8203
2025
-
[38]
Specificity-preserving rgb-d saliency detection,
T. Zhou, H. Fu, G. Chen, Y . Zhou, D.-P. Fan, and L. Shao, “Specificity-preserving rgb-d saliency detection,” inPro- ceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4681–4691
2021
-
[39]
Spsn: Superpixel prototype sampling network for rgb-d salient object de- tection (supplementary material)
M. Lee, C. Park, S. Cho, and S. Lee, “Spsn: Superpixel prototype sampling network for rgb-d salient object de- tection (supplementary material).”
-
[40]
Source- free depth for object pop-out,
Z. Wu, D. P. Paudel, D.-P. Fan, J. Wang, S. Wang, C. Demonceaux, R. Timofte, and L. Van Gool, “Source- free depth for object pop-out,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 1032–1042
2023
-
[41]
Con- cealed object detection,
D.-P. Fan, G.-P. Ji, M.-M. Cheng, and L. Shao, “Con- cealed object detection,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 6024–6042, 2021
2021
-
[42]
Zoom in and out: A mixed-scale triplet network for camouflaged object detection,
Y . Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 2160–2170
2022
-
[43]
I can find you! boundary-guided separated attention network for camouflaged object detection,
H. Zhu, P. Li, H. Xie, X. Yan, D. Liang, D. Chen, M. Wei, and J. Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” in Proceedings of the AAAI conference on artificial intelli- gence, vol. 36, no. 3, 2022, pp. 3608–3616
2022
-
[44]
Frequency perception network for camouflaged object detection,
R. Cong, M. Sun, S. Zhang, X. Zhou, W. Zhang, and Y . Zhao, “Frequency perception network for camouflaged object detection,” inProceedings of the 31st ACM in- ternational conference on multimedia, 2023, pp. 1179– 1189
2023
-
[45]
Efficient camouflaged object detection network based on global localization perception and local guidance refinement,
X. Hu, X. Zhang, F. Wang, J. Sun, and F. Sun, “Efficient camouflaged object detection network based on global localization perception and local guidance refinement,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 5452–5465, 2024
2024
-
[46]
Hierarchical graph interaction transformer with dynamic token clustering for camouflaged object detection,
S. Yao, H. Sun, T.-Z. Xiang, X. Wang, and X. Cao, “Hierarchical graph interaction transformer with dynamic token clustering for camouflaged object detection,”IEEE Transactions on Image Processing, 2024
2024
-
[47]
Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,
X. Xiong, Z. Wu, S. Tan, W. Li, F. Tang, Y . Chen, S. Li, J. Ma, and G. Li, “Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation,”Visual Intelligence, vol. 4, no. 1, p. 2, 2026
2026
-
[48]
Glass segmen- tation using intensity and spectral polarization cues,
H. Mei, B. Dong, W. Dong, J. Yang, S.-H. Baek, F. Heide, P. Peers, X. Wei, and X. Yang, “Glass segmen- tation using intensity and spectral polarization cues,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 622–12 631
2022
-
[49]
Polarization-based cam- ouflaged object detection with high-resolution adaptive fusion network,
X. Wang, J. Xu, and J. Ding, “Polarization-based cam- ouflaged object detection with high-resolution adaptive fusion network,”Engineering Applications of Artificial Intelligence, vol. 146, p. 110245, 2025
2025
-
[50]
Hrtransnet: Hrformer-driven two-modality salient object detection,
B. Tang, Z. Liu, Y . Tan, and Q. He, “Hrtransnet: Hrformer-driven two-modality salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 728–742, 2022
2022
-
[51]
Wavenet: Wavelet network with knowledge distillation for rgb-t salient object detection,
W. Zhou, F. Sun, Q. Jiang, R. Cong, and J.-N. Hwang, “Wavenet: Wavelet network with knowledge distillation for rgb-t salient object detection,”IEEE Transactions on Image Processing, vol. 32, pp. 3027–3039, 2023
2023
-
[52]
Alignment- free rgbt salient object detection: Semantics-guided asymmetric correlation network and a unified bench- mark,
K. Wang, D. Lin, C. Li, Z. Tu, and B. Luo, “Alignment- free rgbt salient object detection: Semantics-guided asymmetric correlation network and a unified bench- mark,”IEEE Transactions on Multimedia, vol. 26, pp. 10 692–10 707, 2024
2024
-
[53]
Vscode: General visual salient and camouflaged object detection with 2d prompt learning,
Z. Luo, N. Liu, W. Zhao, X. Yang, D. Zhang, D.-P. Fan, F. Khan, and J. Han, “Vscode: General visual salient and camouflaged object detection with 2d prompt learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 17 169–17 180
2024
-
[54]
Unified- modal salient object detection via adaptive prompt learn- ing,
K. Wang, Z. Tu, C. Li, Z. Liu, and B. Luo, “Unified- modal salient object detection via adaptive prompt learn- ing,”IEEE Transactions on Circuits and Systems for Video Technology, 2025
2025
-
[55]
Alignment- free rgb-t salient object detection: A large-scale dataset and progressive correlation network,
K. Wang, K. Chen, C. Li, Z. Tu, and B. Luo, “Alignment- free rgb-t salient object detection: A large-scale dataset and progressive correlation network,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 7780–7788
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.