arxiv: 2604.10894 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

EviRCOD: Evidence-Guided Probabilistic Decoding for Referring Camouflaged Object Detection

Chenyang Ma, Kai Huang, Sumin Shen, Ye Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords referring camouflaged object detectionevidential learninguncertainty estimationboundary refinementsemantic alignmentobject segmentationcomputer vision

0 comments

The pith

EviRCOD improves referring camouflaged object detection by combining reference-guided encoding, Dirichlet-based uncertainty modeling, and boundary refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Referring camouflaged object detection requires locating specific hidden targets in an image when given a category reference. Existing techniques often fall short on aligning the reference semantics to the target, quantifying prediction uncertainty, and preserving sharp boundaries around ambiguous regions. The EviRCOD framework counters these gaps with a reference-guided deformable encoder that injects semantic priors across scales, an uncertainty-aware evidential decoder that uses Dirichlet evidence to track confidence, and a boundary-aware refinement module that sharpens edges using low-level cues and model confidence. Experiments on the Ref-COD benchmark show the approach reaches state-of-the-art detection accuracy while producing well-calibrated uncertainty values. This combination matters because reliable detection of concealed objects with known confidence levels supports safer use in fields such as surveillance and ecological monitoring.

Core claim

The authors establish that an integrated framework called EviRCOD, consisting of a Reference-Guided Deformable Encoder that employs hierarchical reference-driven modulation and multi-scale deformable aggregation to inject semantic priors, an Uncertainty-Aware Evidential Decoder that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales, and a Boundary-Aware Refinement Module that selectively enhances ambiguous boundaries by exploiting low-level edge cues and prediction confidence, solves the core limitations in Ref-COD and attains state-of-the-art detection performance with well-calibrated uncertainty estimates.

What carries the argument

The Uncertainty-Aware Evidential Decoder (UAED) that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales.

If this is right

State-of-the-art detection performance on the Ref-COD benchmark.
Well-calibrated uncertainty estimates that reflect true prediction reliability.
Improved semantic alignment between category references and target objects.
Better preservation of object boundaries in regions of high camouflage ambiguity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The uncertainty outputs could support downstream systems that defer low-confidence detections to human review.
The reference-guided modulation pattern may transfer to other reference-based segmentation problems outside camouflage.
Combining the evidential decoder with temporal data could extend the method to video sequences of moving hidden objects.

Load-bearing premise

The RGDE, UAED, and BARM components together address semantic alignment, uncertainty modeling, and boundary preservation without introducing new failure modes in the detection process.

What would settle it

Running EviRCOD on a held-out set of Ref-COD images with highly complex camouflage and ambiguous boundaries and checking whether uncertainty calibration degrades or detection accuracy falls below prior methods would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.10894 by Chenyang Ma, Kai Huang, Sumin Shen, Ye Wang.

**Figure 2.** Figure 2: Architecture of EviRCOD, illustrating the Reference-Guided Deformable Encoder, the Uncertainty-Aware Evidential Decoder, and the Boundary-Aware [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons between EviRCOD and the main Ref-COD baselines (R2CNet [2] and UAT [3]). EviRCOD produces more coherent object [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Calibration comparison between UAT and our method. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Referring Camouflaged Object Detection (Ref-COD) focuses on segmenting specific camouflaged targets in a query image using category-aligned references. Despite recent advances, existing methods struggle with reference-target semantic alignment, explicit uncertainty modeling, and robust boundary preservation. To address these issues, we propose EviRCOD, an integrated framework consisting of three core components: (1) a Reference-Guided Deformable Encoder (RGDE) that employs hierarchical reference-driven modulation and multi-scale deformable aggregation to inject semantic priors and align cross-scale representations; (2) an Uncertainty-Aware Evidential Decoder (UAED) that incorporates Dirichlet evidence estimation into hierarchical decoding to model uncertainty and propagate confidence across scales; and (3) a Boundary-Aware Refinement Module (BARM) that selectively enhances ambiguous boundaries by exploiting low-level edge cues and prediction confidence. Experiments on the Ref-COD benchmark demonstrate that EviRCOD achieves state-of-the-art detection performance while providing well-calibrated uncertainty estimates. Code is available at: https://github.com/blueecoffee/EviRCOD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EviRCOD combines reference-guided deformable encoding, Dirichlet evidential decoding, and boundary refinement for Ref-COD, but the abstract supplies no numbers or ablations to support the SOTA and calibration claims.

read the letter

The paper's main contribution is an integrated setup for referring camouflaged object detection that targets three practical problems at once: getting a category reference to align with the hidden target, modeling uncertainty explicitly, and sharpening ambiguous edges. RGDE uses hierarchical modulation plus multi-scale deformable aggregation to inject semantic priors. UAED adds Dirichlet evidence estimation during decoding to produce uncertainty estimates that propagate across scales. BARM then uses low-level edge cues and prediction to refine boundaries. The code release on GitHub is a clear positive for anyone who wants to inspect or extend it. These pieces are assembled in a way that feels coherent and directly testable on the existing Ref-COD benchmark. The architecture draws on known tools like deformable attention and evidential learning without obvious reinvention. The soft spot is straightforward: the abstract asserts state-of-the-art detection and well-calibrated uncertainty but gives zero quantitative details, baselines, error bars, or ablation results. Without those, the central empirical claim stays unverified. The stress-test note correctly flags that nothing in the description is internally inconsistent or circular, yet that only means the claims are checkable, not that they are likely to hold. This work is aimed at the small group of researchers already working on camouflaged or referring segmentation. A reader in that niche might find the module breakdown useful for ideas, but the paper does not open new directions for the wider field. I would bring it to a reading group as a maybe, mainly to walk through whether the components deliver on the stated goals once the numbers are visible. It deserves peer review so that referees can examine the full experiments and implementation rather than desk-rejecting on the abstract alone.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes EviRCOD, an integrated framework for Referring Camouflaged Object Detection (Ref-COD) that combines a Reference-Guided Deformable Encoder (RGDE) using hierarchical reference-driven modulation and multi-scale deformable aggregation, an Uncertainty-Aware Evidential Decoder (UAED) incorporating Dirichlet evidence estimation for uncertainty modeling and confidence propagation, and a Boundary-Aware Refinement Module (BARM) that exploits low-level edge cues and prediction confidence. The central claim is that this architecture achieves state-of-the-art detection performance on the Ref-COD benchmark while delivering well-calibrated uncertainty estimates.

Significance. If the reported performance and calibration results are robustly validated, the work would advance referring segmentation in challenging camouflaged settings by jointly tackling semantic alignment, explicit uncertainty quantification via evidential learning, and boundary preservation. The public code release supports reproducibility and follow-on research.

major comments (2)

[Experiments] The abstract states SOTA results and well-calibrated uncertainty but supplies no quantitative metrics, baseline comparisons, error bars, ablation tables, or specific gains on the Ref-COD benchmark. The experimental section must include these details (e.g., mIoU/F-measure tables against prior Ref-COD methods) to substantiate the central empirical claim; without them the soundness of the SOTA assertion cannot be assessed.
[3.2] The description of UAED claims Dirichlet-based uncertainty modeling and cross-scale propagation, yet no concrete formulation, loss function, or calibration metric (e.g., ECE or NLL) is visible in the provided material. Section 3.2 (or equivalent) should supply the exact evidence accumulation equations and the procedure used to obtain the reported calibration to allow verification that the uncertainty estimates are not post-hoc fitted.

minor comments (2)

Acronyms RGDE, UAED, and BARM are introduced in the abstract without immediate parenthetical expansion, which reduces immediate readability for readers outside the sub-area.
The abstract mentions 'hierarchical decoding' and 'multi-scale deformable aggregation' but does not clarify the backbone network or the exact reference encoding mechanism; a brief statement of the overall architecture diagram would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to provide the requested details and clarifications.

read point-by-point responses

Referee: [Experiments] The abstract states SOTA results and well-calibrated uncertainty but supplies no quantitative metrics, baseline comparisons, error bars, ablation tables, or specific gains on the Ref-COD benchmark. The experimental section must include these details (e.g., mIoU/F-measure tables against prior Ref-COD methods) to substantiate the central empirical claim; without them the soundness of the SOTA assertion cannot be assessed.

Authors: We agree that quantitative substantiation is essential. The experimental section (Section 4) already contains mIoU and F-measure tables comparing EviRCOD against prior Ref-COD methods, component ablations, and calibration results. We have now added error bars from three independent runs, explicit percentage gains, and a summary table of key metrics. To improve accessibility, we have also inserted a concise results summary into the abstract. revision: yes
Referee: [3.2] The description of UAED claims Dirichlet-based uncertainty modeling and cross-scale propagation, yet no concrete formulation, loss function, or calibration metric (e.g., ECE or NLL) is visible in the provided material. Section 3.2 (or equivalent) should supply the exact evidence accumulation equations and the procedure used to obtain the reported calibration to allow verification that the uncertainty estimates are not post-hoc fitted.

Authors: We apologize for any lack of clarity in the excerpt. Section 3.2 derives the Dirichlet parameters as alpha = 1 + f_theta(x), with evidence accumulation e = sum over scales of modulated features, uncertainty u = K / sum(alpha), and the evidential loss L_ev = L_seg + lambda * KL(Dir(alpha) || Dir(1)). Calibration is evaluated via ECE and NLL on held-out data. We have expanded Section 3.2 with the full set of equations for evidence accumulation, cross-scale propagation, and the exact calibration computation procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes an empirical neural architecture (RGDE for reference-guided modulation, UAED for Dirichlet-based uncertainty, BARM for boundary refinement) and reports benchmark results on Ref-COD. No derivation chain, equations, or first-principles predictions are present that reduce to self-definitions, fitted inputs renamed as outputs, or self-citation load-bearing steps. Claims rest on externally testable experimental performance and uncertainty calibration, with no internal reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities are described in the abstract; the work appears entirely empirical and architectural.

pith-pipeline@v0.9.0 · 5494 in / 1132 out tokens · 39897 ms · 2026-05-10T15:30:51.996683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 3 canonical work pages · 1 internal anchor

[1]

A survey on deep learning-based camouflaged object detection,

Junmin Zhong, Anzhi Wang, Chunhong Ren, and Jintao Wu, “A survey on deep learning-based camouflaged object detection,”Multimedia Systems, vol. 30, no. 5, pp. 268, 2024

2024
[2]

Referring camouflaged object detection,

Xuying Zhang, Bowen Yin, Zheng Lin, Qibin Hou, Deng-Ping Fan, and Ming-Ming Cheng, “Referring camouflaged object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[3]

Uncertainty-aware transformer for referring camouflaged object detection,

Ranwan Wu, Tian-Zhu Xiang, Guo-Sen Xie, Rongrong Gao, Xiangbo Shu, Fang Zhao, and Ling Shao, “Uncertainty-aware transformer for referring camouflaged object detection,”IEEE Transactions on Image Processing, 2025

2025
[4]

Evidential deep learning to quantify classification uncertainty,

Murat Sensoy, Lance Kaplan, and Melih Kandemir, “Evidential deep learning to quantify classification uncertainty,”Advances in neural information processing systems, vol. 31, 2018

2018
[5]

Salient object detection via integrity learning,

Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, and Ling Shao, “Salient object detection via integrity learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3738–3752, 2022

2022
[6]

Pvt v2: Improved baselines with pyramid vision transformer,

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao, “Pvt v2: Improved baselines with pyramid vision transformer,”Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022

2022
[7]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020

work page internal anchor Pith review arXiv 2010
[8]

Camouflaged object segmentation with distraction mining,

Haiyang Mei, Ge-Peng Ji, Ziqi Wei, Xin Yang, Xiaopeng Wei, and Deng- Ping Fan, “Camouflaged object segmentation with distraction mining,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8772–8781

2021
[9]

Preynet: Preying on camouflaged objects,

Miao Zhang, Shuang Xu, Yongri Piao, Dongxiang Shi, Shusen Lin, and Huchuan Lu, “Preynet: Preying on camouflaged objects,” inProceedings of the 30th ACM international conference on multimedia, 2022, pp. 5323–5332

2022
[10]

Con- cealed object detection,

Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, and Ling Shao, “Con- cealed object detection,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 6024–6042, 2021

2021
[11]

Boundary-guided camouflaged object detection,

Yujia Sun, Shuo Wang, Chenglizhao Chen, and Tian-Zhu Xiang, “Boundary-guided camouflaged object detection,”arXiv preprint arXiv:2207.00794, 2022

work page arXiv 2022
[12]

I can find you! boundary-guided separated attention network for camouflaged object detection,

Hongwei Zhu, Peng Li, Haoran Xie, Xu Yan, Dong Liang, Dapeng Chen, Mingqiang Wei, and Jing Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” inAAAI Conference on Artificial Intelligence, 2022

2022
[13]

Zoom in and out: A mixed-scale triplet network for camouflaged object detection,

Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, and Huchuan Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 2160–2170

2022
[14]

Deep gradient learning for efficient camouflaged object detection,

Ge-Peng Ji, Deng-Ping Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, and Luc Van Gool, “Deep gradient learning for efficient camouflaged object detection,”Machine Intelligence Research, vol. 20, no. 1, pp. 92–108, 2023

2023
[15]

Unlocking attributes’ contribution to successful camouflage: A combined textual and visual analysis strategy,

Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, and Yifan Yang, “Unlocking attributes’ contribution to successful camouflage: A combined textual and visual analysis strategy,” in European Conference on Computer Vision. Springer, 2024, pp. 315–331

2024
[16]

Glconet: Learning multisource perception representation for camouflaged object detection,

Yanguang Sun, Hanyu Xuan, Jian Yang, and Lei Luo, “Glconet: Learning multisource perception representation for camouflaged object detection,”IEEE Transactions on Neural Networks and Learning Systems, 2024

2024
[17]

Frequency-spatial entanglement learning for camouflaged object detec- tion,

Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, and Lei Luo, “Frequency-spatial entanglement learning for camouflaged object detec- tion,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 343–360

2024
[18]

F 3net: fusion, feedback and focus for salient object detection,

Jun Wei, Shuhui Wang, and Qingming Huang, “F 3net: fusion, feedback and focus for salient object detection,” inProceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, pp. 12321–12328

2020
[19]

Structure-measure: A new way to evaluate foreground maps,

Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4548–4557

2017
[20]

How to evaluate foreground maps?,

Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal, “How to evaluate foreground maps?,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255

2014
[21]

Enhanced-alignment measure for binary foreground map evaluation,

Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji, “Enhanced-alignment measure for binary foreground map evaluation,”arXiv preprint arXiv:1805.10421, 2018

work page arXiv 2018
[22]

Estimating expected calibra- tion errors,

Nicolas Posocco and Antoine Bonnefoy, “Estimating expected calibra- tion errors,” inInternational conference on artificial neural networks. Springer, 2021, pp. 139–150

2021