Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

Caifeng Shan; Haonan Qin; Jiong Zhang; Xiao Song; Yuqi Fang; Zhaoxu Zhang

arxiv: 2606.28520 · v1 · pith:KIC65EOKnew · submitted 2026-06-26 · 💻 cs.CV · cs.CL

Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

Xiao Song , Haonan Qin , Zhaoxu Zhang , Jiong Zhang , Yuqi Fang , Caifeng Shan This is my paper

Pith reviewed 2026-06-30 01:33 UTC · model grok-4.3

classification 💻 cs.CV cs.CL

keywords hallucination detectionvision-language modelsclinical imagingvisual groundingcounterfactual perturbationuncertainty estimationmedical AI

0 comments

The pith

A framework audits arbitrary responses from clinical vision-language models by grounding extracted entities and scoring uncertainty through counterfactual image perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to detect hallucinations—textual claims not supported by the image—in large vision-language models used for medical image interpretation. It extracts verifiable entities from any model response, localizes them on the input image with a domain-adapted verifier, then perturbs those entities to create counterfactual versions and measures how much the localization confidence shifts. The resulting uncertainty score combines factual confidence, counterfactual confidence, and spatial overlap to decide whether an entity is hallucinated. This matters because current LVLMs are deployed in clinical settings yet can generate unsupported findings, and the approach works without any internal model access or fine-tuning. Experiments across imaging modalities and model backbones show gains over prior detection baselines together with localization maps and transfer across models.

Core claim

The central claim is that entity-level hallucination decisions can be made by computing a visual evidence uncertainty score that contrasts factual grounding results against those obtained after counterfactual entity perturbation; the score is formed from positive confidence, counterfactual confidence, and their grounding overlap, and this procedure yields improved detection performance, interpretable localization, and cross-model transfer without requiring changes to the target LVLM.

What carries the argument

Counterfactual visual grounding uncertainty: the mechanism that extracts entities, localizes them factually and after perturbation, then derives an uncertainty score for binary hallucination classification.

If this is right

The method improves hallucination detection performance over recent baselines on multiple medical imaging modalities and LVLM backbones.
It supplies interpretable localization evidence for each detected hallucination.
It exhibits strong cross-model transferability without retraining the target LVLM.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grounding-plus-counterfactual pattern could be tested on non-clinical image domains where entity localization verifiers already exist.
If the uncertainty scores prove stable under different verifiers, the framework might support auditing pipelines that swap in new grounding models as they improve.

Load-bearing premise

The domain-adapted grounding verifier accurately localizes entities taken from arbitrary LVLM responses on clinical images.

What would settle it

A held-out test set of clinical images with known hallucinated versus supported entities where the uncertainty scores fail to separate the two groups at rates better than chance.

Figures

Figures reproduced from arXiv: 2606.28520 by Caifeng Shan, Haonan Qin, Jiong Zhang, Xiao Song, Yuqi Fang, Zhaoxu Zhang.

**Figure 1.** Figure 1: Comparison of hallucination detection paradigms. (a) Hidden state-based methods: access LVLM’s internal hidden states. (b) External verifier-based methods: rely on external expert models. (c) Ours: identifies hallucinations by aligning responses to visual regions, improving interpretability via visual evidence. and decision support. Despite rapid progress, LVLMs remain prone to hallucinations [7,9,2,17,1… view at source ↗

**Figure 2.** Figure 2: Pipeline of the proposed Counterfactual-driven Visual Grounding Uncertainty Estimation method. ① Given a response R from an arbitrary LVLM, we extract entities E and construct counterfactual entities E˜ using radiological knowledge. ② Constructing the factual and counterfactual queries. ③ A trained grounding verifier predicts bounding boxes (b + e , b− e ) and confidence scores (s + e , s− e ). ④ Uncertain… view at source ↗

**Figure 3.** Figure 3: Ablation studies on hallucination detection. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of hallucination detection. (a) Non-hallucination case where factual branch detects grounded visual evidence and makes correct decision. (b) Hallucination case missed by factual branch but successfully detected by our algorithm. This enables robust uncertainty estimation, effectively distinguishing hallucinations by suppressing spurious visual alignments from a single factual branch. Exper… view at source ↗

read the original abstract

Large vision-language models (LVLMs) are increasingly used for clinical image understanding, yet they remain vulnerable to \emph{hallucinations}--producing textual findings or attributes not supported by the image. We present a vision-traceable hallucination detection framework that audits arbitrary LVLM responses via visual evidence grounding, requiring neither modification nor internal access to the hidden states of LVLMs. Given an LVLM response, we extract visually verifiable entities and use a medical-domain-adapted Qwen-VL grounding verifier to localize each entity on the input image. To enhance the robustness of our detection method, we introduce a counterfactual entity perturbation method and estimate visual evidence uncertainty by contrasting factual and counterfactual grounding results. Specifically, we compute an entity-level uncertainty score from the positive confidence, counterfactual confidence, and their grounding overlap for binary hallucination decision-making. Experiments on multiple medical imaging modalities and LVLM backbones demonstrate that our method consistently improves hallucination detection performance over recent baselines, while providing interpretable localization evidence and strong cross-model transferability. Code and dataset are available at https://github.com/Agentic-CliniAI/CounterVHD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a no-access hallucination detector for clinical LVLMs via entity grounding plus counterfactual perturbation, but the verifier step lacks independent checks.

read the letter

The core idea here is a post-hoc detector that pulls entities from any LVLM output, runs them through a medical Qwen-VL grounding verifier, and contrasts factual versus perturbed localizations to produce an uncertainty score for binary hallucination calls.

What is new is the explicit use of counterfactual entity perturbation to turn grounding overlap and into that uncertainty metric. The no-internal-access requirement and the claim of cross-model transfer are practical for clinical settings where you cannot touch the deployed model.

The abstract states that the method improves over recent baselines on multiple imaging modalities and backbones while also supplying localization evidence. Code and dataset release is noted, which helps.

The main soft spot is the missing validation of the grounding verifier itself. The method depends on that Qwen-VL component accurately localizing whatever entities the target LVLMs happen to emit, yet the abstract supplies no IoU, pointing accuracy, or other benchmark on clinical phrases. If the verifier errs on composite or rare terms, both the uncertainty scores and the transferability results rest on an untested piece. The abstract also gives no numbers, dataset sizes, or statistical details, so the performance claims cannot be weighed yet.

This is for groups working on safe deployment of medical vision-language models who need auditing tools that do not require model internals. A reader focused on practical hallucination mitigation would find the framework worth examining once the numbers and verifier checks are in place.

I would send it to peer review. The approach is concrete and the limitation is fixable with added experiments.

Referee Report

1 major / 1 minor

Summary. The paper presents a black-box hallucination detection framework for LVLMs on clinical images. Given an LVLM response, entities are extracted and localized on the input image by a medical-domain-adapted Qwen-VL grounding verifier. A counterfactual entity perturbation is applied to produce factual and counterfactual grounding maps; an entity-level uncertainty score is then computed from positive confidence, counterfactual confidence, and grounding overlap to yield binary hallucination decisions. The authors claim consistent gains over recent baselines across multiple medical imaging modalities and LVLM backbones, plus interpretable localization evidence and strong cross-model transferability. Code and dataset are released.

Significance. If the central claims hold, the work supplies a practical, model-agnostic auditing tool that does not require hidden-state access—an important capability for safe clinical deployment of LVLMs. The counterfactual contrast and grounding-based uncertainty metric constitute a distinct technical contribution relative to prior logit- or embedding-based detectors. Releasing code and data supports reproducibility.

major comments (1)

[Abstract / Method] Abstract and method description: the binary hallucination decisions and the claimed cross-model transferability rest on the accuracy of the medical-adapted Qwen-VL grounding verifier when applied to entities extracted from arbitrary target LVLM responses. No independent localization benchmark (IoU, pointing accuracy, or similar) is reported for this verifier on the precise distribution of clinical entities that appear in the evaluated responses.

minor comments (1)

[Abstract] The abstract supplies no quantitative metrics, dataset sizes, statistical tests, or ablation results, which prevents readers from assessing the magnitude or reliability of the reported improvements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The single major comment raises a valid point about the grounding verifier's accuracy, which we address below with a commitment to strengthen the paper.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: the binary hallucination decisions and the claimed cross-model transferability rest on the accuracy of the medical-adapted Qwen-VL grounding verifier when applied to entities extracted from arbitrary target LVLM responses. No independent localization benchmark (IoU, pointing accuracy, or similar) is reported for this verifier on the precise distribution of clinical entities that appear in the evaluated responses.

Authors: We agree that the absence of a direct, independent localization benchmark for the medical-adapted Qwen-VL verifier on entities drawn from the target LVLM responses is a limitation. While the verifier was domain-adapted and the end-to-end hallucination detection gains (plus cross-model transfer) provide indirect support for its utility, a standalone evaluation (e.g., IoU or pointing accuracy on annotated clinical entities) would more rigorously substantiate the claims. In the revised manuscript we will add such a benchmark: we will manually annotate a representative subset of entities extracted from the evaluated responses across modalities and report localization metrics for the verifier. This addition will also clarify the basis for the reported cross-model transferability. revision: yes

Circularity Check

0 steps flagged

No circularity; method uses external verifier and contrastive perturbation

full rationale

The derivation extracts entities, applies an external medical-adapted Qwen-VL grounding verifier, generates counterfactual perturbations, and computes an uncertainty score from positive/counterfactual confidence plus overlap. None of these steps reduce by definition or self-citation to the hallucination labels being evaluated; the chain remains independent of the target data and does not invoke load-bearing self-citations or fitted-input predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim depends on the accuracy of an external medical grounding model and the validity of counterfactual perturbation for uncertainty; no explicit free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption A medical-domain-adapted Qwen-VL model can reliably localize extracted entities in clinical images.
The entire detection pipeline rests on this verifier's performance as described in the method section of the abstract.

pith-pipeline@v0.9.1-grok · 5745 in / 1286 out tokens · 35428 ms · 2026-06-30T01:33:28.793812+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 6 canonical work pages · 5 internal anchors

[1]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

Chen, J., Yang, D., Wu, T., Jiang, Y., Hou, X., Li, M., Wang, S., Xiao, D., Li, K., Zhang, L.: Detecting and evaluating medical hallucinations in large vision language models. arXiv preprint arXiv:2406.10185 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Chen, X., Wang, C., Xue, Y., Zhang, N., Yang, X., Li, Q., Shen, Y., Liang, L., Gu, J., Chen, H.: Unified hallucination detection for multimodal large language models. In: ACL. pp. 3235–3252 (2024)

2024
[4]

In: CVPR

Cheng, J., Fu, B., Ye, J., Wang, G., Li, T., Wang, H., Li, R., Yao, H., Cheng, J., Li, J., et al.: Interactive medical image segmentation: A benchmark dataset and baseline. In: CVPR. pp. 20841–20851 (2025) 10 X. Song et al

2025
[5]

Hardy, R., Kim, S.E., Rajpurkar, P., et al.: Rextrust: A model for fine-grained hallucinationdetectioninai-generatedradiologyreports.In:AAAIBridgeProgram on AI for Medicine and Healthcare. pp. 173–182 (2025)

2025
[6]

In: Findings of EMNLP

Jing, L., Li, R., Chen, Y., Du, X.: Faithscore: Fine-grained evaluations of hallu- cinations in large vision-language models. In: Findings of EMNLP. pp. 5042–5063 (2024)

2024
[7]

In: MICCAI

Khanal, B., Pokhrel, S., Bhandari, S., Rana, R., Shrestha, N., Gurung, R.B., Linte, C., Watson, A., Shrestha, Y.R., Bhattarai, B.: Hallucination-aware multimodal benchmark for gastrointestinal image analysis with large vision-language models. In: MICCAI. pp. 235–245. Springer (2025)

2025
[8]

In: Findings of EMNLP

Li, Q., Geng, J., Lyu, C., Zhu, D., Panov, M., Karray, F.: Reference-free hallu- cination detection for large vision-language models. In: Findings of EMNLP. pp. 4542–4551 (2024)

2024
[9]

In: MIC- CAI

Liao, Z., Hu, S., Zou, K., Fu, H., Zhen, L., Xia, Y.: Vision-amplified semantic entropy for hallucination detection in medical visual question answering. In: MIC- CAI. pp. 669–679. Springer (2025)

2025
[10]

arXiv preprint arXiv:2503.20504 (2025)

Liao, Z., Hu, S., Zou, K., Jin, M., Zhang, Y., Fu, H., Zhen, L., Xia, Y.: Univrse: Unified vision-conditioned response semantic entropy for hallucination detection in medical vision-language models. arXiv preprint arXiv:2503.20504 (2025)

work page arXiv 2025
[11]

OpenAI: Introducing GPT-5 (Aug 2025),https://openai.com/zh-Hans-CN/ index/introducing-gpt-5

2025
[12]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

In: ACM BCB

Song, X., Liu, J., Liu, Y., Li, Y., Lei, W., Wang, R.: Rethinking radiology report generation via causal inspired counterfactual augmentation. In: ACM BCB. pp. 1–10 (2024)

2024
[14]

Gemini: A Family of Highly Capable Multimodal Models

Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Wang, J., Xu, H., Yan, M., Zhang, J., et al.: Amber: An llm-free multi-dimensional benchmark for mllms hallucination evaluation. arXiv preprint arXiv:2311.07397 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

xAI: Grok 4 (Jul 2025),https://x.ai/news/grok-4

2025
[17]

In: AAAI

Xiao, W., Huang, Z., Gan, L., He, W., Li, H., Yu, Z., Shu, F., Jiang, H., Zhu, L.: Detecting and mitigating hallucination in large vision language models via fine- grained ai feedback. In: AAAI. vol. 39, pp. 25543–25551 (2025)

2025
[18]

IEEE Trans

Zou, K., Bai, Y., Liu, B., Chen, Y., Chen, Z., Zhou, Y., Yuan, X., Wang, M., Shen, X., Cao, X., et al.: Uncertainty-aware medical diagnostic phrase identification and grounding. IEEE Trans. Pattern Anal. Mach. Intell. (2025)

2025

[1] [1]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

Chen, J., Yang, D., Wu, T., Jiang, Y., Hou, X., Li, M., Wang, S., Xiao, D., Li, K., Zhang, L.: Detecting and evaluating medical hallucinations in large vision language models. arXiv preprint arXiv:2406.10185 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Chen, X., Wang, C., Xue, Y., Zhang, N., Yang, X., Li, Q., Shen, Y., Liang, L., Gu, J., Chen, H.: Unified hallucination detection for multimodal large language models. In: ACL. pp. 3235–3252 (2024)

2024

[4] [4]

In: CVPR

Cheng, J., Fu, B., Ye, J., Wang, G., Li, T., Wang, H., Li, R., Yao, H., Cheng, J., Li, J., et al.: Interactive medical image segmentation: A benchmark dataset and baseline. In: CVPR. pp. 20841–20851 (2025) 10 X. Song et al

2025

[5] [5]

Hardy, R., Kim, S.E., Rajpurkar, P., et al.: Rextrust: A model for fine-grained hallucinationdetectioninai-generatedradiologyreports.In:AAAIBridgeProgram on AI for Medicine and Healthcare. pp. 173–182 (2025)

2025

[6] [6]

In: Findings of EMNLP

Jing, L., Li, R., Chen, Y., Du, X.: Faithscore: Fine-grained evaluations of hallu- cinations in large vision-language models. In: Findings of EMNLP. pp. 5042–5063 (2024)

2024

[7] [7]

In: MICCAI

Khanal, B., Pokhrel, S., Bhandari, S., Rana, R., Shrestha, N., Gurung, R.B., Linte, C., Watson, A., Shrestha, Y.R., Bhattarai, B.: Hallucination-aware multimodal benchmark for gastrointestinal image analysis with large vision-language models. In: MICCAI. pp. 235–245. Springer (2025)

2025

[8] [8]

In: Findings of EMNLP

Li, Q., Geng, J., Lyu, C., Zhu, D., Panov, M., Karray, F.: Reference-free hallu- cination detection for large vision-language models. In: Findings of EMNLP. pp. 4542–4551 (2024)

2024

[9] [9]

In: MIC- CAI

Liao, Z., Hu, S., Zou, K., Fu, H., Zhen, L., Xia, Y.: Vision-amplified semantic entropy for hallucination detection in medical visual question answering. In: MIC- CAI. pp. 669–679. Springer (2025)

2025

[10] [10]

arXiv preprint arXiv:2503.20504 (2025)

Liao, Z., Hu, S., Zou, K., Jin, M., Zhang, Y., Fu, H., Zhen, L., Xia, Y.: Univrse: Unified vision-conditioned response semantic entropy for hallucination detection in medical vision-language models. arXiv preprint arXiv:2503.20504 (2025)

work page arXiv 2025

[11] [11]

OpenAI: Introducing GPT-5 (Aug 2025),https://openai.com/zh-Hans-CN/ index/introducing-gpt-5

2025

[12] [12]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

In: ACM BCB

Song, X., Liu, J., Liu, Y., Li, Y., Lei, W., Wang, R.: Rethinking radiology report generation via causal inspired counterfactual augmentation. In: ACM BCB. pp. 1–10 (2024)

2024

[14] [14]

Gemini: A Family of Highly Capable Multimodal Models

Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Wang, J., Xu, H., Yan, M., Zhang, J., et al.: Amber: An llm-free multi-dimensional benchmark for mllms hallucination evaluation. arXiv preprint arXiv:2311.07397 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

xAI: Grok 4 (Jul 2025),https://x.ai/news/grok-4

2025

[17] [17]

In: AAAI

Xiao, W., Huang, Z., Gan, L., He, W., Li, H., Yu, Z., Shu, F., Jiang, H., Zhu, L.: Detecting and mitigating hallucination in large vision language models via fine- grained ai feedback. In: AAAI. vol. 39, pp. 25543–25551 (2025)

2025

[18] [18]

IEEE Trans

Zou, K., Bai, Y., Liu, B., Chen, Y., Chen, Z., Zhou, Y., Yuan, X., Wang, M., Shen, X., Cao, X., et al.: Uncertainty-aware medical diagnostic phrase identification and grounding. IEEE Trans. Pattern Anal. Mach. Intell. (2025)

2025