Cai: Caption-sensitive attention in- tervention for mitigating object hallucination in large vision- language models

Li, Q · 2025 · arXiv 2506.23590

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models

cs.CV · 2026-04-28 · conditional · novelty 7.0

Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

GEASS adaptively gates and weights self-generated captions in VLMs using confidence, entropy reduction, and pathway disagreement to reduce hallucination and improve benchmark scores.

citing papers explorer

Showing 3 of 3 citing papers.

Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models cs.CV · 2026-04-28 · conditional · none · ref 20
Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.
CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering cs.CV · 2026-05-06 · unverdicted · none · ref 88
CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.
GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models cs.CV · 2026-05-03 · unverdicted · none · ref 5
GEASS adaptively gates and weights self-generated captions in VLMs using confidence, entropy reduction, and pathway disagreement to reduce hallucination and improve benchmark scores.

Cai: Caption-sensitive attention in- tervention for mitigating object hallucination in large vision- language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer