The hidden life of tokens: Reducing hallucination of large vision-language models via visual information steering.arXiv preprint arXiv:2502.03628

The hidden life of tokens: Reducing hallucination of large vision-language models via visual information steering , author= · 2025 · arXiv 2502.03628

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0 · 3 refs

GEASS is a logit-level gating module that selectively trusts generated captions in VLMs per query by combining clean-path confidence, entropy reduction, and pathway disagreement, improving results on POPE and HallusionBench across four models.

Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models

cs.CV · 2026-04-28 · conditional · novelty 7.0

Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.

Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

TLVS mitigates hallucinations in LVLMs via token-level extraction and visual-sensitivity-adaptive steering applied only at critical decoding steps.

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

cs.CV · 2026-05-07 · unverdicted · novelty 6.0 · 4 refs

Decoder-based VLMs over-align visual embeddings to text manifold causing linguistic bias in top PCs of a universal text subspace; projecting out this subspace reduces hallucinations on POPE/CHAIR/AMBER and improves CLAIR.

Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

RUDDER creates a persistent visual anchor by extracting CARD from prefill residuals and modulating its injection via an adaptive Beta Gate, cutting CHAIR_S by 24.4% and CHAIR_i by 23.6% on average across LLaVA, Idefics2, InstructBLIP and Qwen2.5-VL with >96% throughput.

FADE: Mitigating Hallucinations by Reducing Language-Prior Dominance in Large Vision-Language Models

cs.AI · 2026-06-28 · unverdicted · novelty 5.0 · 2 refs

FADE attenuates FFN outputs at critical layers in LVLMs to curb language-prior dominance and cut hallucinations, shown effective on POPE, CHAIR, and MME across three models.

MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

MultiToP mitigates hallucinations in video multimodal models by training a Visual Token Patcher with information-guided rank calibration to selectively replace unreliable tokens, yielding 50.60% F1 gain on Vript-HAL and 18.58% accuracy gain on ActivityNet-QA.

Not Blind but Silenced: Rebalancing Vision and Language via Adversarial Counter-Commonsense Equilibrium

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

ACE uses adversarial counter-commonsense perturbations on image tokens during decoding to suppress hallucinated linguistic priors while preserving stable visual signals in MLLMs.

Hallucination of Multimodal Large Language Models: A Survey

cs.CV · 2024-04-29 · accept · novelty 5.0

The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

cs.AI · 2026-05-24 · unverdicted · novelty 4.0

A training-free region-aware attention recalibration strategy reduces object hallucinations in LVLMs on CHAIR, POPE, and MME benchmarks while preserving fluency.

From Weights to Activations: Is Steering the Next Frontier of Adaptation?

cs.CL · 2026-04-15 · unverdicted · novelty 4.0

Steering is positioned as a distinct adaptation paradigm that uses targeted activation interventions for local, reversible behavioral changes without parameter updates.

citing papers explorer

Showing 9 of 9 citing papers after filters.

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models cs.CV · 2026-05-03 · unverdicted · none · ref 5 · 3 links
GEASS is a logit-level gating module that selectively trusts generated captions in VLMs per query by combining clean-path confidence, entropy reduction, and pathway disagreement, improving results on POPE and HallusionBench across four models.
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models cs.CV · 2026-04-28 · conditional · none · ref 23
Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.
Steer Where It Matters: Token-Level Visual-Sensitivity Steering for LVLMs Hallucination Mitigation cs.CV · 2026-06-02 · unverdicted · none · ref 10
TLVS mitigates hallucinations in LVLMs via token-level extraction and visual-sensitivity-adaptive steering applied only at critical decoding steps.
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models cs.CV · 2026-05-07 · unverdicted · none · ref 10 · 4 links
Decoder-based VLMs over-align visual embeddings to text manifold causing linguistic bias in top PCs of a universal text subspace; projecting out this subspace reduces hallucinations on POPE/CHAIR/AMBER and improves CLAIR.
FADE: Mitigating Hallucinations by Reducing Language-Prior Dominance in Large Vision-Language Models cs.AI · 2026-06-28 · unverdicted · none · ref 30 · 2 links
FADE attenuates FFN outputs at critical layers in LVLMs to curb language-prior dominance and cut hallucinations, shown effective on POPE, CHAIR, and MME across three models.
MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models cs.CV · 2026-06-10 · unverdicted · none · ref 36
MultiToP mitigates hallucinations in video multimodal models by training a Visual Token Patcher with information-guided rank calibration to selectively replace unreliable tokens, yielding 50.60% F1 gain on Vript-HAL and 18.58% accuracy gain on ActivityNet-QA.
Not Blind but Silenced: Rebalancing Vision and Language via Adversarial Counter-Commonsense Equilibrium cs.CV · 2026-05-11 · unverdicted · none · ref 19
ACE uses adversarial counter-commonsense perturbations on image tokens during decoding to suppress hallucinated linguistic priors while preserving stable visual signals in MLLMs.
Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration cs.AI · 2026-05-24 · unverdicted · none · ref 19
A training-free region-aware attention recalibration strategy reduces object hallucinations in LVLMs on CHAIR, POPE, and MME benchmarks while preserving fluency.
From Weights to Activations: Is Steering the Next Frontier of Adaptation? cs.CL · 2026-04-15 · unverdicted · none · ref 11
Steering is positioned as a distinct adaptation paradigm that uses targeted activation interventions for local, reversible behavioral changes without parameter updates.

The hidden life of tokens: Reducing hallucination of large vision-language models via visual information steering.arXiv preprint arXiv:2502.03628

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer