YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
ADAPT reduces MLLM hallucinations 40-60% by aligning cross-attention dynamics via visual anchors, supervised inference, and preference tuning while preserving general capabilities.
MLLMs show late-layer textual override of correct visual predictions, with a directional signature enabling a simple inference-time recovery method that improves conflict benchmarks by up to 9.4%.
Proposes GranFact benchmark with coarse-to-fine annotations and a DPO variant that penalizes unreliable fine-grained claims to improve reliable specificity in MLLM outputs.
Fox detects risky attention heads in LVLMs using visual attention entropy and severs hallucination shortcuts via numerical logit saturation and conflict-gated decoding, outperforming prior methods by 29.1%.
citing papers explorer
-
YARD: Y-Architecture Register Decoding for Efficient Hallucination Mitigation in Large Vision-Language Models
YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.
-
ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs
ADAPT reduces MLLM hallucinations 40-60% by aligning cross-attention dynamics via visual anchors, supervised inference, and preference tuning while preserving general capabilities.
-
MLLMs Get It Right, Then Get It Wrong: Tracing and Correcting Late-Layer Textual Bias
MLLMs show late-layer textual override of correct visual predictions, with a directional signature enabling a simple inference-time recovery method that improves conflict benchmarks by up to 9.4%.
-
Reliability-Prioritized Fine-Grained Generation in Multimodal Large
Proposes GranFact benchmark with coarse-to-fine annotations and a DPO variant that penalizes unreliable fine-grained claims to improve reliable specificity in MLLM outputs.
-
Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding
Fox detects risky attention heads in LVLMs using visual attention entropy and severs hallucination shortcuts via numerical logit saturation and conflict-gated decoding, outperforming prior methods by 29.1%.