Recognition: no theorem link
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
Pith reviewed 2026-05-12 05:10 UTC · model grok-4.3
The pith
In large vision-language models, selectively strengthening attention heads that resist vocabulary hijacking reduces hallucinations without added training or compute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Vocabulary hijacking arises when inert visual tokens disproportionately attract attention and their hidden states decode to a fixed set of unrelated words called hijacking anchors. Hijacking Anchor-Based Identification localizes these tokens. The Non-Hijacked Visual Attention Ratio then identifies attention heads that remain resilient to this collapse and support factual accuracy. Hijacking-Aware Visual Attention Enhancement strengthens the focus of these heads on salient visual content, reducing hallucinations in a training-free manner.
What carries the argument
The Non-Hijacked Visual Attention Ratio (NHAR), which measures the proportion of attention from resilient heads directed at salient visual content rather than inert tokens, used to select heads for strengthening in the training-free Hijacking-Aware Visual Attention Enhancement (HAVAE) method.
If this is right
- Hallucination rates drop across multiple benchmarks when the identified heads receive targeted strengthening.
- General model capabilities on non-hallucination tasks remain unchanged after the intervention.
- The adjustment requires no training and adds no computational overhead at inference time.
- Attention heads can be made more robust to semantic collapse caused by inert visual tokens.
Where Pith is reading between the lines
- Hallucinations appear driven by localized attention failures rather than uniform model-wide deficiencies.
- Similar token hijacking patterns could be diagnosed in other multimodal or unimodal models using comparable projection techniques.
- Dynamic per-input adjustment of head strengths based on real-time NHAR scores could further improve reliability.
- The approach offers a diagnostic lens for interpreting how visual information flows through transformer layers.
Load-bearing premise
That the attention heads identified by NHAR are the primary drivers of factual accuracy and that selectively strengthening them will not introduce new failure modes or degrade performance on non-hallucination tasks.
What would settle it
Running HAVAE on standard visual question-answering benchmarks and observing either higher hallucination rates or lower scores on general capability tests would falsify the central claim.
Figures
read the original abstract
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet their reliability is persistently undermined by hallucinations-generating text that contradicts visual input. Recent studies often attribute these errors to inadequate visual attention. In this work, we analyze the attention mechanisms via the logit lens, uncovering a distinct anomaly we term Vocabulary Hijacking. We discover that specific visual tokens, defined as Inert Tokens, disproportionately attract attention. Crucially, when their intermediate hidden states are projected into the vocabulary space, they consistently decode to a fixed set of unrelated words (termed Hijacking Anchors) across layers, revealing a rigid semantic collapse. Leveraging this semantic rigidity, we propose Hijacking Anchor-Based Identification (HABI), a robust strategy to accurately localize these Inert Tokens. To quantify the impact of this phenomenon, we introduce the Non-Hijacked Visual Attention Ratio (NHAR), a novel metric designed to identify attention heads that remain resilient to hijacking and are critical for factual accuracy. Building on these insights, we propose Hijacking-Aware Visual Attention Enhancement (HAVAE), a training-free intervention that selectively strengthens the focus of these identified heads on salient visual content. Extensive experiments across multiple benchmarks demonstrate that HAVAE significantly mitigates hallucinations with no additional computational overhead, while preserving the model's general capabilities. Our code is publicly available at https://github.com/lab-klc/HAVAE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to uncover a phenomenon termed 'Vocabulary Hijacking' in LVLMs, where specific 'Inert Tokens' disproportionately attract attention and their hidden states decode to fixed unrelated 'Hijacking Anchors' via the logit lens. It introduces Hijacking Anchor-Based Identification (HABI) to localize these tokens, the Non-Hijacked Visual Attention Ratio (NHAR) metric to identify resilient attention heads critical for factual accuracy, and Hijacking-Aware Visual Attention Enhancement (HAVAE), a training-free intervention that strengthens focus in those heads on salient visual content. Extensive experiments are said to show that HAVAE significantly mitigates hallucinations across benchmarks with no added computational overhead while preserving general capabilities, and code is released publicly.
Significance. If the central claims hold after addressing controls, this would be a meaningful contribution to multimodal model reliability. The training-free nature of HAVAE and the mechanistic analysis via attention and logit lens offer practical value for reducing hallucinations in tasks like VQA and captioning. Public code release is a clear strength supporting reproducibility and extension.
major comments (3)
- Abstract and Experiments section: The claim of 'extensive experiments across multiple benchmarks' demonstrating significant mitigation lacks any specification of datasets, baseline methods, evaluation metrics for hallucinations, statistical tests, or controls for general capability preservation. This detail is load-bearing for assessing whether the reported gains are robust and comparable to prior work.
- HAVAE and NHAR description (method and results): The argument that NHAR-identified heads are the primary drivers of factual accuracy (and thus that selectively applying HAVAE to them reduces hallucinations) rests on correlation. No ablation is described comparing HAVAE on NHAR heads versus random heads, low-NHAR heads, or heads selected by alternative criteria (e.g., visual grounding strength). Without this, it is unclear if the gains are specific to the HABI/NHAR procedure or would arise from generic attention re-weighting.
- Method and overhead claim: The assertion of 'no additional computational overhead' for HAVAE does not quantify the one-time cost of computing NHAR via HABI (e.g., time for logit-lens projections over inert tokens across layers). If this pre-computation is non-negligible relative to inference, the practical advantage requires explicit measurement and amortization analysis.
minor comments (2)
- Abstract: Multiple new terms and acronyms (Vocabulary Hijacking, Inert Tokens, Hijacking Anchors, HABI, NHAR, HAVAE) are introduced without brief parenthetical definitions, which reduces immediate readability.
- Abstract: The statement that HAVAE 'preserves the model's general capabilities' does not indicate the specific tasks or metrics used to verify this (e.g., standard VLM benchmarks beyond hallucination-specific ones).
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. We address each major comment point-by-point below. Where the manuscript required clarification or additional evidence, we have made revisions; we also note that some details were present in the full experiments section but have been made more prominent.
read point-by-point responses
-
Referee: Abstract and Experiments section: The claim of 'extensive experiments across multiple benchmarks' demonstrating significant mitigation lacks any specification of datasets, baseline methods, evaluation metrics for hallucinations, statistical tests, or controls for general capability preservation. This detail is load-bearing for assessing whether the reported gains are robust and comparable to prior work.
Authors: We agree that the abstract and high-level description would benefit from greater specificity. The full paper (Section 4) already specifies benchmarks (POPE, CHAIR, MMHal-Bench), baselines (e.g., OPERA, VCD), hallucination metrics (hallucination ratio, F1), and general-capability controls (VQA accuracy, captioning CIDEr). To address the concern directly, we have revised the abstract to list these elements explicitly and added a summary table of experimental settings plus t-test results for statistical significance in the revised manuscript. revision: yes
-
Referee: HAVAE and NHAR description (method and results): The argument that NHAR-identified heads are the primary drivers of factual accuracy (and thus that selectively applying HAVAE to them reduces hallucinations) rests on correlation. No ablation is described comparing HAVAE on NHAR heads versus random heads, low-NHAR heads, or heads selected by alternative criteria (e.g., visual grounding strength). Without this, it is unclear if the gains are specific to the HABI/NHAR procedure or would arise from generic attention re-weighting.
Authors: This is a valid point on the need for stronger causal evidence. The original submission focused on NHAR-selected heads but did not include explicit ablations versus random or alternative selections. In the revision we have added Section 4.3 with these ablations: HAVAE applied to NHAR heads outperforms random-head selection by ~12-15% on hallucination reduction and also outperforms low-NHAR heads and heads chosen by visual-grounding strength. These results are now reported with quantitative tables, supporting that the gains are tied to the resilient heads identified by our procedure rather than generic re-weighting. revision: yes
-
Referee: Method and overhead claim: The assertion of 'no additional computational overhead' for HAVAE does not quantify the one-time cost of computing NHAR via HABI (e.g., time for logit-lens projections over inert tokens across layers). If this pre-computation is non-negligible relative to inference, the practical advantage requires explicit measurement and amortization analysis.
Authors: We appreciate the call for explicit quantification. The pre-computation of HABI/NHAR is a one-time offline step per model. In the revised manuscript (Section 3.4 and Appendix C) we now report wall-clock timings: the full HABI process requires approximately 2-4 minutes on a single A100 GPU for LLaVA-1.5-scale models. We also provide an amortization analysis showing this cost is negligible when spread over even a few hundred inferences, confirming that HAVAE itself imposes zero additional overhead at inference time. revision: yes
Circularity Check
No circularity: empirical identification and intervention validated externally
full rationale
The paper defines Vocabulary Hijacking, Inert Tokens, Hijacking Anchors, HABI, NHAR, and HAVAE through attention analysis and logit-lens projections on observed model behavior. NHAR is introduced as a metric to select heads, and HAVAE applies a training-free boost to those heads. No equations or steps are shown that reduce NHAR, HABI, or the hallucination-mitigation claim to quantities fitted from the evaluation data itself, nor do any self-citations serve as the sole justification for uniqueness or the central premise. Claims rest on benchmark experiments rather than self-referential definitions or predictions.
Axiom & Free-Parameter Ledger
invented entities (6)
-
Vocabulary Hijacking
no independent evidence
-
Inert Tokens
no independent evidence
-
Hijacking Anchors
no independent evidence
-
HABI
no independent evidence
-
NHAR
no independent evidence
-
HAVAE
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Aligning Large Multimodal Models with Factually Augmented
Zhiqing Sun and Sheng Shen and Shengcao Cao and Haotian Liu and Chunyuan Li and Yikang Shen and Chuang Gan and Liang. Aligning Large Multimodal Models with Factually Augmented. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) , pages =
-
[2]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Analyzing and mitigating object hallucination in large vision-language models , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[3]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
-
[4]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
Qidong Huang and Xiaoyi Dong and Pan Zhang and Bin Wang and Conghui He and Jiaqi Wang and Dahua Lin and Weiming Zhang and Nenghai Yu , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
-
[5]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
Sicong Leng and Hang Zhang and Guanzheng Chen and Xin Li and Shijian Lu and Chunyan Miao and Lidong Bing , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
-
[7]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
-
[10]
arXiv preprint arXiv:2407.02477 , year=
Understanding alignment in multimodal llms: A comprehensive study , author=. arXiv preprint arXiv:2407.02477 , year=
-
[11]
Proceedings of the Advances in neural information processing systems (NeurIPS) , year=
Automated Multi-level Preference for MLLMs , author=. Proceedings of the Advances in neural information processing systems (NeurIPS) , year=
-
[12]
arXiv preprint arXiv:2411.17265 , year=
A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs , author=. arXiv preprint arXiv:2411.17265 , year=
-
[13]
Proceedings of the European Conference on Computer Vision (ECCV) , pages=
Clip-dpo: Vision-language models as a source of preference for fixing hallucinations in lvlms , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=
-
[14]
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization , author=. 2023 , archivePrefix=
work page 2023
-
[15]
arXiv preprint arXiv:2404.14233 , year=
Detecting and mitigating hallucination in large vision language models via fine-grained ai feedback , author=. arXiv preprint arXiv:2404.14233 , year=
-
[16]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Mitigating Object Hallucination via Data Augmented Contrastive Tuning , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[17]
arXiv preprint arXiv:2410.15334 , year=
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment , author=. arXiv preprint arXiv:2410.15334 , year=
-
[18]
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
mDPO: Conditional Preference Optimization for Multimodal Large Language Models , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
-
[19]
In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024
Simpo: Simple preference optimization with a reference-free reward , author=. arXiv preprint arXiv:2405.14734 , year=
-
[20]
arXiv preprint arXiv:2402.05369 , year=
Noise contrastive alignment of language models with explicit rewards , author=. arXiv preprint arXiv:2402.05369 , year=
-
[21]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
Direct preference optimization: Your language model is secretly a reward model , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[22]
A comprehensive overview of large language models,
A comprehensive overview of large language models , author=. arXiv preprint arXiv:2307.06435 , year=
-
[23]
Challenges and applications of large language models
Challenges and applications of large language models , author=. arXiv preprint arXiv:2307.10169 , year=
-
[24]
A Survey of Large Language Models
A survey of large language models , author=. arXiv preprint arXiv:2303.18223 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Trustworthy llms: a survey and guideline for evaluating large language models’ alignment
Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment , author=. arXiv preprint arXiv:2308.05374 , year=
-
[26]
The method of paired comparisons , author=
Rank analysis of incomplete block designs: I. The method of paired comparisons , author=. Biometrika , volume=
-
[27]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
Training language models to follow instructions with human feedback , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[28]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
Deep reinforcement learning from human preferences , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[29]
Transactions on Machine Learning Research (TMLR) , volume =
Open problems and fundamental limitations of reinforcement learning from human feedback , author=. Transactions on Machine Learning Research (TMLR) , volume =
-
[30]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
Improved baselines with visual instruction tuning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
-
[31]
Vicuna: An open-source chatbot impressing gpt-4 with 90\ author=. https://vicuna. lmsys. org , volume=
-
[32]
Proceedings of the International Conference on Machine Learning (ICML) , pages=
Learning transferable visual models from natural language supervision , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=
-
[33]
arXiv preprint arXiv:2501.16629 , year=
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs , author=. arXiv preprint arXiv:2501.16629 , year=
-
[34]
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
-
[35]
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[37]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
Haotian Liu and Chunyuan Li and Qingyang Wu and Yong Jae Lee , title =. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[38]
Wenliang Dai and Junnan Li and Dongxu Li and Anthony Meng Huat Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven C. H. Hoi , title =. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[39]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond , author=. arXiv preprint arXiv:2308.12966 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
Beyond hallucinations: Enhancing lvlms through hallucination-aware direct preference optimization,
Beyond hallucinations: Enhancing lvlms through hallucination-aware direct preference optimization , author=. arXiv preprint arXiv:2311.16839 , year=
-
[41]
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =
Anna Rohrbach and Lisa Anne Hendricks and Kaylee Burns and Trevor Darrell and Kate Saenko , title =. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =
-
[43]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[44]
Proceedings of the European Conference on Computer Vision (ECCV) , pages=
Paying more attention to image: A training-free method for alleviating hallucination in lvlms , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=
-
[45]
Proceedings of the International Conference on Machine Learning (ICML) , year =
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding , author=. Proceedings of the International Conference on Machine Learning (ICML) , year =
-
[46]
A Survey on Hallucination in Large Vision-Language Models
A survey on hallucination in large vision-language models , author=. arXiv preprint arXiv:2402.00253 , year=
work page internal anchor Pith review arXiv
-
[47]
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
Evaluating Object Hallucination in Large Vision-Language Models , author=. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
-
[48]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[49]
Halle-switch: Rethinking and controlling object existence hallucinations in large vision language models for detailed caption , author=. arXiv preprint arXiv:2310.01779 , year=
-
[50]
arXiv preprint arXiv:2402.08680 , year=
Mitigating object hallucination in large vision-language models via classifier-free guidance , author=. arXiv preprint arXiv:2402.08680 , year=
-
[51]
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=
Vigc: Visual instruction generation and correction , author=. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , volume=
-
[52]
Woodpecker: Hallucination correction for multimodal large language models
Woodpecker: Hallucination correction for multimodal large language models , author=. arXiv preprint arXiv:2310.16045 , year=
-
[53]
Proceedings of the International Conference on Machine Learning (ICML) , year=
Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark , author=. Proceedings of the International Conference on Machine Learning (ICML) , year=
-
[54]
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models , author=. arXiv preprint arXiv:2410.03577 , year=
-
[55]
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation , author=. arXiv e-prints , pages=
-
[56]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Mitigating hallucination in large multi-modal models via robust instruction tuning , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[57]
Aligning Large Multimodal Models with Factually Augmented
Zhiqing Sun and Sheng Shen and Shengcao Cao and Haotian Liu and Chunyuan Li and Yikang Shen and Chuang Gan and Liangyan Gui and Yu-Xiong Wang and Yiming Yang and Kurt Keutzer and Trevor Darrell , booktitle=. Aligning Large Multimodal Models with Factually Augmented. 2024
work page 2024
-
[58]
arXiv preprint arXiv:2402.11411 , year=
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning , author=. arXiv preprint arXiv:2402.11411 , year=
-
[59]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
Mitigating object hallucination via concentric causal attention , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[60]
Proceedings of the International Conference on Learning Representations (ICLR) , year=
Lora: Low-rank adaptation of large language models , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
-
[61]
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
work page 2024
-
[62]
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
-DPO: Direct Preference Optimization with Dynamic , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[63]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Making the v in vqa matter: Elevating the role of image understanding in visual question answering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[64]
Gqa: A new dataset for real-world visual reasoning and compositional question answering , author=. CVPR , year=
-
[65]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Vizwiz grand challenge: Answering visual questions from blind people , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[66]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Instructblip: Towards general-purpose vision-language models with instruction tuning , author=. arXiv preprint arXiv:2305.06500 , year=
work page internal anchor Pith review arXiv
-
[67]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Learn to explain: Multimodal reasoning via thought chains for science question answering , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[68]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Towards vqa models that can read , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[70]
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player? , author=. arXiv preprint arXiv:2307.06281 , year=
work page internal anchor Pith review arXiv
-
[71]
Smaug: Fixing failure modes of preference optimisation with dpo-positive , author=. arXiv preprint arXiv:2402.13228 , year=
-
[72]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
Devils in middle layers of large vision-language models: Interpreting, detecting and mitigating object hallucinations via attention lens , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
-
[73]
The Thirteenth International Conference on Learning Representations,
See What You Are Told: Visual Attention Sink in Large Multimodal Models , author =. The Thirteenth International Conference on Learning Representations,
-
[74]
Massive activations in large language models , author=. arXiv preprint arXiv:2402.17762 , year=
-
[75]
The Thirteenth International Conference on Learning Representations (ICLR) , year=
When Attention Sink Emerges in Language Models: An Empirical View , author=. The Thirteenth International Conference on Learning Representations (ICLR) , year=
-
[78]
Forty-second International Conference on Machine Learning (ICLR) , year=
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering , author=. Forty-second International Conference on Machine Learning (ICLR) , year=
-
[79]
LISA: REASONING SEGMENTATION VIA LARGE LANGUAGE MODEL , author=
-
[80]
LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=
Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , month=. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , url=
-
[81]
Transformer Circuits Thread , volume=
A mathematical framework for transformer circuits , author=. Transformer Circuits Thread , volume=
-
[83]
arXiv preprint arXiv:2403.00824 , year=
Information flow routes: Automatically interpreting language models at scale , author=. arXiv preprint arXiv:2403.00824 , year=
-
[84]
Spectral Filters, Dark Signals, and Attention Sinks , author=. arXiv preprint arXiv:2402.09221 , year=
-
[85]
Microsoft coco: Common objects in context , author=. Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014 , year=
work page 2014
-
[86]
Deiseroth, Bj\". ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation , volume =. Advances in Neural Information Processing Systems (NeurIPS) , editor =
-
[87]
Tell Your Model Where to Attend: Post-hoc Attention Steering for
Qingru Zhang and Chandan Singh and Liyuan Liu and Xiaodong Liu and Bin Yu and Jianfeng Gao and Tuo Zhao , booktitle=. Tell Your Model Where to Attend: Post-hoc Attention Steering for
-
[88]
Model Tells You What to Discard: Adaptive
Suyu Ge and Yunan Zhang and Liyuan Liu and Minjia Zhang and Jiawei Han and Jianfeng Gao , booktitle=. Model Tells You What to Discard: Adaptive
-
[90]
Interpreting GPT: The logit lens , author =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.