pith. sign in

arxiv: 2603.16284 · v2 · pith:Q2M2425Mnew · submitted 2026-03-17 · 💻 cs.CV · cs.LG

Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation

Pith reviewed 2026-05-21 10:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords hallucination mitigationfeature steeringlarge vision-language modelscausal attributionlayer-wise sparsityvisual question answering
0
0 comments X

The pith

By scoring each layer's role in hallucinations and steering only the relevant ones, vision-language models reduce errors while keeping general performance intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large vision-language models often generate visual hallucinations that limit their reliability. Existing feature-steering fixes apply the same correction strength to every layer, which can disturb layers that have little to do with the errors and hurt performance on normal tasks. The paper introduces a locate-then-sparsify approach that first builds a dataset of token- and sentence-level hallucination examples, then uses causal interventions to measure how much each layer contributes to those errors. Those layer scores are turned into different steering intensities so that only the most relevant layers receive strong correction. Experiments on several models and benchmarks show this targeted method lowers hallucination rates without the usual drop in overall accuracy.

Core claim

The Locate-Then-Sparsify for Feature Steering framework first constructs token-level and sentence-level hallucination datasets, applies causal-intervention attribution to produce per-layer relevance scores, and then converts those scores into individualized steering intensities that apply stronger corrections only to hallucination-relevant layers while leaving other layers largely untouched.

What carries the argument

Layerwise attribution scores derived from causal interventions on a constructed hallucination dataset, which are then mapped to per-layer feature-steering intensities.

If this is right

  • Uniform steering across all layers can be replaced by sparse, score-driven intensities that preserve capability on non-hallucination tasks.
  • The same attribution pipeline can be applied to new LVLMs without retraining the base model.
  • Both token-level and sentence-level hallucination cases are handled by the single layerwise intensity map.
  • Inference cost stays the same because only the steering strengths change, not the number of operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be extended to other error modes such as factual inconsistency by building analogous attribution datasets.
  • Dynamic, input-dependent attribution might further improve results if computed on the fly for each query.
  • The approach suggests that many mitigation techniques now applied globally could benefit from first locating the responsible components inside the model.

Load-bearing premise

The causal-intervention method accurately measures how much each layer contributes to hallucinations, and the token- and sentence-level dataset captures the main patterns that need correction.

What would settle it

A controlled comparison in which the same steering strengths are reassigned to random layers instead of the attributed high-relevance layers, with hallucination metrics then checked to see whether mitigation drops sharply.

Figures

Figures reproduced from arXiv: 2603.16284 by Chao Bi, Jinzhe Liu, Qingming Huang, Shufan Shen, Shuhui Wang, Tiantian Dang.

Figure 1
Figure 1. Figure 1: Current methods (e.g., Nullu [42]) mitigate hallucina￾tions by uniformly steering features across layers, which (a) al￾ters feature distributions and (b) leads to degraded performance on general tasks like MMMU. In contrast, we propose a layerwise steering framework, LTS-FS, which mitigates hallucinations more effectively (e.g., on CHAIR) while minimally perturbing the fea￾ture distributions, thus preservi… view at source ↗
Figure 2
Figure 2. Figure 2: Hallucination examples at token level and sentence level. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our LTS-FS framework. First, we build a bi-granularity dataset with token level and sentence level hallucinations. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results of MME evaluation. 4.3. Results on POPE We conduct evaluations on POPE benchmark under the Popular, Random, and Adversarial settings. Here, we mainly provide the average results of Accuracy and F1- score, respectively shown in Tab. 2 and Tab. 3. The compre￾hensive results can be found in the Supplementary Materi￾als. Since we use Qwen-VL for evaluation, some methods (e.g., Nullu) did not report cor… view at source ↗
Figure 5
Figure 5. Figure 5: Demonstration of our framework for hallucination mitigation on two examples of LLaVA-Bench using LLaVA-v1.5-7B. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A sample generation based on CHAIR benchmark [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: More examples on LLaVA-Bench. and applied to a different target set for evaluation (e.g., POPE-GQA or Antidote). Concretely, CHAIR relies on MSCOCO; POPE uses MSCOCO and GQA; Antidote uses its own corpus. We therefore test cross-dataset pairs such as MSCOCO→GQA to verify transfer. The results is shown in Tab. 12). MSCOCO→GQA denotes calibrating attribu￾tion on MSCOCO and evaluating on the POPE–GQA sub￾set,… view at source ↗
Figure 8
Figure 8. Figure 8: Prompt of GPT-4V Evaluation [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Despite the significant advancements in Large Vision-Language Models (LVLMs), their tendency to generate hallucinations undermines reliability and restricts broader practical deployment. Among the hallucination mitigation methods, feature steering emerges as a promising approach that reduces erroneous outputs in LVLMs without increasing inference costs. However, current methods apply uniform feature steering across all layers. This heuristic strategy ignores inter-layer differences, potentially disrupting layers unrelated to hallucinations and ultimately leading to performance degradation on general tasks. In this paper, we propose Locate-Then-Sparsify for Feature Steering (LTS-FS), a plug-and-play framework which controls the steering intensity according to the hallucination relevance of each layer. We first construct a dataset comprising token-level and sentence-level hallucination cases. Based on this dataset, we introduce an attribution method based on causal interventions to quantify the hallucination relevance of each layer. With the attribution scores across layers, we propose a layerwise strategy that converts these scores into feature steering intensities for individual layers, enabling more precise adjustments specifically on hallucination-relevant layers. Extensive experiments across multiple LVLMs and benchmarks demonstrate that LTS-FS effectively mitigates hallucination while preserving strong performance. Codes are available at https://github.com/huttersadan/LTS-FS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Locate-Then-Sparsify for Feature Steering (LTS-FS), a plug-and-play framework for mitigating visual hallucinations in LVLMs. It constructs a dataset of token-level and sentence-level hallucination cases, applies causal-intervention attribution to quantify hallucination relevance per layer, converts the resulting scores into layer-specific steering intensities via a sparsification rule, and reports that this targeted approach reduces hallucinations more effectively than uniform steering while preserving performance across multiple LVLMs and benchmarks. Code is released at a public GitHub repository.

Significance. If the attribution method can be shown to isolate hallucination-driving layers rather than generic sensitivity, the work would offer a principled improvement over uniform feature steering by enabling sparse, less disruptive interventions. The public code release supports reproducibility and is a clear strength.

major comments (2)
  1. [Attribution method] The central claim that LTS-FS enables precise adjustments on hallucination-relevant layers rests on the causal-intervention attribution scores accurately isolating hallucination relevance. However, the manuscript provides no controls comparing attribution-guided layer selection against (a) random subsets of equal cardinality or (b) layers ranked by impact on non-hallucination metrics. In transformer LVLMs, interventions propagate through residual streams, so measured changes may reflect correlated downstream or task-general effects rather than specific hallucination drivers. This directly threatens the justification for the layerwise sparsification strategy (Abstract and attribution-method description).
  2. [Layerwise strategy] The conversion of attribution scores into per-layer steering intensities is described only at a high level. Without explicit details on the sparsification rule, any free parameters in the score-to-intensity mapping, or ablation of the conversion choices, it is unclear whether the reported gains are robust or depend on dataset-specific tuning. This is load-bearing for the claim of improved performance preservation (Abstract).
minor comments (2)
  1. The abstract states that experiments demonstrate effectiveness but omits reporting of statistical significance, exact implementation of the attribution (e.g., activation patching vs. scaling), and any ablation on the constructed hallucination dataset.
  2. Figure and table captions should explicitly state the number of runs, random seeds, and whether error bars reflect standard deviation or standard error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation and justification of our method.

read point-by-point responses
  1. Referee: [Attribution method] The central claim that LTS-FS enables precise adjustments on hallucination-relevant layers rests on the causal-intervention attribution scores accurately isolating hallucination relevance. However, the manuscript provides no controls comparing attribution-guided layer selection against (a) random subsets of equal cardinality or (b) layers ranked by impact on non-hallucination metrics. In transformer LVLMs, interventions propagate through residual streams, so measured changes may reflect correlated downstream or task-general effects rather than specific hallucination drivers. This directly threatens the justification for the layerwise sparsification strategy (Abstract and attribution-method description).

    Authors: We thank the referee for this important observation. We agree that additional controls are needed to demonstrate that the attribution scores capture hallucination-specific relevance rather than generic layer sensitivity. In the revised manuscript we add experiments comparing attribution-guided layer selection against (a) random subsets of equal cardinality and (b) layers ranked by impact on non-hallucination metrics such as standard VQA accuracy. These controls show that our method yields a superior hallucination-mitigation versus performance-preservation trade-off. We also expand the discussion of the causal-intervention procedure to clarify how layer-specific interventions, with other layers held fixed, measure the direct causal contribution to output hallucination rate and thereby mitigate concerns about residual-stream propagation. revision: yes

  2. Referee: [Layerwise strategy] The conversion of attribution scores into per-layer steering intensities is described only at a high level. Without explicit details on the sparsification rule, any free parameters in the score-to-intensity mapping, or ablation of the conversion choices, it is unclear whether the reported gains are robust or depend on dataset-specific tuning. This is load-bearing for the claim of improved performance preservation (Abstract).

    Authors: We agree that the layerwise strategy requires a more explicit description. In the revised manuscript we provide the precise sparsification rule, the normalization procedure, the free parameters (threshold and scaling factor), and how they are selected on a held-out validation split. We also add an ablation study that varies the sparsification threshold and the functional form of the score-to-intensity mapping. The results confirm that the reported gains remain stable across reasonable parameter choices and are not artifacts of dataset-specific tuning, thereby supporting the claim of improved performance preservation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's chain proceeds by constructing a hallucination dataset, applying causal interventions to derive per-layer attribution scores, and then using a deterministic layerwise conversion rule to set steering intensities; none of these steps reduce by construction to the final benchmark performance numbers or to a self-referential definition of the target quantity. The attribution scores are computed via interventions on the constructed cases rather than by optimizing against the reported mitigation metrics, and the subsequent sparsification rule is presented as a fixed function of those scores. Evaluation on separate benchmarks across multiple LVLMs supplies an independent test, with no load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of causal attribution for layer scoring and the representativeness of the hallucination dataset; no explicit free parameters or invented entities are named in the abstract, though the score-to-intensity conversion likely involves implicit scaling choices.

free parameters (1)
  • score-to-intensity conversion parameters
    Attribution scores are converted into per-layer steering intensities; specific thresholds or scaling factors are not detailed but are required for the layerwise strategy.
axioms (1)
  • domain assumption Causal interventions on the constructed dataset reveal true per-layer hallucination relevance
    Invoked when introducing the attribution method to quantify relevance.

pith-pipeline@v0.9.0 · 5769 in / 1257 out tokens · 39120 ms · 2026-05-21T10:27:54.592598+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 9 internal anchors

  1. [1]

    Flamingo: a visual language model for few-shot learning

    Jean-Baptiste Alayrac and et al. Flamingo: a visual language model for few-shot learning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022. 1

  2. [2]

    Mitigating object hallucinations in large vision-language models with assembly of global and local attention

    Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, Qianying Wang, Ping Chen, Xiaoqin Zhang, and Shijian Lu. Mitigating object hallucinations in large vision-language models with assembly of global and local attention. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 29915–29926, 2025. 2, 6

  3. [3]

    Fluctuation-based adaptive structured pruning for large lan- guage models

    Yongqi An, Xu Zhao, Tao Yu, Ming Tang, and Jinqiao Wang. Fluctuation-based adaptive structured pruning for large lan- guage models. InProceedings of the AAAI Conference on Artificial Intelligence, 2024. 3

  4. [4]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 6

  5. [5]

    Asking questions to alleviate object hallucina- tion in large vision-language models.IEEE Transactions on Circuits and Systems for Video Technology, 2025

    Chao Bi, Tiantian Dang, Shuhui Wang, Feng Cao, and Qing- ming Huang. Asking questions to alleviate object hallucina- tion in large vision-language models.IEEE Transactions on Circuits and Systems for Video Technology, 2025. 2

  6. [6]

    Chan, Suzanne Petryk, et al

    David M. Chan, Suzanne Petryk, et al. Clair: Evaluating im- age captions with large language models. InEMNLP 2023,

  7. [7]

    Prompt-prompted adaptive structured pruning for efficient llm generation

    Harry Dong, Beidi Chen, and Yuejie Chi. Prompt-prompted adaptive structured pruning for efficient llm generation. arXiv preprint arXiv:2404.01365, 2024. 3

  8. [8]

    Learning to prune deep neural networks via layer-wise optimal brain surgeon

    Xin Dong, Shangyu Chen, and Sinno Jialin Pan. Learning to prune deep neural networks via layer-wise optimal brain surgeon. InNeurIPS, 2017. 3

  9. [9]

    Cascaded revision network for novel object captioning.IEEE Transactions on Circuits and Systems for Video Technology, 30(10):3413–3421, 2020

    Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Mingliang Xu, and Yi Yang. Cascaded revision network for novel object captioning.IEEE Transactions on Circuits and Systems for Video Technology, 30(10):3413–3421, 2020. 2

  10. [10]

    Mme: A comprehensive evaluation bench- mark for multimodal large language models

    Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation bench- mark for multimodal large language models. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. 5, 6

  11. [11]

    Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

    Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron pruning ap- proach towards efficient deep architectures.ArXiv preprint, abs/1607.03250, 2016. 3

  12. [12]

    A survey on evaluation of multimodal large language models.arXiv preprint arXiv:2408.15769, 2024

    Jing Huang et al. A survey on evaluation of multimodal large language models.arXiv preprint arXiv:2408.15769, 2024. 1

  13. [13]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, tax- onomy, challenges, and open questions.arXiv preprint arXiv:2311.05232, 2023. 2

  14. [14]

    Opera: Alleviating hallucination in multi- modal large language models via over-trust penalty and retrospection-allocation

    Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Con- ghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, and Nenghai Yu. Opera: Alleviating hallucination in multi- modal large language models via over-trust penalty and retrospection-allocation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13418–13427, 2024. 2, 3, 4

  15. [15]

    Gqa: A new dataset for real-world visual reasoning and compositional question answering

    Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6700–6709, 2019. 6

  16. [16]

    Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12):1–38, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12):1–38, 2023. 2

  17. [17]

    Survey of hal- lucination in natural language generation.ACM Computing Surveys, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, et al. Survey of hal- lucination in natural language generation.ACM Computing Surveys, 2023. 1

  18. [18]

    Model sparsity can simplify machine unlearning, 2024

    Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning, 2024. 3

  19. [19]

    Mitigating object hal- lucinations in large vision-language models through visual contrastive decoding

    Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, and Lidong Bing. Mitigating object hal- lucinations in large vision-language models through visual contrastive decoding. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 13872–13882, 2024. 2, 6

  20. [20]

    Inference-time intervention: Elicit- ing truthful answers from a language model.Advances in Neural Information Processing Systems, 36:41451–41530,

    Kenneth Li, Oam Patel, Fernanda Vi ´egas, Hanspeter Pfister, and Martin Wattenberg. Inference-time intervention: Elicit- ing truthful answers from a language model.Advances in Neural Information Processing Systems, 36:41451–41530,

  21. [21]

    Evaluating Object Hallucination in Large Vision-Language Models

    Y . Li and et al. Evaluating object hallucination in large vision-language models.arXiv preprint arXiv:2305.10355,

  22. [22]

    Evaluating object hallucination in large vision-language models

    Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Pro- cessing, pages 292–305, 2023. 3, 5, 6

  23. [23]

    Continual learning via sparse memory finetuning, 2025

    Jessy Lin, Luke Zettlemoyer, Gargi Ghosh, Wen-Tau Yih, Aram Markosyan, Vincent-Pierre Berges, and Barlas O ˘guz. Continual learning via sparse memory finetuning, 2025. 3

  24. [24]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014. 6

  25. [25]

    Mitigating hallucination in large multi-modal models via robust instruction tuning

    Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Ya- coob, and Lijuan Wang. Mitigating hallucination in large multi-modal models via robust instruction tuning. InThe Twelfth International Conference on Learning Representa- tions, pages 1–12, 2023. 2

  26. [26]

    Visual Instruction Tuning

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.arXiv preprint arXiv:2304.08485,

  27. [27]

    Improved baselines with visual instruction tuning

    Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024. 5, 6

  28. [28]

    Visual instruction tuning.Advances in neural information processing systems, 36, 2024

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36, 2024. 6

  29. [29]

    A Survey on Hallucination in Large Vision-Language Models

    Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiu- tian Zhao, Ke Wang, Liping Hou, Rongjun Li, and Wei Peng. A survey on hallucination in large vision-language models. arXiv preprint arXiv:2402.00253, 2024. 2

  30. [30]

    Edit less, achieve more: Dynamic sparse neu- ron masking for lifelong knowledge editing in llms, 2025

    Jinzhe Liu, Junshu Sun, Shufan Shen, Chenxue Yang, and Shuhui Wang. Edit less, achieve more: Dynamic sparse neu- ron masking for lifelong knowledge editing in llms, 2025. 3

  31. [31]

    Reducing hallucina- tions in large vision-language models via latent space steer- ing

    Sheng Liu, Haotian Ye, and James Zou. Reducing hallucina- tions in large vision-language models via latent space steer- ing. InThe Thirteenth International Conference on Learning Representations, 2025. 2, 5, 6

  32. [32]

    The devil is in the neu- rons: Interpreting and mitigating social biases in pre-trained language models, 2024

    Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, and Tsung-Yi Ho. The devil is in the neu- rons: Interpreting and mitigating social biases in pre-trained language models, 2024. 3

  33. [33]

    Object hallucination in image cap- tioning

    Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, and Kate Saenko. Object hallucination in image cap- tioning. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4035–4045,

  34. [34]

    A comprehensive survey of hallucina- tion mitigation techniques in large language models.Find- ings of EMNLP, 2024

    Prasanta Sahoo et al. A comprehensive survey of hallucina- tion mitigation techniques in large language models.Find- ings of EMNLP, 2024. 1

  35. [35]

    A-okvqa: A benchmark for visual question answering using world knowl- edge

    Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, and Roozbeh Mottaghi. A-okvqa: A benchmark for visual question answering using world knowl- edge. InEuropean conference on computer vision, pages 146–162. Springer, 2022. 6

  36. [36]

    Expanding sparse tuning for low memory usage

    Shufan Shen, Junshu Sun, Xiangyang Ji, Qingming Huang, and Shuhui Wang. Expanding sparse tuning for low memory usage. InNeurIPS, 2024. 3

  37. [37]

    Venkatesh Babu

    Suraj Srinivas and R. Venkatesh Babu. Data-free parameter pruning for deep neural networks. InBMVC, 2015. 3

  38. [38]

    Rl-pruner: Struc- tured pruning using reinforcement learning for cnn com- pression and acceleration.arXiv preprint arXiv:2411.06463,

    Boyao Wang and V olodymyr Kindratenko. Rl-pruner: Struc- tured pruning using reinforcement learning for cnn com- pression and acceleration.arXiv preprint arXiv:2411.06463,

  39. [39]

    Vigc: Visual instruction generation and correction

    Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, et al. Vigc: Visual instruction generation and correction. InProceedings of the AAAI Conference on Artificial Intel- ligence, pages 5309–5317, 2024. 2

  40. [40]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Peng Wang, Shuai Bai, et al. Qwen2-vl: Enhancing vision- language model’s understanding of the open world.arXiv preprint arXiv:2409.12191, 2024. 1

  41. [41]

    Anti- dote: A unified framework for mitigating lvlm hallucinations in counterfactual presupposition and object perception

    Yuanchen Wu, Lu Zhang, Hang Yao, Junlong Du, Ke Yan, Shouhong Ding, Yunsheng Wu, and Xiaoqiang Li. Anti- dote: A unified framework for mitigating lvlm hallucinations in counterfactual presupposition and object perception. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 14646–14656, 2025. 3, 5

  42. [42]

    Nullu: Mitigating object hallucinations in large vision-language models via halluspace projection

    Le Yang, Ziwei Zheng, Boxu Chen, Zhengyu Zhao, Chenhao Lin, and Chao Shen. Nullu: Mitigating object hallucinations in large vision-language models via halluspace projection. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 14635–14645, 2025. 1, 2, 5, 6, 8

  43. [43]

    Designing energy-efficient convolutional neural networks using energy- aware pruning

    Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy- aware pruning. InCVPR, 2017. 3

  44. [44]

    A Survey on Multimodal Large Language Models

    Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. A survey on multimodal large language models.National Science Review, 2024. Earlier arXiv:2306.13549. 1

  45. [45]

    Woodpecker: Hallucination correction for multimodal large language models.Science China Information Sciences, 67(12):220105, 2024

    Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, and Enhong Chen. Woodpecker: Hallucination correction for multimodal large language models.Science China Information Sciences, 67(12):220105, 2024. 2

  46. [46]

    Neuron-level knowl- edge attribution in large language models

    Zeping Yu and Sophia Ananiadou. Neuron-level knowl- edge attribution in large language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Lan- guage Processing, pages 3267–3280, 2024. 4

  47. [47]

    Understanding multi- modal llms: the mechanistic interpretability of llava in visual question answering.arXiv preprint arXiv:2411.10950, 2024

    Zeping Yu and Sophia Ananiadou. Understanding multi- modal llms: the mechanistic interpretability of llava in visual question answering.arXiv preprint arXiv:2411.10950, 2024. 4

  48. [48]

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yu- long Chen, et al. Siren’s song in the ai ocean: a survey on hallucination in large language models.arXiv preprint arXiv:2309.01219, 2023. 2

  49. [49]

    Analyzing and mitigating object hallucination in large vision-language models

    Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, and Huaxiu Yao. Analyzing and mitigating object hallucination in large vision-language models. InThe Twelfth International Con- ference on Learning Representations, 2024. 4 Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigati...

  50. [50]

    At first, to preserve generalization, the data used for dataset construction and the data used for experiments are strictly disjoint

    Details of the construction of the dataset In this section, we introduce the details of how to construct the Bi-granularity Dataset. At first, to preserve generalization, the data used for dataset construction and the data used for experiments are strictly disjoint. Particularly, for data selected based on CHAIR and POPE, we use data from train spilt of M...

  51. [51]

    The mask threshold rs is selected to be 0.5, as shown in Tab.5 of main text

    Implementation details of LTS-FS Hyper-parameters.The strength control parameters of sl tok:λ cue, λpos, λhall is set to be 1. The mask threshold rs is selected to be 0.5, as shown in Tab.5 of main text. Environment.All the experiments are conducted on one A100 80G. For 7B model, two RTX3090 24G can replace A100. For detailed python requirements, please r...

  52. [52]

    Compared methods.We employ the default parameters and settings as reported in the original papers

    Implementation Settings of CHAIR Results Generation Setting.Here we set the generation config as follows:Max New Tokens=128,num beams=1, and sampling=False. Compared methods.We employ the default parameters and settings as reported in the original papers

  53. [53]

    To evaluate general capability more comprehensively, we perform an evaluation using a broader benchmark called CLAIR [6]

    Generation Capability. To evaluate general capability more comprehensively, we perform an evaluation using a broader benchmark called CLAIR [6]. This result in Tab. 7 shows that LTS-FS achieves a better trade-off between hallucination mitigation and general capability preservation. Table 7. Trade-off between hallucination mitigation and general capability...

  54. [54]

    Compared methods.We employ the default parameters and settings as reported in original papers

    More details of POPE results Generation Setting.Here we set the generation config as follows:Max New Tokens=16,num beams=1, andsam- pling=False. Compared methods.We employ the default parameters and settings as reported in original papers. Total Results.The total results is shown in Tab. 13. Across all settings, our LTS-FS framework achieves the best accu...

  55. [55]

    More details of MME results We report the MME numerical results in Tab. 8. The nu- merical results demonstrate that LTS-FS can strongly in- crease the mitigation abilitity of feature steering methods. Specifically, across the subset most related to hallucina- tion: Count, and Position, LTS-FS achieves great improve- ments, highlighting its effectiveness i...

  56. [56]

    The time to apply methods is the time to employ a hallucination mitigation method into a specific LVLMs

    Time Analysis There are two time cost analysis, the time to apply methods and the time for inference. The time to apply methods is the time to employ a hallucination mitigation method into a specific LVLMs. As an example, in order to apply VTI to LVLMs, the direction vector needs to be computed and the Table 8. Results on all MME perception-related tasks....

  57. [57]

    The result is shown in Tab

    Ablation Study about Indicators In this section, we discuss the effect of the three indicator in sentence level hallucination attribution. The result is shown in Tab. 11. We investigate the effect of removing each indi- cator in turn and find thatw/o cue indicatorandw/o position indicatoryield only small changes, whereasw/o hallucina- tioncauses a much la...

  58. [58]

    Discussion about Generalization To assess generalization beyond the construction sources, we evaluate on datasets whose distributions differ from those used to build our bi-granularity labels. Although the construction leverages CHAIR, POPE, and Antidote, we additionally report results on MME and LLaV A-Bench, which serve as out of- istribution dataset of...

  59. [59]

    7, which demonstrates our the effectiveness of our framework in hallucination mitigation

    More cases in LLaV A-bench More case studies on the LLaV A-bench are presented in Fig. 7, which demonstrates our the effectiveness of our framework in hallucination mitigation. In particular, color and count attributes are given greater emphasis, thereby avoiding hallucinations in these aspects

  60. [60]

    GPT4v-Evaluation prompt Following VCD, the prompt for GPT4v-aided evaluation is shown in Fig. 8. The GPT4v receive three type of LVLM’s responses and then generate output. Then we collect the output from GPT4v and finally report the average accuracy and detailedness

  61. [61]

    Since existing fea- ture steering techniques have not been evaluated on larger 70B-scale models, extending our method to 70B models re- mains a challenge

    Limitation and future work Although our approach can be effectively ported to feature- steering methods and achieves strong hallucination mitiga- tion, there is still room for development. Since existing fea- ture steering techniques have not been evaluated on larger 70B-scale models, extending our method to 70B models re- mains a challenge. We aim to ext...