pith. machine review for the scientific record. sign in

arxiv: 2604.06714 · v1 · submitted 2026-04-08 · 💻 cs.AI · cs.CL· cs.CV· cs.LG

Recognition: no theorem link

Steering the Verifiability of Multimodal AI Hallucinations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CVcs.LG
keywords multimodal AIhallucinationsverifiabilityactivation spaceintervention probesobvious hallucinationselusive hallucinationsMLLMs
0
0 comments X

The pith

Separate probes in activation space let multimodal models steer the verifiability of their hallucinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hallucinations in multimodal AI models vary in how readily humans can detect them, with some obvious and others elusive. The paper collects thousands of human responses to label these differences and builds a dataset of obvious versus elusive cases. It then develops an intervention technique that learns distinct probes inside the model's activation space for each category. These probes allow targeted adjustments that regulate how verifiable the model's outputs become. If the method works, AI applications could be tuned for stricter or more lenient error checking based on the use case, from high-stakes tasks to everyday queries.

Core claim

The authors construct a dataset from 4,470 human responses that categorizes AI-generated hallucinations into obvious and elusive types according to human verifiability. They propose an activation-space intervention method that learns separate probes for the two types. Experiments show that obvious and elusive hallucinations trigger different probes, targeted interventions outperform general ones at regulating the matching verifiability, and simply mixing the probes produces flexible control suited to different security and usability demands.

What carries the argument

Activation-space intervention method that learns separate probes for obvious and elusive hallucinations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The probes could be extended to modulate other output properties such as confidence levels or level of detail.
  • Dynamic mixing during generation might allow real-time adjustment to match user-specified risk tolerance.
  • The approach may transfer to text-only models if similar human-labeled categories can be collected.

Load-bearing premise

Human responses provide a reliable and generalizable way to categorize hallucinations as obvious or elusive based on verifiability.

What would settle it

New human evaluations on outputs after probe application that show no measurable change in detection effort or accuracy for obvious versus elusive hallucinations relative to the unadjusted model.

read the original abstract

AI applications driven by multimodal large language models (MLLMs) are prone to hallucinations and pose considerable risks to human users. Crucially, such hallucinations are not equally problematic: some hallucination contents could be detected by human users(i.e., obvious hallucinations), while others are often missed or require more verification effort(i.e., elusive hallucinations). This indicates that multimodal AI hallucinations vary significantly in their verifiability. Yet, little research has explored how to control this property for AI applications with diverse security and usability demands. To address this gap, we construct a dataset from 4,470 human responses to AI-generated hallucinations and categorize these hallucinations into obvious and elusive types based on their verifiability by human users. Further, we propose an activation-space intervention method that learns separate probes for obvious and elusive hallucinations. We reveal that obvious and elusive hallucinations elicit different intervention probes, allowing for fine-grained control over the model's verifiability. Empirical results demonstrate the efficacy of this approach and show that targeted interventions yield superior performance in regulating corresponding verifiability. Moreover, simply mixing these interventions enables flexible control over the verifiability required for different scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that hallucinations in multimodal LLMs vary in verifiability (obvious vs. elusive to humans), constructs a dataset of 4,470 human responses to label them accordingly, and proposes learning separate activation-space probes for each type. Targeted interventions using these probes are shown to regulate the corresponding verifiability more effectively than alternatives, while linear mixing of the probes enables flexible control over verifiability levels for different application scenarios.

Significance. If the human labels prove stable and the probes demonstrably isolate verifiability directions without collateral effects on model capability, the approach would provide a practical, tunable mechanism for steering MLLM outputs in contexts with differing security or usability requirements, extending activation-engineering techniques to a new controllable property.

major comments (2)
  1. [Dataset Construction] Dataset construction (human labeling of 4,470 responses): no inter-annotator agreement statistics, annotation guidelines, or cross-context consistency checks are reported. Because the probes are learned directly from these labels, high label noise would cause the probes to fit annotator-specific artifacts rather than reproducible verifiability features, directly undermining both the superiority claim for targeted interventions and the controllability of mixtures.
  2. [Experiments / Results] Empirical results section: the abstract asserts superior performance for targeted probes and flexible control via mixing, yet the manuscript provides no ablation isolating probe specificity (e.g., effect on non-hallucinated outputs), no statistical significance tests, and no comparison against strong baselines such as random or single-probe interventions. These omissions leave the central empirical support for load-bearing claims unverified.
minor comments (2)
  1. [Method] Clarify the precise linear-algebraic definition of the mixing operation and the loss used to train the probes; the current description leaves the intervention formula ambiguous.
  2. [Figures] Figure captions and axis labels in the results figures should explicitly state the metric (e.g., human verifiability score or detection rate) and the number of trials per condition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the empirical and methodological rigor of the work.

read point-by-point responses
  1. Referee: [Dataset Construction] Dataset construction (human labeling of 4,470 responses): no inter-annotator agreement statistics, annotation guidelines, or cross-context consistency checks are reported. Because the probes are learned directly from these labels, high label noise would cause the probes to fit annotator-specific artifacts rather than reproducible verifiability features, directly undermining both the superiority claim for targeted interventions and the controllability of mixtures.

    Authors: We agree that inter-annotator agreement statistics are essential for validating label quality. In the revised manuscript we will report Fleiss' kappa (or equivalent) computed over the multiple annotators who labeled the 4,470 responses. We will also append the complete annotation guidelines, which explicitly define obvious hallucinations as those detectable by visual inspection of the image alone and elusive hallucinations as those requiring external knowledge or verification effort. Annotations were performed under a standardized protocol with training examples and quality checks across image-question contexts; we will add a brief consistency analysis (e.g., agreement stratified by image category) to address potential context-specific artifacts. These additions directly respond to the concern that label noise could undermine the learned probes. revision: yes

  2. Referee: [Experiments / Results] Empirical results section: the abstract asserts superior performance for targeted probes and flexible control via mixing, yet the manuscript provides no ablation isolating probe specificity (e.g., effect on non-hallucinated outputs), no statistical significance tests, and no comparison against strong baselines such as random or single-probe interventions. These omissions leave the central empirical support for load-bearing claims unverified.

    Authors: We accept that the current experimental section lacks several standard controls. In the revision we will add (1) an ablation measuring probe effects on non-hallucinated outputs to demonstrate specificity, (2) statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) for all reported performance differences, and (3) explicit comparisons against random-direction interventions and single-probe baselines. These new results will be presented in an expanded results section and will directly support the claims of targeted superiority and flexible mixing control. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical probe learning from external human labels

full rationale

The paper's core chain begins with an external dataset of 4,470 human responses used to label hallucinations as obvious or elusive, followed by learning separate activation-space probes and testing targeted interventions on verifiability. No step reduces by construction to its own inputs: the probes are fitted to human-provided labels rather than self-defined quantities, the claimed superiority of targeted vs. mixed interventions is evaluated empirically (not forced by the fitting procedure itself), and no self-citations or uniqueness theorems are invoked as load-bearing premises. The derivation remains self-contained against the external human-label benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only; the central approach assumes human labels define meaningful verifiability categories and that activation interventions can selectively affect them. No numerical free parameters or external benchmarks are specified.

axioms (1)
  • domain assumption Human responses to AI hallucinations provide a consistent and generalizable basis for distinguishing obvious from elusive types.
    The dataset construction and subsequent probe learning rest directly on these human judgments.
invented entities (1)
  • Type-specific activation-space probes for obvious and elusive hallucinations no independent evidence
    purpose: To detect and intervene on distinct verifiability patterns in model activations
    Probes are learned from the new dataset; no independent falsifiable evidence outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5523 in / 1378 out tokens · 78181 ms · 2026-05-10T18:16:06.795689+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

    cs.CR 2026-03 unverdicted novelty 7.0

    UMID infers membership in contrastive pre-training data using only text queries by performing latent inversion and comparing similarity and variability signals to synthetic gibberish references via unsupervised anomal...

Reference graph

Works this paper leans on

44 extracted references · 10 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

    XiangAn,YinXie,KaichengYang,WenkangZhang,XiuweiZhao,ZhengCheng,YiruiWang,SongcenXu,Changrui Chen, ChunshengWu,HuajieTan,ChunyuanLi,JingYang,JieYu,XiyaoWang, BinQin, YumengWang,ZizhenYan, Ziyong Feng, Ziwei Liu, Bo Li, and Jiankang Deng. Llava-onevision-1.5: Fully open framework for democratized multimodal training, 2025. URLhttps://arxiv.org/abs/2509.23661

  2. [2]

    Refusal in language models is mediated by a single direction.Advances in Neural Information Processing Systems, 37: 136037–136083, 2024

    Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, and Neel Nanda. Refusal in language models is mediated by a single direction.Advances in Neural Information Processing Systems, 37: 136037–136083, 2024

  3. [3]

    The internal state of an llm knows when it’s lying

    Amos Azaria and Tom Mitchell. The internal state of an llm knows when it’s lying. InFindingsofthe Association forComputationalLinguistics: EMNLP 2023, pages 967–976, 2023

  4. [4]

    Hallucination of Multimodal Large Language Models: A Survey

    Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. Hallucination of multimodal large language models: A survey.arXivpreprintarXiv:2404.18930, 2024

  5. [5]

    Leace: Perfect linear concept erasure in closed form.AdvancesinNeuralInformationProcessingSystems, 36:66044–66063, 2023

    Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, and Stella Biderman. Leace: Perfect linear concept erasure in closed form.AdvancesinNeuralInformationProcessingSystems, 36:66044–66063, 2023

  6. [6]

    Dola: Decoding by contrasting layers improves factuality in large language models

    Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James R Glass, and Pengcheng He. Dola: Decoding by contrasting layers improves factuality in large language models. InThe TwelfthInternational Conference on Learning Representations

  7. [7]

    I don’t know: Explicit modeling of uncertainty with an [idk] token.AdvancesinNeuralInformationProcessingSystems, 37:10935–10958, 2024

    Roi Cohen, Konstantin Dobler, Eden Biran, and Gerard de Melo. I don’t know: Explicit modeling of uncertainty with an [idk] token.AdvancesinNeuralInformationProcessingSystems, 37:10935–10958, 2024

  8. [8]

    Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

    Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, et al. Vlmevalkit: An open-source toolkit for evaluating large multi-modality models. In Proceedings ofthe 32nd ACMInternational Conferenceon Multimedia, pages 11198–11201, 2024

  9. [9]

    Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, 2024

  10. [10]

    Don’thallucinate, abstain: Identifying llm knowledge gaps via multi-llm collaboration

    ShangbinFeng,WeijiaShi,YikeWang,WenxuanDing,VidhishaBalachandran,andYuliaTsvetkov. Don’thallucinate, abstain: Identifying llm knowledge gaps via multi-llm collaboration. InProceedingsofthe 62ndAnnualMeetingof the AssociationforComputationalLinguistics(Volume1: LongPapers), pages 14664–14690, 2024

  11. [11]

    Mme: A comprehensive evaluation benchmark for multimodal large language models

    Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation benchmark for multimodal large language models. InThe Thirty-ninthAnnualConferenceon NeuralInformationProcessingSystemsDatasetsand BenchmarksTrack

  12. [12]

    Xuefeng Du, Chaowei Xiao, and Yixuan Li

    ChengGao, HuiminChen, ChaojunXiao, ZhiyiChen, ZhiyuanLiu, andMaosongSun. H-neurons: Ontheexistence, impact, and origin of hallucination-associated neurons in llms.arXiv preprintarXiv:2512.01797, 2025

  13. [13]

    Enabling large language models to generate text with citations

    Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. Enabling large language models to generate text with citations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6465–6488, 2023

  14. [14]

    Anah-v2: Scalinganalyticalhallucination annotation of large language models.AdvancesinNeuralInformationProcessingSystems, 37:60012–60039, 2024

    YuzheGu,ZiweiJi,WenweiZhang,ChengqiLyu,DahuaLin,andKaiChen. Anah-v2: Scalinganalyticalhallucination annotation of large language models.AdvancesinNeuralInformationProcessingSystems, 37:60012–60039, 2024

  15. [15]

    Hallusionbench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models

    Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, et al. Hallusionbench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, page...

  16. [16]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACMTransactionson InformationSystems, 43(2):1–55, 2025

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACMTransactionson InformationSystems, 43(2):1–55, 2025. 13

  17. [17]

    Survey of hallucination in natural language generation.ACMcomputingsurveys, 55(12):1–38, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACMcomputingsurveys, 55(12):1–38, 2023

  18. [18]

    Calibrating verbal uncertainty as a linear feature to reduce hallucinations

    Ziwei Ji, Lei Yu, Yeskendir Koishekenov, Yejin Bang, Anthony Hartshorn, Alan Schelten, Cheng Zhang, Pascale Fung, and Nicola Cancedda. Calibrating verbal uncertainty as a linear feature to reduce hallucinations. InProceedings of the 2025Conferenceon EmpiricalMethods inNaturalLanguageProcessing, pages 3769–3793, 2025

  19. [19]

    Language Models (Mostly) Know What They Know

    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221, 2022

  20. [20]

    Evaluating object hallucination in large vision-language models

    Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models. InProceedingsofthe 2023conferenceonempiricalmethods innaturallanguage processing, pages 292–305, 2023

  21. [21]

    A Survey on Hallucination in Large Vision-Language Models

    Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, and Wei Peng. A survey on hallucination in large vision-language models.arXivpreprintarXiv:2402.00253, 2024

  22. [22]

    Visual instruction tuning.Advances in neural informationprocessingsystems, 36:34892–34916, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural informationprocessingsystems, 36:34892–34916, 2023

  23. [23]

    Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computervision, pages 216–233

    Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, et al. Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computervision, pages 216–233. Springer, 2024

  24. [24]

    Fine-grained hallucination detection and editing for language models.arXivpreprintarXiv:2401.06855, 2024

    Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, and Hannaneh Hajishirzi. Fine-grained hallucination detection and editing for language models.arXivpreprintarXiv:2401.06855, 2024

  25. [25]

    Controlling hallucinations at word level in data-to-text generation.DataMiningandKnowledgeDiscovery, 36(1): 318–354, 2022

    Clément Rebuffel, Marco Roberti, Laure Soulier, Geoffrey Scoutheeten, Rossella Cancelliere, and Patrick Gallinari. Controlling hallucinations at word level in data-to-text generation.DataMiningandKnowledgeDiscovery, 36(1): 318–354, 2022

  26. [26]

    Towards vqa models that can read

    Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. Towards vqa models that can read. InProceedingsofthe IEEE/CVFconferenceoncomputervisionand pattern recognition, pages 8317–8326, 2019

  27. [27]

    On early detection of hallucinations in factual question answering

    Ben Snyder, Marius Moisescu, and Muhammad Bilal Zafar. On early detection of hallucinations in factual question answering. InProceedingsofthe30thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining,pages 2721–2732, 2024

  28. [28]

    Whatlargelanguagemodelsknowandwhatpeoplethinktheyknow

    Mark Steyvers, Heliodoro Tejeda, Aakriti Kumar, Catarina Belem, Sheer Karny, Xinyue Hu, Lukas W Mayer, and PadhraicSmyth. Whatlargelanguagemodelsknowandwhatpeoplethinktheyknow. NatureMachineIntelligence, 7(2):221–231, 2025

  29. [29]

    Activation steering decoding: Mitigating hallucination in large vision-language models through bidirectional hidden state intervention

    Jingran Su, Jingfan Chen, Hongxin Li, Yuntao Chen, Li Qing, and Zhaoxiang Zhang. Activation steering decoding: Mitigating hallucination in large vision-language models through bidirectional hidden state intervention. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12964–12974, 2025

  30. [30]

    Towards verifiable text generation with evolving memory and self-reflection

    Hao Sun, Hengyi Cai, Bo Wang, Yingyan Hou, Xiaochi Wei, Shuaiqiang Wang, Yan Zhang, and Dawei Yin. Towards verifiable text generation with evolving memory and self-reflection. InProceedings of the 2024 Conference on EmpiricalMethods in NaturalLanguageProcessing, pages 8211–8227, 2024

  31. [31]

    Cross-layer attention probing for fine-grained hallucination detection.arXivpreprintarXiv:2509.09700, 2025

    Malavika Suresh, Rahaf Aljundi, Ikechukwu Nkisi-Orji, and Nirmalie Wiratunga. Cross-layer attention probing for fine-grained hallucination detection.arXivpreprintarXiv:2509.09700, 2025

  32. [32]

    Qwen2.5-vl, January 2025

    Qwen Team. Qwen2.5-vl, January 2025. URLhttps://qwenlm.github.io/blog/qwen2.5-vl/

  33. [33]

    Uncertainty-based abstention in llms improves safety and reduces hallucinations.arXiv preprint arXiv:2404.10960, 2024

    Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, and Mark Ibrahim. Uncertainty-based abstention in llms improves safety and reduces hallucinations.arXiv preprintarXiv:2404.10960, 2024. 14

  34. [34]

    A stitch in time saves nine: Detecting and mitigating hallucinations of LLMs by validating low-confidence generation,

    Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation.arXiv preprint arXiv:2307.03987, 2023

  35. [35]

    Amber: An llm-free multi-dimensional benchmark for mllms hallucination evaluation.arXiv preprint arXiv:2311.07397, 2023

    Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, JiZhang,etal. Amber: Anllm-freemulti-dimensionalbenchmarkformllmshallucinationevaluation. arXivpreprint arXiv:2311.07397, 2023

  36. [36]

    Reft: Representationfinetuningforlanguagemodels

    Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D Manning, and ChristopherPotts. Reft: Representationfinetuningforlanguagemodels. AdvancesinNeuralInformationProcessing Systems, 37:63908–63962, 2024

  37. [37]

    On hallucination and predictive uncertainty in conditional language generation

    Yijun Xiao and William Yang Wang. On hallucination and predictive uncertainty in conditional language generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2734–2744, 2021

  38. [38]

    Region-based cluster discrimination for visual representation learning

    Yin Xie, Kaicheng Yang, Xiang An, Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Roy Miles, et al. Region-based cluster discrimination for visual representation learning. InProceedings of the IEEE/CVF International Conferenceon ComputerVision, pages 1793–1803, 2025

  39. [39]

    A new benchmark and reverse validation method for passage- level hallucination detection

    Shiping Yang, Renliang Sun, and Xiaojun Wan. A new benchmark and reverse validation method for passage- level hallucination detection. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3898–3908, 2023

  40. [40]

    Woodpecker: Hallucination correction for multimodal large language models.Science China Information Sciences, 67(12):220105, 2024

    ShukangYin,ChaoyouFu,SiruiZhao,TongXu,HaoWang,DianboSui,YunhangShen,KeLi,XingSun,andEnhong Chen. Woodpecker: Hallucination correction for multimodal large language models.Science China Information Sciences, 67(12):220105, 2024

  41. [41]

    Mmmu-pro: Amorerobustmulti-disciplinemultimodalunderstandingbenchmark

    XiangYue,TianyuZheng,YuanshengNi,YuboWang,KaiZhang,ShengbangTong,YuxuanSun,BotaoYu,GeZhang, HuanSun,etal. Mmmu-pro: Amorerobustmulti-disciplinemultimodalunderstandingbenchmark. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15134–15186, 2025

  42. [42]

    Enhancing uncertainty-based hallucination detection with stronger focus

    Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, and Luoyi Fu. Enhancing uncertainty-based hallucination detection with stronger focus. InProceedings of the 2023 Conferenceon EmpiricalMethods inNaturalLanguageProcessing, pages 915–932, 2023

  43. [43]

    unclear” or “I don’t know

    Kaitlyn Zhou, Jena Hwang, Xiang Ren, and Maarten Sap. Relying on the unreliable: The impact of language models’ reluctancetoexpressuncertainty. In Proceedingsofthe62ndAnnualMeetingoftheAssociationforComputational Linguistics(Volume1: Long Papers), pages 3623–3643, 2024. 15 Appendix A Prompts A.1 Prompt Construction for Description Detailed prompt template...

  44. [44]

    Table 6Selected intervention coefficients for OHI and EHI

    Inotherwords,wetreattheacceptableinterventionstrengthasastablerangeratherthanauniquebestpoint. Table 6Selected intervention coefficients for OHI and EHI. Model𝛼 oh 𝛼eh Qwen2.5-VL-3B 0.90 0.90 Qwen2.5-VL-7B 0.80 0.70 LLaVA-OneVision-1.5-8B 0.80 0.80 For Qwen2.5-VL-3B, the validation curves in Fig- ure9becomerelativelyflatinthehigh-performing region, and we...