arxiv: 2605.01428 · v1 · submitted 2026-05-02 · 💻 cs.CL

Recognition: unknown

Hallucinations Undermine Trust; Metacognition is a Way Forward

Gal Yona , Mor Geva , Yossi Matias

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:27 UTC · model grok-4.3

classification 💻 cs.CL

keywords hallucinationsmetacognitionuncertaintylarge language modelsfactualitytrustworthinessgenerative AIself-awareness

0 comments

The pith

Models can build trust by expressing uncertainty instead of delivering confident errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that progress on factual reliability in language models has come mainly from encoding more facts rather than from learning to recognize what remains unknown. Because models may lack the ability to draw a perfect line between known and unknown information, forcing them to eliminate every error risks reducing their overall usefulness. A different path is to treat hallucinations as confident mistakes and instead have the model communicate its uncertainty when appropriate. This alignment between expressed doubt and internal confidence is called faithful uncertainty and forms one part of metacognition. Metacognition then becomes the mechanism that keeps models both capable and trustworthy in conversations and in systems that must decide when to gather more information.

Core claim

Hallucinations are confident errors; most factuality improvements have expanded the knowledge boundary rather than sharpened awareness of its limits. Models may inherently lack the discriminative power to separate truths from errors without cost to utility. Faithful uncertainty, where linguistic expressions match intrinsic uncertainty, dissolves the tradeoff as one facet of metacognition that governs honest communication and controls when to seek external information.

What carries the argument

Faithful uncertainty, the alignment of linguistic expressions of doubt with the model's internal uncertainty, acting as a control layer within metacognition for communication and decision-making about external help.

If this is right

Models maintain usefulness by qualifying answers rather than always answering or staying silent.
In agentic systems, metacognition determines when to use search tools and which results to trust.
Metacognition becomes required for reliable performance on complex or nuanced tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training objectives could shift from rewarding only correct answers toward rewarding well-calibrated expressions of doubt.
This view implies that perfect factuality without uncertainty signals may remain out of reach, redirecting effort toward self-monitoring abilities.
The same metacognitive layer could reduce over-reliance on external verification in deployed systems.

Load-bearing premise

That models lack enough power to perfectly separate known facts from errors, so that complete removal of confident mistakes must reduce their ability to answer questions.

What would settle it

A model that achieves zero confident errors on factoid question-answering benchmarks while attempting and correctly covering the same fraction of questions as current frontier systems.

read the original abstract

Despite significant strides in factual reliability, errors -- often termed hallucinations -- remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting -- factoid question-answering with clear ground truth-frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors -- incorrect information delivered without appropriate qualification -- a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition -- the ability to be aware of one's own uncertainty and to act on it. For direct interaction, acting on uncertainty means communicating it honestly; for agentic systems, it becomes the control layer governing when to search and what to trust. Metacognition is thus essential for LLMs to be both trustworthy and capable; we conclude by highlighting open problems for progress towards this objective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean position paper that reframes hallucinations as a metacognition issue rather than just a knowledge-scaling one, but it offers no evidence for its central conjecture.

read the letter

The main takeaway is that the authors distinguish between expanding what a model knows and improving its awareness of its own knowledge limits. They argue most reliability gains so far have come from the first, and they conjecture the second is fundamentally harder because models may not have the power to draw a perfect line between what they know and what they do not. From there they suggest expressing uncertainty as a practical third option, which they label faithful uncertainty and link to metacognition. That framing is the clearest new angle, and the paper lays it out plainly without overclaiming results. It also notes how this could matter for agentic setups where the model has to decide when to search or defer. The writing stays direct and the logic follows from the premise without unnecessary steps. The main limitation is that the tradeoff they describe stays conjectural. There are no experiments, no formalization, and no attempt to show why perfect discrimination would be impossible or costly. The paper ends by listing open problems, which is honest but leaves the core claim untested. Readers who work on LLM evaluation or trustworthy deployment will find the distinction useful for thinking about next steps, even if they want data before adopting the view. It is worth sending to peer review for venues that publish position pieces, because the idea is coherent and points to a direction that has not been stressed enough in the empirical literature.

Referee Report

1 major / 2 minor

Summary. The manuscript is a position paper arguing that factuality improvements in LLMs have primarily resulted from expanding knowledge boundaries (encoding more facts) rather than enhancing boundary awareness (distinguishing known from unknown). It conjectures that the latter is inherently limited by insufficient discriminative power, creating an unavoidable tradeoff between hallucination elimination and utility preservation. The paper reframes hallucinations as confident errors and proposes 'faithful uncertainty'—aligning expressed linguistic uncertainty with intrinsic model uncertainty—as a metacognitive approach that enables honest communication in direct interactions and control in agentic systems, while identifying open problems for future progress.

Significance. If the central conjecture holds, the work offers a useful conceptual reframing that could redirect research on LLM trustworthiness toward metacognition and uncertainty expression rather than solely scaling knowledge. The explicit distinction between boundary expansion and awareness, combined with the identification of open problems, provides a clear agenda that may help organize subsequent empirical and theoretical efforts in the field.

major comments (1)

Abstract: the assertion that 'most factuality gains in this domain have come from expanding the model's knowledge boundary rather than improving awareness of that boundary' is load-bearing for the subsequent conjecture and tradeoff claim, yet it is presented without reference to specific studies, quantitative comparisons, or examples that would ground the distinction between the two mechanisms.

minor comments (2)

The introduction of 'faithful uncertainty' as a new term would benefit from an explicit operational definition or contrast with existing concepts such as calibration, verbalized confidence, or abstention mechanisms to clarify its novelty.
The conclusion lists open problems but does not elaborate on them; a short dedicated subsection enumerating concrete research questions (e.g., metrics for faithfulness of uncertainty or training objectives) would increase actionability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting this point about grounding our central claim. We address the comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: the assertion that 'most factuality gains in this domain have come from expanding the model's knowledge boundary rather than improving awareness of that boundary' is load-bearing for the subsequent conjecture and tradeoff claim, yet it is presented without reference to specific studies, quantitative comparisons, or examples that would ground the distinction between the two mechanisms.

Authors: We agree that the assertion would benefit from additional grounding to make the distinction more concrete for readers. Although the paper is a position piece focused on conceptual reframing rather than a comprehensive empirical survey, we will revise the abstract to include a concise reference to observed trends in the literature (e.g., scaling-driven gains on factuality benchmarks such as MMLU or TruthfulQA alongside persistent hallucination rates in frontier models). We will also expand the introduction with brief illustrative examples distinguishing knowledge expansion (e.g., larger models encoding more factual associations) from boundary awareness (e.g., lack of corresponding improvement in uncertainty calibration). These changes will support the subsequent conjecture without altering the paper's core argument or requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a position paper that advances conceptual distinctions between knowledge-boundary expansion and boundary-awareness, followed by an explicitly labeled conjecture about inherent limits on the latter. No equations, derivations, fitted parameters, predictions, or empirical measurements are present that could reduce to self-defined quantities or self-citation chains. The central claims rest on argumentation rather than any internal reduction, making the derivation chain self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper rests on domain assumptions about LLM limitations and introduces a new conceptual entity without quantitative grounding or external validation.

axioms (2)

domain assumption Most factuality improvements come from knowledge expansion rather than boundary awareness
Invoked to explain why current approaches fall short and to motivate the metacognition focus.
domain assumption Models lack perfect discriminative power between known and unknown information
Central to the conjecture of an unavoidable tradeoff.

invented entities (1)

faithful uncertainty no independent evidence
purpose: Aligning linguistic expressions of uncertainty with the model's intrinsic uncertainty
New term proposed as the mechanism for metacognition in LLMs; no independent evidence provided.

pith-pipeline@v0.9.0 · 5543 in / 1366 out tokens · 23926 ms · 2026-05-09T14:27:46.508072+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 39 canonical work pages · 7 internal anchors

[1]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

F. J. Binder, J. Chua, T. Korbak, H. Sleight, J. Hughes, R. Long, E. Perez, M. Turpin, and O. Evans. Looking inward: Language models can learn about themselves by introspection. InThe Thirteenth International Conference on Learning Representations. J. Blasiok and P. Nakkiran. Smooth ece: Principled reliability diagrams via kernel smoothing. InThe Twelfth ...

work page arXiv
[3]

Chuang, Y

14 Hallucinations Undermine Trust; Metacognition is a Way Forward Y.-S. Chuang, Y. Xie, H. Luo, Y. Kim, J. R. Glass, and P. He. Dola: Decoding by contrasting layers improves factuality in large language models. InThe Twelfth International Conference on Learning Representations. R. Cohen, M. Hamri, M. Geva, and A. Globerson. Lm vs lm: Detecting factual err...

2023
[4]

Chain-of-verification reduces hallucination in large language models

S.Dhuliawala, M.Komeili, J.Xu, R.Raileanu, X.Li, A.Celikyilmaz, andJ.Weston. Chain-of-verification reduces hallucination in large language models. InFindings of the association for computational linguistics: ACL 2024, pages 3563–3578,

2024
[5]

Eikema, E

B. Eikema, E. Ilia, J. G. de Souza, C. Zerva, and W. Aziz. Teaching language models to faithfully express their uncertainty.arXiv preprint arXiv:2510.12587,

work page arXiv
[6]

Eisenstein, R

J. Eisenstein, R. Aghajani, A. Fisch, D. Dua, F. Huot, M. Lapata, V. Zayats, and J. Berant. Don’t lie to your friends: Learning what you know from collaborative self-play.arXiv preprint arXiv:2503.14481,

work page arXiv
[7]

Gekhman, G

Z. Gekhman, G. Yona, R. Aharoni, M. Eyal, A. Feder, R. Reichart, and J. Herzig. Does fine-tuning llms on new knowledge encourage hallucinations? InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7765–7784,

2024
[8]

Gekhman, E

Z. Gekhman, E. B. David, H. Orgad, E. Ofek, Y. Belinkov, I. Szpektor, J. Herzig, and R. Reichart. Inside-out: Hidden factual knowledge in llms.arXiv preprint arXiv:2503.15299,

work page arXiv
[9]

Ghafouri, S

B. Ghafouri, S. Mohammadzadeh, J. Zhou, P. Nair, J.-J. Tian, H. Tsujimura, M. Goel, S. Krishna, R. Rabbany, J.-F. Godbout, et al. Epistemic integrity in large language models.arXiv preprint arXiv:2411.06528,

work page arXiv
[10]

The Llama 3 Herd of Models

A.Grattafiori, A.Dubey, A.Jauhri, A.Pandey, A.Kadian, A.Al-Dahle, A.Letman, A.Mathur, A.Schelten, A. Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

L. Haas, G. Yona, G. D’Antonio, S. Goldshtein, and D. Das. Simpleqa verified: A reliable factuality benchmark to measure parametric knowledge.arXiv preprint arXiv:2509.07968,

work page arXiv
[12]

Rewardingtheunlikely: Liftinggrpobeyonddistributionsharpening

A.W.He,D.Fried,andS.Welleck. Rewardingtheunlikely: Liftinggrpobeyonddistributionsharpening. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25559–25571,

2025
[13]

URLhttps://arxiv.org/abs/2511. 13029. A. Jaech, A. Kalai, A. Lerer, A. Richardson, A. El-Kishky, A. Low, A. Helyar, A. Madry, A. Beutel, A. Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Z. Ji, L. Yu, Y. Koishekenov, Y. Bang, A. Hartshorn, A. Schelten, C. Zhang, P. Fung, and N. Can- cedda. Calibrating verbal uncertainty as a linear feature to reduce hallucinations.arXiv preprint arXiv:2503.14477,

work page arXiv
[15]

Training llms for honesty via confessions, 2025

M. Joglekar, J. Chen, G. Wu, J. Yosinski, J. Wang, B. Barak, and A. Glaese. Training llms for honesty via confessions.arXiv preprint arXiv:2512.08093,

work page arXiv
[16]

Language Models (Mostly) Know What They Know

S. Kadavath, T. Conerly, A. Askell, T. Henighan, D. Drain, E. Perez, N. Schiefer, Z. Hatfield-Dodds, N. DasSarma, E. Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221,

work page internal anchor Pith review arXiv
[17]

S. Kang, Y. F. Bakman, D. N. Yaldiz, B. Buyukates, and S. Avestimehr. Uncertainty quantification for hallucination detection in large language models: Foundations, methodology, and future directions. arXiv preprint arXiv:2510.12040,

work page arXiv
[18]

Why Fine-Tuning Encourages Hallucinations and How to Fix It

G. Kaplan, Z. Gekhman, Z. Zhu, L. Rozner, Y. Reif, S. Swayamdipta, D. Hoiem, and R. Schwartz. Why fine-tuning encourages hallucinations and how to fix it.arXiv preprint arXiv:2604.15574,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

i’mnotsure, but

S.S.Kim, Q.V.Liao, M.Vorvoreanu, S.Ballard, andJ.W.Vaughan. "i’mnotsure, but...": Examiningthe impact of large language models’ uncertainty expression on user reliance and trust. InProceedings of the 2024 ACM conference on fairness, accountability, and transparency, pages 822–835,

2024
[20]

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions,

16 Hallucinations Undermine Trust; Metacognition is a Way Forward P. Kirichenko, M. Ibrahim, K. Chaudhuri, and S. J. Bell. Abstentionbench: Reasoning llms fail on unanswerable questions.arXiv preprint arXiv:2506.09038,

work page arXiv
[21]

B. A. Levinstein and D. A. Herrmann. Still no lie detector for language models: Probing empirical and conceptual roadblocks.arXiv preprint arXiv:2307.00175,

work page arXiv
[22]

B. Z. Li, Z. C. Guo, V. Huang, J. Steinhardt, and J. Andreas. Training language models to explain their own computations.arXiv preprint arXiv:2511.08579, 2025a. D. Li, A. S. Rawat, M. Zaheer, X. Wang, M. Lukasik, A. Veit, F. Yu, and S. Kumar. Large language models with controllable working memory. InFindings of the association for computational linguistic...

work page arXiv 2023
[23]

K. Li, O. Patel, F. Viégas, H. Pfister, and M. Wattenberg. Inference-time intervention: Eliciting truthful answers from a language model.Advances in Neural Information Processing Systems, 36: 41451–41530, 2023b. P. Li, M. Skripkin, A. Zubrey, A. Kuznetsov, and I. Oseledets. Confidence is all you need: Few-shot rl fine-tuning of language models.arXiv prepr...

work page arXiv
[24]

S. Lin, J. Hilton, and O. Evans. Teaching models to express their uncertainty in words.arXiv preprint arXiv:2205.14334,

work page arXiv
[25]

Lin, W.-L

T.-H. Lin, W.-L. Chen, C.-A. Li, H.-y. Lee, Y.-N. Chen, and Y. Meng. Adasearch: Balancing paramet- ric knowledge and search in large language models via reinforcement learning.arXiv preprint arXiv:2512.16883,

work page arXiv
[26]

URLhttps://transformer-circuits.pub/2025/introspection/index.html. G. K.-M. Liu, G. Yona, A. Caciularu, I. Szpektor, T. G. Rudner, and A. Cohan. Metafaith: Faithful natural language uncertainty expression in llms.arXiv preprint arXiv:2505.24858,

work page arXiv 2025
[27]

K. Liu, S. Casper, D. Hadfield-Menell, and J. Andreas. Cognitive dissonance: Why do language model outputs disagree with internal representations of truthfulness? InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4791–4797,

2023
[28]

WebGPT: Browser-assisted question-answering with human feedback

R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saun- ders, et al. Webgpt: Browser-assisted question-answering with human feedback.arXiv preprint arXiv:2112.09332,

work page internal anchor Pith review arXiv
[29]

Is there a {object} in the image?

P. Nakkiran, A. Bradley, A. Goliński, E. Ndiaye, M. Kirchhof, and S. Williamson. Trained on tokens, cali- brated on concepts: The emergence of semantic calibration in llms.arXiv preprint arXiv:2511.04869,

work page arXiv
[30]

Petroni, T

F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller. Language models as knowledge bases? InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 2463–2473,

2019
[31]

arXiv preprint arXiv:2005.04611 , year=

F. Petroni, P. Lewis, A. Piktus, T. Rocktäschel, Y. Wu, A. H. Miller, and S. Riedel. How context affects language models’ factual predictions.arXiv preprint arXiv:2005.04611,

work page arXiv 2005
[32]

Podolak and R

J. Podolak and R. Verma. Read your own mind: Reasoning helps surface self-confidence signals in llms.arXiv preprint arXiv:2505.23845,

work page arXiv
[33]

arXiv preprint arXiv:2505.22660

M. Prabhudesai, L. Chen, A. Ippoliti, K. Fragkiadaki, H. Liu, and D. Pathak. Maximizing confidence alone improves reasoning.arXiv preprint arXiv:2505.22660,

work page arXiv
[34]

C. Qian, E. C. Acikgoz, H. Wang, X. Chen, A. Sil, D. Hakkani-Tur, G. Tur, and H. Ji. Smart: Self-aware agent for tool overuse mitigation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 4604–4621,

2025
[35]

Towards a science of ai agent reliability, 2026

S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, and A. Narayanan. Towards a science of ai agent reliability.arXiv preprint arXiv:2602.16666,

work page arXiv
[36]

arXiv preprint arXiv:2510.15804 , year=

S. Ravfogel, G. Yehudai, T. Linzen, J. Bruna, and A. Bietti. Emergence of linear truth encodings in language models.arXiv preprint arXiv:2510.15804,

work page arXiv
[37]

Roberts, C

A. Roberts, C. Raffel, and N. Shazeer. How much knowledge can you pack into the parameters of a language model? InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426,

2020
[38]

Trustingyourevidence: Hallucinate less with context-aware decoding

W.Shi,X.Han,M.Lewis,Y.Tsvetkov,L.Zettlemoyer,andW.-t.Yih. Trustingyourevidence: Hallucinate less with context-aware decoding. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 783–791,

2024
[39]

Simhi, J

A. Simhi, J. Herzig, I. Itzhak, D. Arad, Z. Gekhman, R. Reichart, F. Barez, G. Stanovsky, I. Szpek- tor, and Y. Belinkov. Hack: Hallucinations along certainty and knowledge axes.arXiv preprint arXiv:2510.24222,

work page arXiv
[40]

C.-W. Sky, B. Van Durme, J. Eisner, and C. Kedzie. Do androids know they’re only dreaming of electric sheep? InFindings of the Association for Computational Linguistics: ACL 2024, pages 4401–4420,

2024
[41]

S. Song, H. Lederman, J. Hu, and K. Mahowald. Privileged self-access matters for introspection in ai. arXiv preprint arXiv:2508.14802, 2025a. Y. Song, J. Kempe, and R. Munos. Outcome-based exploration for llm reasoning.arXiv preprint arXiv:2509.06941, 2025b. E. Stengel-Eskin, P. Hase, and M. Bansal. Lacie: Listener-aware finetuning for calibration in larg...

work page arXiv
[42]

Revisiting Uncertainty Esti- mation and Calibration of Large Language Models

L. Tao, Y.-F. Yeh, M. Dong, T. Huang, P. Torr, and C. Xu. Revisiting uncertainty estimation and calibration of large language models.arXiv preprint arXiv:2505.23854,

work page arXiv
[43]

arXiv preprint arXiv:2502.06233 , year=

A. Taubenfeld, T. Sheffer, E. Ofek, A. Feder, A. Goldstein, Z. Gekhman, and G. Yona. Confidence improves self-consistency in llms.arXiv preprint arXiv:2502.06233,

work page arXiv
[44]

K. Tian, E. Mitchell, H. Yao, C. D. Manning, and C. Finn. Fine-tuning language models for factuality. InThe Twelfth International Conference on Learning Representations, 2023a. K. Tian, E. Mitchell, A. Zhou, A. Sharma, R. Rafailov, H. Yao, C. Finn, and C. D. Manning. Just ask for calibration: Strategies for eliciting calibrated confidence scores from lang...

work page arXiv 2023
[45]

S. Wang, Y. Dong, R. Chang, T. Zhu, Y. Sun, K. Lyu, and J. Li. When bias pretends to be truth: How spurious correlations undermine hallucination detection in llms.arXiv preprint arXiv:2511.07318, 2025b. J. Wei, N. Karina, H. W. Chung, Y. J. Jiao, S. Papay, A. Glaese, J. Schulman, and W. Fedus. Measuring short-form factuality in large language models.arXiv...

work page arXiv
[46]

Z. Xu, S. Jain, and M. Kankanhalli. Hallucination is inevitable: An innate limitation of large language models.arXiv preprint arXiv:2401.11817,

work page arXiv
[47]

S. Yan, J. Tong, H. Xue, X. Tang, Y. Wang, K. Shi, G. Zhang, R. Li, and Y. Zou. Act wisely: Cultivating meta-cognitive tool use in agentic multimodal models.arXiv preprint arXiv:2604.08545,

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Z. Yao, Y. Liu, Y. Chen, J. Chen, J. Fang, L. Hou, J. Li, and T.-S. Chua. Are reasoning models more prone to hallucination?arXiv preprint arXiv:2505.23646,

work page arXiv
[49]

G. Yona, R. Aharoni, and M. Geva. Can large language models faithfully express their intrinsic uncertainty in words? InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7752–7764,

2024
[50]

G. Yona, O. Honovich, O. Levy, and R. Aharoni. Keep guessing? when considering inference scaling, mind the baselines. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 5979–5991,

2025
[51]

D. Yoon, S. Kim, S. Yang, S. Kim, S. Kim, Y. Kim, E. Choi, Y. Kim, and M. Seo. Reasoning models better express their confidence.arXiv preprint arXiv:2505.14489,

work page arXiv
[52]

L. Yu, M. Cao, J. C. Cheung, and Y. Dong. Mechanistic understanding and mitigation of language model non-factual hallucinations. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 7943–7956,

2024
[53]

Zhang, F

Q.-W. Zhang, F. Li, J. Wang, L. Qiao, Y. Yu, D. Yin, and X. Sun. Factguard: Leveraging multi- agent systems to generate answerable and unanswerable questions for enhanced long-context llm extraction.arXiv preprint arXiv:2504.05607, 2025a. 20 Hallucinations Undermine Trust; Metacognition is a Way Forward Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. H...

work page arXiv 2024
[54]

C. Zhu, B. Xu, Q. Wang, Y. Zhang, and Z. Mao. On the calibration of large language models and alignment. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 9778– 9795,

2023
[55]

designed to reproduce the empirical confidence profiles reported by Nakkiran et al. (2025). We fixed a base hallucination rate of 25%. Confidence scores for correct answers (𝑦=

2025
[56]

When was Barack Obama born?

were sampled from Beta distributions,Beta(𝛼, 𝛽) , chosen to model overlapping confidence profiles typical of modern LLMs. Specifically, we sampled correct scores fromBeta( 1.8, 1.0) (skewed toward high confidence) and incorrect scores fromBeta( 1.0, 1.3) (skewed toward low confidence). To isolate discriminative power as the limiting factor, we applied Iso...

2024