Limitations on Accurate, Trusted, Human-level Reasoning

Rina Panigrahy; Vatsal Sharan

arxiv: 2509.21654 · v2 · submitted 2025-09-25 · 💻 cs.LG · cs.AI· cs.CC

Limitations on Accurate, Trusted, Human-level Reasoning

Rina Panigrahy , Vatsal Sharan This is my paper

Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CC

keywords AI accuracy and trusthuman-level reasoningself-referential tasksGödel incompletenessTuring halting problemepistemic assumptions in AIlimitations of trusted AI

0 comments

The pith

An accurate and trusted AI system cannot achieve human-level reasoning on all tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that accuracy, trust, and human-level reasoning are incompatible under strict formal definitions. Accuracy requires that the system never outputs a false claim and may abstain from answering. Trust is the assumption that the system meets this accuracy standard. Human-level reasoning means the system always matches or exceeds what a human can do on any given task. If a system is both accurate and trusted, the definitions force the existence of task instances that humans can solve but the system cannot handle without either abstaining or risking an error. This result follows from a self-referential construction that parallels Gödel's incompleteness theorems and Turing's halting problem proof. Readers care because it identifies a structural limit on building AI that people can rely on for arbitrary reasoning problems.

Core claim

We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that for our formal definitions of these notions an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to Gödel's incompleteness theorems and Turing's proof of the undecid

What carries the argument

The separation of intrinsic accuracy from the epistemic status of trust, which enables a self-referential diagonal argument to construct unsolvable task instances for the system.

If this is right

A trusted accurate system must abstain from answering on certain self-referential tasks that humans solve directly.
Full human-level reasoning capability requires the system to risk false claims or lose the trust assumption.
Trust in an AI system implies acceptance that it will not solve every problem a human can solve.
Achieving all three properties simultaneously is impossible under the given definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical AI development may require relaxing at least one of the three properties to reach broad reasoning performance.
Similar diagonal arguments could limit other combinations of reliability and capability properties in AI.
The result suggests that verification of AI outputs will remain necessary even for highly trusted systems.
Extensions might examine whether weaker notions of trust allow human-level performance on most but not all tasks.

Load-bearing premise

The chosen formal definitions of accuracy as abstention-enabled truthfulness, trust as the assumption of accuracy, and human-level reasoning as universal superiority or equality to human performance are the correct ones for the concepts.

What would settle it

Construct an explicit self-referential task that refers to the system's own accuracy behavior on that task and verify whether a human can solve it while the system either abstains or produces an incorrect output.

Figures

Figures reproduced from arXiv: 2509.21654 by Rina Panigrahy, Vatsal Sharan.

read the original abstract

We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results. Key to our proof is the formalization of the notion of trust, which allows us to separate the intrinsic property of a system (being accurate) from its epistemic status (being trusted).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper cleanly separates accuracy from trust to derive a Gödel-style limit on trusted human-level AI reasoning, but the result's reach depends on strong assumptions about task representation.

read the letter

The main point is that under their definitions an accurate trusted AI cannot reach human-level reasoning: there exist tasks humans can solve easily and provably that force the system into a contradiction or abstention. They define accuracy as never outputting false claims when abstention is allowed, trust as the assumption that the system meets that standard, and human-level reasoning as matching or beating human performance on every task. The separation between the system's intrinsic accuracy and its epistemic status as trusted is the useful new move here, and it lets them treat the incompatibility as an interpretation of Gödel and Turing rather than a direct copy of those theorems. The argument stays grounded in external classical results without introducing fitted parameters or self-referential equations that beg the question. That keeps the circularity burden low and the structure consistent at the level of the abstract. The soft spot is the representational requirement for the diagonalization. The construction needs a task T that the system can recognize as referring to its own accuracy on T itself. For systems whose inputs are natural language or unstructured data without explicit syntax for quoting their own decision procedure, it is not clear the self-reference goes through while still letting a human solve it from the definitions alone. The paper would be stronger if it spelled out how typical AI input languages or internal models satisfy this condition. This work is aimed at researchers thinking about formal limits on reliable AI and safety-critical systems. Readers who care about foundations will find the definitions and the incompatibility result worth discussing. It deserves a serious referee to check the full proof steps and the scope of the representational assumptions.

Referee Report

2 major / 2 minor

Summary. The manuscript defines accuracy as never making false claims (with an abstention option), trust as the epistemic assumption that the system is accurate, and human-level reasoning as always matching or exceeding human capability on tasks. It claims that, under these definitions, no accurate and trusted AI system can be a human-level reasoner, because there exist task instances that are easily and provably solvable by a human but not by the system. The argument is presented as an interpretation of Gödel incompleteness and Turing undecidability, with the formalization of trust used to separate intrinsic accuracy from its epistemic status.

Significance. If the central claim holds under the stated formalizations, the work supplies a mathematical argument that accuracy, trust, and human-level reasoning are mutually incompatible for AI systems. It offers a direct parallel between classical computability limits and AI capability claims, which could inform expectations for systems that must abstain on uncertain inputs while remaining trusted.

major comments (2)

[Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.
[Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.

minor comments (2)

[Abstract and introduction] The abstract states that the proofs 'can be regarded as interpretations' of Gödel and Turing; a short paragraph in the introduction or conclusion clarifying exactly which classical lemmas are being re-used and which new definitions are required would improve readability.
[Section introducing formal definitions] Notation for the abstention option and the trust assumption could be introduced with a single displayed equation or definition box rather than inline prose, to make the separation between intrinsic accuracy and epistemic trust easier to track.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important points about the formal assumptions in the diagonalization argument and the characterization of human solvability. We address each major comment below and indicate the revisions we will make to clarify and strengthen the presentation.

read point-by-point responses

Referee: [Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.

Authors: We agree that the diagonalization step presupposes sufficient representational capacity in the input language to encode self-referential statements about the system's own accuracy and abstention. The manuscript draws an explicit parallel to Gödel's incompleteness theorems, where self-reference is achieved via arithmetization or quotation; we intend the same abstraction here for any system whose inputs can express statements about its decision procedure. To make this assumption explicit and to address applicability to typical AI input formats, we will add a short lemma in the revised version that states the minimal encoding conditions required for the construction to go through. This lemma will clarify that the argument applies to systems whose input languages support the necessary self-reference, consistent with the trust and accuracy definitions. revision: yes
Referee: [Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.

Authors: We thank the referee for noting the need for greater precision here. The human meta-reasoning step relies on the fact that a human reasoner is external to the system's decision procedure and can therefore apply the definitions of accuracy and trust directly to deduce that the system must abstain on the constructed task instance, while the human can still determine the correct answer by logical inspection of the definitions. To strengthen this part of the argument, we will expand the relevant section with an explicit outline of the human reasoning steps, showing how the trust assumption plus the system's forced abstention yields the solution without requiring the human to operate under the same input-representation constraints that apply to the AI system. revision: yes

Circularity Check

0 steps flagged

No significant circularity; relies on external theorems and novel definitions

full rationale

The paper defines accuracy (never false claims, with abstention), trust (epistemic assumption of accuracy), and human-level reasoning (matching or exceeding human capability) as new formal notions. It then constructs a diagonalization argument paralleling Gödel incompleteness and Turing undecidability to exhibit tasks solvable by humans but not the system. These steps invoke classical external results rather than reducing to self-referential equations, fitted parameters, or load-bearing self-citations within the paper. The derivation is self-contained against independent mathematical benchmarks and does not collapse by construction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The paper introduces three domain-specific formal definitions as the foundation for the proof and relies on standard background results from mathematical logic.

axioms (3)

domain assumption Accuracy is defined as the property that the system never makes false claims when it has the ability to abstain from making a prediction on any input.
This definition is introduced to formalize the notion of accuracy in the AI context.
domain assumption Trust is the assumption that the system is accurate.
This separates the intrinsic property from its epistemic status as stated in the abstract.
domain assumption Human-level reasoning is the property that the AI system always matches or exceeds human capability.
This definition is used to state the target property that leads to the incompatibility.

pith-pipeline@v0.9.0 · 5732 in / 1513 out tokens · 23985 ms · 2026-05-18T13:37:17.008067+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

If an AI system is safe and trusted, then it cannot be an AGI system... proofs draw parallels to Gödel’s incompleteness theorems and Turing’s proof of the undecidability of the halting problem
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 1.2 (Safety). We define a system to be safe if it does not make any false claims

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
cs.AI 2026-04 unverdicted novelty 7.0

The Accountability Incompleteness Theorem demonstrates that human-AI collectives above the Accountability Horizon with feedback cycles cannot simultaneously meet attributability, foreseeability, non-vacuity, and compl...

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Large language models for education: A survey and outlook

Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S Yu, and Qingsong Wen. Large language models for education: A survey and outlook.arXiv preprint arXiv:2403.18105,

work page arXiv
[2]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in AI safety.arXiv preprint arXiv:1606.06565,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI

Alon Jacovi, Ana Marasovi´ c, Tim Miller, and Yoav Goldberg. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 624–635,

work page 2021
[4]

& Omohundro, S

11 Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI.arXiv preprint arXiv:2309.01933,

work page arXiv
[5]

Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh

Accessed: 2025- 07-21. Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh. Diversifying AI: Towards creative chess with AlphaZero. arXiv preprint arXiv:2308.09175,

work page arXiv 2025
[6]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

arXiv preprint arXiv:2504.07139,

work page arXiv
[8]

Alan M. Turing. Intelligent machinery, a heretical theory. https://uberty.org/wp-content/uploads/ 2015/02/intelligent-machinery-a-heretical-theory.pdf,

work page 2015
[9]

Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society

Accessed: 2025-07-21. Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Houghton Mifflin, Boston,

work page 2025
[10]

https://futureoflife.org/wp-content/uploads/2024/ 12/AI-Safety-Index-2024-Full-Report-27-May-25.pdf,

work page 2024
[11]

The Center for AI Safety

Accessed: 2025-07-23. The Center for AI Safety. Statement on AI risk.https://aistatement.com/,

work page 2025
[12]

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al

Accessed: 2025-07-23. Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al. Managing extreme AI risks amid rapid progress. Science, 384(6698):842–845,

work page 2025
[13]

Ai safety in generative ai large language models: A survey

12 Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, and Lina Yao. AI safety in generative AI large language models: A survey.arXiv preprint arXiv:2407.18369,

work page arXiv
[14]

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, and Kwok-Yan Lam. Trustworthy, responsible, and safe AI: A comprehensive architectural framework for AI safety with challenges and mitigations.arXiv preprint arXiv:2408.12935,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2501.17805

Yoshua Bengio, S¨ oren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, et al. International AI safety report.arXiv preprint arXiv:2501.17805,

work page arXiv
[16]

Accessed: 2025-07-21

Wikipedia.https://en.wikipedia.org/wiki/Penrose%E2%80%93Lucas_argument. Accessed: 2025-07-21. David J Chalmers. Minds, machines, and mathematics.Psyche, 2(9):117–18,

work page 2025
[17]

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models.arXiv preprint arXiv:2401.11817,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

it al- ways terminates

Throughout the proof we assume A is well-behaved, i.e. it al- ways terminates. Our construction involves a program which does not take any input, i.e. I = ϕ. The program involves identifying whether the probability p of A outputting ‘halts’ when given G¨ odelprogram random as input is greater than 0.5 or not. We use a simple best arm identification proced...

work page 2013
[19]

[2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls

While we can use any suitable multi-armed bandit algorithm in our construction, here we use Karnin et al. [2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls. This bound is known to be optimal [Jamieson et al., 2014], though in our c...

work page 2013

[1] [1]

Large language models for education: A survey and outlook

Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S Yu, and Qingsong Wen. Large language models for education: A survey and outlook.arXiv preprint arXiv:2403.18105,

work page arXiv

[2] [2]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in AI safety.arXiv preprint arXiv:1606.06565,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI

Alon Jacovi, Ana Marasovi´ c, Tim Miller, and Yoav Goldberg. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 624–635,

work page 2021

[4] [4]

& Omohundro, S

11 Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI.arXiv preprint arXiv:2309.01933,

work page arXiv

[5] [5]

Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh

Accessed: 2025- 07-21. Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh. Diversifying AI: Towards creative chess with AlphaZero. arXiv preprint arXiv:2308.09175,

work page arXiv 2025

[6] [6]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

arXiv preprint arXiv:2504.07139,

work page arXiv

[8] [8]

Alan M. Turing. Intelligent machinery, a heretical theory. https://uberty.org/wp-content/uploads/ 2015/02/intelligent-machinery-a-heretical-theory.pdf,

work page 2015

[9] [9]

Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society

Accessed: 2025-07-21. Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Houghton Mifflin, Boston,

work page 2025

[10] [10]

https://futureoflife.org/wp-content/uploads/2024/ 12/AI-Safety-Index-2024-Full-Report-27-May-25.pdf,

work page 2024

[11] [11]

The Center for AI Safety

Accessed: 2025-07-23. The Center for AI Safety. Statement on AI risk.https://aistatement.com/,

work page 2025

[12] [12]

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al

Accessed: 2025-07-23. Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al. Managing extreme AI risks amid rapid progress. Science, 384(6698):842–845,

work page 2025

[13] [13]

Ai safety in generative ai large language models: A survey

12 Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, and Lina Yao. AI safety in generative AI large language models: A survey.arXiv preprint arXiv:2407.18369,

work page arXiv

[14] [14]

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, and Kwok-Yan Lam. Trustworthy, responsible, and safe AI: A comprehensive architectural framework for AI safety with challenges and mitigations.arXiv preprint arXiv:2408.12935,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

arXiv preprint arXiv:2501.17805

Yoshua Bengio, S¨ oren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, et al. International AI safety report.arXiv preprint arXiv:2501.17805,

work page arXiv

[16] [16]

Accessed: 2025-07-21

Wikipedia.https://en.wikipedia.org/wiki/Penrose%E2%80%93Lucas_argument. Accessed: 2025-07-21. David J Chalmers. Minds, machines, and mathematics.Psyche, 2(9):117–18,

work page 2025

[17] [17]

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models.arXiv preprint arXiv:2401.11817,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

it al- ways terminates

Throughout the proof we assume A is well-behaved, i.e. it al- ways terminates. Our construction involves a program which does not take any input, i.e. I = ϕ. The program involves identifying whether the probability p of A outputting ‘halts’ when given G¨ odelprogram random as input is greater than 0.5 or not. We use a simple best arm identification proced...

work page 2013

[19] [19]

[2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls

While we can use any suitable multi-armed bandit algorithm in our construction, here we use Karnin et al. [2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls. This bound is known to be optimal [Jamieson et al., 2014], though in our c...

work page 2013