Limitations on Accurate, Trusted, Human-level Reasoning
Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3
The pith
An accurate and trusted AI system cannot achieve human-level reasoning on all tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that for our formal definitions of these notions an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to Gödel's incompleteness theorems and Turing's proof of the undecid
What carries the argument
The separation of intrinsic accuracy from the epistemic status of trust, which enables a self-referential diagonal argument to construct unsolvable task instances for the system.
If this is right
- A trusted accurate system must abstain from answering on certain self-referential tasks that humans solve directly.
- Full human-level reasoning capability requires the system to risk false claims or lose the trust assumption.
- Trust in an AI system implies acceptance that it will not solve every problem a human can solve.
- Achieving all three properties simultaneously is impossible under the given definitions.
Where Pith is reading between the lines
- Practical AI development may require relaxing at least one of the three properties to reach broad reasoning performance.
- Similar diagonal arguments could limit other combinations of reliability and capability properties in AI.
- The result suggests that verification of AI outputs will remain necessary even for highly trusted systems.
- Extensions might examine whether weaker notions of trust allow human-level performance on most but not all tasks.
Load-bearing premise
The chosen formal definitions of accuracy as abstention-enabled truthfulness, trust as the assumption of accuracy, and human-level reasoning as universal superiority or equality to human performance are the correct ones for the concepts.
What would settle it
Construct an explicit self-referential task that refers to the system's own accuracy behavior on that task and verify whether a human can solve it while the system either abstains or produces an incorrect output.
Figures
read the original abstract
We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results. Key to our proof is the formalization of the notion of trust, which allows us to separate the intrinsic property of a system (being accurate) from its epistemic status (being trusted).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript defines accuracy as never making false claims (with an abstention option), trust as the epistemic assumption that the system is accurate, and human-level reasoning as always matching or exceeding human capability on tasks. It claims that, under these definitions, no accurate and trusted AI system can be a human-level reasoner, because there exist task instances that are easily and provably solvable by a human but not by the system. The argument is presented as an interpretation of Gödel incompleteness and Turing undecidability, with the formalization of trust used to separate intrinsic accuracy from its epistemic status.
Significance. If the central claim holds under the stated formalizations, the work supplies a mathematical argument that accuracy, trust, and human-level reasoning are mutually incompatible for AI systems. It offers a direct parallel between classical computability limits and AI capability claims, which could inform expectations for systems that must abstain on uncertain inputs while remaining trusted.
major comments (2)
- [Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.
- [Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.
minor comments (2)
- [Abstract and introduction] The abstract states that the proofs 'can be regarded as interpretations' of Gödel and Turing; a short paragraph in the introduction or conclusion clarifying exactly which classical lemmas are being re-used and which new definitions are required would improve readability.
- [Section introducing formal definitions] Notation for the abstention option and the trust assumption could be introduced with a single displayed equation or definition box rather than inline prose, to make the separation between intrinsic accuracy and epistemic trust easier to track.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important points about the formal assumptions in the diagonalization argument and the characterization of human solvability. We address each major comment below and indicate the revisions we will make to clarify and strengthen the presentation.
read point-by-point responses
-
Referee: [Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.
Authors: We agree that the diagonalization step presupposes sufficient representational capacity in the input language to encode self-referential statements about the system's own accuracy and abstention. The manuscript draws an explicit parallel to Gödel's incompleteness theorems, where self-reference is achieved via arithmetization or quotation; we intend the same abstraction here for any system whose inputs can express statements about its decision procedure. To make this assumption explicit and to address applicability to typical AI input formats, we will add a short lemma in the revised version that states the minimal encoding conditions required for the construction to go through. This lemma will clarify that the argument applies to systems whose input languages support the necessary self-reference, consistent with the trust and accuracy definitions. revision: yes
-
Referee: [Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.
Authors: We thank the referee for noting the need for greater precision here. The human meta-reasoning step relies on the fact that a human reasoner is external to the system's decision procedure and can therefore apply the definitions of accuracy and trust directly to deduce that the system must abstain on the constructed task instance, while the human can still determine the correct answer by logical inspection of the definitions. To strengthen this part of the argument, we will expand the relevant section with an explicit outline of the human reasoning steps, showing how the trust assumption plus the system's forced abstention yields the solution without requiring the human to operate under the same input-representation constraints that apply to the AI system. revision: yes
Circularity Check
No significant circularity; relies on external theorems and novel definitions
full rationale
The paper defines accuracy (never false claims, with abstention), trust (epistemic assumption of accuracy), and human-level reasoning (matching or exceeding human capability) as new formal notions. It then constructs a diagonalization argument paralleling Gödel incompleteness and Turing undecidability to exhibit tasks solvable by humans but not the system. These steps invoke classical external results rather than reducing to self-referential equations, fitted parameters, or load-bearing self-citations within the paper. The derivation is self-contained against independent mathematical benchmarks and does not collapse by construction to its inputs.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Accuracy is defined as the property that the system never makes false claims when it has the ability to abstain from making a prediction on any input.
- domain assumption Trust is the assumption that the system is accurate.
- domain assumption Human-level reasoning is the property that the AI system always matches or exceeds human capability.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
If an AI system is safe and trusted, then it cannot be an AGI system... proofs draw parallels to Gödel’s incompleteness theorems and Turing’s proof of the undecidability of the halting problem
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat induction and recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Definition 1.2 (Safety). We define a system to be safe if it does not make any false claims
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
The Accountability Incompleteness Theorem demonstrates that human-AI collectives above the Accountability Horizon with feedback cycles cannot simultaneously meet attributability, foreseeability, non-vacuity, and compl...
Reference graph
Works this paper leans on
-
[1]
Large language models for education: A survey and outlook
Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S Yu, and Qingsong Wen. Large language models for education: A survey and outlook.arXiv preprint arXiv:2403.18105,
-
[2]
Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in AI safety.arXiv preprint arXiv:1606.06565,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI
Alon Jacovi, Ana Marasovi´ c, Tim Miller, and Yoav Goldberg. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 624–635,
work page 2021
-
[4]
11 Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI.arXiv preprint arXiv:2309.01933,
-
[5]
Accessed: 2025- 07-21. Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh. Diversifying AI: Towards creative chess with AlphaZero. arXiv preprint arXiv:2308.09175,
-
[6]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,
work page internal anchor Pith review Pith/arXiv arXiv
- [7]
-
[8]
Alan M. Turing. Intelligent machinery, a heretical theory. https://uberty.org/wp-content/uploads/ 2015/02/intelligent-machinery-a-heretical-theory.pdf,
work page 2015
-
[9]
Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society
Accessed: 2025-07-21. Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Houghton Mifflin, Boston,
work page 2025
-
[10]
https://futureoflife.org/wp-content/uploads/2024/ 12/AI-Safety-Index-2024-Full-Report-27-May-25.pdf,
work page 2024
-
[11]
Accessed: 2025-07-23. The Center for AI Safety. Statement on AI risk.https://aistatement.com/,
work page 2025
-
[12]
Accessed: 2025-07-23. Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al. Managing extreme AI risks amid rapid progress. Science, 384(6698):842–845,
work page 2025
-
[13]
Ai safety in generative ai large language models: A survey
12 Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, and Lina Yao. AI safety in generative AI large language models: A survey.arXiv preprint arXiv:2407.18369,
-
[14]
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions
Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, and Kwok-Yan Lam. Trustworthy, responsible, and safe AI: A comprehensive architectural framework for AI safety with challenges and mitigations.arXiv preprint arXiv:2408.12935,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
arXiv preprint arXiv:2501.17805
Yoshua Bengio, S¨ oren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, et al. International AI safety report.arXiv preprint arXiv:2501.17805,
-
[16]
Wikipedia.https://en.wikipedia.org/wiki/Penrose%E2%80%93Lucas_argument. Accessed: 2025-07-21. David J Chalmers. Minds, machines, and mathematics.Psyche, 2(9):117–18,
work page 2025
-
[17]
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models.arXiv preprint arXiv:2401.11817,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Throughout the proof we assume A is well-behaved, i.e. it al- ways terminates. Our construction involves a program which does not take any input, i.e. I = ϕ. The program involves identifying whether the probability p of A outputting ‘halts’ when given G¨ odelprogram random as input is greater than 0.5 or not. We use a simple best arm identification proced...
work page 2013
-
[19]
While we can use any suitable multi-armed bandit algorithm in our construction, here we use Karnin et al. [2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls. This bound is known to be optimal [Jamieson et al., 2014], though in our c...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.