pith. sign in

arxiv: 2509.21654 · v2 · submitted 2025-09-25 · 💻 cs.LG · cs.AI· cs.CC

Limitations on Accurate, Trusted, Human-level Reasoning

Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CC
keywords AI accuracy and trusthuman-level reasoningself-referential tasksGödel incompletenessTuring halting problemepistemic assumptions in AIlimitations of trusted AI
0
0 comments X

The pith

An accurate and trusted AI system cannot achieve human-level reasoning on all tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that accuracy, trust, and human-level reasoning are incompatible under strict formal definitions. Accuracy requires that the system never outputs a false claim and may abstain from answering. Trust is the assumption that the system meets this accuracy standard. Human-level reasoning means the system always matches or exceeds what a human can do on any given task. If a system is both accurate and trusted, the definitions force the existence of task instances that humans can solve but the system cannot handle without either abstaining or risking an error. This result follows from a self-referential construction that parallels Gödel's incompleteness theorems and Turing's halting problem proof. Readers care because it identifies a structural limit on building AI that people can rely on for arbitrary reasoning problems.

Core claim

We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that for our formal definitions of these notions an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to Gödel's incompleteness theorems and Turing's proof of the undecid

What carries the argument

The separation of intrinsic accuracy from the epistemic status of trust, which enables a self-referential diagonal argument to construct unsolvable task instances for the system.

If this is right

  • A trusted accurate system must abstain from answering on certain self-referential tasks that humans solve directly.
  • Full human-level reasoning capability requires the system to risk false claims or lose the trust assumption.
  • Trust in an AI system implies acceptance that it will not solve every problem a human can solve.
  • Achieving all three properties simultaneously is impossible under the given definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical AI development may require relaxing at least one of the three properties to reach broad reasoning performance.
  • Similar diagonal arguments could limit other combinations of reliability and capability properties in AI.
  • The result suggests that verification of AI outputs will remain necessary even for highly trusted systems.
  • Extensions might examine whether weaker notions of trust allow human-level performance on most but not all tasks.

Load-bearing premise

The chosen formal definitions of accuracy as abstention-enabled truthfulness, trust as the assumption of accuracy, and human-level reasoning as universal superiority or equality to human performance are the correct ones for the concepts.

What would settle it

Construct an explicit self-referential task that refers to the system's own accuracy behavior on that task and verify whether a human can solve it while the system either abstains or produces an incorrect output.

Figures

Figures reproduced from arXiv: 2509.21654 by Rina Panigrahy, Vatsal Sharan.

Figure 1
Figure 1. Figure 1: Sketch of the basic argument for program verification, for the case when the AI system [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

We identify a fundamental incompatibility between the goals of accuracy, trust, and human-level reasoning in artificial intelligence (AI) systems, for strict mathematical definitions of these notions. We define accuracy of a system as the property that it never makes any false claims when it has the ability to abstain from making a prediction on any input, and trust as the assumption that the system is accurate. We define human-level reasoning as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- an accurate and trusted AI system cannot be a human-level reasoning system: for such an accurate, trusted system there are task instances which are easily and provably solvable by a human but not by the system. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results. Key to our proof is the formalization of the notion of trust, which allows us to separate the intrinsic property of a system (being accurate) from its epistemic status (being trusted).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript defines accuracy as never making false claims (with an abstention option), trust as the epistemic assumption that the system is accurate, and human-level reasoning as always matching or exceeding human capability on tasks. It claims that, under these definitions, no accurate and trusted AI system can be a human-level reasoner, because there exist task instances that are easily and provably solvable by a human but not by the system. The argument is presented as an interpretation of Gödel incompleteness and Turing undecidability, with the formalization of trust used to separate intrinsic accuracy from its epistemic status.

Significance. If the central claim holds under the stated formalizations, the work supplies a mathematical argument that accuracy, trust, and human-level reasoning are mutually incompatible for AI systems. It offers a direct parallel between classical computability limits and AI capability claims, which could inform expectations for systems that must abstain on uncertain inputs while remaining trusted.

major comments (2)
  1. [Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.
  2. [Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.
minor comments (2)
  1. [Abstract and introduction] The abstract states that the proofs 'can be regarded as interpretations' of Gödel and Turing; a short paragraph in the introduction or conclusion clarifying exactly which classical lemmas are being re-used and which new definitions are required would improve readability.
  2. [Section introducing formal definitions] Notation for the abstention option and the trust assumption could be introduced with a single displayed equation or definition box rather than inline prose, to make the separation between intrinsic accuracy and epistemic trust easier to track.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important points about the formal assumptions in the diagonalization argument and the characterization of human solvability. We address each major comment below and indicate the revisions we will make to clarify and strengthen the presentation.

read point-by-point responses
  1. Referee: [Proof of main theorem (diagonalization step)] The diagonalization construction of task T (described in the proof of the main result) requires that the system's input representation supports encoding and recognizing self-referential statements about its own accuracy and abstention behavior on T. The definitions of accuracy and trust alone do not establish that typical AI input languages (e.g., natural language without formal quoting mechanisms) possess this representational capacity; without an explicit lemma showing how arbitrary self-referential claims about the system's decision procedure are encodable, the contradiction does not necessarily follow for the systems the paper targets.

    Authors: We agree that the diagonalization step presupposes sufficient representational capacity in the input language to encode self-referential statements about the system's own accuracy and abstention. The manuscript draws an explicit parallel to Gödel's incompleteness theorems, where self-reference is achieved via arithmetization or quotation; we intend the same abstraction here for any system whose inputs can express statements about its decision procedure. To make this assumption explicit and to address applicability to typical AI input formats, we will add a short lemma in the revised version that states the minimal encoding conditions required for the construction to go through. This lemma will clarify that the argument applies to systems whose input languages support the necessary self-reference, consistent with the trust and accuracy definitions. revision: yes

  2. Referee: [Definition of human-level reasoning and the human-solvability clause] The claim that the tasks are 'easily and provably solvable by a human' rests on humans being able to meta-reason about the forced abstention from the definitions alone. This step needs a precise statement of the human's reasoning procedure and why it evades the same representational limitation that blocks the system; the current argument treats human solvability as immediate once the system abstains, but does not supply the formal bridge between the trust assumption and human meta-reasoning.

    Authors: We thank the referee for noting the need for greater precision here. The human meta-reasoning step relies on the fact that a human reasoner is external to the system's decision procedure and can therefore apply the definitions of accuracy and trust directly to deduce that the system must abstain on the constructed task instance, while the human can still determine the correct answer by logical inspection of the definitions. To strengthen this part of the argument, we will expand the relevant section with an explicit outline of the human reasoning steps, showing how the trust assumption plus the system's forced abstention yields the solution without requiring the human to operate under the same input-representation constraints that apply to the AI system. revision: yes

Circularity Check

0 steps flagged

No significant circularity; relies on external theorems and novel definitions

full rationale

The paper defines accuracy (never false claims, with abstention), trust (epistemic assumption of accuracy), and human-level reasoning (matching or exceeding human capability) as new formal notions. It then constructs a diagonalization argument paralleling Gödel incompleteness and Turing undecidability to exhibit tasks solvable by humans but not the system. These steps invoke classical external results rather than reducing to self-referential equations, fitted parameters, or load-bearing self-citations within the paper. The derivation is self-contained against independent mathematical benchmarks and does not collapse by construction to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The paper introduces three domain-specific formal definitions as the foundation for the proof and relies on standard background results from mathematical logic.

axioms (3)
  • domain assumption Accuracy is defined as the property that the system never makes false claims when it has the ability to abstain from making a prediction on any input.
    This definition is introduced to formalize the notion of accuracy in the AI context.
  • domain assumption Trust is the assumption that the system is accurate.
    This separates the intrinsic property from its epistemic status as stated in the abstract.
  • domain assumption Human-level reasoning is the property that the AI system always matches or exceeds human capability.
    This definition is used to state the target property that leads to the incompatibility.

pith-pipeline@v0.9.0 · 5732 in / 1513 out tokens · 23985 ms · 2026-05-18T13:37:17.008067+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives

    cs.AI 2026-04 unverdicted novelty 7.0

    The Accountability Incompleteness Theorem demonstrates that human-AI collectives above the Accountability Horizon with feedback cycles cannot simultaneously meet attributability, foreseeability, non-vacuity, and compl...

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    Large language models for education: A survey and outlook

    Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S Yu, and Qingsong Wen. Large language models for education: A survey and outlook.arXiv preprint arXiv:2403.18105,

  2. [2]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in AI safety.arXiv preprint arXiv:1606.06565,

  3. [3]

    Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI

    Alon Jacovi, Ana Marasovi´ c, Tim Miller, and Yoav Goldberg. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 624–635,

  4. [4]

    & Omohundro, S

    11 Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI.arXiv preprint arXiv:2309.01933,

  5. [5]

    Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh

    Accessed: 2025- 07-21. Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, and Satinder Singh. Diversifying AI: Towards creative chess with AlphaZero. arXiv preprint arXiv:2308.09175,

  6. [6]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

  7. [7]

    arXiv preprint arXiv:2504.07139,

  8. [8]

    Alan M. Turing. Intelligent machinery, a heretical theory. https://uberty.org/wp-content/uploads/ 2015/02/intelligent-machinery-a-heretical-theory.pdf,

  9. [9]

    Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society

    Accessed: 2025-07-21. Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Houghton Mifflin, Boston,

  10. [10]

    https://futureoflife.org/wp-content/uploads/2024/ 12/AI-Safety-Index-2024-Full-Report-27-May-25.pdf,

  11. [11]

    The Center for AI Safety

    Accessed: 2025-07-23. The Center for AI Safety. Statement on AI risk.https://aistatement.com/,

  12. [12]

    Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al

    Accessed: 2025-07-23. Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, et al. Managing extreme AI risks amid rapid progress. Science, 384(6698):842–845,

  13. [13]

    Ai safety in generative ai large language models: A survey

    12 Jaymari Chua, Yun Li, Shiyi Yang, Chen Wang, and Lina Yao. AI safety in generative AI large language models: A survey.arXiv preprint arXiv:2407.18369,

  14. [14]

    AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

    Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, and Kwok-Yan Lam. Trustworthy, responsible, and safe AI: A comprehensive architectural framework for AI safety with challenges and mitigations.arXiv preprint arXiv:2408.12935,

  15. [15]

    arXiv preprint arXiv:2501.17805

    Yoshua Bengio, S¨ oren Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, et al. International AI safety report.arXiv preprint arXiv:2501.17805,

  16. [16]

    Accessed: 2025-07-21

    Wikipedia.https://en.wikipedia.org/wiki/Penrose%E2%80%93Lucas_argument. Accessed: 2025-07-21. David J Chalmers. Minds, machines, and mathematics.Psyche, 2(9):117–18,

  17. [17]

    Hallucination is Inevitable: An Innate Limitation of Large Language Models

    Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models.arXiv preprint arXiv:2401.11817,

  18. [18]

    it al- ways terminates

    Throughout the proof we assume A is well-behaved, i.e. it al- ways terminates. Our construction involves a program which does not take any input, i.e. I = ϕ. The program involves identifying whether the probability p of A outputting ‘halts’ when given G¨ odelprogram random as input is greater than 0.5 or not. We use a simple best arm identification proced...

  19. [19]

    [2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls

    While we can use any suitable multi-armed bandit algorithm in our construction, here we use Karnin et al. [2013], which has the guarantee that if it is provided with two arms with a gap of ϵ, then it finds the better arm with probability 1 −δ using O 1 ϵ2 log 1 δ log 1 ϵ arm pulls. This bound is known to be optimal [Jamieson et al., 2014], though in our c...