pith. sign in

arxiv: 2605.23024 · v1 · pith:BBR4UVAMnew · submitted 2026-05-21 · 💻 cs.AI · cs.CC· cs.CL· cs.LG

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

Pith reviewed 2026-05-25 05:27 UTC · model grok-4.3

classification 💻 cs.AI cs.CCcs.CLcs.LG
keywords deterministic horizontransformer capacityimpossibility resultsai design specificationsresidual streamaccuracy limitstrustworthy aicircuit complexity
0
0 comments X

The pith

A transformer's accuracy ceiling is fixed by its layer count and embedding width alone, beyond a critical depth that no training can surpass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how impossibility results can serve as design specifications for AI systems rather than mere theoretical curiosities. It establishes that large language models have a Deterministic Horizon, an accuracy limit determined solely by the number of layers and embedding dimension, past which further training provides no benefit regardless of method or data. This limit, ranging from 19 to 31 in tested models, arises from a capacity invariant in the residual stream and leads to rapid accuracy drops beyond it. Additional results apply similar logic to preference learning, retrieval systems, auctions, and verification protocols, yielding a set of 16 such specifications. These boundaries offer concrete rules for building more trustworthy AI by respecting computable limits.

Core claim

The central discovery is the Deterministic Horizon: an accuracy ceiling set by architecture alone that cannot be exceeded by any amount of training at any adapter rank, sample size, or loss function. This horizon is computable before deployment from layer count and embedding width, measured between nineteen and thirty-one across twelve transformer architectures. Fine-tuning on optimal-length traces recovers under four percentage points of performance. The underlying mechanism is a capacity invariant of the residual stream, which also produces super-exponential accuracy decay past the horizon via an information-theoretic conversion. An unconditional circuit-complexity lower bound for modular

What carries the argument

The capacity invariant of the residual stream, which sets the Deterministic Horizon based only on layer count and embedding width.

If this is right

  • Past the horizon, accuracy exhibits super-exponential decay.
  • Preference learning under misspecified models requires discontinuous increases in sample complexity.
  • Multi-stage retrieval needs at least as many independent metrics as stages.
  • Standard truthful auctions fail for agents with prompt-dependent valuations.
  • Zero-knowledge verification of neural inference incurs 110 to 190 times overhead per non-linear activation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers could pre-compute the horizon for any new architecture to decide if deeper models are needed for a task.
  • The methodology might extend to non-transformer models if similar capacity invariants exist.
  • One could test whether the horizon predicts performance on reasoning benchmarks more accurately than scaling laws.
  • Integrating these specs early in design might reduce wasted compute on impossible performance targets.

Load-bearing premise

The accuracy ceiling depends only on a capacity invariant of the residual stream determined by layer count and embedding width, independent of training procedure or data distribution.

What would settle it

Finding that a transformer exceeds its predicted Deterministic Horizon accuracy after extensive training on a task requiring greater reasoning depth would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.23024 by Dongxin Guo.

Figure 1.1
Figure 1.1. Figure 1.1: The thesis in one picture. A four-row schematic: Row A, four trustworthy-AI subfields, each with a failure mode; Row B, the methodological move that every such limit is also a design rule; Row C, the template per subfield, a computable boundary with a constructive rule; Row D, the 4 × 4 composition matrix, two compositions proved (one cross-pillar, one within the Trust pillar), one honest obstruction, fo… view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: Bidirectional mapping underlying Theorem 2.4. What is plotted. Left column: the layer structure of a softmax transformer (input embedding, alter￾nating attention and FFN layers, classifier). Right column: the corresponding syntactic structure of a depth-L FOC[Attn] sentence (atomic predicates, atten￾tion quantifiers, FOC arithmetic, final sentence). Horizontal arrows denote the structural induction of th… view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: One round of the Attention EF game (attention move from Defini￾tion 2.7). What is plotted. Two binary string structures A (top) and B (bottom) of length n=8, with position indices shown above A and below B; colour encodes role (blue = Spoiler, orange = Duplicator). Spoiler specifies attention param￾eters (ψQ, ψK, ψV,s) and picks position i=3 in A, computing Attn(i)A = 0.73; Duplicator must respond with a… view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: Deterministic Horizon accuracy-depth curves across 12 architectures. What is plotted. Chain accuracy Pr[correct] (y-axis, higher is better) versus sequential reasoning depth δ (x-axis). Thin grey lines: schematic super-exponential decay Pr[correct] = exp(−(δ/d ∗ ) 2 ln 2) using per-architecture d ∗ values from [PITH_FULL_IMAGE:figures/full_fig_p086_2_3.png] view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Accuracy decay on multi-digit addition for three representative mod￾els (GPT-2 Medium, Llama-2 7B, Gemma-2 9B). What is plotted. Chain accu￾racy (y-axis) versus effective reasoning depth δ (x-axis) on multi-digit addition (D ∈ {2, 4, 8, 16, 32, 64} digits). Data points: empirical accuracy (mean over 2,000 instances × 3 prompt orderings). Solid curves: theoretical super-exponential fit Equation (2.2). Das… view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: Overview of the entropy-threshold stopping algorithm (Algorithm 1). What is plotted. Control flow of the stopping procedure: input problem x; step genera￾tion st+1 ∼ P(· | st); smoothed-entropy computation H¯ t+1 = 0.3 Hˆ t+1 + 0.7 H¯ t ; threshold comparison against h ∗ = (λ/γˆ)ln(1/λ); and either termination or loop-back to the generator. The threshold input γˆ is a calibration esti￾mate of the spectra… view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Preference-learning phase transition (↓ lower sample complex￾ity is better). Solid segments at γ = 0: Bradley-Terry regime with Nwell = Θ(n log n/∆ 2 ) (Theorem 3.3); hollow circles mark the regime endpoints. Dashed curves for γ > 0: misspecified regime with Nmis = Ω(n 2/γ 2 ) (Theorem 3.4, lower bound). Orange arrow: the discontinuity at γ = 0 + (proved; any infinites￾imal misspecification triggers the … view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Phase transition from the Bradley–Terry regime Θ(n log n/∆ 2 ) to the misspecified regime Θe (n 2/γ 2 ) at ∆ = 0.05, shown on log–log axes to span the γ ∈ [4×10−4 , 0.2] data range. Theory drawn as a piecewise lower envelope per scale: the solid horizontal segments at left give the BT plateau (no legend entry, taken as the natural continuation of the dashed branch), the short coloured arrows mark the upw… view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Collapse trajectories, LLaMA-2 7B. Replacement: quadratic KL. Accumulation: saturation ∝ 1/ρ, matching Theorem 3.12. 0 5 10 15 20 25 30 35 40 45 50 55 0.5 0.6 0.7 0.8 0.9 1 K ∗ Sequential edits K Retention ROME, LLaMA-2 MEMIT, LLaMA-2 ROME, Pythia [PITH_FULL_IMAGE:figures/full_fig_p116_3_3.png] view at source ↗
Figure 3.4
Figure 3.4. Figure 3.4: Retention vs. sequential edits. ROME degrades past K ∗ ≈ 13; cross￾architecture predictions match within 1 std [PITH_FULL_IMAGE:figures/full_fig_p116_3_4.png] view at source ↗
Figure 3.5
Figure 3.5. Figure 3.5: The EVOPREF pipeline. A population of 32 LoRA adapters is initial￾ized from DPO, then evolved over 200 generations via NSGA-II with two objec￾tives: alignment reward f1 and behavioral diversity f2. LoRA block crossover and Gaussian mutation generate offspring; NSGA-II selection with crowding distance maintains the Pareto front. The final archive of 28 non-dominated adapters covers 81.7% of the 75-cell be… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Three-tier failure taxonomy from a synthesis of 150+ RAG￾deployment papers ( [PITH_FULL_IMAGE:figures/full_fig_p146_4_1.png] view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Hybrid conflict-resolution architecture prescribed by Theorem 4.3 and evaluated on the four-category conflict taxonomy of §4.4 (temporal, numerical, entity, semantic). An incoming conflict (s1,s2, c) is routed by an 8M-parameter LCR classifier that compares the metadata-informativeness Imeta(s1,s2, c) against the threshold H(c)/2. The shallow branch (Imeta ≥ H(c)/2) applies latent refinement at 6% token … view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: VCG failure versus OSP solution for LLM agents with prompt [PITH_FULL_IMAGE:figures/full_fig_p160_5_1.png] view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Non-linearity tax across zkML operations [PITH_FULL_IMAGE:figures/full_fig_p167_5_2.png] view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: The Collapse folding scheme for verifiable neural-network inference. Top: Layered Sumcheck Accumulation processes the d-layer network incremen￾tally; at each layer the accumulator Aℓ−1 absorbs a verifier challenge ρℓ together with a commitment to the layer output, yielding Aℓ . The construction achieves verifier cost O(d log nmax) and recursive circuit size O(log2 nmax). Bottom: re￾cursive circuit gate c… view at source ↗
Figure 5.4
Figure 5.4. Figure 5.4: The Welfare Composition Theorem (Theorem 5.18) as a joint-necessity composition. Scenario (i): without verification V, an incentive-compatible mech￾anism M alone admits computation substitution (agents bid honestly but ex￾ecute approximate computations), yielding welfare loss Ω(m∆), linear in the number of tasks. Scenario (ii): without mechanism design M, perfect verifica￾tion V alone admits strategic ta… view at source ↗
Figure 6.1
Figure 6.1. Figure 6.1: Sixteen impossibility specifications (S1–S16) grouped into the four pillars Computation (Chapter 2), Adaptation (Chapter 3), Grounding (Chap￾ter 4), and Trust (Chapter 5). Three composition edges record progress on the six pairwise and one four-way compositions. Left, green-dashed bidirectional: Computation ⊙ Grounding is proved (Theorem 6.3), valid under Assumption 6.2 and the capacity-bottleneck retent… view at source ↗
read the original abstract

Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript presents a framework for converting impossibility results into design specifications for trustworthy AI. The flagship claim is the existence of a 'Deterministic Horizon'—an accuracy ceiling for transformer models determined exclusively by layer count and embedding width via a residual stream capacity invariant. This horizon is reported as ranging from 19 to 31 across twelve architectures, with fine-tuning on optimal traces recovering less than four percentage points. Additional results include an unconditional circuit-complexity lower bound for modular exponentiation and a catalogue of sixteen specifications derived from impossibility results in various AI subfields.

Significance. Should the central claims be substantiated with derivations and experiments, the work could offer a significant contribution by providing pre-computable design rules for AI systems, potentially advancing the field of trustworthy AI by linking theoretical limits directly to engineering practices. The emphasis on turning limits into constructive rules is a novel perspective if supported by rigorous proofs and experiments.

major comments (3)
  1. [Abstract] Abstract: The abstract asserts proofs of the Deterministic Horizon, the capacity invariant, and sixteen specifications, yet the manuscript provides no derivation steps, formal definitions, or circuit reductions to support these claims. This is load-bearing as the computability before deployment relies on the invariant being independent of training.
  2. [Abstract] Abstract: The measurements of the horizon between nineteen and thirty-one and the fine-tuning recovery under four percentage points are stated without reference to experimental protocols, datasets, architectures details, or statistical analysis, making it impossible to assess the evidence for the capacity invariant's independence from data distribution and optimization.
  3. [Abstract] Abstract: The claim that the accuracy ceiling is set solely by a capacity invariant of the residual stream independent of training procedure, adapter rank, sample size, loss function, and data distribution is central but unsupported; no argument or experiment demonstrates this independence, which is required for the pre-deployment computability and super-exponential decay.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each point below and will revise the manuscript to improve clarity on derivations and experimental details while preserving the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts proofs of the Deterministic Horizon, the capacity invariant, and sixteen specifications, yet the manuscript provides no derivation steps, formal definitions, or circuit reductions to support these claims. This is load-bearing as the computability before deployment relies on the invariant being independent of training.

    Authors: The formal definitions of the residual stream capacity invariant, the proof of the Deterministic Horizon, and the circuit reductions appear in Sections 3 and 5, with the sixteen specifications derived in Section 6. We agree the abstract would benefit from an explicit reference to these sections and a concise outline of the proof strategy; we will revise the abstract and expand the step-by-step derivations in the body for greater accessibility. revision: yes

  2. Referee: [Abstract] Abstract: The measurements of the horizon between nineteen and thirty-one and the fine-tuning recovery under four percentage points are stated without reference to experimental protocols, datasets, architectures details, or statistical analysis, making it impossible to assess the evidence for the capacity invariant's independence from data distribution and optimization.

    Authors: Section 4 contains the full experimental protocols, the twelve architectures tested, the datasets, the fine-tuning setup on optimal traces, and the statistical analysis. We will revise the abstract to reference this section and include a brief protocol summary so readers can immediately locate the supporting evidence. revision: yes

  3. Referee: [Abstract] Abstract: The claim that the accuracy ceiling is set solely by a capacity invariant of the residual stream independent of training procedure, adapter rank, sample size, loss function, and data distribution is central but unsupported; no argument or experiment demonstrates this independence, which is required for the pre-deployment computability and super-exponential decay.

    Authors: Section 3 presents the theoretical argument establishing independence via the information-theoretic capacity bound on the residual stream (depending only on layer count and embedding width). Section 4 reports controlled experiments that vary training procedure, adapter rank, sample size, loss function, and data distribution while holding architecture fixed, confirming the horizon remains stable. We will expand the discussion of both the theoretical bound and the experimental controls in the revision. revision: yes

Circularity Check

1 steps flagged

Horizon claimed computable from layers/width via residual invariant, but values measured and decay derived from same calibrated capacity

specific steps
  1. fitted input called prediction [abstract]
    "Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon."

    The horizon and its decay are asserted to follow from an architecture-only capacity invariant (independent of training procedure or data distribution) that makes it computable from layer count and embedding width alone. Yet the specific numerical range is obtained by measurement, and the decay is produced by conversion from that invariant; if the invariant itself is fitted or defined from the measured horizons, both the pre-deployment claim and the decay become statistically forced by the same empirical inputs rather than independently derived.

full rationale

The abstract presents the Deterministic Horizon as both architecture-derived (computable pre-deployment from layer count and embedding width via a capacity invariant of the residual stream independent of training/data) and empirically measured (19-31 across 12 architectures), with super-exponential decay obtained by information-theoretic conversion from that same invariant. This creates a fitted-input-called-prediction loop where the invariant appears calibrated on the reported measurements rather than independently derived or proved, so the pre-deployment computability and decay claims reduce to the empirical values by construction. No equations or explicit derivation of the invariant appear in the provided text, but the dual presentation of 'computable from' and 'measured' satisfies the pattern without requiring external speculation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

Abstract-only review yields limited visibility into parameters and axioms; the residual-stream capacity invariant appears introduced to support the horizon claim.

free parameters (2)
  • horizon range 19-31 = 19 to 31
    Values obtained by measurement across twelve architectures rather than derived from first principles alone.
  • under four percentage points recovery = <4%
    Quantitative bound on fine-tuning benefit stated without derivation or confidence interval.
axioms (2)
  • ad hoc to paper A capacity invariant of the residual stream exists and is independent of training
    Invoked as the mechanism that sets the horizon from layer count and width alone.
  • domain assumption Turing, Arrow, and No Free Lunch results directly constrain modern transformer behavior
    Stated in the opening sentence as shaping what computation can do.
invented entities (1)
  • Deterministic Horizon no independent evidence
    purpose: Architecture-determined accuracy ceiling
    New named construct whose independent evidence is the claimed measurement and invariant.

pith-pipeline@v0.9.0 · 5828 in / 1806 out tokens · 33513 ms · 2026-05-25T05:27:25.299554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

201 extracted references · 201 canonical work pages · 12 internal anchors

  1. [1]

    When More is Less: Understanding Chain-of-Thought Length in LLMs

    Y. Wu, Y. Wang, Z. Ye, T. Du, S. Jegelka, and Y. Wang. “When More is Less: Understanding Chain-of-Thought Length in LLMs”. In:The Fourteenth International Conference on Learning Representations. 2026

  2. [2]

    SWE-bench: Can Language Models Resolve Real-world Github Issues?

    C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan. “SWE-bench: Can Language Models Resolve Real-world Github Issues?” In:The Twelfth International Conference on Learning Repre- sentations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  3. [3]

    Convergent and discriminant valida- tion by the multitrait-multimethod matrix

    D. T. Campbell and D. W. Fiske. “Convergent and discriminant valida- tion by the multitrait-multimethod matrix”. In:Psychological Bulletin56.2 (1959), 81–105

  4. [4]

    Validity

    S. Messick. “Validity”. In:Educational Measurement. 3rd ed. New York, NY: American Council on Education / Macmillan, 1989, pp. 13–103

  5. [5]

    Measurement and Fairness

    A. Z. Jacobs. “Measurement and Fairness”. In:FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021. ACM, 2021, pp. 375–385

  6. [6]

    A Mathematical Theory of Communication

    C. E. Shannon. “A Mathematical Theory of Communication”. In:Bell System Technical Journal27.3 (1948), 379–423

  7. [7]

    A Theory of the Learnable

    L. G. Valiant. “A Theory of the Learnable”. In:Commun. ACM27.11 (1984), pp. 1134–1142

  8. [8]

    M. J. Kearns and U. V . Vazirani.An Introduction to Computational Learning Theory. MIT Press, 1994.ISBN: 978-0-262-11193-5

  9. [9]

    Saturated Transformers are Constant-Depth Threshold Circuits

    W. Merrill, A. Sabharwal, and N. A. Smith. “Saturated Transformers are Constant-Depth Threshold Circuits”. In:Trans. Assoc. Comput. Linguistics 10 (2022), pp. 843–856

  10. [10]

    The Parallelism Tradeoff: Limitations of Log-Precision Transformers

    W. Merrill and A. Sabharwal. “The Parallelism Tradeoff: Limitations of Log-Precision Transformers”. In:Trans. Assoc. Comput. Linguistics11 (2023), pp. 531–545. Bibliography228

  11. [11]

    The Expressive Power of Transformers with Chain of Thought

    W. Merrill and A. Sabharwal. “The Expressive Power of Transformers with Chain of Thought”. In:The Twelfth International Conference on Learn- ing Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenRe- view.net, 2024

  12. [12]

    What For- mal Languages Can Transformers Express? A Survey

    L. Strobl, W. Merrill, G. Weiss, D. Chiang, and D. Angluin. “What For- mal Languages Can Transformers Express? A Survey”. In:Trans. Assoc. Comput. Linguistics12 (2024), pp. 543–561

  13. [13]

    Russell.Human Compatible: Artificial Intelligence and the Problem of Con- trol

    S. Russell.Human Compatible: Artificial Intelligence and the Problem of Con- trol. New York, NY: Viking, Oct. 2019.ISBN: 978-0-525-55861-3

  14. [14]

    On computable numbers, with an application to the Entscheidungsproblem

    A. M. Turing. “On computable numbers, with an application to the Entscheidungsproblem”. In:Proc. London Math. Soc.s2-42.1 (1937), pp. 230– 265

  15. [15]

    A Difficulty in the Concept of Social Welfare

    K. J. Arrow. “A Difficulty in the Concept of Social Welfare”. In:Journal of Political Economy58.4 (1950), 328–346

  16. [16]

    Classes of recursively enumerable sets and their decision problems

    H. G. Rice. “Classes of recursively enumerable sets and their decision problems”. In:Transactions of the American Mathematical Society74.2 (1953), 358–366

  17. [17]

    Impossibility of Distributed Consensus with One Faulty Process

    M. J. Fischer, N. A. Lynch, and M. Paterson. “Impossibility of Distributed Consensus with One Faulty Process”. In:J. ACM32.2 (1985), pp. 374–382

  18. [18]

    Towards robust distributed systems (abstract)

    E. A. Brewer. “Towards robust distributed systems (abstract)”. In:Proceed- ings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, July 16-19, 2000, Portland, Oregon, USA. ACM, 2000, p. 7

  19. [19]

    Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services

    S. Gilbert and N. A. Lynch. “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services”. In:SIGACT News 33.2 (2002), pp. 51–59

  20. [20]

    Consistency Tradeoffs in Modern Distributed Database Sys- tem Design: CAP is Only Part of the Story

    D. Abadi. “Consistency Tradeoffs in Modern Distributed Database Sys- tem Design: CAP is Only Part of the Story”. In:Computer45.2 (2012), pp. 37–42

  21. [21]

    No free lunch theorems for opti- mization

    D. H. Wolpert and W. G. Macready. “No free lunch theorems for opti- mization”. In:IEEE Trans. Evol. Comput.1.1 (1997), pp. 67–82

  22. [22]

    Inherent Trade-Offs in the Fair Determination of Risk Scores

    J. M. Kleinberg, S. Mullainathan, and M. Raghavan. “Inherent Trade-Offs in the Fair Determination of Risk Scores”. In:8th Innovations in Theoretical Computer Science Conference, ITCS 2017, Berkeley, CA, USA, January 9-11,

  23. [23]

    Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017, 43:1–43:23

    LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017, 43:1–43:23

  24. [24]

    Calibrated Language Models Must Hal- lucinate

    A. T. Kalai and S. S. Vempala. “Calibrated Language Models Must Hal- lucinate”. In:Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024. ACM, 2024, pp. 160–171. Bibliography229

  25. [25]

    Thinking Like Transformers

    G. Weiss, Y. Goldberg, and E. Yahav. “Thinking Like Transformers”. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of Machine Learning Research. PMLR, 2021, pp. 11080–11090

  26. [26]

    Theoretical Limitations of Self-Attention in Neural Sequence Models

    M. Hahn. “Theoretical Limitations of Self-Attention in Neural Sequence Models”. In:Transactions of the Association for Computational Linguistics8 (2020), 156–171

  27. [27]

    Attention is Turing-Complete

    J. Pérez, P . Barceló, and J. Marinkovic. “Attention is Turing-Complete”. In:J. Mach. Learn. Res.22 (2021), 75:1–75:35

  28. [28]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”. In:Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022

  29. [29]

    Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

    G. Feng, B. Zhang, Y. Gu, H. Ye, D. He, and L. Wang. “Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective”. In: Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  30. [30]

    Chain of Thought Empowers Trans- formers to Solve Inherently Serial Problems

    Z. Liu, H. Liu, D. Zhou, and T. Ma. “Chain of Thought Empowers Trans- formers to Solve Inherently Serial Problems”. In:The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  31. [31]

    Faith and Fate: Limits of Transformers on Com- positionality

    N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, S. Welleck, P . West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi. “Faith and Fate: Limits of Transformers on Com- positionality”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Syste...

  32. [32]

    Are Emergent Abilities of Large Language Models a Mirage?

    R. Schaeffer, B. Miranda, and S. Koyejo. “Are Emergent Abilities of Large Language Models a Mirage?” In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  33. [33]

    Measuring Faithfulness in Chain-of-Thought Reasoning

    T. Lanham, A. Chen, A. Radhakrishnan, B. Steiner, C. Denison, D. Hernan- dez, D. Li, E. Durmus, E. Hubinger, J. Kernion, K. Lukoši¯ut˙e, K. Nguyen, N. Cheng, N. Joseph, N. Schiefer, O. Rausch, R. Larson, S. McCandlish, S. Kundu, S. Kadavath, S. Yang, T. Henighan, T. Maxwell, T. Telleen-Lawton, T. Hume, Z. Hatfield-Dodds, J. Kaplan, J. Brauner, S. R. Bowma...

  34. [34]

    Training Verifiers to Solve Math Word Problems

    K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman. “Training Verifiers to Solve Math Word Problems”. In:arXiv preprint arXiv.2110.14168 (2021)

  35. [35]

    Let’s Verify Step by Step

    H. Lightman, V . Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. “Let’s Verify Step by Step”. In: The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

  36. [36]

    Solving math word problems with process- and outcome-based feedback

    J. Uesato, N. Kushman, R. Kumar, F. Song, N. Siegel, L. Wang, A. Creswell, G. Irving, and I. Higgins. “Solving math word problems with process- and outcome-based feedback”. In:arXiv preprintarXiv.2211.14275 (2022)

  37. [37]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom. “Toolformer: Language Models Can Teach Themselves to Use Tools”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, ...

  38. [38]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. “ReAct: Synergizing Reasoning and Acting in Language Models”. In:The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

  39. [39]

    The rise and potential of large language model based agents: a survey

    Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y. Zhou, W. Wang, C. Jiang, Y. Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Qin, Y. Zheng, X. Qiu, X. Huang, Q. Zhang, and T. Gui. “The rise and potential of large language model based agents: a survey”. In:Sci. China Inf. Sci.68.2 (2025)

  40. [40]

    LoRA: Low-Rank Adaptation of Large Language Models

    E. J. Hu, Y. Shen, P . Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. “LoRA: Low-Rank Adaptation of Large Language Models”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022

  41. [41]

    QLoRA: Effi- cient Finetuning of Quantized LLMs

    T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer. “QLoRA: Effi- cient Finetuning of Quantized LLMs”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. Bibliography231

  42. [42]

    AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

    Q. Zhang, M. Chen, A. Bukharin, N. Karampatziakis, P . He, Y. Cheng, W. Chen, and T. Zhao. “AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning”. In:arXiv preprintarXiv.2303.10512 (2023)

  43. [43]

    Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parame- ters than Training Data

    G. K. Dziugaite and D. M. Roy. “Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parame- ters than Training Data”. In:Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017. AUAI Press, 2017

  44. [44]

    Non- vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach

    W. Zhou, V . Veitch, M. Austern, R. P . Adams, and P . Orbanz. “Non- vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach”. In:7th International Conference on Learning Rep- resentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenRe- view.net, 2019

  45. [45]

    Non-Vacuous Generalization Bounds for Large Language Mod- els

    S. Lotfi, M. A. Finzi, Y. Kuang, T. G. J. Rudner, M. Goldblum, and A. G. Wilson. “Non-Vacuous Generalization Bounds for Large Language Mod- els”. In:Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. Proceedings of Machine Learning Research. PMLR / OpenReview.net, 2024, pp. 32801–32818

  46. [46]

    Unlocking Deter- ministic Robustness Certification on ImageNet

    K. Hu, A. Zou, Z. Wang, K. Leino, and M. Fredrikson. “Unlocking Deter- ministic Robustness Certification on ImageNet”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  47. [47]

    LoRA Learns Less and Forgets Less

    D. Biderman, J. P . Portes, J. J. G. Ortiz, M. Paul, P . Greengard, C. Jennings, D. King, S. Havens, V . Chiley, J. Frankle, C. Blakeney, and J. P . Cunning- ham. “LoRA Learns Less and Forgets Less”. In:Trans. Mach. Learn. Res. 2024 (2024)

  48. [48]

    A Kernel-Based View of Language Model Fine-Tuning

    S. Malladi, A. Wettig, D. Yu, D. Chen, and S. Arora. “A Kernel-Based View of Language Model Fine-Tuning”. In:International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Proceedings of Machine Learning Research. PMLR, 2023, pp. 23610–23641

  49. [49]

    Training language models to follow instructions with human feedback

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P . Welinder, P . F. Christiano, J. Leike, and R. Lowe. “Training language models to follow instructions with human feedback”. In:Advances in Neural Information Processing System...

  50. [50]

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn. “Direct Preference Optimization: Your Language Model is Secretly a Reward Model”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  51. [51]

    Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

    S. Xu, W. Fu, J. Gao, W. Ye, W. Liu, Z. Mei, G. Wang, C. Yu, and Y. Wu. “Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study”. In: Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. Proceedings of Machine Learning Research. PMLR / OpenReview.net, 2024, pp. 54983–54998

  52. [52]

    On the Algorithmic Bias of Aligning Large Language Models with RLHF: Prefer- ence Collapse and Matching Regularization

    J. Xiao, Z. Li, X. Xie, E. Getzen, C. Fang, Q. Long, and W. J. Su. “On the Algorithmic Bias of Aligning Large Language Models with RLHF: Prefer- ence Collapse and Matching Regularization”. In:Journal of the American Statistical Association120.552 (2025), pp. 2154–2164

  53. [53]

    Locating and Editing Factual Associations in GPT

    K. Meng, D. Bau, A. Andonian, and Y. Belinkov. “Locating and Editing Factual Associations in GPT”. In:Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022

  54. [54]

    Mass- Editing Memory in a Transformer

    K. Meng, A. S. Sharma, A. J. Andonian, Y. Belinkov, and D. Bau. “Mass- Editing Memory in a Transformer”. In:The Eleventh International Confer- ence on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

  55. [55]

    Toy Models of Superposition

    N. Elhage, T. Hume, C. Olsson, N. Schiefer, T. Henighan, S. Kravec, Z. Hatfield-Dodds, R. Lasenby, D. Drain, C. Chen, R. Grosse, S. McCandlish, J. Kaplan, D. Amodei, M. Wattenberg, and C. Olah. “Toy Models of Superposition”. In:arXiv preprintarXiv.2209.10652 (2022)

  56. [56]

    Templeton, T

    A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H. Cunningham, N. L. Turner, C. McDougall, M. MacDiarmid, A. Tamkin, E. Durmus, T. Hume, F. Mosconi, C. D. Freeman, T. R. Sumers, E. Rees, J. Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan.Scaling Monosemanticity: Extracting Interpretab...

  57. [57]

    Editing models with task arithmetic

    G. Ilharco, M. T. Ribeiro, M. Wortsman, L. Schmidt, H. Hajishirzi, and A. Farhadi. “Editing models with task arithmetic”. In:The Eleventh Inter- national Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

  58. [58]

    TIES-Merging: Resolving Interference When Merging Models

    P . Yadav, D. Tam, L. Choshen, C. A. Raffel, and M. Bansal. “TIES-Merging: Resolving Interference When Merging Models”. In:Advances in Neural Bibliography233 Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  59. [59]

    Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

    G. Ortiz-Jiménez, A. Favero, and P . Frossard. “Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models”. In:Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023

  60. [60]

    AI models collapse when trained on recursively generated data

    I. Shumailov, Z. Shumaylov, Y. Zhao, N. Papernot, R. J. Anderson, and Y. Gal. “AI models collapse when trained on recursively generated data”. In:Nat.631.8022 (2024), pp. 755–759

  61. [61]

    Self-Consuming Genera- tive Models Go MAD

    S. Alemohammad, J. Casco-Rodriguez, L. Luzi, A. I. Humayun, H. Babaei, D. LeJeune, A. Siahkoohi, and R. G. Baraniuk. “Self-Consuming Genera- tive Models Go MAD”. In:The Twelfth International Conference on Learn- ing Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenRe- view.net, 2024

  62. [62]

    A Tale of Tails: Model Collapse as a Change of Scaling Laws

    E. Dohmatob, Y. Feng, P . Yang, F. Charton, and J. Kempe. “A Tale of Tails: Model Collapse as a Change of Scaling Laws”. In:Forty-first In- ternational Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. Proceedings of Machine Learning Research. PMLR / OpenReview.net, 2024, pp. 11165–11197

  63. [63]

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    M. Gerstgrasser, R. Schaeffer, A. Dey, R. Rafailov, T. Korbak, H. Sleight, R. Agrawal, J. Hughes, D. B. Pai, A. Gromov, D. Roberts, D. Yang, D. L. Donoho, and S. Koyejo. “Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data”. In:First Conference on Language Modeling. 2024

  64. [64]

    Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks

    P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küt- tler, M. Lewis, W. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. “Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks”. In:Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December...

  65. [65]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, and H. Wang. “Retrieval-Augmented Generation for Large Language Models: A Survey”. In:arXiv preprintarXiv.2312.10997 (2024)

  66. [66]

    Dense Passage Retrieval for Open-Domain Question An- swering

    V . Karpukhin, B. Oguz, S. Min, P . Lewis, L. Wu, S. Edunov, D. Chen, and W. Yih. “Dense Passage Retrieval for Open-Domain Question An- swering”. In:Proceedings of the 2020 Conference on Empirical Methods in Bibliography234 Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics, 2020, pp. 6769–6781

  67. [67]

    Unsupervised Dense Information Retrieval with Contrastive Learning

    G. Izacard, M. Caron, L. Hosseini, S. Riedel, P . Bojanowski, A. Joulin, and E. Grave. “Unsupervised Dense Information Retrieval with Contrastive Learning”. In:Trans. Mach. Learn. Res.2022 (2022)

  68. [68]

    GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Re- trieval

    K. Wang, N. Thakur, N. Reimers, and I. Gurevych. “GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Re- trieval”. In:Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo- gies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022. Associatio...

  69. [69]

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal. “Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions”. In:Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. Association for Computational Lingu...

  70. [70]

    Active Retrieval Augmented Generation

    Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y. Yang, J. Callan, and G. Neubig. “Active Retrieval Augmented Generation”. In:Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computa- tional Linguistics, 2023, pp. 7969–7992

  71. [71]

    Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

    B. Jin, H. Zeng, Z. Yue, J. Yoon, S. O. Arik, D. Wang, H. Zamani, and J. Han. “Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning”. In:Second Conference on Language Modeling. 2025

  72. [72]

    MuSiQue: Multihop Questions via Single-hop Question Composition

    H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal. “MuSiQue: Multihop Questions via Single-hop Question Composition”. In:Trans. Assoc. Comput. Linguistics10 (2022), pp. 539–554

  73. [73]

    MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

    Y. Tang and Y. Yang. “MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries”. In:First Conference on Language Mod- eling. 2024

  74. [74]

    RAGAs: Automated Eval- uation of Retrieval Augmented Generation

    S. ES, J. James, L. E. Anke, and S. Schockaert. “RAGAs: Automated Eval- uation of Retrieval Augmented Generation”. In:Proceedings of the 18th Conference of the European Chapter of the Association for Computational Lin- guistics, EACL 2024 - System Demonstrations, St. Julians, Malta, March 17-22,

  75. [75]

    Association for Computational Linguistics, 2024, pp. 150–158

  76. [76]

    Measuring Attribution in Bibliography235 Natural Language Generation Models

    H. Rashkin, V . Nikolaev, M. Lamm, L. Aroyo, M. Collins, D. Das, S. Petrov, G. S. Tomar, I. Turc, and D. Reitter. “Measuring Attribution in Bibliography235 Natural Language Generation Models”. In:Comput. Linguistics49.4 (2023), pp. 777–840

  77. [77]

    RARR: Researching and Revising What Language Models Say, Using Language Models

    L. Gao, Z. Dai, P . Pasupat, A. Chen, A. T. Chaganty, Y. Fan, V . Y. Zhao, N. Lao, H. Lee, D. Juan, and K. Guu. “RARR: Researching and Revising What Language Models Say, Using Language Models”. In:Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. Ass...

  78. [78]

    Correctness is not Faith- fulness in Retrieval Augmented Generation Attributions

    J. Wallat, M. Heuss, M. d. Rijke, and A. Anand. “Correctness is not Faith- fulness in Retrieval Augmented Generation Attributions”. In:Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR). ICTIR ’25. Padua, Italy: Associa- tion for Computing Machinery, 2025, 22–32.ISBN: 9798400718618

  79. [79]

    K. J. Arrow.Social Choice and Individual Values. Yale University Press, 2017. ISBN: 9780300186987

  80. [80]

    Information in Mechanism Design

    D. Bergemann and J. Välimäki. “Information in Mechanism Design”. In: Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress. Vol. 1. Econometric Society Monographs 41. Cambridge, UK: Cambridge University Press, 2006. Chap. 5, pp. 186–221.ISBN: 978-0- 521-87152-5

Showing first 80 references.