arxiv: 2604.27292 · v2 · submitted 2026-04-30 · 💻 cs.AI

Recognition: unknown

The Two Boundaries: Why Behavioral AI Governance Fails Structurally

Alan L. McCann

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:36 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI governancebehavioral governanceRice's theoremcoterminous governanceeffects governanceTuring completenessstructural failureundecidability

0 comments

The pith

AI systems governing effects must make their capability boundary identical to the governance boundary or else risk and theater are inevitable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that every effects-governing AI has two independent boundaries: what the system can express and what its policies can cover. When these boundaries are set separately, three regions appear: the useful overlap of governed capabilities, the risky region of ungoverned capabilities, and the empty region of policies that address nothing real. Rice's theorem demonstrates that no algorithm can decide, for arbitrary programs, whether effects will comply with policy. The only escape is to force the boundaries to coincide through an upfront architectural choice that separates computation from effect, turning governance into a structural property of the execution pipeline rather than a later check. If this reasoning holds, then post-hoc behavioral governance layers on Turing-complete systems cannot succeed.

Core claim

The central claim is that behavioral governance of effects in Turing-complete AI systems is undecidable in general by Rice's theorem, because no algorithm can determine whether an arbitrary program satisfies a non-trivial semantic property such as policy compliance. Coterminous governance is therefore required: the expressiveness boundary must equal the governance boundary. This equality is achieved only by an architectural separation of computation from effect, after which governance checks become part of the execution pipeline and subsume any separate governance infrastructure. The testable criterion follows directly: if the two boundaries are not provably identical, then ungoverned risk,

What carries the argument

Coterminous governance, the requirement that an AI system's expressiveness boundary (what effects it can produce) exactly equals its governance boundary, enforced by separating computation from effects so that policy checks are structural rather than behavioral.

If this is right

Any behavioral governance layer added after the fact on unrestricted programs will leave either ungoverned capabilities or policies that cover nothing.
Governance checks must be moved inside the execution pipeline rather than run as a parallel system.
Structural governance under separated computation and effect renders separate governance infrastructure redundant.
The undecidability result applies to any attempt to decide non-trivial properties of effects in Turing-complete systems.
Coterminous boundaries become the single measurable test for whether a governance approach avoids structural failure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Restricting the effect-generating component to a non-Turing-complete language would remove the undecidability barrier and allow effective behavioral governance.
System designers could verify coterminous boundaries by enumerating every possible effect and confirming that each is explicitly covered and that no policy addresses an impossible action.
The same boundary-coincidence requirement may apply to other domains where programs produce external effects, such as operating-system access control or robotic action planning.
In practice this would favor agent architectures whose action sets are declared and finite rather than generated on the fly by general computation.

Load-bearing premise

The claim depends on modeling deployed AI effect-governance systems as arbitrary Turing-complete programs whose semantic compliance properties cannot be decided algorithmically after the fact.

What would settle it

A working deployed system that governs effects behaviorally on a Turing-complete architecture yet produces neither ungoverned risky effects nor policies that address impossible actions would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.27292 by Alan L. McCann.

**Figure 1.** Figure 1: Non-coterminous governance: expressiveness and governance boundaries are misaligned. view at source ↗

**Figure 2.** Figure 2: Coterminous governance: expressiveness and governance share the same boundary. The view at source ↗

read the original abstract

Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rice's theorem plus Coq proofs show why post-hoc effect governance in AI is structurally broken and architectural separation is required instead.

read the letter

The main takeaway is that behavioral governance of AI effects cannot work in the general case. Rice's theorem establishes that no algorithm can decide non-trivial semantic properties like policy compliance for arbitrary programs, so any attempt at post-hoc checks produces either ungoverned risks or policies that address nothing real. The paper frames this as two independent boundaries creating three regions, with only one useful and the others as structural failure modes. It defines coterminous governance as the fix, where the boundaries must match through design rather than added layers. The Coq development of 454 theorems across 36 modules with zero admits gives the undecidability and subsumption claims real backing that is reproducible. That is the part that holds up cleanly. The application to AI is new in its specific framing and the testable criterion it proposes. The formal steps are direct and avoid circularity by relying on the 1953 result as an external anchor. The soft spot is the modeling assumption. Real AI systems with fixed models, bounded tool APIs, and non-arbitrary execution paths are not equivalent to fully general Turing machines, so partial decidability may exist in practice that the general theorem does not rule out. The paper also stays abstract on how to implement the required separation of computation from effect in deployed systems, leaving the constructive alternative without concrete cases. This is for AI safety and governance researchers who already suspect monitoring layers are insufficient and want a computability-based reason to focus on architecture instead. Readers who work with formal methods will find the mechanization useful. It deserves a serious referee because the core argument is grounded and the questions it raises about redesign are worth external scrutiny. I would send it to peer review, with the expectation that revisions address the gap between the general theorem and current AI constraints.

Referee Report

2 major / 2 minor

Summary. The paper claims that every AI system performing effects has two independently defined boundaries (expressiveness and governance), creating three regions of which two are structural failure modes (ungoverned risk and governance theater). It invokes Rice's theorem to prove that deciding non-trivial semantic properties such as policy-compliant effects is undecidable for any Turing-complete architecture attempting behavioral governance, defines coterminous governance as the property that the two boundaries coincide, shows this requires an architectural separation of computation from effect rather than a post-hoc layer, and mechanizes the framework in Coq (454 theorems, 36 modules, 0 admits).

Significance. If the reduction from deployed AI effect mechanisms to arbitrary Turing-complete programs holds, the result supplies a formal, testable criterion that subsumes many existing post-hoc governance proposals and explains why behavioral approaches are prone to either residual risk or ineffective theater. The Coq mechanization of 454 theorems with zero admits is a clear strength, providing machine-checked support for the undecidability argument and the derived architectural requirements.

major comments (2)

[§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.
[Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.

minor comments (2)

[Abstract] Abstract: the three-region diagram is described in text but not referenced by figure number; adding '(see Figure 1)' would improve readability.
[§5.3] §5.3: the statement that 'structural governance subsumes separate infrastructure' uses the term 'subsumes' without a precise set-theoretic or simulation relation; a short clarifying sentence would remove ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments and the recommendation for minor revision. The suggestions to formalize the reduction and the non-coterminous property of post-hoc layers will improve the clarity and rigor of the paper. We outline our responses below and confirm that revisions will be made accordingly.

read point-by-point responses

Referee: [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.

Authors: We concur that the mapping in §3.2 relies on an informal argument. In the revised version, we will provide a concrete example illustrating the simulation of an arbitrary Turing-complete program using an LLM with tool-calling capabilities, assuming tools that support persistent state and control flow. Furthermore, we will add a lemma in the Coq formalization that captures this simulation, building on the existing 454 theorems to make the inheritance of undecidability explicit and machine-checked. revision: yes
Referee: [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.

Authors: The referee correctly identifies that the derivation of coterminous governance from undecidability would benefit from an explicit lemma. We will add a new lemma stating that for any Turing-complete system, a post-hoc governance layer (operating externally on effects) cannot be coterminous with the expressiveness boundary, because it would necessitate an algorithm to decide non-trivial semantic properties of programs, contradicting Rice's theorem. This lemma will be mechanized in Coq and integrated into the definition of coterminous governance in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external Rice's theorem and Coq mechanization

full rationale

The paper grounds its core claim in Rice's theorem (1953), an independent external result on undecidability of non-trivial semantic properties for arbitrary programs, and mechanizes the mapping to behavioral AI governance effects in Coq (454 theorems, 36 modules, 0 admitted). Coterminous governance is defined directly from the two-boundary distinction and shown to require separation of computation from effect as a logical consequence of the undecidability result rather than by redefinition or fitting. No load-bearing step reduces to self-citation, ansatz smuggling, renaming of known results, or any input-output equivalence by construction within the paper itself. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper relies on standard computability theory with no free parameters fitted to data. New concepts are introduced definitionally to organize the argument.

axioms (1)

standard math Rice's theorem: non-trivial semantic properties of programs are undecidable for Turing-complete systems
Directly invoked to establish that behavioral governance of effects is undecidable in the general case.

invented entities (2)

coterminous governance no independent evidence
purpose: System property in which expressiveness boundary equals governance boundary
Newly defined as the testable criterion that avoids risk and theater regions.
three regions (governed capabilities, ungoverned capabilities, theater) no independent evidence
purpose: Categorization of outcomes when boundaries are independent
Conceptual partition introduced to identify failure modes.

pith-pipeline@v0.9.0 · 5596 in / 1509 out tokens · 72813 ms · 2026-05-07T09:36:34.786156+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 19 canonical work pages · 3 internal anchors

[1]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional AI : Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review arXiv 2022
[2]

arXiv preprint arXiv:2405.06624 , year =

David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI : A framework for ensuring robust and reliable AI systems. arXiv preprint arXiv:2405.06624, 2024

work page arXiv 2024
[3]

Dennis and Earl C

Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966. doi:10.1145/365230.365252

work page doi:10.1145/365230.365252 1966
[4]

The method of levels of abstraction

Luciano Floridi. The method of levels of abstraction. Minds and Machines, 18 0 (3): 0 303--329, 2008

2008
[5]

Gifford and John M

David K. Gifford and John M. Lucassen. Integrating functional and imperative programming. In ACM Conference on LISP and Functional Programming, pages 28--38, 1986. doi:10.1145/319838.319848

work page doi:10.1145/319838.319848 1986
[6]

Guardrails: Adding guardrails to large language models

Guardrails AI . Guardrails: Adding guardrails to large language models. https://github.com/guardrails-ai/guardrails, 2024

2024
[7]

Schuff, Ben L

Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and J. F. Bastien. Bringing the web up to speed with WebAssembly . In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 185--200, 2017. doi:10.1145/3062341.3062363

work page doi:10.1145/3062341.3062363 2017
[8]

A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability

Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thesing, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37: 0 100270, 2020. doi:10.1016/j.cosrev.2020.100270

work page doi:10.1016/j.cosrev.2020.100270 2020
[9]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Mober, et al. DSPy : Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023

work page internal anchor Pith review arXiv 2023
[10]

Type directed compilation of row-typed algebraic effects

Daan Leijen. Type directed compilation of row-typed algebraic effects. Proceedings of the ACM on Programming Languages, 1 0 (POPL): 0 1--28, 2017. doi:10.1145/3009837.3009872

work page doi:10.1145/3009837.3009872 2017
[11]

Do be do be do

Sam Lindley, Conor McBride, and Craig McLaughlin. Do be do be do. In Proceedings of the ACM on Programming Languages (POPL), pages 1--26, 2017. doi:10.1145/3009837.3009897

work page doi:10.1145/3009837.3009897 2017
[12]

Lee, Jing Tao, and Yang Zhao

Bertram Lud \"a scher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, 18 0 (10): 0 1039--1065, 2006. doi:10.1002/cpe.994

work page doi:10.1002/cpe.994 2006
[13]

Alan L. McCann. Algebraic semantics of governed execution: Monoidal categories, effect algebras, and coterminous boundaries, 2026 a . arXiv preprint (to appear)

2026
[14]

Alan L. McCann. Effect-transparent governance for AI workflow architectures: Semantic preservation, expressive minimality, and decidability boundaries, 2026 b . arXiv preprint (to appear)

2026
[15]

Alan L. McCann. Mechanized foundations of structural governance: Machine-checked proofs for governed intelligence, 2026 c . arXiv preprint (to appear)

2026
[16]

Alan L. McCann. Cryptographic registry provenance: Structural defense against dependency confusion in AI package ecosystems, 2026 d . arXiv preprint (to appear)

2026
[17]

Alan L. McCann. Certified purity for cognitive workflow executors: From static analysis to cryptographic attestation, 2026 e . arXiv preprint (to appear)

2026
[18]

Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

2006
[19]

Notions of

Eugenio Moggi. Notions of computation and monads. Information and Computation, 93 0 (1): 0 55--92, 1991. doi:10.1016/0890-5401(91)90052-4

work page doi:10.1016/0890-5401(91)90052-4 1991
[20]

Andrew C. Myers. JFlow : Practical mostly-static information flow control. In ACM Symposium on Principles of Programming Languages (POPL), pages 228--241, 1999. doi:10.1145/292540.292561

work page doi:10.1145/292540.292561 1999
[21]

Myers and Barbara Liskov

Andrew C. Myers and Barbara Liskov. A decentralized model for information flow control. In ACM Symposium on Operating Systems Principles (SOSP), pages 129--142, 1997. doi:10.1145/268998.266669

work page doi:10.1145/268998.266669 1997
[22]

OPA : Open policy agent

Open Policy Agent . OPA : Open policy agent. https://www.openpolicyagent.org/, 2024

2024
[23]

Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 0 27730--27744, 2022

2022
[24]

Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell

Simon Peyton Jones. Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell . In Engineering Theories of Software Construction, pages 47--96. IOS Press, 2001

2001
[25]

Plotkin and Matija Pretnar

Gordon Plotkin and Matija Pretnar. Handlers of algebraic effects. In European Symposium on Programming (ESOP), pages 80--94, 2009. doi:10.1007/978-3-642-00590-9_7

work page doi:10.1007/978-3-642-00590-9_7 2009
[26]

NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails

Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, and Jonathan Cohen. NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails. In Conference on Empirical Methods in Natural Language Processing (EMNLP), System Demonstrations, 2023

2023
[27]

Classes of recursively enumerable sets and their decision problems

Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74 0 (2): 0 358--366, 1953

1953
[28]

Toward Verified Artificial Intelligence,

Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. Toward verified artificial intelligence. Communications of the ACM, 65 0 (7): 0 46--55, 2022. doi:10.1145/3503914

work page doi:10.1145/3503914 2022
[29]

Mitchell, and David Mazi \`e res

Deian Stefan, Alejandro Russo, John C. Mitchell, and David Mazi \`e res. Flexible dynamic information flow control in Haskell . In ACM SIGPLAN Haskell Symposium, pages 95--106, 2011. doi:10.1145/2034675.2034688

work page doi:10.1145/2034675.2034688 2011
[30]

Monads for functional programming

Philip Wadler. Monads for functional programming. In Advanced Functional Programming, volume 925 of LNCS, pages 24--52. Springer, 1995. doi:10.1007/3-540-59451-5_2

work page doi:10.1007/3-540-59451-5_2 1995
[31]

Anderson, and Susan L

Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In ACM Symposium on Operating Systems Principles (SOSP), pages 203--216, 1993. doi:10.1145/168619.168635

work page doi:10.1145/168619.168635 1993
[32]

Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023

2023
[33]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review arXiv 2023