Recognition: unknown
The Two Boundaries: Why Behavioral AI Governance Fails Structurally
Pith reviewed 2026-05-07 09:36 UTC · model grok-4.3
The pith
AI systems governing effects must make their capability boundary identical to the governance boundary or else risk and theater are inevitable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that behavioral governance of effects in Turing-complete AI systems is undecidable in general by Rice's theorem, because no algorithm can determine whether an arbitrary program satisfies a non-trivial semantic property such as policy compliance. Coterminous governance is therefore required: the expressiveness boundary must equal the governance boundary. This equality is achieved only by an architectural separation of computation from effect, after which governance checks become part of the execution pipeline and subsume any separate governance infrastructure. The testable criterion follows directly: if the two boundaries are not provably identical, then ungoverned risk,
What carries the argument
Coterminous governance, the requirement that an AI system's expressiveness boundary (what effects it can produce) exactly equals its governance boundary, enforced by separating computation from effects so that policy checks are structural rather than behavioral.
If this is right
- Any behavioral governance layer added after the fact on unrestricted programs will leave either ungoverned capabilities or policies that cover nothing.
- Governance checks must be moved inside the execution pipeline rather than run as a parallel system.
- Structural governance under separated computation and effect renders separate governance infrastructure redundant.
- The undecidability result applies to any attempt to decide non-trivial properties of effects in Turing-complete systems.
- Coterminous boundaries become the single measurable test for whether a governance approach avoids structural failure.
Where Pith is reading between the lines
- Restricting the effect-generating component to a non-Turing-complete language would remove the undecidability barrier and allow effective behavioral governance.
- System designers could verify coterminous boundaries by enumerating every possible effect and confirming that each is explicitly covered and that no policy addresses an impossible action.
- The same boundary-coincidence requirement may apply to other domains where programs produce external effects, such as operating-system access control or robotic action planning.
- In practice this would favor agent architectures whose action sets are declared and finite rather than generated on the fly by general computation.
Load-bearing premise
The claim depends on modeling deployed AI effect-governance systems as arbitrary Turing-complete programs whose semantic compliance properties cannot be decided algorithmically after the fact.
What would settle it
A working deployed system that governs effects behaviorally on a Turing-complete architecture yet produces neither ungoverned risky effects nor policies that address impossible actions would falsify the claim.
Figures
read the original abstract
Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that every AI system performing effects has two independently defined boundaries (expressiveness and governance), creating three regions of which two are structural failure modes (ungoverned risk and governance theater). It invokes Rice's theorem to prove that deciding non-trivial semantic properties such as policy-compliant effects is undecidable for any Turing-complete architecture attempting behavioral governance, defines coterminous governance as the property that the two boundaries coincide, shows this requires an architectural separation of computation from effect rather than a post-hoc layer, and mechanizes the framework in Coq (454 theorems, 36 modules, 0 admits).
Significance. If the reduction from deployed AI effect mechanisms to arbitrary Turing-complete programs holds, the result supplies a formal, testable criterion that subsumes many existing post-hoc governance proposals and explains why behavioral approaches are prone to either residual risk or ineffective theater. The Coq mechanization of 454 theorems with zero admits is a clear strength, providing machine-checked support for the undecidability argument and the derived architectural requirements.
major comments (2)
- [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.
- [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.
minor comments (2)
- [Abstract] Abstract: the three-region diagram is described in text but not referenced by figure number; adding '(see Figure 1)' would improve readability.
- [§5.3] §5.3: the statement that 'structural governance subsumes separate infrastructure' uses the term 'subsumes' without a precise set-theoretic or simulation relation; a short clarifying sentence would remove ambiguity.
Simulated Author's Rebuttal
We thank the referee for the insightful comments and the recommendation for minor revision. The suggestions to formalize the reduction and the non-coterminous property of post-hoc layers will improve the clarity and rigor of the paper. We outline our responses below and confirm that revisions will be made accordingly.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Mapping to AI architectures): the claim that current tool-calling and API-effect mechanisms in deployed systems are sufficiently expressive to inherit the full undecidability of Rice's theorem is asserted via informal reduction; a concrete lemma or example showing how an arbitrary program is simulated by an LLM-plus-tool loop would make the application load-bearing rather than illustrative.
Authors: We concur that the mapping in §3.2 relies on an informal argument. In the revised version, we will provide a concrete example illustrating the simulation of an arbitrary Turing-complete program using an LLM with tool-calling capabilities, assuming tools that support persistent state and control flow. Furthermore, we will add a lemma in the Coq formalization that captures this simulation, building on the existing 454 theorems to make the inheritance of undecidability explicit and machine-checked. revision: yes
-
Referee: [Definition 4.1] Definition 4.1 (coterminous governance): the requirement that governance checks become part of the execution pipeline is derived from the undecidability result, yet the paper does not exhibit a formal statement showing that any post-hoc governance layer is necessarily non-coterminous; adding such a lemma would tighten the subsumption claim.
Authors: The referee correctly identifies that the derivation of coterminous governance from undecidability would benefit from an explicit lemma. We will add a new lemma stating that for any Turing-complete system, a post-hoc governance layer (operating externally on effects) cannot be coterminous with the expressiveness boundary, because it would necessitate an algorithm to decide non-trivial semantic properties of programs, contradicting Rice's theorem. This lemma will be mechanized in Coq and integrated into the definition of coterminous governance in the revised manuscript. revision: yes
Circularity Check
No significant circularity; derivation relies on external Rice's theorem and Coq mechanization
full rationale
The paper grounds its core claim in Rice's theorem (1953), an independent external result on undecidability of non-trivial semantic properties for arbitrary programs, and mechanizes the mapping to behavioral AI governance effects in Coq (454 theorems, 36 modules, 0 admitted). Coterminous governance is defined directly from the two-boundary distinction and shown to require separation of computation from effect as a logical consequence of the undecidability result rather than by redefinition or fitting. No load-bearing step reduces to self-citation, ansatz smuggling, renaming of known results, or any input-output equivalence by construction within the paper itself. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Rice's theorem: non-trivial semantic properties of programs are undecidable for Turing-complete systems
invented entities (2)
-
coterminous governance
no independent evidence
-
three regions (governed capabilities, ungoverned capabilities, theater)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional AI : Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073, 2022
work page internal anchor Pith review arXiv 2022
-
[2]
arXiv preprint arXiv:2405.06624 , year =
David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI : A framework for ensuring robust and reliable AI systems. arXiv preprint arXiv:2405.06624, 2024
-
[3]
Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966. doi:10.1145/365230.365252
-
[4]
The method of levels of abstraction
Luciano Floridi. The method of levels of abstraction. Minds and Machines, 18 0 (3): 0 303--329, 2008
2008
-
[5]
David K. Gifford and John M. Lucassen. Integrating functional and imperative programming. In ACM Conference on LISP and Functional Programming, pages 28--38, 1986. doi:10.1145/319838.319848
-
[6]
Guardrails: Adding guardrails to large language models
Guardrails AI . Guardrails: Adding guardrails to large language models. https://github.com/guardrails-ai/guardrails, 2024
2024
-
[7]
Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and J. F. Bastien. Bringing the web up to speed with WebAssembly . In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 185--200, 2017. doi:10.1145/3062341.3062363
-
[8]
Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thesing, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37: 0 100270, 2020. doi:10.1016/j.cosrev.2020.100270
-
[9]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Mober, et al. DSPy : Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023
work page internal anchor Pith review arXiv 2023
-
[10]
Type directed compilation of row-typed algebraic effects
Daan Leijen. Type directed compilation of row-typed algebraic effects. Proceedings of the ACM on Programming Languages, 1 0 (POPL): 0 1--28, 2017. doi:10.1145/3009837.3009872
-
[11]
Sam Lindley, Conor McBride, and Craig McLaughlin. Do be do be do. In Proceedings of the ACM on Programming Languages (POPL), pages 1--26, 2017. doi:10.1145/3009837.3009897
-
[12]
Bertram Lud \"a scher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience, 18 0 (10): 0 1039--1065, 2006. doi:10.1002/cpe.994
-
[13]
Alan L. McCann. Algebraic semantics of governed execution: Monoidal categories, effect algebras, and coterminous boundaries, 2026 a . arXiv preprint (to appear)
2026
-
[14]
Alan L. McCann. Effect-transparent governance for AI workflow architectures: Semantic preservation, expressive minimality, and decidability boundaries, 2026 b . arXiv preprint (to appear)
2026
-
[15]
Alan L. McCann. Mechanized foundations of structural governance: Machine-checked proofs for governed intelligence, 2026 c . arXiv preprint (to appear)
2026
-
[16]
Alan L. McCann. Cryptographic registry provenance: Structural defense against dependency confusion in AI package ecosystems, 2026 d . arXiv preprint (to appear)
2026
-
[17]
Alan L. McCann. Certified purity for cognitive workflow executors: From static analysis to cryptographic attestation, 2026 e . arXiv preprint (to appear)
2026
-
[18]
Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006
2006
-
[19]
Eugenio Moggi. Notions of computation and monads. Information and Computation, 93 0 (1): 0 55--92, 1991. doi:10.1016/0890-5401(91)90052-4
-
[20]
Andrew C. Myers. JFlow : Practical mostly-static information flow control. In ACM Symposium on Principles of Programming Languages (POPL), pages 228--241, 1999. doi:10.1145/292540.292561
-
[21]
Andrew C. Myers and Barbara Liskov. A decentralized model for information flow control. In ACM Symposium on Operating Systems Principles (SOSP), pages 129--142, 1997. doi:10.1145/268998.266669
-
[22]
OPA : Open policy agent
Open Policy Agent . OPA : Open policy agent. https://www.openpolicyagent.org/, 2024
2024
-
[23]
Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 0 27730--27744, 2022
2022
-
[24]
Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell
Simon Peyton Jones. Tackling the awkward squad: Monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell . In Engineering Theories of Software Construction, pages 47--96. IOS Press, 2001
2001
-
[25]
Gordon Plotkin and Matija Pretnar. Handlers of algebraic effects. In European Symposium on Programming (ESOP), pages 80--94, 2009. doi:10.1007/978-3-642-00590-9_7
-
[26]
NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails
Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, and Jonathan Cohen. NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails. In Conference on Empirical Methods in Natural Language Processing (EMNLP), System Demonstrations, 2023
2023
-
[27]
Classes of recursively enumerable sets and their decision problems
Henry Gordon Rice. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74 0 (2): 0 358--366, 1953
1953
-
[28]
Toward Verified Artificial Intelligence,
Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. Toward verified artificial intelligence. Communications of the ACM, 65 0 (7): 0 46--55, 2022. doi:10.1145/3503914
-
[29]
Mitchell, and David Mazi \`e res
Deian Stefan, Alejandro Russo, John C. Mitchell, and David Mazi \`e res. Flexible dynamic information flow control in Haskell . In ACM SIGPLAN Haskell Symposium, pages 95--106, 2011. doi:10.1145/2034675.2034688
-
[30]
Monads for functional programming
Philip Wadler. Monads for functional programming. In Advanced Functional Programming, volume 925 of LNCS, pages 24--52. Springer, 1995. doi:10.1007/3-540-59451-5_2
-
[31]
Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. In ACM Symposium on Operating Systems Principles (SOSP), pages 203--216, 1993. doi:10.1145/168619.168635
-
[32]
Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023
Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? In Advances in Neural Information Processing Systems, volume 36, 2023
2023
-
[33]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023
work page internal anchor Pith review arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.