arxiv: 2605.05248 · v2 · submitted 2026-05-05 · 💻 cs.PL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Governed Metaprogramming for Intelligent Systems: Reclassifying Eval as a Governed Effect

Alan L. McCann

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:08 UTC · model grok-4.3

classification 💻 cs.PL cs.AI

keywords metaprogrammingevalgoverned effectsAI workflowsruntime code generationprogram formscapability analysis

0 comments

The pith

In AI systems that generate executable code at runtime, eval must be reclassified from an unrestricted primitive into a governed effect subject to inspection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that when intelligent systems synthesize programs on the fly, the step from code representation to actual execution amplifies authority and therefore requires mediation. It proposes governed metaprogramming, in which program forms remain first-class values that can be manipulated purely, while the transition to executable form becomes a single governed operation. That operation is required to perform structural checks on capability needs, policy compliance, and resource use before any execution is allowed. The design is formalized with two distinct judgments and proven to preserve purity of manipulation, prevent bypass, and maintain clear boundaries between pure and effectful code. An implementation in a DSL for AI workflows demonstrates the approach and integrates it with a large body of existing machine-checked theorems.

Core claim

The central claim is that eval is not a language primitive but an authority-amplifying transition from symbolic form to executable machine; this transition must therefore be expressed as a governed effect whose materialization judgment inspects the proposed program before permitting execution, while form manipulation stays pure computation.

What carries the argument

Governed materialization, the judgment that converts a first-class program form into an executable directive only after analyzing its capability requirements, policy compliance, and resource estimates.

If this is right

Runtime code synthesis in agents and self-improving systems becomes subject to the same policy and resource controls applied to ordinary effects.
Pure form manipulation libraries can be written and reused without side effects, while only the final materialization step carries authority.
Existing machine-checked proofs about program forms remain valid because materialization is isolated as a single governed step.
Language runtimes for AI workflows can enforce a strict separation between symbolic computation and executable authority.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same governance boundary could be applied to other dynamic code-execution points such as template expansion or macro expansion in AI-generated scripts.
If the inspection is made compositional, it may allow modular policies that combine checks across multiple generated components without re-inspecting the whole program.
The approach suggests that other authority-amplifying operations, such as dynamic loading of foreign functions, could be re-expressed as analogous governed effects.

Load-bearing premise

A governance mechanism can inspect arbitrary generated programs for capabilities, compliance, and resource use without missing bypasses or becoming impractical.

What would settle it

A concrete program that is accepted by the governance checks yet executes with authority or resources outside the declared bounds, or a generated form that evades inspection while still reaching the machine.

read the original abstract

AI systems increasingly synthesize executable structure at runtime: LLMs generate programs, agents construct workflows,self-improving systems modify their own behavior. In classical homoiconic and staged languages, the transition from coderepresentation to execution is unrestricted. eval is a language primitive, not a governed operation. We argue that ingovernedintelligent systems, this transition is an authority amplification: it converts symbolic structure into executableauthority andmust be mediated like any other effect. We present governed metaprogramming, a language design where programrepresentations(machine forms) are first-class values, form manipulation is pure computation, and materialization (the transition fromform toexecutable machine) is a governed effect subject to structural inspection. The governance system analyzes the proposedprogram'scapability requirements, policy compliance, and resource estimates before permitting execution. We formalize twojudgments: pureform evaluation (which emits no directives) and governed materialization (which emits exactly one governed directive). Weprovethree properties: purity of form manipulation, the no-bypass theorem, and boundary preservation. We implement the designinMashinTalk, a DSL for AI workflows compiling to BEAM bytecode, and report on integration with 454 existingmachine-checked Rocqtheorems. The central contribution is reclassifying eval from a language primitive into a governed effect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reclassifies eval as a governed effect with formal judgments and a no-bypass theorem in Rocq, but the analyzer's soundness for arbitrary code remains the open question.

read the letter

The core move is treating the shift from code representation to execution as an authority-amplifying effect that needs explicit governance rather than a free language primitive. They split it into pure form evaluation and governed materialization, prove purity, no-bypass, and boundary preservation, and tie the whole thing to an implementation in MashinTalk that reuses 454 existing Rocq theorems. That formal backbone and the machine-checked count are the parts that stand out as actual work done, not just framing.

Referee Report

1 major / 2 minor

Summary. The paper claims that in AI systems generating executable code at runtime, eval should be reclassified from an unrestricted language primitive to a governed effect to mediate authority amplification. It introduces governed metaprogramming in which machine forms are first-class values, form manipulation is pure computation, and materialization (form to executable) is a governed effect that performs structural inspection of capability requirements, policy compliance, and resource estimates. The design is formalized via two judgments—pure form evaluation (emitting no directives) and governed materialization (emitting exactly one governed directive)—with proofs of purity of form manipulation, the no-bypass theorem, and boundary preservation. An implementation is described in the MashinTalk DSL for AI workflows (compiling to BEAM bytecode) together with integration against 454 existing machine-checked Rocq theorems.

Significance. If the formal results hold without gaps, the work supplies a principled, verifiable mechanism for controlling runtime code synthesis in intelligent systems, converting an authority-amplifying primitive into a mediated effect. The explicit use of 454 machine-checked Rocq theorems is a clear strength, supplying reproducible and mechanically verified guarantees rather than informal arguments. The approach could influence language designs for agents and self-modifying systems provided the governance analyzer can be realized soundly.

major comments (1)

[Abstract] Abstract (no-bypass theorem): The no-bypass theorem is load-bearing for the central claim that materialization cannot emit an ungoverned directive. The formalization defines the governance analyzer as an oracle that correctly inspects capability requirements, policy compliance, and resource estimates for any program form. No argument or additional theorem is supplied showing that this analyzer is sound (or even decidable) for arbitrary Turing-complete or dynamically generated forms; the manuscript therefore leaves open the possibility that the no-bypass property fails to hold in the MashinTalk implementation once the idealized oracle is replaced by a concrete static analyzer.

minor comments (2)

[Abstract] Abstract: several typographical errors and missing spaces appear (e.g., “ingovernedintelligent”, “designinMashinTalk”, “programrepresentations(machine forms)”). These should be corrected for readability.
[Abstract] Abstract: the statement that three properties are proven is given without any proof sketch, key lemma, or indication of how the formal judgments support the claims; readers must reach the full formal sections to evaluate soundness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review, particularly for noting the potential impact of the approach and the strength of the machine-checked Rocq integration. We address the single major comment below and will incorporate clarifications in a revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (no-bypass theorem): The no-bypass theorem is load-bearing for the central claim that materialization cannot emit an ungoverned directive. The formalization defines the governance analyzer as an oracle that correctly inspects capability requirements, policy compliance, and resource estimates for any program form. No argument or additional theorem is supplied showing that this analyzer is sound (or even decidable) for arbitrary Turing-complete or dynamically generated forms; the manuscript therefore leaves open the possibility that the no-bypass property fails to hold in the MashinTalk implementation once the idealized oracle is replaced by a concrete static analyzer.

Authors: We agree that the no-bypass theorem is conditional on the governance analyzer being sound, and that this assumption is modeled via an oracle in the formalization. The theorem establishes that, given a sound analyzer, materialization always emits exactly one governed directive and cannot bypass the governance boundary; this modularizes the metaprogramming design from the (orthogonal) problem of analyzer soundness. The 454 Rocq theorems mechanically verify the two judgments, the purity property, the no-bypass theorem, and boundary preservation under the oracle model. We did not supply a separate soundness proof for the analyzer because, for Turing-complete languages, deciding arbitrary capability and policy properties is undecidable in general. In the MashinTalk implementation the analyzer is realized via conservative, sound static checks for the restricted forms arising in AI workflows, integrated with the existing Rocq theorems. We will revise the abstract and add an explicit discussion subsection clarifying the oracle assumption, the scope of the no-bypass result, and the practical limitations of the concrete analyzer. This revision will make the conditional nature of the result transparent without overclaiming. revision: partial

Circularity Check

0 steps flagged

No significant circularity; formal judgments and proofs defined independently

full rationale

The paper defines two judgments (pure form evaluation emitting no directives, governed materialization emitting one governed directive) and proves three properties (purity, no-bypass, boundary preservation) from the language design. These are presented as first-principles formalizations rather than reductions to fitted parameters, self-definitions, or self-citation chains. Integration with 454 existing machine-checked Rocq theorems indicates external verification rather than circular self-reference. The no-bypass theorem relies on an oracle-like governance analyzer, but this is an explicit modeling assumption, not a hidden reduction of the claim to its own inputs by construction. No self-definitional, fitted-prediction, or ansatz-smuggling patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The design rests on domain assumptions about first-class forms and pure manipulation, plus an ad-hoc definition of governed materialization; no free parameters or invented entities with independent evidence are introduced.

axioms (2)

domain assumption Program representations (machine forms) are first-class values and form manipulation is pure computation.
Core of the governed metaprogramming design stated in the abstract.
ad hoc to paper Materialization (transition from form to executable machine) emits exactly one governed directive.
Defined as part of the formalization of governed materialization.

invented entities (1)

Governed materialization no independent evidence
purpose: Mediates form-to-execution transition with structural inspection and policy checks.
New operation introduced to replace unrestricted eval.

pith-pipeline@v0.9.0 · 5535 in / 1190 out tokens · 69217 ms · 2026-05-12T03:08:33.884893+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Form manipulation is pure computation. Materialization... is classified as an effect and requires governance mediation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. Julia: A fresh approach to numerical computing. SIAM Review, 59 0 (1): 0 65--98, 2017

work page 2017
[2]

Terra: A multi-stage language for high-performance computing

Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. Terra: A multi-stage language for high-performance computing. In ACM SIGPLAN Notices, volume 48, pages 105--116, 2013

work page 2013
[3]

Creating languages in Racket

Matthew Flatt. Creating languages in Racket . Communications of the ACM, 55 0 (1): 0 48--56, 2012

work page 2012
[4]

The Clojure programming language

Rich Hickey. The Clojure programming language. In Proceedings of the 2008 Symposium on Dynamic Languages, pages 1--1, 2008

work page 2008
[5]

The power of parameterization in coinductive proof

Chung-Kil Hur, Georg Neis, Derek Dreyer, and Viktor Vafeiadis. The power of parameterization in coinductive proof. ACM SIGPLAN Notices, 48 0 (1): 0 193--206, 2013

work page 2013
[6]

The Reactive Engine

Alan Kay. The Reactive Engine. PhD thesis, University of Utah, 1969

work page 1969
[7]

The design and implementation of BER MetaOCaml

Oleg Kiselyov. The design and implementation of BER MetaOCaml . In International Symposium on Functional and Logic Programming, pages 86--102. Springer, 2014

work page 2014
[8]

Alan L. McCann. Mechanized foundations of structural governance: Machine-checked proofs for governed intelligence, 2026 a

work page 2026
[9]

Alan L. McCann. mashin: A governed domain-specific language for intelligent systems, 2026 b . arXiv preprint (to appear)

work page 2026
[10]

Recursive functions of symbolic expressions and their computation by machine, part I

John McCarthy. Recursive functions of symbolic expressions and their computation by machine, part I . Communications of the ACM, 3 0 (4): 0 184--195, 1960

work page 1960
[11]

Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

work page 2006
[12]

Wyvern: A simple, typed, and pure object-oriented language

Ligia Nistor, Darya Kurilova, Stephanie Balzer, Benjamin Chung, Alex Potanin, and Jonathan Aldrich. Wyvern: A simple, typed, and pure object-oriented language. In MASPEGHI, 2013

work page 2013
[13]

Malik Sallam

Sylvestre-Alvise Rebuffi et al. NeMo guardrails: A toolkit for controllable and safe LLM applications with programmable rails. arXiv preprint arXiv:2310.10501, 2023

work page arXiv 2023
[14]

Gleckler

Alex Shinn, John Cowan, and Arthur A. Gleckler. Revised 7 report on the algorithmic language Scheme . Technical report, 2013

work page 2013
[15]

Reflection and semantics in Lisp

Brian Cantwell Smith. Reflection and semantics in Lisp . In Proceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 23--35, 1984

work page 1984
[16]

Elixir: A dynamic, functional language for building scalable and maintainable applications

Jos\' e Valim. Elixir: A dynamic, functional language for building scalable and maintainable applications. https://elixir-lang.org, 2014

work page 2014
[17]

Pierce, and Steve Zdancewic

Li-yao Xia, Yannick Zakowski, Paul He, Chung-Kil Hur, Gregory Malecha, Benjamin C. Pierce, and Steve Zdancewic. Interaction trees: Representing recursive and impure programs in Coq . Proceedings of the ACM on Programming Languages, 4 0 (POPL): 0 1--32, 2020

work page 2020