Recognition: unknown
From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers
Pith reviewed 2026-05-10 04:48 UTC · model grok-4.3
The pith
Arbiter-K enforces security as a microarchitectural property in agentic AI by reifying model outputs through a Semantic ISA into taint-tracked discrete instructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Arbiter-K reconceptualizes the underlying model as a Probabilistic Processing Unit encapsulated by a deterministic, neuro-symbolic kernel. It implements a Semantic Instruction Set Architecture to reify probabilistic messages into discrete instructions. This allows the kernel to maintain a Security Context Registry and construct an Instruction Dependency Graph at runtime, enabling active taint propagation based on the data-flow pedigree of each reasoning node. The kernel then precisely interdicts unsafe trajectories at deterministic sinks and performs autonomous execution correction and architectural rollback when policies are triggered.
What carries the argument
The Semantic Instruction Set Architecture that converts probabilistic model outputs into discrete instructions, together with the runtime Instruction Dependency Graph that supports taint propagation from data-flow pedigrees.
If this is right
- Unsafe trajectories are interdicted at precise deterministic sinks such as high-risk tool calls or unauthorized network egress.
- The kernel can trigger autonomous execution correction and architectural rollback when a security policy is activated.
- Security becomes enforceable as a microarchitectural property rather than an external heuristic layer.
- Evaluations on OpenClaw and NanoBot show interception rates between 76% and 95%, producing a 92.79% absolute gain over native policies.
Where Pith is reading between the lines
- The deterministic kernel layer could be combined with conventional operating-system isolation primitives to create defense-in-depth for agentic workloads.
- Custom policy definitions at the Semantic ISA level would let domain experts express governance rules without modifying the underlying model.
- Similar reification and dependency-graph techniques might apply to other settings where probabilistic components must interface with deterministic safety constraints.
Load-bearing premise
Probabilistic outputs from the underlying model can be reliably and losslessly reified into discrete instructions via the Semantic ISA without introducing new failure modes or missing critical context.
What would settle it
An evaluation run in which an unsafe action (such as an unauthorized network call) completes without interception because the Semantic ISA reification omitted or distorted key context from the model's probabilistic output.
Figures
read the original abstract
The transition of agentic AI from brittle prototypes to production systems is stalled by a pervasive crisis of craft. We suggest that the prevailing orchestration paradigm-delegating the system control loop to large language models and merely patching with heuristic guardrails-is the root cause of this fragility. Instead, we propose Arbiter-K, a Governance-First execution architecture that reconceptualizes the underlying model as a Probabilistic Processing Unit encapsulated by a deterministic, neuro-symbolic kernel. Arbiter-K implements a Semantic Instruction Set Architecture (ISA) to reify probabilistic messages into discrete instructions. This allows the kernel to maintain a Security Context Registry and construct an Instruction Dependency Graph at runtime, enabling active taint propagation based on the data-flow pedigree of each reasoning node. By leveraging this mechanism, Arbiter-K precisely interdicts unsafe trajectories at deterministic sinks (e.g., high-risk tool calls or unauthorized network egress) and enables autonomous execution correction and architectural rollback when security policies are triggered. Evaluations on OpenClaw and NanoBot demonstrate that Arbiter-K enforces security as a microarchitectural property, achieving 76% to 95% unsafe interception for a 92.79% absolute gain over native policies. The code is publicly available at https://github.com/cure-lab/ArbiterOS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Arbiter-K, a governance-first execution architecture that treats the underlying LLM as a Probabilistic Processing Unit encapsulated by a deterministic neuro-symbolic kernel. It introduces a Semantic Instruction Set Architecture (ISA) to reify probabilistic outputs into discrete instructions, populating a Security Context Registry and constructing an Instruction Dependency Graph for runtime taint propagation. This enables deterministic interdiction of unsafe trajectories at sinks such as high-risk tool calls. Evaluations on OpenClaw and NanoBot report 76%–95% unsafe interception rates, for a 92.79% absolute gain over native policies; the code is released publicly.
Significance. If the central claims hold, the work would represent a meaningful shift from heuristic guardrails to microarchitectural security enforcement in agentic systems, potentially improving reliability for production autonomous agents. Public code availability supports reproducibility and is a clear strength.
major comments (3)
- [Evaluation] Evaluation section (and abstract): the reported 76%–95% interception rates and 92.79% gain lack any description of experimental setup, test-case count, baseline implementations, or controls for confounds. Without these, the empirical support for the central security claim cannot be assessed.
- [Architecture] Semantic ISA reification step (abstract and § on architecture): the lossless conversion of probabilistic LLM outputs into discrete instructions that populate the Security Context Registry and Instruction Dependency Graph is load-bearing for all taint-based interdiction claims, yet no fidelity metrics, error rates, coverage statistics, or stress tests on context loss or misclassification are provided.
- [Architecture] Instruction Dependency Graph construction (architecture description): the paper does not address how the graph is built or maintained when reification is incomplete or ambiguous, which directly affects the reliability of data-flow pedigree tracking and deterministic sink interdiction.
minor comments (2)
- Notation for the Semantic ISA and its mapping to the kernel is introduced without a formal definition or example instruction format, making the reification process hard to follow.
- The abstract states the code is publicly available but does not include the repository URL in the main text body; ensure consistency.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments identify important gaps in the empirical and architectural descriptions that we will address through targeted revisions. Below we respond point by point to each major comment.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section (and abstract): the reported 76%–95% interception rates and 92.79% gain lack any description of experimental setup, test-case count, baseline implementations, or controls for confounds. Without these, the empirical support for the central security claim cannot be assessed.
Authors: We agree that the current evaluation section provides insufficient detail to allow independent assessment of the reported interception rates and performance gains. In the revised manuscript we will expand the evaluation section (and update the abstract accordingly) to include: the total number of test cases and scenarios drawn from the OpenClaw and NanoBot benchmarks; explicit descriptions of the baseline implementations (native LLM-driven agent policies without the Arbiter-K kernel); and the controls used for potential confounds such as prompt phrasing, temperature settings, and environmental variability. These additions will be supported by the publicly released code. revision: yes
-
Referee: [Architecture] Semantic ISA reification step (abstract and § on architecture): the lossless conversion of probabilistic LLM outputs into discrete instructions that populate the Security Context Registry and Instruction Dependency Graph is load-bearing for all taint-based interdiction claims, yet no fidelity metrics, error rates, coverage statistics, or stress tests on context loss or misclassification are provided.
Authors: The referee is correct that quantitative validation of the reification step is necessary to support the downstream security claims. While the architecture section describes the Semantic ISA conceptually, we did not report supporting metrics. We will add a new subsection that presents fidelity metrics, reification error rates, coverage statistics, and stress-test results on context loss and misclassification, using both the existing evaluation traces and additional analysis performed on the released codebase. revision: yes
-
Referee: [Architecture] Instruction Dependency Graph construction (architecture description): the paper does not address how the graph is built or maintained when reification is incomplete or ambiguous, which directly affects the reliability of data-flow pedigree tracking and deterministic sink interdiction.
Authors: We acknowledge that the manuscript does not explicitly describe graph construction and maintenance under incomplete or ambiguous reification. This omission affects the claimed reliability of taint propagation. In the revised architecture section we will add a description of the fallback mechanisms employed, including conservative default tainting rules, ambiguity-resolution heuristics, and how these choices preserve deterministic sink interdiction even when reification is imperfect. revision: yes
Circularity Check
No circularity; architecture proposal and empirical results are self-contained
full rationale
The paper advances a governance-first architecture (Arbiter-K) with a Semantic ISA for reifying LLM outputs into discrete instructions, a Security Context Registry, and Instruction Dependency Graph for taint-based interdiction. Central claims rest on this design plus reported evaluation metrics (76-95% interception, 92.79% gain) on OpenClaw and NanoBot. No equations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or description. The derivation chain does not reduce to its inputs by construction; the reification step is an explicit design choice whose fidelity is left to empirical validation rather than assumed by definition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM outputs can be accurately reified into discrete instructions by the Semantic ISA without significant information loss or new vulnerabilities.
invented entities (2)
-
Semantic Instruction Set Architecture (ISA)
no independent evidence
-
Instruction Dependency Graph
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Amazon. 2025. Amazon Bedrock AgentCore Policy: Control Agent-to-Tool Interactions. https://docs.aws.amazon.com/bedrock- agentcore/latest/devguide/policy.html
2025
-
[2]
Anthropic. 2025. Equipping agents for the real world with Agent Skills. https://www.anthropic.com/engineering/equipping-agents- for-the-real-world- with-agent-skills
2025
-
[3]
Xiaohe Bo, Zeyu Zhang, Quanyu Dai, Xueyang Feng, Lei Wang, Rui Li, Xu Chen, and Ji-Rong Wen. 2024. Reflective Multi-Agent Collabo- ration based on Large Language Models. In Proceedings of NeurIPS
2024
-
[4]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovi’c, Luca Beurer- Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents. ArXiv abs/2406.13352 (2024). https://api.semanticscholar. org/CorpusID:270619628
work page internal anchor Pith review arXiv 2024
- [5]
-
[6]
Invariant Labs. 2025. Invariant Guardrails. https://github.com/invariantlabs-ai/invariant
2025
-
[7]
IronClaw Contributors. 2026. IronClaw: Your secure personal AI as- sistant, always on your side. https://github.com/nearai/ironclaw
2026
-
[8]
Yang JingYi, Shuai Shao, Dongrui Liu, and Jing Shao. 2025. RiOS- World: Benchmarking the Risk of Multimodal Computer-Use Agents. In NeurIPS
2025
- [9]
- [10]
-
[11]
Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. 2025. OS- Harm: A Benchmark for Measuring Safety of Computer Use Agents. In NeurIPS
2025
- [12]
-
[13]
Songyang Liu, Chaozhuo Li, Chenxu Wang, Jinyu Hou, Zejian Chen, Litian Zhang, Zheng Liu, Qiwei Ye, Yiming Hei, Xi Zhang, and Zhongyuan Wang. 2026. ClawKeeper: Comprehensive Safety Pro- tection for OpenClaw Agents Through Skills, Plugins, and Watchers. ArXiv abs/2603.24414 (2026)
-
[14]
Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. 2025. A Survey of Context Engineering for Large Language Models. ArXiv abs/2507.13334 (2025)
work page internal anchor Pith review arXiv 2025
-
[15]
NanoBot Contributors. 2026. NanoBot: Ultra-Lightweight Personal AI Agent. https://github.com/HKUDS/nanobot
2026
-
[16]
OpenClaw Contributors. 2026. OpenClaw: Open-Source AI Agent Runtime. https://github.com/openclaw/openclaw
2026
-
[17]
Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Sam- rat Sohel Mondal, and Aman Chadha. 2024. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Ap- plications. ArXiv abs/2402.07927 (2024)
work page internal anchor Pith review arXiv 2024
-
[18]
URL: https: //arxiv.org/abs/2406.06608.arXiv:2406.06608
Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Ka- hadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, Hyo- Jung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyad- hara, Dayeon Ki, Sweta Agrawal, Chau Minh Pham, Gerson C. Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Sa- loni Gupta, Megan L. Rogers, Inn...
-
[19]
Ava Spataru, Eric Hambro, Elena Voita, and Nicola Cancedda. 2024. Know When To Stop: A Study of Semantic Drift in Text Generation. In Proceedings of NAACL. 3656–3671
2024
-
[20]
MI9 – agent intelligence protocol: Runtime governance for agentic AI systems,
Charles L. Wang, Trisha Singhal, Ameya Kelkar, and Jason Tuo. 2025. MI9: An Integrated Runtime Governance Framework for Agentic AI. ArXiv abs/2508.03858 (2025)
- [21]
- [22]
-
[23]
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. 2025. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents. In ICLR
2025
-
[24]
Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, and Minlie Huang. 2024. Agent-SafetyBench: Eval- uating the Safety of LLM Agents. ArXiv abs/2412.14470 (2024). https: //api.semanticscholar.org/CorpusID:274859514
work page internal anchor Pith review arXiv 2024
-
[25]
Wei Zhao, Zhe Li, Peixin Zhang, and Jun Sun. 2026. ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection. ArXiv abs/2604.11790v1 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.