Recognition: 1 theorem link
· Lean TheoremClawLess: A Security Model of AI Agents
Pith reviewed 2026-05-10 20:10 UTC · model grok-4.3
The pith
ClawLess enforces formally verified policies on potentially adversarial AI agents using dynamic security rules and BPF interception.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design under a worst-case threat model where the agent itself may be adversarial.
What carries the argument
The fine-grained security model over system entities, trust scopes, and permissions, which supports the creation of runtime-adaptive policies that translate into enforceable rules.
If this is right
- Dynamic policies can adjust in real time to match the agent's observed actions.
- Enforcement succeeds without depending on the agent's internal design or training.
- Abstract policies produce concrete rules that a user-space mechanism can apply directly.
- Security remains intact even when the agent is constructed to act against the user's interests.
Where Pith is reading between the lines
- The same structure of entities, scopes, and permissions could apply to other autonomous software systems that act on behalf of users.
- Combining the approach with hardware isolation features might strengthen protection against kernel-level bypass attempts.
- Experiments on multi-step agent workflows could show how well the model handles chained permission decisions.
Load-bearing premise
That the formal policies can be correctly translated into enforceable rules and that the user-space BPF interception mechanism can prevent an adversarial agent from bypassing the controls or escalating privileges outside the defined trust scopes.
What would settle it
Placing an adversarial AI agent inside a ClawLess-protected environment and checking whether it can still read protected files, run restricted commands, or escalate privileges despite the active policy rules.
Figures
read the original abstract
Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve information and run code introduces significant security risks. Existing approaches attempt to regulate agent behavior through training or prompting, which does not offer fundamental security guarantees. We present ClawLess, a security framework that enforces formally verified policies on AI agents under a worst-case threat model where the agent itself may be adversarial. ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ClawLess, a security framework for LLM-powered autonomous AI agents. It claims to formalize a fine-grained security model over system entities, trust scopes, and permissions that supports dynamic policies adapting to runtime agent behavior; these policies are translated into concrete rules and enforced by a user-space kernel augmented with BPF-based syscall interception, providing security guarantees under a worst-case threat model in which the agent itself may be fully adversarial.
Significance. If the formal model, policy translation, and non-bypassable enforcement were rigorously established, the work would offer a principled alternative to training- or prompt-based controls for AI agents. The explicit attempt to connect a dynamic, entity-level security model with practical syscall interception is a constructive direction for the field.
major comments (2)
- [Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.
- [Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.
minor comments (1)
- The phrase 'user-space kernel' is used without clarification; a brief note distinguishing it from a conventional kernel or explaining its privilege boundary would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating planned revisions to improve the rigor of the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.
Authors: We agree that the abstract makes overly strong claims relative to the formality provided in the body of the work. The manuscript introduces definitions for system entities, trust scopes, and permissions in Section 3 and describes the translation of dynamic policies into enforcement rules in Section 4, but it does not supply explicit axioms, formal semantics, or a proof that the translation preserves invariants. In the revised manuscript we will expand Section 3 with formal semantics and axioms for the model, add a proof sketch for invariant preservation under policy translation, and revise the abstract to state that the model is formalized with support for runtime-adaptive policies rather than asserting full formal verification. revision: yes
-
Referee: [Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.
Authors: The referee is correct that the non-bypassability of the enforcement mechanism is central to the worst-case threat model and that the current text does not explicitly analyze potential evasion vectors. The design positions the user-space kernel with BPF interception to mediate all syscalls at the interface between the agent and the system. In the revised version we will add a dedicated subsection to the enforcement section that discusses these vectors and explains the mitigation strategy, including interception of process-creation syscalls and comprehensive hooking to cover direct invocation paths within the agent's privilege domain. revision: yes
Circularity Check
No circularity: formal model and enforcement claims are self-contained design choices
full rationale
The paper defines a security model over entities, trust scopes, and permissions, then describes translation to rules enforced by user-space BPF syscall interception. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central claims (formalization, translation, and enforcement under adversarial agent) are presented as a proposed framework rather than derived from prior self-referential results or by construction. The derivation chain does not reduce any prediction or result to its own inputs; it remains an independent modeling and implementation proposal.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption An AI agent may be fully adversarial and its internal reasoning untrusted.
- domain assumption Dynamic policies expressed over entities and permissions can be translated into concrete enforceable rules without loss of security properties.
invented entities (1)
-
ClawLess security model
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Reference graph
Works this paper leans on
-
[1]
Docker CVEs
2016. Docker CVEs. https://www.cvedetails.com/product/28125/Docker- Docker.html?vendor_id=13534
2016
-
[2]
OpenClaw’s CVEs
2025. OpenClaw’s CVEs. https://github.com/jgamblin/OpenClawCVEs/
2025
-
[3]
Confidential Containers
2026. Confidential Containers. https://github.com/confidential- containers
2026
-
[4]
IronClaw
2026. IronClaw. https://github.com/nearai/ironclaw
2026
-
[5]
System Calls - Linux Kernel Documents
2026. System Calls - Linux Kernel Documents. https://linux-kernel- labs.github.io/refs/heads/master/lectures/syscalls.html
2026
-
[6]
Anthropics. 2025. Claude Code. https://github.com/anthropics/claude- code
2025
-
[7]
Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, and Jungmin Son
-
[8]
arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588
ExpGuard: LLM Content Moderation in Specialized Do- mains. arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588
- [9]
-
[10]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: an efficient SMT solver. In Proceedings of the Theory and Practice of Soft- ware, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hun- gary) (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidel- berg, 337340
2008
-
[11]
GitHub. 2025. OpenClaw. https://github.com/openclaw/openclaw
2025
-
[12]
Google. 2025. gVisor. https://github.com/google/gvisor
2025
-
[13]
Yanan Guo, Zhenkai Zhang, and Jun Yang. 2024. GPU Mem- ory Exploitation for Fun and Profit. In 33rd USENIX Secu- rity Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 4033–4050. https://www.usenix.org/conference/ usenixsecurity24/presentation/guo-yanan
2024
- [14]
-
[15]
Dirk Merkel. 2014. Docker: lightweight Linux containers for con- sistent development and deployment. Linux J. 2014, 239, Article 2 (March 2014)
2014
-
[16]
OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, and Lei Ma. 2026. Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems. arXiv:2604.03081 [cs.CR] https://arxiv.org/abs/2604.03081
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Debdeep Sanyal, Manodeep Ray, and Murari Mandal. 2026. Anti- Dote: Bi-level Adversarial Training for Tamper-Resistant LLMs. Proceedings of the AAAI Conference on Artificial Intelligence 40, 39 (Mar. 2026), 32893–32901. doi:10.1609/aaai.v40i39.40570
- [19]
-
[20]
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language mod- els. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1809–1820
2025
-
[21]
Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, et al. 2025. Adasteer: Your aligned llm is inherently an adaptive jailbreak defender. In Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing. 24570–24588
2025
- [22]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.