arxiv: 2604.06284 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

Recognition: 1 theorem link

· Lean Theorem

ClawLess: A Security Model of AI Agents

Fengwei Zhang, Hongyi Lu, Nian Liu, Shuai Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords AI agentssecurity frameworkformal security modelBPF syscall interceptiondynamic policiesadversarial threat modeluser-space enforcement

0 comments

The pith

ClawLess enforces formally verified policies on potentially adversarial AI agents using dynamic security rules and BPF interception.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous AI agents using large language models can perform complex tasks but introduce security risks when they retrieve information and run code on their own. The paper presents ClawLess as a framework that formalizes security over system entities, trust scopes, and permissions to create policies that adjust based on what the agent does at runtime. These policies convert into concrete rules that a user-space kernel enforces by intercepting system calls with BPF technology. The result supplies security guarantees under a worst-case assumption that the agent itself is adversarial, unlike methods that rely only on training or prompting.

Core claim

ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design under a worst-case threat model where the agent itself may be adversarial.

What carries the argument

The fine-grained security model over system entities, trust scopes, and permissions, which supports the creation of runtime-adaptive policies that translate into enforceable rules.

If this is right

Dynamic policies can adjust in real time to match the agent's observed actions.
Enforcement succeeds without depending on the agent's internal design or training.
Abstract policies produce concrete rules that a user-space mechanism can apply directly.
Security remains intact even when the agent is constructed to act against the user's interests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structure of entities, scopes, and permissions could apply to other autonomous software systems that act on behalf of users.
Combining the approach with hardware isolation features might strengthen protection against kernel-level bypass attempts.
Experiments on multi-step agent workflows could show how well the model handles chained permission decisions.

Load-bearing premise

That the formal policies can be correctly translated into enforceable rules and that the user-space BPF interception mechanism can prevent an adversarial agent from bypassing the controls or escalating privileges outside the defined trust scopes.

What would settle it

Placing an adversarial AI agent inside a ClawLess-protected environment and checking whether it can still read protected files, run restricted commands, or escalate privileges despite the active policy rules.

Figures

Figures reproduced from arXiv: 2604.06284 by Fengwei Zhang, Hongyi Lu, Nian Liu, Shuai Wang.

**Figure 1.** Figure 1: Architectures of secure containers. kernel, on the other hand, achieves a balance between usability and security; it protects the host kernel by delegating most kernel-user interactions to the user-space kernel (only one CVE in past ten years) while maintaining usability by acting as a user-space process on the host. Virtualization and confidential containers achieve high security assurance by using hardw… view at source ↗

**Figure 2.** Figure 2: BPF workflow. BPF offers several key advantages for security monitoring: it compiles to native code, executes with minimal overhead and supports dynamic policy updates without kernel recompilation. In sum, BPF provides fine-grained visibility into system events (e.g., syscalls, network operations, file accesses). 3 Threat Model Unlike traditional software which executes a series of predefined instructi… view at source ↗

**Figure 3.** Figure 3: Overview of ClawLess. Motivation. The design of ClawLess is guided by a fundamental insight: AI agents fundamentally break the threat models underlying traditional security mechanisms. Conventional software security assumes static trust boundaries, applications retrieve data from predetermined endpoints, execute fixed sets of instruction sequences, and operate within analyzable privilege scopes. AI age… view at source ↗

**Figure 4.** Figure 4: Syscall interception. the system call number (line 2), and then yields the control to the designated handler via bpf_tail_call (line 3). Since BPF allows dynamically loading and updating the program in prog_arr, ClawLess can easily update the policies by updating the corresponding handler without interrupting the system [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Handler for syscall read. to the file descriptor (line 5). With all the necessary information, the handler can then check if this syscall invocation complies with the policies defined previously. External Sandbox. AI agents often need to execute external scripts for tasks like data processing and code generation. These scripts, however, are often potentially malicious and can lead to various security cons… view at source ↗

read the original abstract

Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve information and run code introduces significant security risks. Existing approaches attempt to regulate agent behavior through training or prompting, which does not offer fundamental security guarantees. We present ClawLess, a security framework that enforces formally verified policies on AI agents under a worst-case threat model where the agent itself may be adversarial. ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ClawLess sketches a formal policy model for AI agents plus BPF-based enforcement, but the abstract supplies no definitions, proofs, or evidence that the user-space mechanism actually prevents bypass by an adversarial agent.

read the letter

The paper's core idea is to replace prompting and training with a formal security model that defines entities, trust scopes, and permissions, then translates those into rules enforced by a user-space kernel using BPF for syscall interception. This targets the real gap that current agent systems have no hard guarantees once they can execute code or fetch data. The approach is at least trying to treat the agent as potentially adversarial rather than assuming it will follow instructions, which is a step beyond most existing work on this topic. That combination of formal modeling and low-level enforcement is the main novelty claimed in the abstract. The description is clear enough on the high-level architecture to see what they have in mind. The main weakness is that nothing in the abstract shows the formal model is actually defined, let alone verified, or that the translation to rules preserves the intended properties. More critically, user-space BPF interception sits in the same privilege domain as the agent itself. An adversarial process could use direct syscalls, ptrace, shared memory, or process creation to evade the hooks, and the abstract gives no reduction or argument showing why those vectors are closed. Without implementation details, evaluation against a worst-case agent, or even the formal definitions, it is impossible to tell whether the central claim holds. This reads like an early position or architecture paper rather than a completed result. It is worth sending to referees who work on both formal security and systems enforcement so they can point out exactly where the model needs to be tightened and whether the BPF layer can be made robust. A serious editor should send it out rather than desk-reject, but only if the authors are prepared for substantial revision once the full details are examined.

Referee Report

2 major / 1 minor

Summary. The paper presents ClawLess, a security framework for LLM-powered autonomous AI agents. It claims to formalize a fine-grained security model over system entities, trust scopes, and permissions that supports dynamic policies adapting to runtime agent behavior; these policies are translated into concrete rules and enforced by a user-space kernel augmented with BPF-based syscall interception, providing security guarantees under a worst-case threat model in which the agent itself may be fully adversarial.

Significance. If the formal model, policy translation, and non-bypassable enforcement were rigorously established, the work would offer a principled alternative to training- or prompt-based controls for AI agents. The explicit attempt to connect a dynamic, entity-level security model with practical syscall interception is a constructive direction for the field.

major comments (2)

[Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.
[Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.

minor comments (1)

The phrase 'user-space kernel' is used without clarification; a brief note distinguishing it from a conventional kernel or explaining its privilege boundary would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating planned revisions to improve the rigor of the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.

Authors: We agree that the abstract makes overly strong claims relative to the formality provided in the body of the work. The manuscript introduces definitions for system entities, trust scopes, and permissions in Section 3 and describes the translation of dynamic policies into enforcement rules in Section 4, but it does not supply explicit axioms, formal semantics, or a proof that the translation preserves invariants. In the revised manuscript we will expand Section 3 with formal semantics and axioms for the model, add a proof sketch for invariant preservation under policy translation, and revise the abstract to state that the model is formalized with support for runtime-adaptive policies rather than asserting full formal verification. revision: yes
Referee: [Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.

Authors: The referee is correct that the non-bypassability of the enforcement mechanism is central to the worst-case threat model and that the current text does not explicitly analyze potential evasion vectors. The design positions the user-space kernel with BPF interception to mediate all syscalls at the interface between the agent and the system. In the revised version we will add a dedicated subsection to the enforcement section that discusses these vectors and explains the mitigation strategy, including interception of process-creation syscalls and comprehensive hooking to cover direct invocation paths within the agent's privilege domain. revision: yes

Circularity Check

0 steps flagged

No circularity: formal model and enforcement claims are self-contained design choices

full rationale

The paper defines a security model over entities, trust scopes, and permissions, then describes translation to rules enforced by user-space BPF syscall interception. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central claims (formalization, translation, and enforcement under adversarial agent) are presented as a proposed framework rather than derived from prior self-referential results or by construction. The derivation chain does not reduce any prediction or result to its own inputs; it remains an independent modeling and implementation proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on standard security assumptions about external enforcement and the ability to intercept syscalls. No free parameters or new physical entities are introduced.

axioms (2)

domain assumption An AI agent may be fully adversarial and its internal reasoning untrusted.
Explicit worst-case threat model stated in the abstract.
domain assumption Dynamic policies expressed over entities and permissions can be translated into concrete enforceable rules without loss of security properties.
The translation step is presented as reliable.

invented entities (1)

ClawLess security model no independent evidence
purpose: Fine-grained expression of runtime-adaptive policies for AI agents
New modeling construct introduced by the paper.

pith-pipeline@v0.9.0 · 5428 in / 1318 out tokens · 58583 ms · 2026-05-10T20:10:27.457412+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
cs.CR 2026-05 unverdicted novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

Reference graph

Works this paper leans on

22 extracted references · 8 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Docker CVEs

2016. Docker CVEs. https://www.cvedetails.com/product/28125/Docker- Docker.html?vendor_id=13534

2016
[2]

OpenClaw’s CVEs

2025. OpenClaw’s CVEs. https://github.com/jgamblin/OpenClawCVEs/

2025
[3]

Conﬁdential Containers

2026. Conﬁdential Containers. https://github.com/confidential- containers

2026
[4]

IronClaw

2026. IronClaw. https://github.com/nearai/ironclaw

2026
[5]

System Calls - Linux Kernel Documents

2026. System Calls - Linux Kernel Documents. https://linux-kernel- labs.github.io/refs/heads/master/lectures/syscalls.html

2026
[6]

Anthropics. 2025. Claude Code. https://github.com/anthropics/claude- code

2025
[7]

Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, and Jungmin Son
[8]

arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588

ExpGuard: LLM Content Moderation in Specialized Do- mains. arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588

work page arXiv
[9]

Saswat Das and Ferdinando Fioretto. 2026. NeuroFilter: Privacy Guardrails for Conversational LLM Agents. ArXiv ClawLess : A Security Model of AI Agents Conference’17, July 2017, Washington, DC, USA abs/2601.14660 (2026). https://api.semanticscholar.org/CorpusID: 284917641

work page arXiv 2026
[10]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: an eﬃcient SMT solver. In Proceedings of the Theory and Practice of Soft- ware, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hun- gary) (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidel- berg, 337340

2008
[11]

GitHub. 2025. OpenClaw. https://github.com/openclaw/openclaw

2025
[12]

Google. 2025. gVisor. https://github.com/google/gvisor

2025
[13]

Yanan Guo, Zhenkai Zhang, and Jun Yang. 2024. GPU Mem- ory Exploitation for Fun and Proﬁt. In 33rd USENIX Secu- rity Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 4033–4050. https://www.usenix.org/conference/ usenixsecurity24/presentation/guo-yanan

2024
[14]

Evan Li, Tushin Mallick, Evan Rose, William Robert- son, Alina Oprea, and Cristina Nita-Rotaru. 2025. ACE: A Security Architecture for LLM-Integrated App Systems. arXiv:2504.20984 [cs.CR] https://arxiv.org/abs/2504.20984

work page arXiv 2025
[15]

Dirk Merkel. 2014. Docker: lightweight Linux containers for con- sistent development and deployment. Linux J. 2014, 239, Article 2 (March 2014)

2014
[16]

OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, and Lei Ma. 2026. Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems. arXiv:2604.03081 [cs.CR] https://arxiv.org/abs/2604.03081

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Debdeep Sanyal, Manodeep Ray, and Murari Mandal. 2026. Anti- Dote: Bi-level Adversarial Training for Tamper-Resistant LLMs. Proceedings of the AAAI Conference on Artiﬁcial Intelligence 40, 39 (Mar. 2026), 32893–32901. doi:10.1609/aaai.v40i39.40570

work page doi:10.1609/aaai.v40i39.40570 2026
[19]

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. 2025. IsolateGPT: An Execu- tion Isolation Architecture for LLM-Based Agentic Systems. arXiv:2403.04960 [cs.CR] https://arxiv.org/abs/2403.04960

work page arXiv 2025
[20]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language mod- els. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1809–1820

2025
[21]

Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, et al. 2025. Adasteer: Your aligned llm is inherently an adaptive jailbreak defender. In Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing. 24570–24588

2025
[22]

Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi, Zhou Yu, and Junfeng Yang. 2026. Proactive defense against LLM Jailbreak. arXiv:2510.05052 [cs.CR] https://arxiv.org/abs/2510.05052

work page arXiv 2026