pith. machine review for the scientific record. sign in

arxiv: 2604.06284 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

Recognition: 1 theorem link

· Lean Theorem

ClawLess: A Security Model of AI Agents

Fengwei Zhang, Hongyi Lu, Nian Liu, Shuai Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords AI agentssecurity frameworkformal security modelBPF syscall interceptiondynamic policiesadversarial threat modeluser-space enforcement
0
0 comments X

The pith

ClawLess enforces formally verified policies on potentially adversarial AI agents using dynamic security rules and BPF interception.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous AI agents using large language models can perform complex tasks but introduce security risks when they retrieve information and run code on their own. The paper presents ClawLess as a framework that formalizes security over system entities, trust scopes, and permissions to create policies that adjust based on what the agent does at runtime. These policies convert into concrete rules that a user-space kernel enforces by intercepting system calls with BPF technology. The result supplies security guarantees under a worst-case assumption that the agent itself is adversarial, unlike methods that rely only on training or prompting.

Core claim

ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design under a worst-case threat model where the agent itself may be adversarial.

What carries the argument

The fine-grained security model over system entities, trust scopes, and permissions, which supports the creation of runtime-adaptive policies that translate into enforceable rules.

If this is right

  • Dynamic policies can adjust in real time to match the agent's observed actions.
  • Enforcement succeeds without depending on the agent's internal design or training.
  • Abstract policies produce concrete rules that a user-space mechanism can apply directly.
  • Security remains intact even when the agent is constructed to act against the user's interests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure of entities, scopes, and permissions could apply to other autonomous software systems that act on behalf of users.
  • Combining the approach with hardware isolation features might strengthen protection against kernel-level bypass attempts.
  • Experiments on multi-step agent workflows could show how well the model handles chained permission decisions.

Load-bearing premise

That the formal policies can be correctly translated into enforceable rules and that the user-space BPF interception mechanism can prevent an adversarial agent from bypassing the controls or escalating privileges outside the defined trust scopes.

What would settle it

Placing an adversarial AI agent inside a ClawLess-protected environment and checking whether it can still read protected files, run restricted commands, or escalate privileges despite the active policy rules.

Figures

Figures reproduced from arXiv: 2604.06284 by Fengwei Zhang, Hongyi Lu, Nian Liu, Shuai Wang.

Figure 1
Figure 1. Figure 1: Architectures of secure containers. kernel, on the other hand, achieves a balance between usabil￾ity and security; it protects the host kernel by delegating most kernel-user interactions to the user-space kernel (only one CVE in past ten years) while maintaining usability by acting as a user-space process on the host. Virtualization and confidential containers achieve high security assurance by using hardw… view at source ↗
Figure 2
Figure 2. Figure 2: BPF workflow. BPF offers several key advantages for security monitor￾ing: it compiles to native code, executes with minimal over￾head and supports dynamic policy updates without kernel recompilation. In sum, BPF provides fine-grained visibility into system events (e.g., syscalls, network operations, file ac￾cesses). 3 Threat Model Unlike traditional software which executes a series of prede￾fined instructi… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of ClawLess. Motivation. The design of ClawLess is guided by a funda￾mental insight: AI agents fundamentally break the threat models underlying traditional security mechanisms. Conven￾tional software security assumes static trust boundaries, ap￾plications retrieve data from predetermined endpoints, exe￾cute fixed sets of instruction sequences, and operate within analyzable privilege scopes. AI age… view at source ↗
Figure 4
Figure 4. Figure 4: Syscall interception. the system call number (line 2), and then yields the control to the designated handler via bpf_tail_call (line 3). Since BPF allows dynamically loading and updating the program in prog_arr, ClawLess can easily update the policies by up￾dating the corresponding handler without interrupting the system [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Handler for syscall read. to the file descriptor (line 5). With all the necessary infor￾mation, the handler can then check if this syscall invocation complies with the policies defined previously. External Sandbox. AI agents often need to execute external scripts for tasks like data processing and code generation. These scripts, however, are often potentially malicious and can lead to various security cons… view at source ↗
read the original abstract

Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve information and run code introduces significant security risks. Existing approaches attempt to regulate agent behavior through training or prompting, which does not offer fundamental security guarantees. We present ClawLess, a security framework that enforces formally verified policies on AI agents under a worst-case threat model where the agent itself may be adversarial. ClawLess formalizes a fine-grained security model over system entities, trust scopes, and permissions to express dynamic policies that adapt to agents' runtime behavior. These policies are translated into concrete security rules and enforced through a user-space kernel augmented with BPF-based syscall interception. This approach bridges the formal security model with practical enforcement, ensuring security regardless of the agent's internal design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents ClawLess, a security framework for LLM-powered autonomous AI agents. It claims to formalize a fine-grained security model over system entities, trust scopes, and permissions that supports dynamic policies adapting to runtime agent behavior; these policies are translated into concrete rules and enforced by a user-space kernel augmented with BPF-based syscall interception, providing security guarantees under a worst-case threat model in which the agent itself may be fully adversarial.

Significance. If the formal model, policy translation, and non-bypassable enforcement were rigorously established, the work would offer a principled alternative to training- or prompt-based controls for AI agents. The explicit attempt to connect a dynamic, entity-level security model with practical syscall interception is a constructive direction for the field.

major comments (2)
  1. [Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.
  2. [Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.
minor comments (1)
  1. The phrase 'user-space kernel' is used without clarification; a brief note distinguishing it from a conventional kernel or explaining its privilege boundary would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications and indicating planned revisions to improve the rigor of the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the manuscript asserts that policies are 'formally verified' and that the BPF-based enforcement 'ensures security regardless of the agent's internal design,' yet supplies no formal definitions of the security model, no axioms or semantics for trust scopes and permissions, and no proof or reduction showing that the translation from model to rules preserves the intended invariants.

    Authors: We agree that the abstract makes overly strong claims relative to the formality provided in the body of the work. The manuscript introduces definitions for system entities, trust scopes, and permissions in Section 3 and describes the translation of dynamic policies into enforcement rules in Section 4, but it does not supply explicit axioms, formal semantics, or a proof that the translation preserves invariants. In the revised manuscript we will expand Section 3 with formal semantics and axioms for the model, add a proof sketch for invariant preservation under policy translation, and revise the abstract to state that the model is formalized with support for runtime-adaptive policies rather than asserting full formal verification. revision: yes

  2. Referee: [Abstract] Abstract: the central claim that user-space BPF syscall interception prevents bypass by an adversarial agent is load-bearing for the worst-case threat model, but the text provides no argument addressing standard evasion vectors (direct syscalls, ptrace, process creation, or shared-memory channels) that could allow an agent in the same privilege domain to circumvent the hooks.

    Authors: The referee is correct that the non-bypassability of the enforcement mechanism is central to the worst-case threat model and that the current text does not explicitly analyze potential evasion vectors. The design positions the user-space kernel with BPF interception to mediate all syscalls at the interface between the agent and the system. In the revised version we will add a dedicated subsection to the enforcement section that discusses these vectors and explains the mitigation strategy, including interception of process-creation syscalls and comprehensive hooking to cover direct invocation paths within the agent's privilege domain. revision: yes

Circularity Check

0 steps flagged

No circularity: formal model and enforcement claims are self-contained design choices

full rationale

The paper defines a security model over entities, trust scopes, and permissions, then describes translation to rules enforced by user-space BPF syscall interception. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central claims (formalization, translation, and enforcement under adversarial agent) are presented as a proposed framework rather than derived from prior self-referential results or by construction. The derivation chain does not reduce any prediction or result to its own inputs; it remains an independent modeling and implementation proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on standard security assumptions about external enforcement and the ability to intercept syscalls. No free parameters or new physical entities are introduced.

axioms (2)
  • domain assumption An AI agent may be fully adversarial and its internal reasoning untrusted.
    Explicit worst-case threat model stated in the abstract.
  • domain assumption Dynamic policies expressed over entities and permissions can be translated into concrete enforceable rules without loss of security properties.
    The translation step is presented as reliable.
invented entities (1)
  • ClawLess security model no independent evidence
    purpose: Fine-grained expression of runtime-adaptive policies for AI agents
    New modeling construct introduced by the paper.

pith-pipeline@v0.9.0 · 5428 in / 1318 out tokens · 58583 ms · 2026-05-10T20:10:27.457412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

    cs.CR 2026-05 unverdicted novelty 5.0

    A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

Reference graph

Works this paper leans on

22 extracted references · 8 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Docker CVEs

    2016. Docker CVEs. https://www.cvedetails.com/product/28125/Docker- Docker.html?vendor_id=13534

  2. [2]

    OpenClaw’s CVEs

    2025. OpenClaw’s CVEs. https://github.com/jgamblin/OpenClawCVEs/

  3. [3]

    Confidential Containers

    2026. Confidential Containers. https://github.com/confidential- containers

  4. [4]

    IronClaw

    2026. IronClaw. https://github.com/nearai/ironclaw

  5. [5]

    System Calls - Linux Kernel Documents

    2026. System Calls - Linux Kernel Documents. https://linux-kernel- labs.github.io/refs/heads/master/lectures/syscalls.html

  6. [6]

    Anthropics. 2025. Claude Code. https://github.com/anthropics/claude- code

  7. [7]

    Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, and Jungmin Son

  8. [8]

    arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588

    ExpGuard: LLM Content Moderation in Specialized Do- mains. arXiv: 2603.02588 [cs.CL] https://arxiv.org/abs/2603.02588

  9. [9]

    Saswat Das and Ferdinando Fioretto. 2026. NeuroFilter: Privacy Guardrails for Conversational LLM Agents. ArXiv ClawLess : A Security Model of AI Agents Conference’17, July 2017, Washington, DC, USA abs/2601.14660 (2026). https://api.semanticscholar.org/CorpusID: 284917641

  10. [10]

    Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: an efficient SMT solver. In Proceedings of the Theory and Practice of Soft- ware, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (Budapest, Hun- gary) (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidel- berg, 337340

  11. [11]

    GitHub. 2025. OpenClaw. https://github.com/openclaw/openclaw

  12. [12]

    Google. 2025. gVisor. https://github.com/google/gvisor

  13. [13]

    Yanan Guo, Zhenkai Zhang, and Jun Yang. 2024. GPU Mem- ory Exploitation for Fun and Profit. In 33rd USENIX Secu- rity Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 4033–4050. https://www.usenix.org/conference/ usenixsecurity24/presentation/guo-yanan

  14. [14]

    Evan Li, Tushin Mallick, Evan Rose, William Robert- son, Alina Oprea, and Cristina Nita-Rotaru. 2025. ACE: A Security Architecture for LLM-Integrated App Systems. arXiv:2504.20984 [cs.CR] https://arxiv.org/abs/2504.20984

  15. [15]

    Dirk Merkel. 2014. Docker: lightweight Linux containers for con- sistent development and deployment. Linux J. 2014, 239, Article 2 (March 2014)

  16. [16]

    OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774

  17. [17]

    Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li, Leo Yu Zhang, Ying Zhang, and Lei Ma. 2026. Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems. arXiv:2604.03081 [cs.CR] https://arxiv.org/abs/2604.03081

  18. [18]

    Debdeep Sanyal, Manodeep Ray, and Murari Mandal. 2026. Anti- Dote: Bi-level Adversarial Training for Tamper-Resistant LLMs. Proceedings of the AAAI Conference on Artificial Intelligence 40, 39 (Mar. 2026), 32893–32901. doi:10.1609/aaai.v40i39.40570

  19. [19]

    Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. 2025. IsolateGPT: An Execu- tion Isolation Architecture for LLM-Based Agentic Systems. arXiv:2403.04960 [cs.CR] https://arxiv.org/abs/2403.04960

  20. [20]

    Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language mod- els. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1809–1820

  21. [21]

    Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, et al. 2025. Adasteer: Your aligned llm is inherently an adaptive jailbreak defender. In Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing. 24570–24588

  22. [22]

    Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi, Zhou Yu, and Junfeng Yang. 2026. Proactive defense against LLM Jailbreak. arXiv:2510.05052 [cs.CR] https://arxiv.org/abs/2510.05052