pith. machine review for the scientific record. sign in

arxiv: 2604.27464 · v1 · submitted 2026-04-30 · 💻 cs.CR · cs.AI

Recognition: unknown

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:39 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords autonomous agentsLLM securityattack surfacesdefense strategieslayered analysisthreat propagationagent frameworks
0
0 comments X

The pith

Autonomous agent frameworks carry security risks that propagate across four layers from input manipulation to ecosystem-wide effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper organizes security analysis of LLM-based autonomous agents into four layers to show how individual vulnerabilities can chain together. It reviews risks and defenses at each layer using OpenClaw as a concrete running example. A sympathetic reader would care because these systems are moving from simple chatbots toward continuously running tool users that control external resources. The central move is to treat threat propagation as the key phenomenon rather than isolated attacks at any single layer.

Core claim

By mapping security issues onto the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer, the analysis shows that threats can move from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact.

What carries the argument

The four-layer model that separates functional roles, representative risks, and defense strategies while tracing how attacks cross layer boundaries.

If this is right

  • Defenses must address cross-layer propagation rather than single-layer fixes.
  • Long-horizon evaluations become necessary to detect persistent-state and ecosystem effects.
  • Ecosystem trust models need strengthening because automation layers connect multiple agents and services.
  • Research attention should be rebalanced toward under-studied layers such as state persistence and ecosystem automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same layering could be tested on other open agent frameworks to check whether threat propagation patterns hold beyond the OpenClaw case.
  • Developers might add explicit layer-boundary checks inside agent runtimes to interrupt propagation chains before they reach external tools.
  • Standardized benchmarks could measure how far an initial prompt injection travels through the four layers in controlled environments.

Load-bearing premise

The four layers capture all security-relevant aspects of these frameworks and OpenClaw is representative enough to illustrate the general risks.

What would settle it

A documented attack sequence in an autonomous agent system that remains confined to one layer without producing downstream effects on actions, state, or external ecosystems.

Figures

Figures reproduced from arXiv: 2604.27464 by Luyao Xu, Xiang Chen.

Figure 1
Figure 1. Figure 1: Layered Architecture of Autonomous Agent Frameworks: An Illustration Using OpenClaw as a Case Study. view at source ↗
Figure 2
Figure 2. Figure 2: Cross-layer attack propagation in autonomous agent frameworks. view at source ↗
read the original abstract

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems, introducing security risks beyond traditional prompt-level vulnerabilities. As this paradigm is still at an early stage of development, a timely and systematic understanding of its security implications is increasingly important. Although a growing body of work has examined different attack surfaces and defense problems in agent systems, existing studies remain scattered across individual aspects of agent security, and there is still a lack of a layered review on this topic. To address this gap, this survey presents a layered review of security risks and defense strategies in autonomous agent frameworks, with OpenClaw as a case study. We organize the analysis into four security-relevant layers: the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer. For each layer, we summarize its functional role, representative security risks, and corresponding defense strategies. Based on this layered analysis, we further identify that threats in autonomous agent frameworks may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact. Finally, we highlight potential key challenges, including research imbalance across layers, the lack of long-horizon evaluation, and weak ecosystem trust models, and outline future directions toward more systematic and integrated defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a layered survey of security risks and defense strategies for LLM-based autonomous agent frameworks. It organizes the domain into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation), summarizes representative attacks and defenses per layer, uses OpenClaw as a case study, identifies cross-layer threat propagation (from manipulated inputs through unsafe actions and state contamination to ecosystem impact), and outlines challenges such as research imbalance, lack of long-horizon evaluation, and weak trust models along with future directions.

Significance. If the four-layer organization is comprehensive and the cross-layer propagation claim is substantiated, the work would provide a useful organizing framework for an emerging area, helping to move agent security research from isolated per-component studies toward integrated analyses. The survey nature means its value lies in synthesis rather than new derivations or proofs.

major comments (2)
  1. [Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.
  2. [Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.
minor comments (2)
  1. [OpenClaw case study] Clarify the selection criteria for OpenClaw as the case study and explicitly state how its architecture maps onto the four layers to strengthen the claim of representativeness.
  2. [Throughout the layered analysis sections] Ensure consistent terminology across layers (e.g., 'context' vs. 'instruction' layer) and add a summary table comparing risks and defenses across all four layers for improved readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and outline the revisions we plan to make.

read point-by-point responses
  1. Referee: [Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.

    Authors: We acknowledge the referee's point that the OpenClaw case study primarily presents risks and defenses organized by layer rather than providing explicit multi-step attack traces. The propagation claim in the abstract is intended as a synthesis derived from the layered analysis, highlighting how vulnerabilities in one layer can logically lead to impacts in others based on the functional dependencies described. However, to strengthen the evidence, we will revise the manuscript to include concrete illustrative examples of cross-layer propagation paths, drawing from both the OpenClaw components and related literature where possible. This will clarify the basis for the observation and address the concern that it remains a hypothesis. revision: yes

  2. Referee: [Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.

    Authors: We agree that an explicit justification for the four-layer decomposition would enhance the manuscript's rigor. In the revised version, we will expand the introduction to include a discussion of the layer selection rationale. Specifically, we will explain that the four layers are chosen to span the primary security boundaries from external inputs to system-wide impacts, with memory and retrieval subsumed under the state and persistence layer due to their role in maintaining agent state, and multi-agent coordination considered as part of the ecosystem and automation layer. We will also briefly contrast this with alternative organizations to demonstrate why the chosen structure best supports the analysis of threat propagation. revision: yes

Circularity Check

0 steps flagged

No circularity in this survey paper's layered organization or propagation claim.

full rationale

This is a survey paper that organizes existing literature on security risks and defenses in autonomous agent frameworks into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) and uses OpenClaw as a case study to illustrate them. The claim that threats 'may propagate across layers' is presented as an identification arising from the layered review of prior work, without any mathematical derivations, equations, fitted parameters, predictions from data subsets, or self-referential constructions that reduce the result to its own inputs by definition. No self-citation load-bearing steps, uniqueness theorems from the authors, or ansatzes smuggled via citation appear in the provided abstract or structure. The analysis draws on external literature for each layer's risks and defenses, remaining self-contained against external benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the assumption that security issues divide cleanly into the four named layers and that prior work can be mapped onto them without significant overlap or omission.

axioms (1)
  • domain assumption Security risks in autonomous agent frameworks can be partitioned into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) that capture functional roles and allow analysis of cross-layer propagation.
    This partitioning is the core organizing principle stated in the abstract and is not derived from prior equations or data.

pith-pipeline@v0.9.0 · 5541 in / 1259 out tokens · 44952 ms · 2026-05-07T09:39:33.187273+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 17 canonical work pages · 5 internal anchors

  1. [1]

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, Y . Cao, React: Synergizing reasoning and acting in language models, in: The eleventh international conference on learning representations, 2022

  2. [2]

    Schick, J

    T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, Toolformer: Language models can teach them- selves to use tools, Advances in neural information pro- cessing systems 36 (2023) 68539–68551

  3. [3]

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: Interactive simulacra of human behavior, in: Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22

  4. [4]

    X. Chen, A. Zeng, et al., A survey on large language model based autonomous agents, in: CCL 2024–23rd Chinese Natl Conf Comput Linguist, V ol. 2, 2024, pp. 141–150

  5. [5]

    openclaw.ai/concepts/system-prompt, official doc- umentation (2026)

    OpenClaw, System prompt - openclaw, https://docs. openclaw.ai/concepts/system-prompt, official doc- umentation (2026)

  6. [6]

    openclaw.ai/concepts/context, official documenta- tion (2026)

    OpenClaw, Context - openclaw, https://docs. openclaw.ai/concepts/context, official documenta- tion (2026)

  7. [7]

    OpenClaw, Tools, skills, and plugins - openclaw,https: //docs.openclaw.ai/tools, official documentation (2026)

  8. [8]

    ai/tools/skills, official documentation (2026)

    OpenClaw, Skills - openclaw, https://docs.openclaw. ai/tools/skills, official documentation (2026)

  9. [9]

    OpenClaw, Agent workspace - openclaw, https:// docs.openclaw.ai/concepts/agent-workspace, of- ficial documentation (2026)

  10. [10]

    OpenClaw, Heartbeat (gateway) - openclaw, https:// docs.openclaw.ai/gateway/heartbeat, official doc- umentation (2026)

  11. [11]

    OpenClaw, Cron jobs (gateway scheduler) - open- claw, https://docs.openclaw.ai/automation/ cron-jobs, official documentation (2026)

  12. [12]

    openclaw.ai/gateway/security, official documenta- tion (2026)

    OpenClaw, Security - openclaw, https://docs. openclaw.ai/gateway/security, official documenta- tion (2026). 12

  13. [13]

    Debenedetti, J

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr, Agentdojo: A dynamic environ- ment to evaluate prompt injection attacks and defenses for llm agents, Advances in Neural Information Processing Systems 37 (2024) 82895–82920

  14. [14]

    Zhang, J

    H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, Y . Zhang, Agent security bench (asb): Formaliz- ing and benchmarking attacks and defenses in llm-based agents, in: The Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id= V4y0CpX4hK

  15. [15]

    Y . Wang, F. Xu, Z. Lin, G. He, Y . Huang, H. Gao, Z. Niu, S. Lian, Z. Liu, From assistant to double agent: Formaliz- ing and benchmarking attacks on openclaw for person- alized local ai agent, arXiv preprint arXiv:2602.08412 (2026)

  16. [16]

    B. Dong, H. Feng, Q. Wang, Clawdrain: Exploiting tool- calling chains for stealthy token exhaustion in openclaw agents, arXiv preprint arXiv:2603.00902 (2026)

  17. [17]

    F. Liu, Z. Chen, T. Lan, H. Tan, Z. Xu, X. Li, G. Chen, Y . Meng, H. Zhu, Trojan’s whisper: Stealthy manipulation of openclaw through injected bootstrapped guidance, arXiv preprint arXiv:2603.19974 (2026)

  18. [18]

    OpenClaw Contributors, Rfc: Skill security framework — permission manifests, capability verification, and in- struction boundary enforcement, https://github.com/ openclaw/openclaw/issues/10890, gitHub issue/ RFC discussion (2026)

  19. [19]

    Y . Dong, R. Mu, Y . Zhang, S. Sun, T. Zhang, C. Wu, G. Jin, Y . Qi, J. Hu, J. Meng, et al., Safeguarding large language models: A survey, Artificial intelligence review 58 (12) (2025) 382

  20. [20]

    T. Geng, Z. Xu, Y . Qu, W. E. Wong, Prompt injection attacks on large language models: A survey of attack meth- ods, root causes, and defense strategies, Computers, Mate- rials, & Continua 87 (1) (2026)

  21. [21]

    M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, M. Debbah, From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows, ICT Express (2025)

  22. [22]

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, T. Hashimoto, Identifying the risks of lm agents with an lm-emulated sandbox, arXiv preprint arXiv:2309.15817 (2023)

  23. [23]

    W. Hua, X. Yang, M. Jin, Z. Li, W. Cheng, R. Tang, Y . Zhang, Trustagent: Towards safe and trustworthy llm- based agents through agent constitution, in: Trustworthy Multi-modal Foundation Models and AI Agents (TiFA), 2024

  24. [24]

    OpenClaw, Threat model (mitre atlas) - open- claw, https://docs.openclaw.ai/security/ THREAT-MODEL-ATLAS, official documentation (2026)

  25. [25]

    Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, et al., Prompt in- jection attack against llm-integrated applications, arXiv preprint arXiv:2306.05499 (2023)

  26. [26]

    Greshake, S

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, M. Fritz, Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, in: Proceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90

  27. [27]

    Q. Zhan, Z. Liang, Z. Ying, D. Kang, Injecagent: Bench- marking indirect prompt injections in tool-integrated large language model agents, in: Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10471– 10506

  28. [28]

    ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

    H. Chang, Y . Jun, H. Lee, Chatinject: Abusing chat tem- plates for prompt injection in llm agents, arXiv preprint arXiv:2509.22830 (2025)

  29. [29]

    C. Yin, R. Geng, Y . Wang, J. Jia, Pismith: Reinforcement learning-based red teaming for prompt injection defenses, arXiv preprint arXiv:2603.13026 (2026)

  30. [30]

    T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, et al., Promptar- mor: Simple yet effective prompt injection defenses, arXiv preprint arXiv:2507.15219 (2025)

  31. [31]

    Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, Defending against prompt injection with datafilter, arXiv preprint arXiv:2510.19207 (2025)

  32. [32]

    Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste

    R. Bhagwatkar, K. Kasa, A. Puri, G. Huang, I. Rish, G. W. Taylor, K. D. Dvijotham, A. Lacoste, Indirect prompt injec- tions: Are firewalls all you need, or stronger benchmarks?, arXiv preprint arXiv:2510.05244 (2025)

  33. [33]

    S. G. Patil, T. Zhang, X. Wang, J. E. Gonzalez, Go- rilla: Large language model connected with massive apis, Advances in Neural Information Processing Systems 37 (2024) 126544–126565

  34. [34]

    J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, Prompt injection attack to tool selection in llm agents, arXiv preprint arXiv:2504.19793 (2025)

  35. [35]

    Faghih, W

    K. Faghih, W. Wang, Y . Cheng, S. Bharti, G. Sriramanan, S. Balasubramanian, P. Hosseini, S. Feizi, Tool preferences in agentic llms are unreliable, in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 20965–20980

  36. [36]

    Q. Lin, M. Wen, Q. Peng, G. Nie, J. Liao, J. Wang, X. Mo, J. Zhou, C. Cheng, Y . Zhao, et al., Hammer: Robust function-calling for on-device language models via func- tion masking, arXiv preprint arXiv:2410.04587 (2024). 13

  37. [37]

    T. Yuan, Z. He, L. Dong, Y . Wang, R. Zhao, T. Xia, L. Xu, B. Zhou, F. Li, Z. Zhang, et al., R-judge: Benchmarking safety risk awareness for llm agents, in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 1467–1490

  38. [38]

    Packer, V

    C. Packer, V . Fang, S. Patil, K. Lin, S. Wooders, J. Gonza- lez, Memgpt: towards llms as operating systems. (2023)

  39. [39]

    Carlini, M

    N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, F. Tramèr, Poisoning web-scale training datasets is practical, in: 2024 IEEE Symposium on Security and Privacy (SP), IEEE, 2024, pp. 407–425

  40. [40]

    Z. Lin, C. Li, K. Chen, A survey on the security of long- term memory in llm agents: Toward mnemonic sovereignty, arXiv preprint arXiv:2604.16548 (2026)

  41. [41]

    B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, S. Mishra, Memory poisoning attack and defense on mem- ory based llm-agents, arXiv preprint arXiv:2601.05504 (2026)

  42. [42]

    TrustRAG: Enhancing robustness and trustworthiness in retrieval-augmented generation,

    H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, E. Yilmaz, Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation, arXiv preprint arXiv:2501.00879 (2025)

  43. [43]

    W. Zhao, V . Khazanchi, H. Xing, X. He, Q. Xu, N. D. Lane, Attacks on third-party apis of large language models, arXiv preprint arXiv:2404.16891 (2024)

  44. [44]

    Y . Qu, Y . Liu, T. Geng, G. Deng, Y . Li, L. Y . Zhang, Y . Zhang, L. Ma, Supply-chain poisoning attacks against llm coding agent skill ecosystems, arXiv preprint arXiv:2604.03081 (2026)

  45. [45]

    P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, H. Liu, Red- teaming llm multi-agent systems via communication at- tacks, in: Findings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 6726–6747

  46. [46]

    W. Luo, S. Dai, X. Liu, S. Banerjee, H. Sun, M. Chen, C. Xiao, Agrail: A lifelong agent guardrail with effective and adaptive safety detection, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Lin- guistics (V olume 1: Long Papers), 2025, pp. 8104–8139

  47. [47]

    Q. Zhan, R. Fang, H. S. Panchal, D. Kang, Adaptive attacks break defenses against indirect prompt injection attacks on llm agents, in: Findings of the Association for Computa- tional Linguistics: NAACL 2025, 2025, pp. 7101–7117

  48. [48]

    J. Ye, S. Li, G. Li, C. Huang, S. Gao, Y . Wu, Q. Zhang, T. Gui, X.-J. Huang, Toolsword: Unveiling safety issues of large language models in tool learning across three stages, in: Proceedings of the 62nd Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Papers), 2024, pp. 2181–2211. Luyao Xuis currently pursuing his Master...