Recognition: unknown
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study
Pith reviewed 2026-05-07 09:39 UTC · model grok-4.3
The pith
Autonomous agent frameworks carry security risks that propagate across four layers from input manipulation to ecosystem-wide effects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By mapping security issues onto the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer, the analysis shows that threats can move from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact.
What carries the argument
The four-layer model that separates functional roles, representative risks, and defense strategies while tracing how attacks cross layer boundaries.
If this is right
- Defenses must address cross-layer propagation rather than single-layer fixes.
- Long-horizon evaluations become necessary to detect persistent-state and ecosystem effects.
- Ecosystem trust models need strengthening because automation layers connect multiple agents and services.
- Research attention should be rebalanced toward under-studied layers such as state persistence and ecosystem automation.
Where Pith is reading between the lines
- The same layering could be tested on other open agent frameworks to check whether threat propagation patterns hold beyond the OpenClaw case.
- Developers might add explicit layer-boundary checks inside agent runtimes to interrupt propagation chains before they reach external tools.
- Standardized benchmarks could measure how far an initial prompt injection travels through the four layers in controlled environments.
Load-bearing premise
The four layers capture all security-relevant aspects of these frameworks and OpenClaw is representative enough to illustrate the general risks.
What would settle it
A documented attack sequence in an autonomous agent system that remains confined to one layer without producing downstream effects on actions, state, or external ecosystems.
Figures
read the original abstract
Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems, introducing security risks beyond traditional prompt-level vulnerabilities. As this paradigm is still at an early stage of development, a timely and systematic understanding of its security implications is increasingly important. Although a growing body of work has examined different attack surfaces and defense problems in agent systems, existing studies remain scattered across individual aspects of agent security, and there is still a lack of a layered review on this topic. To address this gap, this survey presents a layered review of security risks and defense strategies in autonomous agent frameworks, with OpenClaw as a case study. We organize the analysis into four security-relevant layers: the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer. For each layer, we summarize its functional role, representative security risks, and corresponding defense strategies. Based on this layered analysis, we further identify that threats in autonomous agent frameworks may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact. Finally, we highlight potential key challenges, including research imbalance across layers, the lack of long-horizon evaluation, and weak ecosystem trust models, and outline future directions toward more systematic and integrated defenses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a layered survey of security risks and defense strategies for LLM-based autonomous agent frameworks. It organizes the domain into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation), summarizes representative attacks and defenses per layer, uses OpenClaw as a case study, identifies cross-layer threat propagation (from manipulated inputs through unsafe actions and state contamination to ecosystem impact), and outlines challenges such as research imbalance, lack of long-horizon evaluation, and weak trust models along with future directions.
Significance. If the four-layer organization is comprehensive and the cross-layer propagation claim is substantiated, the work would provide a useful organizing framework for an emerging area, helping to move agent security research from isolated per-component studies toward integrated analyses. The survey nature means its value lies in synthesis rather than new derivations or proofs.
major comments (2)
- [Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.
- [Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.
minor comments (2)
- [OpenClaw case study] Clarify the selection criteria for OpenClaw as the case study and explicitly state how its architecture maps onto the four layers to strengthen the claim of representativeness.
- [Throughout the layered analysis sections] Ensure consistent terminology across layers (e.g., 'context' vs. 'instruction' layer) and add a summary table comparing risks and defenses across all four layers for improved readability.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below and outline the revisions we plan to make.
read point-by-point responses
-
Referee: [Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.
Authors: We acknowledge the referee's point that the OpenClaw case study primarily presents risks and defenses organized by layer rather than providing explicit multi-step attack traces. The propagation claim in the abstract is intended as a synthesis derived from the layered analysis, highlighting how vulnerabilities in one layer can logically lead to impacts in others based on the functional dependencies described. However, to strengthen the evidence, we will revise the manuscript to include concrete illustrative examples of cross-layer propagation paths, drawing from both the OpenClaw components and related literature where possible. This will clarify the basis for the observation and address the concern that it remains a hypothesis. revision: yes
-
Referee: [Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.
Authors: We agree that an explicit justification for the four-layer decomposition would enhance the manuscript's rigor. In the revised version, we will expand the introduction to include a discussion of the layer selection rationale. Specifically, we will explain that the four layers are chosen to span the primary security boundaries from external inputs to system-wide impacts, with memory and retrieval subsumed under the state and persistence layer due to their role in maintaining agent state, and multi-agent coordination considered as part of the ecosystem and automation layer. We will also briefly contrast this with alternative organizations to demonstrate why the chosen structure best supports the analysis of threat propagation. revision: yes
Circularity Check
No circularity in this survey paper's layered organization or propagation claim.
full rationale
This is a survey paper that organizes existing literature on security risks and defenses in autonomous agent frameworks into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) and uses OpenClaw as a case study to illustrate them. The claim that threats 'may propagate across layers' is presented as an identification arising from the layered review of prior work, without any mathematical derivations, equations, fitted parameters, predictions from data subsets, or self-referential constructions that reduce the result to its own inputs by definition. No self-citation load-bearing steps, uniqueness theorems from the authors, or ansatzes smuggled via citation appear in the provided abstract or structure. The analysis draws on external literature for each layer's risks and defenses, remaining self-contained against external benchmarks rather than internally forced.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Security risks in autonomous agent frameworks can be partitioned into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) that capture functional roles and allow analysis of cross-layer propagation.
Reference graph
Works this paper leans on
-
[1]
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, Y . Cao, React: Synergizing reasoning and acting in language models, in: The eleventh international conference on learning representations, 2022
2022
-
[2]
Schick, J
T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, Toolformer: Language models can teach them- selves to use tools, Advances in neural information pro- cessing systems 36 (2023) 68539–68551
2023
-
[3]
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: Interactive simulacra of human behavior, in: Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22
2023
-
[4]
X. Chen, A. Zeng, et al., A survey on large language model based autonomous agents, in: CCL 2024–23rd Chinese Natl Conf Comput Linguist, V ol. 2, 2024, pp. 141–150
2024
-
[5]
openclaw.ai/concepts/system-prompt, official doc- umentation (2026)
OpenClaw, System prompt - openclaw, https://docs. openclaw.ai/concepts/system-prompt, official doc- umentation (2026)
2026
-
[6]
openclaw.ai/concepts/context, official documenta- tion (2026)
OpenClaw, Context - openclaw, https://docs. openclaw.ai/concepts/context, official documenta- tion (2026)
2026
-
[7]
OpenClaw, Tools, skills, and plugins - openclaw,https: //docs.openclaw.ai/tools, official documentation (2026)
2026
-
[8]
ai/tools/skills, official documentation (2026)
OpenClaw, Skills - openclaw, https://docs.openclaw. ai/tools/skills, official documentation (2026)
2026
-
[9]
OpenClaw, Agent workspace - openclaw, https:// docs.openclaw.ai/concepts/agent-workspace, of- ficial documentation (2026)
2026
-
[10]
OpenClaw, Heartbeat (gateway) - openclaw, https:// docs.openclaw.ai/gateway/heartbeat, official doc- umentation (2026)
2026
-
[11]
OpenClaw, Cron jobs (gateway scheduler) - open- claw, https://docs.openclaw.ai/automation/ cron-jobs, official documentation (2026)
2026
-
[12]
openclaw.ai/gateway/security, official documenta- tion (2026)
OpenClaw, Security - openclaw, https://docs. openclaw.ai/gateway/security, official documenta- tion (2026). 12
2026
-
[13]
Debenedetti, J
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr, Agentdojo: A dynamic environ- ment to evaluate prompt injection attacks and defenses for llm agents, Advances in Neural Information Processing Systems 37 (2024) 82895–82920
2024
-
[14]
Zhang, J
H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, Y . Zhang, Agent security bench (asb): Formaliz- ing and benchmarking attacks and defenses in llm-based agents, in: The Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id= V4y0CpX4hK
2025
- [15]
- [16]
- [17]
-
[18]
OpenClaw Contributors, Rfc: Skill security framework — permission manifests, capability verification, and in- struction boundary enforcement, https://github.com/ openclaw/openclaw/issues/10890, gitHub issue/ RFC discussion (2026)
2026
-
[19]
Y . Dong, R. Mu, Y . Zhang, S. Sun, T. Zhang, C. Wu, G. Jin, Y . Qi, J. Hu, J. Meng, et al., Safeguarding large language models: A survey, Artificial intelligence review 58 (12) (2025) 382
2025
-
[20]
T. Geng, Z. Xu, Y . Qu, W. E. Wong, Prompt injection attacks on large language models: A survey of attack meth- ods, root causes, and defense strategies, Computers, Mate- rials, & Continua 87 (1) (2026)
2026
-
[21]
M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, M. Debbah, From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows, ICT Express (2025)
2025
-
[22]
Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, T. Hashimoto, Identifying the risks of lm agents with an lm-emulated sandbox, arXiv preprint arXiv:2309.15817 (2023)
work page internal anchor Pith review arXiv 2023
-
[23]
W. Hua, X. Yang, M. Jin, Z. Li, W. Cheng, R. Tang, Y . Zhang, Trustagent: Towards safe and trustworthy llm- based agents through agent constitution, in: Trustworthy Multi-modal Foundation Models and AI Agents (TiFA), 2024
2024
-
[24]
OpenClaw, Threat model (mitre atlas) - open- claw, https://docs.openclaw.ai/security/ THREAT-MODEL-ATLAS, official documentation (2026)
2026
-
[25]
Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, et al., Prompt in- jection attack against llm-integrated applications, arXiv preprint arXiv:2306.05499 (2023)
work page internal anchor Pith review arXiv 2023
-
[26]
Greshake, S
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, M. Fritz, Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, in: Proceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90
2023
-
[27]
Q. Zhan, Z. Liang, Z. Ying, D. Kang, Injecagent: Bench- marking indirect prompt injections in tool-integrated large language model agents, in: Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10471– 10506
2024
-
[28]
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
H. Chang, Y . Jun, H. Lee, Chatinject: Abusing chat tem- plates for prompt injection in llm agents, arXiv preprint arXiv:2509.22830 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [29]
- [30]
- [31]
-
[32]
Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste
R. Bhagwatkar, K. Kasa, A. Puri, G. Huang, I. Rish, G. W. Taylor, K. D. Dvijotham, A. Lacoste, Indirect prompt injec- tions: Are firewalls all you need, or stronger benchmarks?, arXiv preprint arXiv:2510.05244 (2025)
-
[33]
S. G. Patil, T. Zhang, X. Wang, J. E. Gonzalez, Go- rilla: Large language model connected with massive apis, Advances in Neural Information Processing Systems 37 (2024) 126544–126565
2024
- [34]
-
[35]
Faghih, W
K. Faghih, W. Wang, Y . Cheng, S. Bharti, G. Sriramanan, S. Balasubramanian, P. Hosseini, S. Feizi, Tool preferences in agentic llms are unreliable, in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 20965–20980
2025
- [36]
-
[37]
T. Yuan, Z. He, L. Dong, Y . Wang, R. Zhao, T. Xia, L. Xu, B. Zhou, F. Li, Z. Zhang, et al., R-judge: Benchmarking safety risk awareness for llm agents, in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 1467–1490
2024
-
[38]
Packer, V
C. Packer, V . Fang, S. Patil, K. Lin, S. Wooders, J. Gonza- lez, Memgpt: towards llms as operating systems. (2023)
2023
-
[39]
Carlini, M
N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, F. Tramèr, Poisoning web-scale training datasets is practical, in: 2024 IEEE Symposium on Security and Privacy (SP), IEEE, 2024, pp. 407–425
2024
-
[40]
Z. Lin, C. Li, K. Chen, A survey on the security of long- term memory in llm agents: Toward mnemonic sovereignty, arXiv preprint arXiv:2604.16548 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [41]
-
[42]
TrustRAG: Enhancing robustness and trustworthiness in retrieval-augmented generation,
H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, E. Yilmaz, Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation, arXiv preprint arXiv:2501.00879 (2025)
- [43]
-
[44]
Y . Qu, Y . Liu, T. Geng, G. Deng, Y . Li, L. Y . Zhang, Y . Zhang, L. Ma, Supply-chain poisoning attacks against llm coding agent skill ecosystems, arXiv preprint arXiv:2604.03081 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[45]
P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, H. Liu, Red- teaming llm multi-agent systems via communication at- tacks, in: Findings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 6726–6747
2025
-
[46]
W. Luo, S. Dai, X. Liu, S. Banerjee, H. Sun, M. Chen, C. Xiao, Agrail: A lifelong agent guardrail with effective and adaptive safety detection, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Lin- guistics (V olume 1: Long Papers), 2025, pp. 8104–8139
2025
-
[47]
Q. Zhan, R. Fang, H. S. Panchal, D. Kang, Adaptive attacks break defenses against indirect prompt injection attacks on llm agents, in: Findings of the Association for Computa- tional Linguistics: NAACL 2025, 2025, pp. 7101–7117
2025
-
[48]
J. Ye, S. Li, G. Li, C. Huang, S. Gao, Y . Wu, Q. Zhang, T. Gui, X.-J. Huang, Toolsword: Unveiling safety issues of large language models in tool learning across three stages, in: Proceedings of the 62nd Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Papers), 2024, pp. 2181–2211. Luyao Xuis currently pursuing his Master...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.