arxiv: 2604.27464 · v1 · submitted 2026-04-30 · 💻 cs.CR · cs.AI

Recognition: unknown

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu , Xiang Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:39 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords autonomous agentsLLM securityattack surfacesdefense strategieslayered analysisthreat propagationagent frameworks

0 comments

The pith

Autonomous agent frameworks carry security risks that propagate across four layers from input manipulation to ecosystem-wide effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper organizes security analysis of LLM-based autonomous agents into four layers to show how individual vulnerabilities can chain together. It reviews risks and defenses at each layer using OpenClaw as a concrete running example. A sympathetic reader would care because these systems are moving from simple chatbots toward continuously running tool users that control external resources. The central move is to treat threat propagation as the key phenomenon rather than isolated attacks at any single layer.

Core claim

By mapping security issues onto the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer, the analysis shows that threats can move from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact.

What carries the argument

The four-layer model that separates functional roles, representative risks, and defense strategies while tracing how attacks cross layer boundaries.

If this is right

Defenses must address cross-layer propagation rather than single-layer fixes.
Long-horizon evaluations become necessary to detect persistent-state and ecosystem effects.
Ecosystem trust models need strengthening because automation layers connect multiple agents and services.
Research attention should be rebalanced toward under-studied layers such as state persistence and ecosystem automation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layering could be tested on other open agent frameworks to check whether threat propagation patterns hold beyond the OpenClaw case.
Developers might add explicit layer-boundary checks inside agent runtimes to interrupt propagation chains before they reach external tools.
Standardized benchmarks could measure how far an initial prompt injection travels through the four layers in controlled environments.

Load-bearing premise

The four layers capture all security-relevant aspects of these frameworks and OpenClaw is representative enough to illustrate the general risks.

What would settle it

A documented attack sequence in an autonomous agent system that remains confined to one layer without producing downstream effects on actions, state, or external ecosystems.

Figures

Figures reproduced from arXiv: 2604.27464 by Luyao Xu, Xiang Chen.

**Figure 1.** Figure 1: Layered Architecture of Autonomous Agent Frameworks: An Illustration Using OpenClaw as a Case Study. view at source ↗

**Figure 2.** Figure 2: Cross-layer attack propagation in autonomous agent frameworks. view at source ↗

read the original abstract

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems, introducing security risks beyond traditional prompt-level vulnerabilities. As this paradigm is still at an early stage of development, a timely and systematic understanding of its security implications is increasingly important. Although a growing body of work has examined different attack surfaces and defense problems in agent systems, existing studies remain scattered across individual aspects of agent security, and there is still a lack of a layered review on this topic. To address this gap, this survey presents a layered review of security risks and defense strategies in autonomous agent frameworks, with OpenClaw as a case study. We organize the analysis into four security-relevant layers: the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer. For each layer, we summarize its functional role, representative security risks, and corresponding defense strategies. Based on this layered analysis, we further identify that threats in autonomous agent frameworks may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact. Finally, we highlight potential key challenges, including research imbalance across layers, the lack of long-horizon evaluation, and weak ecosystem trust models, and outline future directions toward more systematic and integrated defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward organizing survey on LLM agent security that maps four layers and notes cross-layer risks, but the OpenClaw case study does not appear to deliver concrete multi-step attack traces to back the propagation claim.

read the letter

The paper's main contribution is a four-layer breakdown of security issues in autonomous agent frameworks: context and instruction, tool and action, state and persistence, and ecosystem and automation. It collects scattered prior work on risks and defenses for each layer and applies the structure to OpenClaw as a running example. That organization is new enough to be useful for people entering the area, since most existing papers focus on one slice like prompt injection or tool misuse without tying them together. The abstract shows they cover functional roles, representative attacks, and defenses per layer, which gives a practical map rather than just another list of vulnerabilities. The cross-layer propagation point follows directly from the structure, and the challenges section on research imbalance and weak trust models is a fair observation. The soft spot is evidentiary. The strongest claim is that threats propagate from input manipulation through actions to state contamination and ecosystem effects. If the OpenClaw case study catalogs issues layer by layer without walking through at least one explicit chain where an attack in layer one produces measurable effects in layers three or four, then the propagation claim stays plausible but not demonstrated. As a survey there are no new derivations or experiments, so the value rests on coverage and clarity. The literature summary looks reasonable from the abstract, though a referee would need to check whether key recent papers on agent persistence or ecosystem attacks are included. This paper is for researchers and engineers who need a starting overview of agent security surfaces before diving into specific defenses. A reader who wants a framework to organize their own work or spot gaps will get something concrete from it. It deserves peer review because the topic is emerging and the layered view could shape how others approach integrated defenses, even if the case study needs tightening on the propagation examples.

Referee Report

2 major / 2 minor

Summary. The paper presents a layered survey of security risks and defense strategies for LLM-based autonomous agent frameworks. It organizes the domain into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation), summarizes representative attacks and defenses per layer, uses OpenClaw as a case study, identifies cross-layer threat propagation (from manipulated inputs through unsafe actions and state contamination to ecosystem impact), and outlines challenges such as research imbalance, lack of long-horizon evaluation, and weak trust models along with future directions.

Significance. If the four-layer organization is comprehensive and the cross-layer propagation claim is substantiated, the work would provide a useful organizing framework for an emerging area, helping to move agent security research from isolated per-component studies toward integrated analyses. The survey nature means its value lies in synthesis rather than new derivations or proofs.

major comments (2)

[Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.
[Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.

minor comments (2)

[OpenClaw case study] Clarify the selection criteria for OpenClaw as the case study and explicitly state how its architecture maps onto the four layers to strengthen the claim of representativeness.
[Throughout the layered analysis sections] Ensure consistent terminology across layers (e.g., 'context' vs. 'instruction' layer) and add a summary table comparing risks and defenses across all four layers for improved readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract; OpenClaw case study section] Abstract and the section identifying cross-layer propagation: the claim that threats 'may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact' is presented as following from the layered analysis and OpenClaw case study. However, the case study appears to catalog risks and defenses within each layer in isolation rather than tracing concrete, multi-step attack paths (e.g., a specific input manipulation that produces state contamination and then ecosystem-level effects). Without such chained examples, the propagation observation remains a plausible hypothesis rather than a directly evidenced finding from the case study.

Authors: We acknowledge the referee's point that the OpenClaw case study primarily presents risks and defenses organized by layer rather than providing explicit multi-step attack traces. The propagation claim in the abstract is intended as a synthesis derived from the layered analysis, highlighting how vulnerabilities in one layer can logically lead to impacts in others based on the functional dependencies described. However, to strengthen the evidence, we will revise the manuscript to include concrete illustrative examples of cross-layer propagation paths, drawing from both the OpenClaw components and related literature where possible. This will clarify the basis for the observation and address the concern that it remains a hypothesis. revision: yes
Referee: [Layered review organization (introduction to the four layers)] The assumption that the four proposed layers comprehensively capture security-relevant aspects of autonomous agent frameworks is load-bearing for the survey structure but is not explicitly justified against alternative decompositions (e.g., including memory/retrieval or multi-agent coordination as distinct layers). This affects the completeness of the propagation analysis.

Authors: We agree that an explicit justification for the four-layer decomposition would enhance the manuscript's rigor. In the revised version, we will expand the introduction to include a discussion of the layer selection rationale. Specifically, we will explain that the four layers are chosen to span the primary security boundaries from external inputs to system-wide impacts, with memory and retrieval subsumed under the state and persistence layer due to their role in maintaining agent state, and multi-agent coordination considered as part of the ecosystem and automation layer. We will also briefly contrast this with alternative organizations to demonstrate why the chosen structure best supports the analysis of threat propagation. revision: yes

Circularity Check

0 steps flagged

No circularity in this survey paper's layered organization or propagation claim.

full rationale

This is a survey paper that organizes existing literature on security risks and defenses in autonomous agent frameworks into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) and uses OpenClaw as a case study to illustrate them. The claim that threats 'may propagate across layers' is presented as an identification arising from the layered review of prior work, without any mathematical derivations, equations, fitted parameters, predictions from data subsets, or self-referential constructions that reduce the result to its own inputs by definition. No self-citation load-bearing steps, uniqueness theorems from the authors, or ansatzes smuggled via citation appear in the provided abstract or structure. The analysis draws on external literature for each layer's risks and defenses, remaining self-contained against external benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the assumption that security issues divide cleanly into the four named layers and that prior work can be mapped onto them without significant overlap or omission.

axioms (1)

domain assumption Security risks in autonomous agent frameworks can be partitioned into four layers (context/instruction, tool/action, state/persistence, ecosystem/automation) that capture functional roles and allow analysis of cross-layer propagation.
This partitioning is the core organizing principle stated in the abstract and is not derived from prior equations or data.

pith-pipeline@v0.9.0 · 5541 in / 1259 out tokens · 44952 ms · 2026-05-07T09:39:33.187273+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 17 canonical work pages · 5 internal anchors

[1]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, Y . Cao, React: Synergizing reasoning and acting in language models, in: The eleventh international conference on learning representations, 2022

2022
[2]

Schick, J

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, Toolformer: Language models can teach them- selves to use tools, Advances in neural information pro- cessing systems 36 (2023) 68539–68551

2023
[3]

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: Interactive simulacra of human behavior, in: Proceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22

2023
[4]

X. Chen, A. Zeng, et al., A survey on large language model based autonomous agents, in: CCL 2024–23rd Chinese Natl Conf Comput Linguist, V ol. 2, 2024, pp. 141–150

2024
[5]

openclaw.ai/concepts/system-prompt, official doc- umentation (2026)

OpenClaw, System prompt - openclaw, https://docs. openclaw.ai/concepts/system-prompt, official doc- umentation (2026)

2026
[6]

openclaw.ai/concepts/context, official documenta- tion (2026)

OpenClaw, Context - openclaw, https://docs. openclaw.ai/concepts/context, official documenta- tion (2026)

2026
[7]

OpenClaw, Tools, skills, and plugins - openclaw,https: //docs.openclaw.ai/tools, official documentation (2026)

2026
[8]

ai/tools/skills, official documentation (2026)

OpenClaw, Skills - openclaw, https://docs.openclaw. ai/tools/skills, official documentation (2026)

2026
[9]

OpenClaw, Agent workspace - openclaw, https:// docs.openclaw.ai/concepts/agent-workspace, of- ficial documentation (2026)

2026
[10]

OpenClaw, Heartbeat (gateway) - openclaw, https:// docs.openclaw.ai/gateway/heartbeat, official doc- umentation (2026)

2026
[11]

OpenClaw, Cron jobs (gateway scheduler) - open- claw, https://docs.openclaw.ai/automation/ cron-jobs, official documentation (2026)

2026
[12]

openclaw.ai/gateway/security, official documenta- tion (2026)

OpenClaw, Security - openclaw, https://docs. openclaw.ai/gateway/security, official documenta- tion (2026). 12

2026
[13]

Debenedetti, J

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr, Agentdojo: A dynamic environ- ment to evaluate prompt injection attacks and defenses for llm agents, Advances in Neural Information Processing Systems 37 (2024) 82895–82920

2024
[14]

Zhang, J

H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, Y . Zhang, Agent security bench (asb): Formaliz- ing and benchmarking attacks and defenses in llm-based agents, in: The Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id= V4y0CpX4hK

2025
[15]

Y . Wang, F. Xu, Z. Lin, G. He, Y . Huang, H. Gao, Z. Niu, S. Lian, Z. Liu, From assistant to double agent: Formaliz- ing and benchmarking attacks on openclaw for person- alized local ai agent, arXiv preprint arXiv:2602.08412 (2026)

work page arXiv 2026
[16]

B. Dong, H. Feng, Q. Wang, Clawdrain: Exploiting tool- calling chains for stealthy token exhaustion in openclaw agents, arXiv preprint arXiv:2603.00902 (2026)

work page arXiv 2026
[17]

F. Liu, Z. Chen, T. Lan, H. Tan, Z. Xu, X. Li, G. Chen, Y . Meng, H. Zhu, Trojan’s whisper: Stealthy manipulation of openclaw through injected bootstrapped guidance, arXiv preprint arXiv:2603.19974 (2026)

work page arXiv 2026
[18]

OpenClaw Contributors, Rfc: Skill security framework — permission manifests, capability verification, and in- struction boundary enforcement, https://github.com/ openclaw/openclaw/issues/10890, gitHub issue/ RFC discussion (2026)

2026
[19]

Y . Dong, R. Mu, Y . Zhang, S. Sun, T. Zhang, C. Wu, G. Jin, Y . Qi, J. Hu, J. Meng, et al., Safeguarding large language models: A survey, Artificial intelligence review 58 (12) (2025) 382

2025
[20]

T. Geng, Z. Xu, Y . Qu, W. E. Wong, Prompt injection attacks on large language models: A survey of attack meth- ods, root causes, and defense strategies, Computers, Mate- rials, & Continua 87 (1) (2026)

2026
[21]

M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, M. Debbah, From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows, ICT Express (2025)

2025
[22]

Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, T. Hashimoto, Identifying the risks of lm agents with an lm-emulated sandbox, arXiv preprint arXiv:2309.15817 (2023)

work page internal anchor Pith review arXiv 2023
[23]

W. Hua, X. Yang, M. Jin, Z. Li, W. Cheng, R. Tang, Y . Zhang, Trustagent: Towards safe and trustworthy llm- based agents through agent constitution, in: Trustworthy Multi-modal Foundation Models and AI Agents (TiFA), 2024

2024
[24]

OpenClaw, Threat model (mitre atlas) - open- claw, https://docs.openclaw.ai/security/ THREAT-MODEL-ATLAS, official documentation (2026)

2026
[25]

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zheng, et al., Prompt in- jection attack against llm-integrated applications, arXiv preprint arXiv:2306.05499 (2023)

work page internal anchor Pith review arXiv 2023
[26]

Greshake, S

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, M. Fritz, Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, in: Proceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90

2023
[27]

Q. Zhan, Z. Liang, Z. Ying, D. Kang, Injecagent: Bench- marking indirect prompt injections in tool-integrated large language model agents, in: Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10471– 10506

2024
[28]

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

H. Chang, Y . Jun, H. Lee, Chatinject: Abusing chat tem- plates for prompt injection in llm agents, arXiv preprint arXiv:2509.22830 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

C. Yin, R. Geng, Y . Wang, J. Jia, Pismith: Reinforcement learning-based red teaming for prompt injection defenses, arXiv preprint arXiv:2603.13026 (2026)

work page arXiv 2026
[30]

T. Shi, K. Zhu, Z. Wang, Y . Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, et al., Promptar- mor: Simple yet effective prompt injection defenses, arXiv preprint arXiv:2507.15219 (2025)

work page arXiv 2025
[31]

Y . Wang, S. Chen, R. Alkhudair, B. Alomair, D. Wagner, Defending against prompt injection with datafilter, arXiv preprint arXiv:2510.19207 (2025)

work page arXiv 2025
[32]

Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste

R. Bhagwatkar, K. Kasa, A. Puri, G. Huang, I. Rish, G. W. Taylor, K. D. Dvijotham, A. Lacoste, Indirect prompt injec- tions: Are firewalls all you need, or stronger benchmarks?, arXiv preprint arXiv:2510.05244 (2025)

work page arXiv 2025
[33]

S. G. Patil, T. Zhang, X. Wang, J. E. Gonzalez, Go- rilla: Large language model connected with massive apis, Advances in Neural Information Processing Systems 37 (2024) 126544–126565

2024
[34]

J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, L. Sun, Prompt injection attack to tool selection in llm agents, arXiv preprint arXiv:2504.19793 (2025)

work page arXiv 2025
[35]

Faghih, W

K. Faghih, W. Wang, Y . Cheng, S. Bharti, G. Sriramanan, S. Balasubramanian, P. Hosseini, S. Feizi, Tool preferences in agentic llms are unreliable, in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 20965–20980

2025
[36]

Q. Lin, M. Wen, Q. Peng, G. Nie, J. Liao, J. Wang, X. Mo, J. Zhou, C. Cheng, Y . Zhao, et al., Hammer: Robust function-calling for on-device language models via func- tion masking, arXiv preprint arXiv:2410.04587 (2024). 13

work page arXiv 2024
[37]

T. Yuan, Z. He, L. Dong, Y . Wang, R. Zhao, T. Xia, L. Xu, B. Zhou, F. Li, Z. Zhang, et al., R-judge: Benchmarking safety risk awareness for llm agents, in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 1467–1490

2024
[38]

Packer, V

C. Packer, V . Fang, S. Patil, K. Lin, S. Wooders, J. Gonza- lez, Memgpt: towards llms as operating systems. (2023)

2023
[39]

Carlini, M

N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, F. Tramèr, Poisoning web-scale training datasets is practical, in: 2024 IEEE Symposium on Security and Privacy (SP), IEEE, 2024, pp. 407–425

2024
[40]

Z. Lin, C. Li, K. Chen, A survey on the security of long- term memory in llm agents: Toward mnemonic sovereignty, arXiv preprint arXiv:2604.16548 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[41]

B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, S. Mishra, Memory poisoning attack and defense on mem- ory based llm-agents, arXiv preprint arXiv:2601.05504 (2026)

work page arXiv 2026
[42]

TrustRAG: Enhancing robustness and trustworthiness in retrieval-augmented generation,

H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, E. Yilmaz, Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation, arXiv preprint arXiv:2501.00879 (2025)

work page arXiv 2025
[43]

W. Zhao, V . Khazanchi, H. Xing, X. He, Q. Xu, N. D. Lane, Attacks on third-party apis of large language models, arXiv preprint arXiv:2404.16891 (2024)

work page arXiv 2024
[44]

Y . Qu, Y . Liu, T. Geng, G. Deng, Y . Li, L. Y . Zhang, Y . Zhang, L. Ma, Supply-chain poisoning attacks against llm coding agent skill ecosystems, arXiv preprint arXiv:2604.03081 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[45]

P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, H. Liu, Red- teaming llm multi-agent systems via communication at- tacks, in: Findings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 6726–6747

2025
[46]

W. Luo, S. Dai, X. Liu, S. Banerjee, H. Sun, M. Chen, C. Xiao, Agrail: A lifelong agent guardrail with effective and adaptive safety detection, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Lin- guistics (V olume 1: Long Papers), 2025, pp. 8104–8139

2025
[47]

Q. Zhan, R. Fang, H. S. Panchal, D. Kang, Adaptive attacks break defenses against indirect prompt injection attacks on llm agents, in: Findings of the Association for Computa- tional Linguistics: NAACL 2025, 2025, pp. 7101–7117

2025
[48]

J. Ye, S. Li, G. Li, C. Huang, S. Gao, Y . Wu, Q. Zhang, T. Gui, X.-J. Huang, Toolsword: Unveiling safety issues of large language models in tool learning across three stages, in: Proceedings of the 62nd Annual Meeting of the Asso- ciation for Computational Linguistics (V olume 1: Long Papers), 2024, pp. 2181–2211. Luyao Xuis currently pursuing his Master...

2024