arxiv: 2605.08460 · v1 · submitted 2026-05-08 · 💻 cs.CR · cs.AI

Recognition: no theorem link

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Ziwen Cai , Yihe Zhang , Xiali Hei

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:17 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords multi-agent systemsLLM agentssubagent spawninheritancesecuritytrust boundariesagent frameworks

0 comments

The pith

Subagent inheritance allows compromised LLM agents to spread malicious instructions across multi-agent networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how multi-agent systems using large language models create new agents through spawning, and how these child agents inherit memory, resources, and state from their parents. It establishes that this inheritance mechanism can transfer security compromises, such as malicious prompts or bad states, from one agent to others. The analysis of existing frameworks reveals specific violations including insecure memory passing, poor resource limits, outdated states after creation, and faulty termination controls. If accurate, this means that securing individual agents is insufficient; the network as a whole needs protection at the inheritance layer. Readers should care because agentic AI systems are becoming more interconnected, turning local vulnerabilities into systemic risks.

Core claim

In multi-agent LLM networks, subagent spawn operates as an inheritance channel that can breach trust boundaries. Current implementations allow malicious content in a parent's memory to be passed to children, weak controls on resources, persistence of stale data post-spawn, and improper authority over termination. The paper demonstrates these issues in practical frameworks and argues for introducing explicit security invariants to govern the spawn process.

What carries the argument

The subagent inheritance model, which treats spawn as the transfer of memory, resources, state, and termination authority from parent to child agents.

Load-bearing premise

The specific inheritance behaviors seen in the studied agent frameworks are typical of current multi-agent networks, and adding security invariants will fix the problems without creating fresh vulnerabilities.

What would settle it

A test where a parent agent is injected with a specific malicious instruction and then spawns a child, checking if the child exhibits the injected behavior without re-prompting.

Figures

Figures reproduced from arXiv: 2605.08460 by Xiali Hei, Yihe Zhang, Ziwen Cai.

**Figure 2.** Figure 2: Threat model of the multi-agent system. The adversary compromises one agent via prompt injection or jailbreaking and propagates malicious influence [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: PoC illustrating inconsistent shared state between agents (left: Main [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: PoC illustrating unrestricted memory inheritance between agents (left: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Since the official release of ChatGPT in 2022, large language models (LLMs) have rapidly evolved from chatbot-style interfaces into agentic systems that can delegate work through tools and newly spawned subagents. While these capabilities improve automation and scalability, they also pose new security risks in multi-agent networks. Existing research has studied how individual LLM-based agents can be compromised through prompt injection, jailbreaking, poisoned retrieval data, or malicious extensions. Less is known about what happens after one agent is compromised inside a multi-agent network. In particular, inherited memory from parent agents can carry malicious instructions, outdated states, or unintended behavioral rules into newly created subagents, allowing a local compromise to spread across agent boundaries. In this paper, we model contemporary multi-agent networks through the lens of subagent inheritance. Our analysis shows that current frameworks can violate trust boundaries through insecure memory inheritance, weak resource control, stale post-spawn state, and improper termination authority. We demonstrate these risks in real agent frameworks and propose defenses based on explicit security invariants. Our findings show that inheritance is not merely an implementation detail, but a central component influencing the security of multi-agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags inheritance as a real propagation risk in multi-agent LLM networks and shows concrete examples from existing frameworks, but the security invariants are presented without enough validation that they fix the issues cleanly.

read the letter

The paper's main takeaway is that subagent spawning lets compromises spread through inherited memory, resources, state, and termination rules, and the authors model four specific violation patterns that current frameworks exhibit. They check this against real agent systems, which moves the discussion past single-agent attacks to network-level inheritance effects. That framing is a clear step forward from prior work on prompt injection or jailbreaks in isolated agents. The demonstrations in actual frameworks give the risks some practical weight rather than staying purely theoretical. The modeling approach itself is straightforward and ties the inheritance behaviors directly to trust boundary breaks. On the fixes, the paper proposes explicit security invariants to address the four issues. This is a logical next step, but the writeup does not include systematic re-testing of the original attack vectors under the invariants or checks for side effects such as broken delegation flows or new timing channels. The transition from observed problems to solved-by-invariants therefore rests on the assumption that the proposed rules are both sufficient and neutral, without broader evidence across frameworks. This paper is aimed at people working on security for agentic and multi-agent LLM systems. A reader in that space would get value from the risk breakdown and the concrete framework examples, even if they want more on the invariants. The core observation holds up enough that the work deserves a serious referee rather than a desk rejection, mainly to pressure the evaluation of the proposed defenses.

Referee Report

2 major / 1 minor

Summary. The paper models subagent spawn and inheritance mechanisms in multi-agent LLM networks. It identifies four trust-boundary violations—insecure memory inheritance, weak resource control, stale post-spawn state, and improper termination authority—demonstrates them in real frameworks, and proposes explicit security invariants as mitigations, arguing that inheritance is a central security factor rather than an implementation detail.

Significance. If the modeling of inheritance behaviors is accurate and the invariants can be shown to block the described attacks without side effects, the work would highlight an important propagation risk in agentic systems that has received less attention than single-agent prompt injection. Concrete demonstrations in existing frameworks add practical value, and the focus on invariants could inform more principled designs for multi-agent security.

major comments (2)

The central claim that the proposed security invariants address the four identified risks without introducing new vulnerabilities (e.g., overly restrictive controls breaking legitimate delegation or new timing channels) is load-bearing but unsupported. The manuscript transitions from observed violations to proposed defenses without formal verification, completeness arguments, or re-testing of the original attack vectors under the invariants.
Demonstrations of the four violation types in real frameworks are described at a high level in the abstract and analysis sections, but lack sufficient detail on the specific frameworks examined, the exact inheritance APIs or memory models exploited, and quantitative outcomes. This weakens the generality claim that current frameworks systematically violate trust boundaries.

minor comments (1)

The abstract and introduction could more explicitly name the frameworks used for demonstrations and the precise security invariants (e.g., by listing them or referencing a table/definition).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas where the manuscript can be strengthened, particularly around supporting the effectiveness of the proposed invariants and providing more concrete details on the demonstrations. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: The central claim that the proposed security invariants address the four identified risks without introducing new vulnerabilities (e.g., overly restrictive controls breaking legitimate delegation or new timing channels) is load-bearing but unsupported. The manuscript transitions from observed violations to proposed defenses without formal verification, completeness arguments, or re-testing of the original attack vectors under the invariants.

Authors: We agree that the current presentation of the invariants would benefit from stronger supporting arguments. In the revised manuscript we will add a dedicated subsection that provides informal completeness arguments for each invariant, mapping them explicitly to the four violation types and explaining the mechanisms by which they prevent propagation. We will also include a short discussion of potential side effects (e.g., restrictions on delegation patterns or introduction of new timing channels) and argue, based on the threat model, that these can be avoided with careful implementation. In addition, we will re-execute the attack vectors from at least one of the evaluated frameworks after applying the invariants and report the outcomes. While we do not add a full formal verification (which would require a different methodological scope), these additions will make the load-bearing claim substantially better supported. revision: partial
Referee: Demonstrations of the four violation types in real frameworks are described at a high level in the abstract and analysis sections, but lack sufficient detail on the specific frameworks examined, the exact inheritance APIs or memory models exploited, and quantitative outcomes. This weakens the generality claim that current frameworks systematically violate trust boundaries.

Authors: We accept that the current level of detail limits the strength of the generality claim. In the revision we will expand the evaluation section with a new table and accompanying text that names the concrete frameworks examined, describes the precise subagent-spawn and memory-inheritance APIs used, outlines the memory models involved, and reports quantitative results (attack success rates, state-propagation latency, and resource-consumption metrics before and after the proposed mitigations). These additions will make the demonstrations reproducible and will directly support the claim that the violations are systematic rather than anecdotal. revision: yes

Circularity Check

0 steps flagged

No circularity; modeling rests on external frameworks and observations

full rationale

The paper models subagent inheritance risks by examining real agent frameworks, identifies specific violations such as insecure memory inheritance, and proposes security invariants as defenses. No equations, fitted parameters, or derivations are present that reduce by construction to the paper's own inputs. Claims rely on external demonstrations rather than self-definitional steps or load-bearing self-citations. The argument is self-contained against benchmarks of existing multi-agent systems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on domain assumptions about how contemporary multi-agent frameworks implement spawning and memory inheritance, without introducing fitted parameters or new postulated entities.

axioms (1)

domain assumption Contemporary multi-agent LLM frameworks pass memory, state, and behavioral rules from parent agents to spawned subagents.
Invoked as the basis for modeling trust boundary violations in the abstract.

pith-pipeline@v0.9.0 · 5508 in / 1118 out tokens · 62800 ms · 2026-05-12T01:17:18.576665+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 9 internal anchors

[1]

A survey of agentic ai and cybersecurity: Challenges, opportunities and use-case prototypes,

S. J. Lazer, K. Aryal, M. Gupta, and E. Bertino, “A survey of agentic ai and cybersecurity: Challenges, opportunities and use-case prototypes,”

work page
[2]

Available: https://arxiv.org/abs/2601.05293

[Online]. Available: https://arxiv.org/abs/2601.05293

work page arXiv
[3]

The path ahead for agentic ai: Challenges and opportunities,

N. Sibai, Y . Ahmed, S. Sibaee, S. AlHalawani, A. Ammar, and W. Boulila, “The path ahead for agentic ai: Challenges and opportunities,” 2026. [Online]. Available: https://arxiv.org/abs/2601. 02749

work page 2026
[4]

Agents of Chaos

N. Shapira, C. Wendler, A. Yen, G. Sarti, K. Pal, O. Floody, A. Belfki, A. Loftus, A. R. Jannali, N. Prakash, J. Cui, G. Rogers, J. Brinkmann, C. Rager, A. Zur, M. Ripa, A. Sankaranarayanan, D. Atkinson, R. Gandikota, J. Fiotto-Kaufman, E. Hwang, H. Orgad, P. S. Sahil, N. Taglicht, T. Shabtay, A. Ambus, N. Alon, S. Oron, A. Gordon-Tapiero, Y . Kaplan, V ....

work page internal anchor Pith review arXiv 2026
[5]

Openclaw cve & security advisory tracker,

J. Gamblin, “Openclaw cve & security advisory tracker,” 2026. [Online]. Available: https://github.com/jgamblin/OpenClawCVEs/

work page 2026
[6]

Openclaw,

OpenClaw, “Openclaw,” 2026, accessed: 2026-03-23. [Online]. Available: https://openclaw.ai/

work page 2026
[7]

A safety and security framework for real-world agentic systems,

S. Ghosh, B. Simkin, K. Shiarlis, S. Nandi, D. Zhao, M. Fiedler, J. Bazinska, N. Pope, R. Prabhu, D. Rohreret al., “A safety and security framework for real-world agentic systems,” 2025. [Online]. Available: https://arxiv.org/abs/2511.21990

work page arXiv 2025
[8]

Agentic misalignment: How llms could be insider threats,

A. Lynch, B. Wright, C. Larson, S. J. Ritchie, S. Mindermann, E. Hubinger, E. Perez, and K. Troy, “Agentic misalignment: How llms could be insider threats,” 2025. [Online]. Available: https://arxiv.org/abs/2510.05179

work page arXiv 2025
[9]

Model context protocol (mcp): Landscape, security threats, and future research directions,

X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,”

work page
[10]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

[Online]. Available: https://arxiv.org/abs/2503.23278

work page internal anchor Pith review arXiv
[11]

Beyond the protocol: Unveiling attack vectors in the model context protocol (MCP) ecosystem.arXiv preprint arXiv:2506.02040, 2025

H. Song, Y . Shen, W. Luo, L. Guo, T. Chen, J. Wang, B. Li, X. Zhang, and J. Chen, “Beyond the protocol: Unveiling attack vectors in the model context protocol (mcp) ecosystem,” 2025. [Online]. Available: https://arxiv.org/abs/2506.02040

work page arXiv 2025
[12]

Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,

Y . Louck, A. Stulman, and A. Dvir, “Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,” 2025. [Online]. Available: https://arxiv.org/abs/2505.12490

work page arXiv 2025
[13]

Security and privacy challenges of large language models: A survey,

B. C. Das, M. H. Amini, and Y . Wu, “Security and privacy challenges of large language models: A survey,”ACM Computing Surveys, vol. 57, no. 6, pp. 1–39, 2025

work page 2025
[14]

PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867, 2024

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.07867

work page arXiv 2024
[15]

Skill-inject: Measuring agent vulnerability to skill file attacks.arXiv preprint arXiv:2602.20156, 2026

D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks,” 2026. [Online]. Available: https://arxiv.org/abs/2602.20156

work page arXiv 2026
[16]

How we built our multi-agent research system,

Anthropic, “How we built our multi-agent research system,” Jun 2025, accessed: 2026-03-16. [Online]. Available: https://www.anthropic.com/ engineering/multi-agent-research-system

work page 2025
[17]

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems, 2025

S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,” 2025. [Online]. Available: https://arxiv.org/abs/2506.04133

work page arXiv 2025
[18]

Openclaw documentation,

OpenClaw, “Openclaw documentation,” 2026, accessed: 2026-03-23. [Online]. Available: https://docs.openclaw.ai

work page 2026
[19]

Owasp top 10 for large language model applications,

OW ASP Foundation, “Owasp top 10 for large language model applications,” 2025. [Online]. Available: https://owasp.org/www- project-top-10-for-large-language-model-applications/

work page 2025
[20]

Security and privacy in llms: A comprehensive survey of threats and mitigation strategies,

A. D. E. Berini, N. Jamil, A.-E. Benrazek, A. Lakas, L. Ismail, M. A. Ferrag, and K.-Y . Lam, “Security and privacy in llms: A comprehensive survey of threats and mitigation strategies,” Information Fusion, vol. 132, p. 104241, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S156625352600120X

work page 2026
[21]

A survey on large language model based autonomous agents,

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

work page 2024
[22]

The rise and potential of large language model based agents: A survey,

Z. Xi, W. Chen, X. Guo, W. He, Y . Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y . Zhou, W. Wang, C. Jiang, Y . Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Cheng, Q. Zhang, W. Qin, Y . Zheng, X. Qiu, X. Huang, and T. Gui, “The rise and potential of large language model based agents: A survey,” 2023

work page 2023
[23]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations (ICLR), 2023

work page 2023
[24]

Re- flexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Re- flexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023

work page 2023
[25]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,

Y . Shen, K. Song, X. Tan, D. Li, W. Lu, and Y . Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 154–38 180, 2023

work page 2023
[26]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural informa- tion processing systems, vol. 36, pp. 68 539–68 551, 2023

work page 2023
[27]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th annual acm symposium on user interface software and technology. Association for Computing Machinery, 2023, pp. 1–22

work page 2023
[28]

How openclaw works: Understanding ai agents through a real architecture,

B. Poudel, “How openclaw works: Understanding ai agents through a real architecture,” Feb 2026. [Online]. Available: https://bibek-poudel. medium.com/how-openclaw-works-understanding-ai-agents-through-a- real-architecture-5d59cc7a4764

work page 2026
[29]

Openclaw architecture, explained: How it works,

P. Perazzo, “Openclaw architecture, explained: How it works,” Feb

work page
[30]

Available: https://ppaolo.substack.com/p/openclaw- system-architecture-overview#

[Online]. Available: https://ppaolo.substack.com/p/openclaw- system-architecture-overview#

work page
[31]

Agent zero: A personal, organic agentic framework that grows and learns with you,

agent0ai, “Agent zero: A personal, organic agentic framework that grows and learns with you,” https://github.com/agent0ai/agent- zero, 2026, gitHub repository

work page 2026
[32]

Hermes agent: The agent that grows with you,

Nous Research, “Hermes agent: The agent that grows with you,” https: //github.com/NousResearch/hermes-agent, 2026, gitHub repository

work page 2026
[33]

Available: https://arxiv.org/abs/2504.03111

Z. Li, J. Cui, X. Liao, and L. Xing, “Les dissonances: Cross-tool harvesting and polluting in pool-of-tools empowered llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.03111

work page arXiv 2025
[34]

V oltagent,

V oltAgent, “V oltagent,” 2026, accessed: 2026-03-18. [Online]. Available: https://voltagent.dev/

work page 2026
[35]

Vertex ai agent builder,

G. Cloud, “Vertex ai agent builder,” 2026, accessed: 2026-03-18. [Online]. Available: https://cloud.google.com/products/agent-builder

work page 2026
[36]

Memory os of ai agent

J. Kang, M. Ji, Z. Zhao, and T. Bai, “Memory os of ai agent,” 2025. [Online]. Available: https://arxiv.org/abs/2506.06326

work page arXiv 2025
[37]

Guide to attribute based access control (abac) definition and considerations (draft),

V . C. Hu, D. Ferraiolo, R. Kuhn, A. R. Friedman, A. J. Lang, M. M. Cogdell, A. Schnitzer, K. Sandlin, R. Miller, K. Scarfoneet al., “Guide to attribute based access control (abac) definition and considerations (draft),”NIST special publication, vol. 800, no. 162, pp. 1–54, 2013

work page 2013
[38]

Zhang, Z

K. Zhang, Z. Su, P.-Y . Chen, E. Bertino, X. Zhang, and N. Li, “Llm agents should employ security principles,” 2025. [Online]. Available: https://arxiv.org/abs/2505.24019

work page arXiv 2025
[39]

Sok: Evaluating jailbreak guardrails for large language models.arXiv preprint arXiv:2506.10597, 2025a

X. Wang, Z. Ji, W. Wang, Z. Li, D. Wu, and S. Wang, “Sok: Evaluating jailbreak guardrails for large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2506.10597 14

work page arXiv 2025
[40]

A new era in llm security: Exploring security concerns in real-world llm-based systems,

F. Wu, N. Zhang, S. Jha, P. McDaniel, and C. Xiao, “A new era in llm security: Exploring security concerns in real-world llm-based systems,” arXiv preprint arXiv:2402.18649, 2024

work page arXiv 2024
[41]

The protection of information in computer systems,

J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,”Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975

work page 1975
[42]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zhenget al., “Prompt injection attack against llm-integrated applications,”arXiv preprint arXiv:2306.05499, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,” inProceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90

work page 2023
[44]

Benchmarking and defending against indirect prompt injection attacks on large language models,

J. Yi, Y . Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F. Wu, “Benchmarking and defending against indirect prompt injection attacks on large language models,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 1809–1820

work page 2025
[45]

Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10 471–10 506

work page 2024
[46]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,”Advances in Neural Information Processing Systems, vol. 37, pp. 82 895–82 920, 2024

work page 2024
[47]

Evil Geniuses : Delving into the Safety of LLM -based Agents , February 2024

Y . Tian, X. Yang, J. Zhang, Y . Dong, and H. Su, “Evil geniuses: Delving into the safety of llm-based agents,”arXiv preprint arXiv:2311.11855, 2023

work page arXiv 2023
[48]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems,”arXiv preprint arXiv:2410.07283, 2024

work page internal anchor Pith review arXiv 2024
[49]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

M. Lupinacci, F. A. Pironti, F. Blefari, F. Romeo, L. Arena, and A. Furfaro, “The dark side of llms: Agent-based attacks for complete computer takeover,”arXiv preprint arXiv:2507.06850, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Multi-agent systems execute arbitrary malicious code

H. Triedman, R. Jha, and V . Shmatikov, “Multi-agent systems execute arbitrary malicious code,”arXiv preprint arXiv:2503.12188, 2025

work page arXiv 2025
[51]

The confused deputy: (or why capabilities might have been invented),

N. Hardy, “The confused deputy: (or why capabilities might have been invented),”ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988

work page 1988
[52]

Netsafe: Exploring the topological safety of multi-agent networks,

M. Yu, S. Wang, G. Zhang, J. Mao, C. Yin, Q. Liu, Q. Wen, K. Wang, and Y . Wang, “Netsafe: Exploring the topological safety of multi-agent networks,”arXiv preprint arXiv:2410.15686, 2024

work page arXiv 2024
[53]

Red-teaming llm multi-agent systems via communication attacks,

P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, and H. Liu, “Red-teaming llm multi-agent systems via communication attacks,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 6726– 6747

work page 2025
[54]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instruc- tions,”arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review arXiv 2024
[55]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,”arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review arXiv 2025
[56]

On optimistic methods for concurrency control,

H.-T. Kung and J. T. Robinson, “On optimistic methods for concurrency control,”ACM Transactions on Database Systems (TODS), vol. 6, no. 2, pp. 213–226, 1981

work page 1981
[57]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, and Y . Zhang, “Agent security bench (asb): Formalizing and bench- marking attacks and defenses in llm-based agents,”arXiv preprint arXiv:2410.02644, 2024

work page internal anchor Pith review arXiv 2024
[58]

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

M. Andriushchenko, A. Souly, M. Dziemian, D. Duenas, M. Lin, J. Wang, D. Hendrycks, A. Zou, Z. Kolter, M. Fredriksonet al., “Agentharm: A benchmark for measuring harmfulness of llm agents,” arXiv preprint arXiv:2410.09024, 2024. 15

work page internal anchor Pith review arXiv 2024