pith. machine review for the scientific record. sign in

arxiv: 2605.08460 · v1 · submitted 2026-05-08 · 💻 cs.CR · cs.AI

Recognition: no theorem link

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:17 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords multi-agent systemsLLM agentssubagent spawninheritancesecuritytrust boundariesagent frameworks
0
0 comments X

The pith

Subagent inheritance allows compromised LLM agents to spread malicious instructions across multi-agent networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how multi-agent systems using large language models create new agents through spawning, and how these child agents inherit memory, resources, and state from their parents. It establishes that this inheritance mechanism can transfer security compromises, such as malicious prompts or bad states, from one agent to others. The analysis of existing frameworks reveals specific violations including insecure memory passing, poor resource limits, outdated states after creation, and faulty termination controls. If accurate, this means that securing individual agents is insufficient; the network as a whole needs protection at the inheritance layer. Readers should care because agentic AI systems are becoming more interconnected, turning local vulnerabilities into systemic risks.

Core claim

In multi-agent LLM networks, subagent spawn operates as an inheritance channel that can breach trust boundaries. Current implementations allow malicious content in a parent's memory to be passed to children, weak controls on resources, persistence of stale data post-spawn, and improper authority over termination. The paper demonstrates these issues in practical frameworks and argues for introducing explicit security invariants to govern the spawn process.

What carries the argument

The subagent inheritance model, which treats spawn as the transfer of memory, resources, state, and termination authority from parent to child agents.

Load-bearing premise

The specific inheritance behaviors seen in the studied agent frameworks are typical of current multi-agent networks, and adding security invariants will fix the problems without creating fresh vulnerabilities.

What would settle it

A test where a parent agent is injected with a specific malicious instruction and then spawns a child, checking if the child exhibits the injected behavior without re-prompting.

Figures

Figures reproduced from arXiv: 2605.08460 by Xiali Hei, Yihe Zhang, Ziwen Cai.

Figure 1
Figure 1. Figure 1: Architecture of an LLM-based agentic system operating in a [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Threat model of the multi-agent system. The adversary compromises one agent via prompt injection or jailbreaking and propagates malicious influence [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PoC illustrating inconsistent shared state between agents (left: Main [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: PoC illustrating unrestricted memory inheritance between agents (left: [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

Since the official release of ChatGPT in 2022, large language models (LLMs) have rapidly evolved from chatbot-style interfaces into agentic systems that can delegate work through tools and newly spawned subagents. While these capabilities improve automation and scalability, they also pose new security risks in multi-agent networks. Existing research has studied how individual LLM-based agents can be compromised through prompt injection, jailbreaking, poisoned retrieval data, or malicious extensions. Less is known about what happens after one agent is compromised inside a multi-agent network. In particular, inherited memory from parent agents can carry malicious instructions, outdated states, or unintended behavioral rules into newly created subagents, allowing a local compromise to spread across agent boundaries. In this paper, we model contemporary multi-agent networks through the lens of subagent inheritance. Our analysis shows that current frameworks can violate trust boundaries through insecure memory inheritance, weak resource control, stale post-spawn state, and improper termination authority. We demonstrate these risks in real agent frameworks and propose defenses based on explicit security invariants. Our findings show that inheritance is not merely an implementation detail, but a central component influencing the security of multi-agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper models subagent spawn and inheritance mechanisms in multi-agent LLM networks. It identifies four trust-boundary violations—insecure memory inheritance, weak resource control, stale post-spawn state, and improper termination authority—demonstrates them in real frameworks, and proposes explicit security invariants as mitigations, arguing that inheritance is a central security factor rather than an implementation detail.

Significance. If the modeling of inheritance behaviors is accurate and the invariants can be shown to block the described attacks without side effects, the work would highlight an important propagation risk in agentic systems that has received less attention than single-agent prompt injection. Concrete demonstrations in existing frameworks add practical value, and the focus on invariants could inform more principled designs for multi-agent security.

major comments (2)
  1. The central claim that the proposed security invariants address the four identified risks without introducing new vulnerabilities (e.g., overly restrictive controls breaking legitimate delegation or new timing channels) is load-bearing but unsupported. The manuscript transitions from observed violations to proposed defenses without formal verification, completeness arguments, or re-testing of the original attack vectors under the invariants.
  2. Demonstrations of the four violation types in real frameworks are described at a high level in the abstract and analysis sections, but lack sufficient detail on the specific frameworks examined, the exact inheritance APIs or memory models exploited, and quantitative outcomes. This weakens the generality claim that current frameworks systematically violate trust boundaries.
minor comments (1)
  1. The abstract and introduction could more explicitly name the frameworks used for demonstrations and the precise security invariants (e.g., by listing them or referencing a table/definition).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas where the manuscript can be strengthened, particularly around supporting the effectiveness of the proposed invariants and providing more concrete details on the demonstrations. We address each major comment below and describe the revisions we will make.

read point-by-point responses
  1. Referee: The central claim that the proposed security invariants address the four identified risks without introducing new vulnerabilities (e.g., overly restrictive controls breaking legitimate delegation or new timing channels) is load-bearing but unsupported. The manuscript transitions from observed violations to proposed defenses without formal verification, completeness arguments, or re-testing of the original attack vectors under the invariants.

    Authors: We agree that the current presentation of the invariants would benefit from stronger supporting arguments. In the revised manuscript we will add a dedicated subsection that provides informal completeness arguments for each invariant, mapping them explicitly to the four violation types and explaining the mechanisms by which they prevent propagation. We will also include a short discussion of potential side effects (e.g., restrictions on delegation patterns or introduction of new timing channels) and argue, based on the threat model, that these can be avoided with careful implementation. In addition, we will re-execute the attack vectors from at least one of the evaluated frameworks after applying the invariants and report the outcomes. While we do not add a full formal verification (which would require a different methodological scope), these additions will make the load-bearing claim substantially better supported. revision: partial

  2. Referee: Demonstrations of the four violation types in real frameworks are described at a high level in the abstract and analysis sections, but lack sufficient detail on the specific frameworks examined, the exact inheritance APIs or memory models exploited, and quantitative outcomes. This weakens the generality claim that current frameworks systematically violate trust boundaries.

    Authors: We accept that the current level of detail limits the strength of the generality claim. In the revision we will expand the evaluation section with a new table and accompanying text that names the concrete frameworks examined, describes the precise subagent-spawn and memory-inheritance APIs used, outlines the memory models involved, and reports quantitative results (attack success rates, state-propagation latency, and resource-consumption metrics before and after the proposed mitigations). These additions will make the demonstrations reproducible and will directly support the claim that the violations are systematic rather than anecdotal. revision: yes

Circularity Check

0 steps flagged

No circularity; modeling rests on external frameworks and observations

full rationale

The paper models subagent inheritance risks by examining real agent frameworks, identifies specific violations such as insecure memory inheritance, and proposes security invariants as defenses. No equations, fitted parameters, or derivations are present that reduce by construction to the paper's own inputs. Claims rely on external demonstrations rather than self-definitional steps or load-bearing self-citations. The argument is self-contained against benchmarks of existing multi-agent systems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on domain assumptions about how contemporary multi-agent frameworks implement spawning and memory inheritance, without introducing fitted parameters or new postulated entities.

axioms (1)
  • domain assumption Contemporary multi-agent LLM frameworks pass memory, state, and behavioral rules from parent agents to spawned subagents.
    Invoked as the basis for modeling trust boundary violations in the abstract.

pith-pipeline@v0.9.0 · 5508 in / 1118 out tokens · 62800 ms · 2026-05-12T01:17:18.576665+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 9 internal anchors

  1. [1]

    A survey of agentic ai and cybersecurity: Challenges, opportunities and use-case prototypes,

    S. J. Lazer, K. Aryal, M. Gupta, and E. Bertino, “A survey of agentic ai and cybersecurity: Challenges, opportunities and use-case prototypes,”

  2. [2]

    Available: https://arxiv.org/abs/2601.05293

    [Online]. Available: https://arxiv.org/abs/2601.05293

  3. [3]

    The path ahead for agentic ai: Challenges and opportunities,

    N. Sibai, Y . Ahmed, S. Sibaee, S. AlHalawani, A. Ammar, and W. Boulila, “The path ahead for agentic ai: Challenges and opportunities,” 2026. [Online]. Available: https://arxiv.org/abs/2601. 02749

  4. [4]

    Agents of Chaos

    N. Shapira, C. Wendler, A. Yen, G. Sarti, K. Pal, O. Floody, A. Belfki, A. Loftus, A. R. Jannali, N. Prakash, J. Cui, G. Rogers, J. Brinkmann, C. Rager, A. Zur, M. Ripa, A. Sankaranarayanan, D. Atkinson, R. Gandikota, J. Fiotto-Kaufman, E. Hwang, H. Orgad, P. S. Sahil, N. Taglicht, T. Shabtay, A. Ambus, N. Alon, S. Oron, A. Gordon-Tapiero, Y . Kaplan, V ....

  5. [5]

    Openclaw cve & security advisory tracker,

    J. Gamblin, “Openclaw cve & security advisory tracker,” 2026. [Online]. Available: https://github.com/jgamblin/OpenClawCVEs/

  6. [6]

    Openclaw,

    OpenClaw, “Openclaw,” 2026, accessed: 2026-03-23. [Online]. Available: https://openclaw.ai/

  7. [7]

    A safety and security framework for real-world agentic systems,

    S. Ghosh, B. Simkin, K. Shiarlis, S. Nandi, D. Zhao, M. Fiedler, J. Bazinska, N. Pope, R. Prabhu, D. Rohreret al., “A safety and security framework for real-world agentic systems,” 2025. [Online]. Available: https://arxiv.org/abs/2511.21990

  8. [8]

    Agentic misalignment: How llms could be insider threats,

    A. Lynch, B. Wright, C. Larson, S. J. Ritchie, S. Mindermann, E. Hubinger, E. Perez, and K. Troy, “Agentic misalignment: How llms could be insider threats,” 2025. [Online]. Available: https://arxiv.org/abs/2510.05179

  9. [9]

    Model context protocol (mcp): Landscape, security threats, and future research directions,

    X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions,”

  10. [10]
  11. [11]

    Beyond the protocol: Unveiling attack vectors in the model context protocol (MCP) ecosystem.arXiv preprint arXiv:2506.02040, 2025

    H. Song, Y . Shen, W. Luo, L. Guo, T. Chen, J. Wang, B. Li, X. Zhang, and J. Chen, “Beyond the protocol: Unveiling attack vectors in the model context protocol (mcp) ecosystem,” 2025. [Online]. Available: https://arxiv.org/abs/2506.02040

  12. [12]

    Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,

    Y . Louck, A. Stulman, and A. Dvir, “Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,” 2025. [Online]. Available: https://arxiv.org/abs/2505.12490

  13. [13]

    Security and privacy challenges of large language models: A survey,

    B. C. Das, M. H. Amini, and Y . Wu, “Security and privacy challenges of large language models: A survey,”ACM Computing Surveys, vol. 57, no. 6, pp. 1–39, 2025

  14. [14]

    PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867, 2024

    W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.07867

  15. [15]

    Skill-inject: Measuring agent vulnerability to skill file attacks.arXiv preprint arXiv:2602.20156, 2026

    D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks,” 2026. [Online]. Available: https://arxiv.org/abs/2602.20156

  16. [16]

    How we built our multi-agent research system,

    Anthropic, “How we built our multi-agent research system,” Jun 2025, accessed: 2026-03-16. [Online]. Available: https://www.anthropic.com/ engineering/multi-agent-research-system

  17. [17]

    TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems, 2025

    S. Raza, R. Sapkota, M. Karkee, and C. Emmanouilidis, “Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,” 2025. [Online]. Available: https://arxiv.org/abs/2506.04133

  18. [18]

    Openclaw documentation,

    OpenClaw, “Openclaw documentation,” 2026, accessed: 2026-03-23. [Online]. Available: https://docs.openclaw.ai

  19. [19]

    Owasp top 10 for large language model applications,

    OW ASP Foundation, “Owasp top 10 for large language model applications,” 2025. [Online]. Available: https://owasp.org/www- project-top-10-for-large-language-model-applications/

  20. [20]

    Security and privacy in llms: A comprehensive survey of threats and mitigation strategies,

    A. D. E. Berini, N. Jamil, A.-E. Benrazek, A. Lakas, L. Ismail, M. A. Ferrag, and K.-Y . Lam, “Security and privacy in llms: A comprehensive survey of threats and mitigation strategies,” Information Fusion, vol. 132, p. 104241, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S156625352600120X

  21. [21]

    A survey on large language model based autonomous agents,

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

  22. [22]

    The rise and potential of large language model based agents: A survey,

    Z. Xi, W. Chen, X. Guo, W. He, Y . Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y . Zhou, W. Wang, C. Jiang, Y . Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Cheng, Q. Zhang, W. Qin, Y . Zheng, X. Qiu, X. Huang, and T. Gui, “The rise and potential of large language model based agents: A survey,” 2023

  23. [23]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations (ICLR), 2023

  24. [24]

    Re- flexion: Language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Re- flexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023

  25. [25]

    Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,

    Y . Shen, K. Song, X. Tan, D. Li, W. Lu, and Y . Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 154–38 180, 2023

  26. [26]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural informa- tion processing systems, vol. 36, pp. 68 539–68 551, 2023

  27. [27]

    Generative agents: Interactive simulacra of human behavior,

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th annual acm symposium on user interface software and technology. Association for Computing Machinery, 2023, pp. 1–22

  28. [28]

    How openclaw works: Understanding ai agents through a real architecture,

    B. Poudel, “How openclaw works: Understanding ai agents through a real architecture,” Feb 2026. [Online]. Available: https://bibek-poudel. medium.com/how-openclaw-works-understanding-ai-agents-through-a- real-architecture-5d59cc7a4764

  29. [29]

    Openclaw architecture, explained: How it works,

    P. Perazzo, “Openclaw architecture, explained: How it works,” Feb

  30. [30]

    Available: https://ppaolo.substack.com/p/openclaw- system-architecture-overview#

    [Online]. Available: https://ppaolo.substack.com/p/openclaw- system-architecture-overview#

  31. [31]

    Agent zero: A personal, organic agentic framework that grows and learns with you,

    agent0ai, “Agent zero: A personal, organic agentic framework that grows and learns with you,” https://github.com/agent0ai/agent- zero, 2026, gitHub repository

  32. [32]

    Hermes agent: The agent that grows with you,

    Nous Research, “Hermes agent: The agent that grows with you,” https: //github.com/NousResearch/hermes-agent, 2026, gitHub repository

  33. [33]

    Available: https://arxiv.org/abs/2504.03111

    Z. Li, J. Cui, X. Liao, and L. Xing, “Les dissonances: Cross-tool harvesting and polluting in pool-of-tools empowered llm agents,” 2025. [Online]. Available: https://arxiv.org/abs/2504.03111

  34. [34]

    V oltagent,

    V oltAgent, “V oltagent,” 2026, accessed: 2026-03-18. [Online]. Available: https://voltagent.dev/

  35. [35]

    Vertex ai agent builder,

    G. Cloud, “Vertex ai agent builder,” 2026, accessed: 2026-03-18. [Online]. Available: https://cloud.google.com/products/agent-builder

  36. [36]

    Memory os of ai agent

    J. Kang, M. Ji, Z. Zhao, and T. Bai, “Memory os of ai agent,” 2025. [Online]. Available: https://arxiv.org/abs/2506.06326

  37. [37]

    Guide to attribute based access control (abac) definition and considerations (draft),

    V . C. Hu, D. Ferraiolo, R. Kuhn, A. R. Friedman, A. J. Lang, M. M. Cogdell, A. Schnitzer, K. Sandlin, R. Miller, K. Scarfoneet al., “Guide to attribute based access control (abac) definition and considerations (draft),”NIST special publication, vol. 800, no. 162, pp. 1–54, 2013

  38. [38]

    Zhang, Z

    K. Zhang, Z. Su, P.-Y . Chen, E. Bertino, X. Zhang, and N. Li, “Llm agents should employ security principles,” 2025. [Online]. Available: https://arxiv.org/abs/2505.24019

  39. [39]

    Sok: Evaluating jailbreak guardrails for large language models.arXiv preprint arXiv:2506.10597, 2025a

    X. Wang, Z. Ji, W. Wang, Z. Li, D. Wu, and S. Wang, “Sok: Evaluating jailbreak guardrails for large language models,” 2025. [Online]. Available: https://arxiv.org/abs/2506.10597 14

  40. [40]

    A new era in llm security: Exploring security concerns in real-world llm-based systems,

    F. Wu, N. Zhang, S. Jha, P. McDaniel, and C. Xiao, “A new era in llm security: Exploring security concerns in real-world llm-based systems,” arXiv preprint arXiv:2402.18649, 2024

  41. [41]

    The protection of information in computer systems,

    J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,”Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975

  42. [42]

    Prompt Injection attack against LLM-integrated Applications

    Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y . Liu, H. Wang, Y . Zhenget al., “Prompt injection attack against llm-integrated applications,”arXiv preprint arXiv:2306.05499, 2025

  43. [43]

    Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,” inProceedings of the 16th ACM workshop on artificial intelligence and security, 2023, pp. 79–90

  44. [44]

    Benchmarking and defending against indirect prompt injection attacks on large language models,

    J. Yi, Y . Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F. Wu, “Benchmarking and defending against indirect prompt injection attacks on large language models,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, 2025, pp. 1809–1820

  45. [45]

    Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,

    Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10 471–10 506

  46. [46]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,”Advances in Neural Information Processing Systems, vol. 37, pp. 82 895–82 920, 2024

  47. [47]

    Evil Geniuses : Delving into the Safety of LLM -based Agents , February 2024

    Y . Tian, X. Yang, J. Zhang, Y . Dong, and H. Su, “Evil geniuses: Delving into the safety of llm-based agents,”arXiv preprint arXiv:2311.11855, 2023

  48. [48]

    Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

    D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems,”arXiv preprint arXiv:2410.07283, 2024

  49. [49]

    The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

    M. Lupinacci, F. A. Pironti, F. Blefari, F. Romeo, L. Arena, and A. Furfaro, “The dark side of llms: Agent-based attacks for complete computer takeover,”arXiv preprint arXiv:2507.06850, 2025

  50. [50]

    Multi-agent systems execute arbitrary malicious code

    H. Triedman, R. Jha, and V . Shmatikov, “Multi-agent systems execute arbitrary malicious code,”arXiv preprint arXiv:2503.12188, 2025

  51. [51]

    The confused deputy: (or why capabilities might have been invented),

    N. Hardy, “The confused deputy: (or why capabilities might have been invented),”ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988

  52. [52]

    Netsafe: Exploring the topological safety of multi-agent networks,

    M. Yu, S. Wang, G. Zhang, J. Mao, C. Yin, Q. Liu, Q. Wen, K. Wang, and Y . Wang, “Netsafe: Exploring the topological safety of multi-agent networks,”arXiv preprint arXiv:2410.15686, 2024

  53. [53]

    Red-teaming llm multi-agent systems via communication attacks,

    P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, and H. Liu, “Red-teaming llm multi-agent systems via communication attacks,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 6726– 6747

  54. [54]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instruc- tions,”arXiv preprint arXiv:2404.13208, 2024

  55. [55]

    Defeating Prompt Injections by Design

    E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,”arXiv preprint arXiv:2503.18813, 2025

  56. [56]

    On optimistic methods for concurrency control,

    H.-T. Kung and J. T. Robinson, “On optimistic methods for concurrency control,”ACM Transactions on Database Systems (TODS), vol. 6, no. 2, pp. 213–226, 1981

  57. [57]

    Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

    H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, and Y . Zhang, “Agent security bench (asb): Formalizing and bench- marking attacks and defenses in llm-based agents,”arXiv preprint arXiv:2410.02644, 2024

  58. [58]

    AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

    M. Andriushchenko, A. Souly, M. Dziemian, D. Duenas, M. Lin, J. Wang, D. Hendrycks, A. Zou, Z. Kolter, M. Fredriksonet al., “Agentharm: A benchmark for measuring harmfulness of llm agents,” arXiv preprint arXiv:2410.09024, 2024. 15