Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

Andis Draguns; Annie Gray; Ayush Chopra; Ben Bucknall; Ben Hagag; Chandler Smith; Christian Schroeder de Witt; Doron Cohen; Igor Krawczuk; Jaron Mink

arxiv: 2505.02077 · v2 · submitted 2025-05-04 · 💻 cs.CR · cs.AI· cs.MA

Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

Christian Schroeder de Witt , Klaudia Krawiecka , Igor Krawczuk , Ben Hagag , William L. Anderson , Peter Belcak , Ben Bucknall , Xiaohong Cai

show 16 more authors

Ayush Chopra Doron Cohen Ron F. Del Rosario Andis Draguns Annie Gray Keren Katz Vasilios Mavroudis Jaron Mink Sumeet Ramesh Motwani Jonathan Petit Leif-Sebastian Rembeck Chandler Smith John Sotiropoulos Steven Young Sarah Scheffler Mary Llewellyn

This is my paper

Pith reviewed 2026-05-22 17:27 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.MA

keywords multi-agent securityAI agent interactionssecret collusionswarm attacksnetwork effectsAI safetycybersecuritythreat taxonomy

0 comments

The pith

Networks of AI agents create security threats through their interactions that existing frameworks do not cover.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces multi-agent security as a new field to protect networks of AI agents from risks that arise or grow through direct and indirect interactions with each other, humans, and institutions. Free-form protocols, required for agents to generalize across tasks, also open pathways for threats such as secret collusion and coordinated swarm attacks. These threats can propagate quickly through network effects, while agents can use dispersion and stealth to evade oversight, producing persistent systemic risks. The work maps the emerging threat landscape, connects work from scattered subfields, and outlines a research agenda for secure agent systems and interaction environments. Clarifying these issues supports safer large-scale agent use by reducing risks to public trust and critical infrastructure.

Core claim

Multi-agent security is required as a distinct field because interactions among AI agents generate threats like secret collusion, swarm attacks, and rapid spread of privacy breaches or disinformation that amplify beyond the reach of single-agent AI security, standard cybersecurity, or multi-agent learning alone, with fundamental trade-offs between security and utility as well as among security properties in both distributed and decentralized settings.

What carries the argument

A taxonomy of threats arising from agent interactions that identifies how free-form protocols enable novel risks and how network effects and stealth behaviors create systemic vulnerabilities.

If this is right

Design of agent systems must explicitly address both direct interactions and effects through shared environments.
Insights from AI security, game theory, distributed systems, and complex systems need integration to cover interaction-based risks.
A unified research agenda can reduce systemic vulnerabilities before large-scale agent deployments occur in infrastructure and defense.
Balancing security-utility trade-offs becomes necessary for maintaining both performance and safety in decentralized agent networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standards for AI deployment may need to include testing for interaction effects rather than evaluating agents in isolation.
Simulation environments that model shared spaces and indirect influences could reveal hidden collusion patterns not visible in single-agent tests.
Connections to institutional governance suggest that security policies should track how agent networks interact with human and organizational actors.

Load-bearing premise

That threats emerging specifically from multi-agent interactions cannot be fully addressed by extending or combining current AI security and cybersecurity methods.

What would settle it

A demonstration that all documented multi-agent threats, including secret collusion and coordinated swarm attacks, can be prevented using only existing single-agent security and cybersecurity techniques without additional multi-agent-specific measures.

read the original abstract

AI agents are beginning to interact with each other directly and across internet platforms and physical environments, creating security challenges beyond traditional cybersecurity and AI safety frameworks. Free-form protocols are essential for AI's task generalization but enable new threats like secret collusion and coordinated swarm attacks. Network effects can rapidly spread privacy breaches, disinformation, jailbreaks, and data poisoning, while multi-agent dispersion and stealth optimization help adversaries evade oversight - creating novel persistent threats at a systemic level. Despite their critical importance, these security challenges remain understudied, with research fragmented across disparate fields including AI security, multi-agent learning, complex systems, cybersecurity, game theory, distributed systems, and technical AI governance. We introduce multi-agent security, a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions - whether direct or indirect via shared environments - with each other, humans, and institutions, and characterise fundamental security-utility and security-security trade-offs across both distributed and decentralised settings. Our preliminary work (1) taxonomizes the threat landscape arising from interacting AI agents, (2) offers applications to multi-agent security for work across diffuse subfields, and (3) proposes a unified research agenda addressing open challenges in designing secure agent systems and interaction environments. By identifying these gaps, we aim to guide research in this critical area to unlock the socioeconomic potential of large-scale agent deployment, foster public trust, and mitigate national security risks in critical infrastructure and defense contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper that names multi-agent security as a new field but does not yet map its claimed threats against existing work to show clear gaps.

read the letter

The main thing to know is that this paper proposes multi-agent security as a distinct area focused on risks that arise when AI agents interact directly or through shared environments. It argues that free-form protocols, while useful for generalization, open the door to issues like secret collusion, coordinated swarm attacks, and rapid spread of jailbreaks or data poisoning that single-agent or traditional security work might miss. The authors sketch a taxonomy and a research agenda to pull together ideas from AI security, multi-agent learning, game theory, and distributed systems. That synthesis is the clearest contribution here, and it could help organize thinking about security in large agent deployments. The discussion of security-utility and security-security trade-offs in distributed versus decentralized settings also gives a reasonable high-level frame for future work. The paper does a fair job flagging why these problems matter for infrastructure and defense contexts. The soft spots are in the novelty argument. The central premise is that existing frameworks do not adequately handle interaction-amplified threats, yet there is no detailed comparison showing specific mechanisms or failure modes that prior results on repeated games, botnet coordination, or MARL robustness have left open. Without that mapping or even a few concrete examples, the case for a brand-new field stays conceptual rather than demonstrated. The support is argumentative throughout, with no experiments, derivations, or data to ground the scale of the risks. This paper is mainly for researchers already working on AI agent systems who want an overview of potential security issues and some pointers for what to study next. Readers expecting methods, proofs, or empirical results will find little to use directly. It shows honest engagement with the literature and clear thinking about an emerging topic, so it deserves peer review to test whether the proposed separation from related fields adds enough substance.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces multi-agent security as a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions—direct or indirect via shared environments—with each other, humans, and institutions. It taxonomizes the threat landscape (including secret collusion, coordinated swarm attacks, network-amplified jailbreaks and data poisoning, and stealth optimization), characterizes fundamental security-utility and security-security trade-offs across distributed and decentralized settings, and proposes a unified research agenda to address open challenges in designing secure agent systems and interaction environments.

Significance. If the claimed novelty of interaction-emergent threats and the associated trade-offs hold after substantiation, the paper could help consolidate fragmented research across AI security, multi-agent learning, game theory, cybersecurity, and distributed systems. It has potential value in guiding secure large-scale agent deployment for socioeconomic benefits while mitigating risks in critical infrastructure and defense, though as a primarily conceptual contribution its impact would depend on inspiring targeted follow-up technical work rather than providing immediate solutions or empirical results.

major comments (2)

[Abstract] Abstract: The central claim that free-form protocols enable novel threats (secret collusion, coordinated swarm attacks, network-amplified jailbreaks/data poisoning) not adequately addressed by existing AI security, cybersecurity, multi-agent learning, game theory, or distributed systems frameworks requires an explicit comparative mapping or citation analysis to prior work on collusion in repeated games, botnet coordination, or MARL robustness; without this, the justification for introducing multi-agent security as a distinct field remains unsubstantiated and is load-bearing for the paper's core contribution.
[Abstract] Abstract: The characterization of fundamental security-utility and security-security trade-offs is asserted without reference to specific mechanisms, examples, or derivations from the preliminary taxonomy, making it difficult to evaluate how these trade-offs arise uniquely from multi-agent interactions versus established models in related fields.

minor comments (2)

[Abstract] The abstract's reference to 'preliminary work (1) taxonomizes...' would benefit from clearer cross-referencing to specific sections or subsections once the full manuscript structure is presented.
Consider providing brief definitions or examples for terms such as 'multi-agent dispersion' and 'stealth optimization' early in the manuscript to improve accessibility for readers from adjacent fields.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which identify key areas where the manuscript's claims can be more rigorously substantiated. We address each major comment point by point below, agreeing where revisions are warranted to strengthen the justification for multi-agent security as a distinct field.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that free-form protocols enable novel threats (secret collusion, coordinated swarm attacks, network-amplified jailbreaks/data poisoning) not adequately addressed by existing AI security, cybersecurity, multi-agent learning, game theory, or distributed systems frameworks requires an explicit comparative mapping or citation analysis to prior work on collusion in repeated games, botnet coordination, or MARL robustness; without this, the justification for introducing multi-agent security as a distinct field remains unsubstantiated and is load-bearing for the paper's core contribution.

Authors: We agree that an explicit comparative mapping would better substantiate the novelty claim and strengthen the case for a dedicated field. The manuscript positions multi-agent security as addressing the synthesis of threats arising specifically from autonomous AI agents using free-form protocols and network effects, which differ from scripted coordination in botnets or human-mediated collusion in repeated games. However, the initial version did not include a dedicated mapping. In the revised manuscript, we will add a new subsection (likely in the introduction or a dedicated related work discussion) containing a comparative table. This table will map each identified threat category to relevant prior work in game theory (e.g., collusion in repeated games), cybersecurity (e.g., botnet coordination and detection), MARL robustness, and AI security, while highlighting the unique dimensions introduced by AI agents such as stealth optimization, natural language-based secret collusion, and rapid network amplification. We will cite additional relevant papers to support this analysis. This revision directly addresses the load-bearing concern without altering the paper's conceptual scope or conclusions. revision: yes
Referee: [Abstract] Abstract: The characterization of fundamental security-utility and security-security trade-offs is asserted without reference to specific mechanisms, examples, or derivations from the preliminary taxonomy, making it difficult to evaluate how these trade-offs arise uniquely from multi-agent interactions versus established models in related fields.

Authors: We acknowledge that the abstract presents the trade-offs at a high level without explicit linkages or examples. In the full manuscript, these trade-offs are characterized based on the preliminary taxonomy (for instance, how network-amplified data poisoning creates security-utility trade-offs by necessitating constrained interaction protocols that reduce agent task performance, or how dispersion in decentralized settings yields security-security trade-offs between individual agent privacy and collective oversight). To improve evaluability, we will revise the abstract to include brief references to specific mechanisms and examples drawn from the taxonomy. We will also add clarifying text or short illustrative scenarios in the main body (in the section discussing trade-offs across distributed and decentralized settings) to derive these trade-offs explicitly and contrast them with established models in related fields. This will make the unique contributions of multi-agent interactions clearer. revision: yes

Circularity Check

0 steps flagged

No circularity: field introduction rests on external literature survey

full rationale

The paper defines multi-agent security and its threat taxonomy by reference to fragmentation and gaps across independent prior fields (AI security, multi-agent learning, cybersecurity, game theory, distributed systems). No equations, fitted parameters, or self-referential derivations appear; the novelty claim is asserted via the state of external literature rather than reducing to the paper's own definitions or citations. This is the standard non-circular pattern for position papers that survey open challenges.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central proposal rests on domain assumptions about the novelty of threats from agent interactions and the insufficiency of existing frameworks.

axioms (2)

domain assumption Free-form protocols are essential for AI's task generalization but enable new threats like secret collusion and coordinated swarm attacks.
This is stated in the abstract as a basis for the new threats.
domain assumption Research on these security challenges is fragmented across disparate fields.
Used to justify the need for a unified field.

invented entities (1)

multi-agent security no independent evidence
purpose: A dedicated field for securing interacting AI agent networks
This is newly introduced in the paper without prior references.

pith-pipeline@v0.9.0 · 5905 in / 1438 out tokens · 57998 ms · 2026-05-22T17:27:25.813680+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce multi-agent security, a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring
cs.MA 2026-05 unverdicted novelty 7.0

CASPIAN introduces unified cross-channel causal monitoring via late-interaction conditional transfer entropy to detect cascade onset and attribute origin, bridge, and amplifier agents in LLM multi-agent systems.
The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play
cs.AI 2026-05 unverdicted novelty 7.0

Anchored Bipolicy Self-Play trains role-specific LoRA adapters on a frozen base model to break self-consistency collapse in self-play red-teaming, yielding up to 100x parameter efficiency and stronger safety on Qwen2....
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation
cs.AI 2026-04 unverdicted novelty 7.0

ClawNet digitizes human collaborative relationships into a network of identity-governed AI agents that collaborate on behalf of their owners through a central orchestrator enforcing binding and verification.
AI Agents Under EU Law
cs.CY 2026-04 unverdicted novelty 7.0

AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
cs.CR 2026-03 unverdicted novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
cs.AI 2025-10 unverdicted novelty 6.0

Introduces host agent and task lifecycle models plus 30 temporal logic properties to enable formal verification of liveness, safety, completeness, and fairness in agentic AI systems.
Scheming Ability in LLM-to-LLM Strategic Interactions
cs.CL 2025-10 conditional novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents
cs.CR 2026-05 unverdicted novelty 5.0

MolTrust deploys a W3C VC+DID trust infrastructure for AI agents with kernel-layer authorization, cross-protocol interoperability, and layered Sybil resistance, operational since March 2026 across eight verticals.
AgentReputation: A Decentralized Agentic AI Reputation Framework
cs.AI 2026-04 unverdicted novelty 5.0

AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verif...
SoK: Security of Autonomous LLM Agents in Agentic Commerce
cs.CR 2026-04 unverdicted novelty 5.0

The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
cs.AI 2025-10 unverdicted novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.