Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

· 2025 · cs.CR · arXiv 2505.02077

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

AI agents are beginning to interact with each other directly and across internet platforms and physical environments, creating security challenges beyond traditional cybersecurity and AI safety frameworks. Free-form protocols are essential for AI's task generalization but enable new threats like secret collusion and coordinated swarm attacks. Network effects can rapidly spread privacy breaches, disinformation, jailbreaks, and data poisoning, while multi-agent dispersion and stealth optimization help adversaries evade oversight - creating novel persistent threats at a systemic level. Despite their critical importance, these security challenges remain understudied, with research fragmented across disparate fields including AI security, multi-agent learning, complex systems, cybersecurity, game theory, distributed systems, and technical AI governance. We introduce multi-agent security, a new field dedicated to securing networks of AI agents against threats that emerge or amplify through their interactions - whether direct or indirect via shared environments - with each other, humans, and institutions, and characterise fundamental security-utility and security-security trade-offs across both distributed and decentralised settings. Our preliminary work (1) taxonomizes the threat landscape arising from interacting AI agents, (2) offers applications to multi-agent security for work across diffuse subfields, and (3) proposes a unified research agenda addressing open challenges in designing secure agent systems and interaction environments. By identifying these gaps, we aim to guide research in this critical area to unlock the socioeconomic potential of large-scale agent deployment, foster public trust, and mitigate national security risks in critical infrastructure and defense contexts.

representative citing papers

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

Anchored Bipolicy Self-Play trains role-specific LoRA adapters on a frozen base model to break self-consistency collapse in self-play red-teaming, yielding up to 100x parameter efficiency and stronger safety on Qwen2.5 models.

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

ClawNet digitizes human collaborative relationships into a network of identity-governed AI agents that collaborate on behalf of their owners through a central orchestrator enforcing binding and verification.

AI Agents Under EU Law

cs.CY · 2026-04-06 · unverdicted · novelty 7.0

AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

cs.CR · 2026-03-28 · unverdicted · novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

MolTrust deploys a W3C VC+DID trust infrastructure for AI agents with kernel-layer authorization, cross-protocol interoperability, and layered Sybil resistance, operational since March 2026 across eight verticals.

AgentReputation: A Decentralized Agentic AI Reputation Framework

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.

SoK: Security of Autonomous LLM Agents in Agentic Commerce

cs.CR · 2026-04-15 · unverdicted · novelty 5.0

The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.

citing papers explorer

Showing 7 of 7 citing papers.

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play cs.AI · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
Anchored Bipolicy Self-Play trains role-specific LoRA adapters on a frozen base model to break self-consistency collapse in self-play red-teaming, yielding up to 100x parameter efficiency and stronger safety on Qwen2.5 models.
ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation cs.AI · 2026-04-21 · unverdicted · none · ref 21 · internal anchor
ClawNet digitizes human collaborative relationships into a network of identity-governed AI agents that collaborate on behalf of their owners through a central orchestrator enforcing binding and verification.
AI Agents Under EU Law cs.CY · 2026-04-06 · unverdicted · none · ref 102 · internal anchor
AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses cs.CR · 2026-03-28 · unverdicted · none · ref 74 · internal anchor
The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents cs.CR · 2026-05-07 · unverdicted · none · ref 14 · internal anchor
MolTrust deploys a W3C VC+DID trust infrastructure for AI agents with kernel-layer authorization, cross-protocol interoperability, and layered Sybil resistance, operational since March 2026 across eight verticals.
AgentReputation: A Decentralized Agentic AI Reputation Framework cs.AI · 2026-04-30 · unverdicted · none · ref 3 · internal anchor
AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.
SoK: Security of Autonomous LLM Agents in Agentic Commerce cs.CR · 2026-04-15 · unverdicted · none · ref 62 · internal anchor
The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.

Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

fields

years

verdicts

representative citing papers

citing papers explorer