Persona Attack uses step-by-step memory injections to achieve up to 95% success in making LLMs ignore safety alignments, with effectiveness depending on model memory and instruction combinations.
Local” exposes the agent’s own past reasoning; “Shared
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Multi-agent AI systems are more vulnerable to attacks than single agents in most tested designs, with attack success rates varying up to 3.8 times depending on how roles, communication, and memory are structured.
citing papers explorer
-
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
Persona Attack uses step-by-step memory injections to achieve up to 95% success in making LLMs ignore safety alignments, with effectiveness depending on model memory and instruction combinations.
-
Architecture Matters for Multi-Agent Security
Multi-agent AI systems are more vulnerable to attacks than single agents in most tested designs, with attack success rates varying up to 3.8 times depending on how roles, communication, and memory are structured.