A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
Not What You've Signed Up For: Compromising Real-World
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Multi-agent AI creates an authorization propagation problem not solved by prompt injection defenses or classical access control, requiring identity governance as continuously enforced infrastructure.
citing papers explorer
-
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
-
Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure
Multi-agent AI creates an authorization propagation problem not solved by prompt injection defenses or classical access control, requiring identity governance as continuously enforced infrastructure.