AI Agents May Always Fall for Prompt Injections

· 2026 · cs.CR · arXiv 2605.17634

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Prompt injection is the most critical vulnerability in deployed AI agents. Despite recent progress, we show that the prevailing defense paradigm (data-instruction separation) both fails to detect attacks that operate through contextual manipulation and degrades contextually appropriate behavior. We then recast prompt injection via the lens of Contextual Integrity (CI), a privacy theory that judges information flow compliance with contextual norms. This explains types of attacks that current defenses attempt to patch and predict advanced ones future agents will face. We develop unique benign and attack scenarios that force an agent to violate the norms by (1) misrepresenting the flow, (2) manipulating norms, or (3) mixing multiple flows. This reframing suggests an impossibility result: an adversary can always construct a context under which a blocked flow appears legitimate, or a defender who tightens norms will block genuinely legitimate flows. Our findings suggest that current research addresses a shrinking fraction of future attack surfaces. Instead, through CI, we offer a principled framework for evaluating context-sensitive failures, and designing CI-aware alignment for the frontier autonomous agents.

representative citing papers

Janus: a Playground for User-Involved Agentic Permission Management

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

Janus is a publicly available playground system and evaluation harness for testing user-involved permission management designs in AI agents, demonstrating benefits of user input and the need for context-sensitive approaches.

AI Snitches Get Glitches: Towards Evading Agentic Surveillance

cs.AI · 2026-06-24 · unverdicted · novelty 6.0 · 2 refs

Formalizes agentic surveillance, releases SurveilBench for testing AI reporting behaviors across corporate, education, and police scenarios, and develops three prompt-injection evasion techniques.

Red-Teaming the Agentic Red-Team

cs.CR · 2026-06-23 · unverdicted · novelty 6.0

Agentic offensive security tools share design flaws enabling API key exfiltration, persistence, and sandbox escape, addressed via a new cyber kill chain and robust architecture principles.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Janus: a Playground for User-Involved Agentic Permission Management cs.AI · 2026-07-01 · unverdicted · none · ref 7 · internal anchor
Janus is a publicly available playground system and evaluation harness for testing user-involved permission management designs in AI agents, demonstrating benefits of user input and the need for context-sensitive approaches.
AI Snitches Get Glitches: Towards Evading Agentic Surveillance cs.AI · 2026-06-24 · unverdicted · none · ref 1 · 2 links · internal anchor
Formalizes agentic surveillance, releases SurveilBench for testing AI reporting behaviors across corporate, education, and police scenarios, and develops three prompt-injection evasion techniques.

AI Agents May Always Fall for Prompt Injections

fields

years

verdicts

representative citing papers

citing papers explorer