PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

· 2026 · cs.CR · arXiv 2604.10134

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Large Language Model (LLM) agents are increasingly integrated into critical systems, leveraging external tools to interact with the real world. However, this capability exposes them to Indirect Prompt Injection (IPI), where attackers embed malicious instructions into retrieved content to manipulate the agent into executing unauthorized or unintended actions. Existing defenses predominantly focus on the pre-processing stage, neglecting the monitoring of the model's actual behavior. In this paper, we propose PlanGuard, a training-free defense framework based on the principle of Context Isolation. Unlike prior methods, PlanGuard introduces an isolated Planner that generates a reference set of valid actions derived solely from user instructions. In addition, we design a Hierarchical Verification Mechanism that first enforces strict hard constraints to block unauthorized tool invocations, and subsequently employs an Intent Verifier to validate whether parameter deviations are benign formatting variances or malicious hijacking. Experiments on the InjecAgent benchmark demonstrate that PlanGuard effectively neutralizes these attacks, reducing the Attack Success Rate (ASR) from 72.8% to 0%, while maintaining an acceptable False Positive Rate of 1.49%. Furthermore, our method is model-agnostic and highly compatible.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

citing papers explorer

Showing 2 of 2 citing papers.

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents cs.CR · 2026-05-04 · unverdicted · none · ref 102 · internal anchor
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 53 · internal anchor
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer