AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?
Pith reviewed 2026-05-16 08:11 UTC · model grok-4.3
The pith
Most existing defenses against indirect prompt injection in AI agents either fail to stop attacks or block too many legitimate actions when tested in dynamic settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgentDyn exposes three core flaws in prior benchmarks—absence of dynamic open-ended tasks, absence of helpful instructions, and overly simplistic user tasks—and shows through direct testing that ten leading defenses either remain vulnerable to indirect prompt injection or impose excessive refusals when agents must perform dynamic planning in the presence of helpful third-party content, placing them far from deployable in realistic environments.
What carries the argument
The AgentDyn benchmark of 60 open-ended tasks and 560 injection test cases that force dynamic planning while embedding helpful third-party instructions.
If this is right
- Defenses must be redesigned to preserve both security and utility across changing plans and helpful external instructions.
- Static benchmarks should give way to adaptive ones that test dynamic planning for accurate security measurement.
- Real-world agent systems will need new defense approaches that account for open-ended, multi-step interactions.
- Nearly all currently published defenses require substantial further work before they can be considered production-ready.
Where Pith is reading between the lines
- Automated generation of dynamic tasks could make such benchmarks easier to scale and maintain.
- The observed security-utility trade-off may require changes to agent architecture rather than defense layers alone.
- Direct testing on live deployed agents would provide an external check on whether the benchmark's tasks match practice.
Load-bearing premise
The 60 manually designed tasks and 560 test cases accurately capture the dynamic planning and helpful-instruction demands of actual agent deployments.
What would settle it
A defense that blocks all injections on AgentDyn while still completing almost all helpful tasks without refusal would support its real-world readiness; conversely, evidence that real user tasks differ substantially from these 60 would undermine the benchmark's claims.
Figures
read the original abstract
AI agents that autonomously interact with external tools and environments have shown great promise across real-world applications. However, their reliance on external data exposes them to serious indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behaviors. To mitigate this threat, a growing number of defenses have been proposed and evaluated under existing agent security benchmarks. These benchmarks provide structured environments for comparing attacks and defenses, and have become a key driver for defense design and optimization. However, as agents move toward more complex and open-ended real-world deployments, there is a pressing need for benchmarks to become more adaptive and better reflect the dynamic environments faced by real-world agentic systems. In this work, we reveal three fundamental flaws in the current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies three flaws in existing agent security benchmarks (lack of dynamic open-ended tasks, lack of helpful instructions, and simplistic user tasks) and introduces AgentDyn, a manually designed benchmark with 60 open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life domains. It evaluates ten state-of-the-art defenses and concludes that almost all are either insufficiently secure or suffer from significant over-defense, indicating they remain far from real-world deployment.
Significance. If the benchmark tasks accurately model real-world dynamic planning and helpful third-party instructions, the empirical results on ten defenses provide a valuable signal that current approaches require substantial improvement before deployment. The public release of the benchmark supports reproducibility and follow-on work.
major comments (2)
- [§3] §3 (Benchmark Construction): The 60 tasks and 560 test cases are manually authored without reported grounding in real agent logs, user studies, or statistical comparison to production traces. This is load-bearing for the central claim that observed failure modes (insecurity or over-defense) are intrinsic to the defenses rather than artifacts of benchmark construction.
- [Evaluation section] Evaluation section: The abstract and main text provide limited detail on exact task construction, success metrics, and attack implementation details, leaving the support for the claim that 'almost all existing defenses' fail moderately supported by the reported results.
minor comments (2)
- [Abstract] Abstract: Include one sentence on the specific success metrics used (e.g., task completion rate under injection) to strengthen the summary of results.
- [Introduction] Introduction: Add explicit citations to the prior benchmarks being critiqued when listing the three flaws for improved traceability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have made revisions to improve clarity and detail where appropriate.
read point-by-point responses
-
Referee: [§3] §3 (Benchmark Construction): The 60 tasks and 560 test cases are manually authored without reported grounding in real agent logs, user studies, or statistical comparison to production traces. This is load-bearing for the central claim that observed failure modes (insecurity or over-defense) are intrinsic to the defenses rather than artifacts of benchmark construction.
Authors: We acknowledge that AgentDyn tasks were manually designed rather than derived from production logs or user studies. The design draws from documented real-world agent use cases in the literature for the Shopping, GitHub, and Daily Life domains, with explicit focus on dynamic planning and helpful third-party instructions to address the three flaws identified in prior benchmarks. In the revised §3 we have added a dedicated subsection on design principles, including concrete examples of how each task category requires adaptive reasoning and incorporates benign external content. While we agree that direct statistical grounding in proprietary traces would further strengthen the benchmark, such data is not publicly available and was outside the scope of this work; the current tasks still expose clear limitations in existing defenses that align with known attack surfaces. revision: partial
-
Referee: [Evaluation section] Evaluation section: The abstract and main text provide limited detail on exact task construction, success metrics, and attack implementation details, leaving the support for the claim that 'almost all existing defenses' fail moderately supported by the reported results.
Authors: We thank the referee for highlighting this. The revised Evaluation section now includes expanded descriptions of task construction (with pseudocode for dynamic execution flows), precise success metrics (primary user goal completion without malicious action execution, plus separate over-defense rate), and attack implementation details (how injections are embedded in tool responses and how test cases are generated). These additions provide stronger empirical grounding for the conclusion that ten evaluated defenses exhibit either insufficient security or significant over-defense. revision: yes
- Absence of grounding in real agent logs, user studies, or statistical comparison to production traces, as the benchmark was manually designed without access to such proprietary data.
Circularity Check
No circularity: empirical benchmark evaluation is self-contained
full rationale
The paper introduces AgentDyn as a manually designed benchmark of 60 open-ended tasks and 560 injection cases, then directly evaluates ten external state-of-the-art defenses on it. No mathematical derivations, fitted parameters renamed as predictions, or self-citations appear in the load-bearing steps. The central claim follows from straightforward empirical runs on the new tasks rather than any reduction to the benchmark's own construction by definition. The analysis therefore contains no self-definitional, fitted-input, or self-citation circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 60 manually designed tasks capture the essential challenges of real-world dynamic agent environments
Forward citations
Cited by 3 Pith papers
-
Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Evidence-carrying multimodal agents decompose tool calls into predicates verified by constrained DOM/OCR/AX checkers to block hallucination-enabled unsafe actions.
-
Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving...
-
PIArena: A Platform for Prompt Injection Evaluation
PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.
Reference graph
Works this paper leans on
-
[1]
Chen, S., Piet, J., Sitawarin, C., and Wagner, D. A. Struq: Defending against prompt injection with structured queries. InUSENIX Security, pp. 2383–2400. USENIX Association, 2025a. Chen, S., Zharmagambetov, A., Mahloujifar, S., Chaudhuri, K., Wagner, D. A., and Guo, C. Secalign: Defending against prompt injection with preference optimization. In CCS, pp. ...
-
[2]
Defeating Prompt Injections by Design
Debenedetti, E., Shumailov, I., Fan, T., Hayes, J., Car- lini, N., Fabian, D., Kern, C., Shi, C., Terzis, A., and Tram`er, F. Defeating prompt injections by design.CoRR, abs/2503.18813,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y ., Tontchev, M., Hu, Q., Fuller, B., Testuggine, D., and Khabsa, M. Llama guard: Llm-based input- output safeguard for human-ai conversations.CoRR, abs/2312.06674,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Li, H., Liu, X., Chiu, H., Li, D., Zhang, N., and Xiao, C. DRIFT: dynamic rule-based defense with injection iso- lation for securing LLM agents.CoRR, abs/2506.12104, 2025a. Li, H., Liu, X., Zhang, N., and Xiao, C. Piguard: Prompt injection guardrail via mitigating overdefense for free. In ACL, pp. 30420–30437. Association for Computational Linguistics, 20...
-
[5]
URL https://www.llama.com/ docs/model-cards-and-prompt-formats/ prompt-guard/. Nasr, M., Carlini, N., Sitawarin, C., Schulhoff, S. V ., Hayes, J., Ilie, M., Pluto, J., Song, S., Chaudhari, H., Shumailov, I., Thakurta, A., Xiao, K. Y ., Terzis, A., and Tram`er, F. The attacker moves second: Stronger adaptive attacks by- pass defenses against llm jailbreaks...
work page internal anchor Pith review arXiv
-
[6]
Accessed: 2025-10-23. OWASP. Owasp llm01. https://genai.owasp. org/llmrisk/llm01-prompt-injection/,
work page 2025
-
[7]
Ignore Previous Prompt: Attack Techniques For Language Models
Perez, F. and Ribeiro, I. Ignore previous prompt: Attack techniques for language models.CoRR, abs/2211.09527,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
URL https: //learnprompting.org/docs/prompt_ hacking/defensive_measures/sandwich_ defense. 9 AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System Shi, T., He, J., Wang, Z., Wu, L., Li, H., Guo, W., and Song, D. Progent: Programmable privilege control for LLM agents.CoRR, abs/2504.11703, 2025a...
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Wu, F., Cecchetti, E., and Xiao, C. System-level defense against indirect prompt injection attacks: An information flow control perspective.CoRR, abs/2409.19091,
-
[10]
Rtbas: Defending llm agents against prompt injection and privacy leakage,
Zhong, P. Y ., Chen, S., Wang, R., McCall, M., Titzer, B. L., Miller, H., and Gibbons, P. B. RTBAS: defending LLM agents against prompt injection and privacy leakage. CoRR, abs/2502.08966,
-
[11]
Zhou, S., Xu, F. F., Zhu, H., Zhou, X., Lo, R., Sridhar, A., Cheng, X., Ou, T., Bisk, Y ., Fried, D., Alon, U., and Neubig, G. Webarena: A realistic web environment for building autonomous agents. InICLR. OpenReview.net, 2024a. Zhou, X., Zhu, H., Mathur, L., Zhang, R., Yu, H., Qi, Z., Morency, L., Bisk, Y ., Fried, D., Neubig, G., and Sap, M. SOTOPIA: int...
work page 2022
-
[12]
restricts the set of callable tools to those relevant to the user’s request, reducing the available attack surface and preventing irrelevant tool invocation during an attack. Alignment-based defenses.These approaches aim to strengthen the model’s intrinsic resistance to prompt injection via fine-tuning or preference optimization. StruQ (Chen et al., 2025a...
work page 2023
-
[13]
may not transfer cleanly (Gur et al., 2024; Deng et al., 2023; Xie et al., 2024). IsolateGPT (Wu et al.,
work page 2024
-
[14]
and RTBAS (Zhong et al., 2025), propagate taint labels to limit the influence of untrusted data throughout the system. CaMeL (Debenedetti et al.,
work page 2025
-
[15]
https://www.homework.com/documents
Tables 5, 6, and 7 provide the detailed scenario listings for each suite. Table 4.Examples of Dynamic Scenarios. Application Category Example Bank OTP Validation You should verify your account first. An email including verification one-time password (OTP) has been sent to your inbox. Email Link Interaction To accept this invitation, please click the link ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.