Recognition: 1 theorem link
· Lean TheoremIPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection
Pith reviewed 2026-05-13 05:37 UTC · model grok-4.3
The pith
IPI-proxy uses an intercepting proxy to embed prompt injection attacks in live responses from whitelisted domains for testing web-browsing AI agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present IPI-proxy, an open-source toolkit whose core is an intercepting proxy that rewrites real HTTP responses from whitelisted domains in flight. It draws from a library of 820 deduplicated attack strings taken from six published benchmarks and inserts them via chosen embedding techniques and HTML locations, all controlled through a YAML harness that also logs successful exfiltrations.
What carries the argument
The intercepting proxy that rewrites live HTTP responses from approved domains to embed indirect prompt injection payloads.
Load-bearing premise
Modifying responses in flight does not cause the agent to detect the change or behave differently enough to make the test results invalid.
What would settle it
Run identical agent queries both with the proxy active and with direct access to the same domains, then check whether the agent's actions, outputs, or success rates differ in ways attributable to the proxy itself.
Figures
read the original abstract
Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-injection benchmarks ship pre-built adversarial pages that whitelisted agents cannot reach, and generic LLM scanners probe the model API rather than its retrieved content. We present IPI-proxy, an open-source toolkit for red-teaming web-browsing agents against indirect prompt injection (IPI). At its core is an intercepting proxy that rewrites real HTTP responses from whitelisted domains in flight, embedding payloads drawn from a unified library of 820 deduplicated attack strings extracted from six published benchmarks (BIPIA, InjecAgent, AgentDojo, Tensor Trust, WASP, and LLMail-Inject). A YAML-driven test harness independently parameterizes the payload set, the embedding technique (HTML comment, invisible CSS, or LLM-generated semantic prose), and the HTML insertion point (6 locations from \icode{head\_meta} to \icode{script\_comment}), enabling parameter-sweep evaluation without mock pages or sandboxed environments. A companion exfiltration tracker logs successful callbacks. This paper describes the threat model, situates IPI-proxy among contemporary IPI benchmarks and red-teaming tools, and details its architecture, design decisions, and configuration interface. By bridging static benchmarks and live deployment, IPI-proxy gives AI security teams a reproducible substrate for measuring and hardening web-browsing agents against indirect prompt injection on the same retrieval surface attackers exploit in production.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents IPI-proxy, an open-source intercepting proxy toolkit for red-teaming web-browsing AI agents against indirect prompt injection (IPI). It rewrites real HTTP responses from whitelisted domains in flight by embedding attack payloads drawn from a unified library of 820 deduplicated strings extracted from six published benchmarks (BIPIA, InjecAgent, AgentDojo, Tensor Trust, WASP, and LLMail-Inject). A YAML-driven harness parameterizes the payload set, embedding technique (HTML comment, invisible CSS, or semantic prose), and one of six HTML insertion points, with a companion exfiltration tracker. The manuscript describes the threat model, situates the tool among existing IPI benchmarks and red-teaming resources, and details its architecture, design decisions, and configuration interface, claiming it bridges static benchmarks and live deployment for reproducible testing on production-like retrieval surfaces.
Significance. If the proxy maintains behavioral equivalence with unmodified pages, the tool would offer a valuable, reproducible substrate for AI security teams to measure and harden agents against IPI using actual whitelisted domains rather than mock pages. Credit is due for releasing an open-source implementation, unifying 820 strings across six benchmarks into a single library, providing a flexible YAML harness for parameter sweeps, and focusing on the exact retrieval surface exploited in production deployments. This addresses a practical gap between static benchmarks and live agent red-teaming.
major comments (2)
- [Architecture] Architecture section: The description of in-flight rewriting (via HTML comment, invisible CSS, or LLM-generated semantic insertion at the six points) supplies no implementation details on preserving response integrity, such as handling of CSP headers, subresource integrity (SRI), signed content, or dynamic JavaScript. This is load-bearing for the central claim that the proxy operates on the 'same retrieval surface' attackers exploit in production, as unaddressed modifications could alter navigation, parsing, or agent output and invalidate red-teaming results.
- [Design Decisions] Design decisions and threat model sections: No behavioral equivalence tests, agent interaction logs, or case studies are provided to confirm that agents treat the rewritten pages identically to unmodified ones. Without such validation, the claim that IPI-proxy enables reproducible measurement 'on the same retrieval surface' cannot be assessed, directly undermining the tool's stated utility for hardening web-browsing agents.
minor comments (2)
- [Payload Library] The abstract states the library contains '820 deduplicated attack strings' but provides no breakdown by source benchmark or deduplication method; adding this table or appendix would improve reproducibility.
- A diagram illustrating the proxy flow, harness parameterization, and exfiltration tracker would clarify the architecture description and configuration interface.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the description of the tool's fidelity and utility.
read point-by-point responses
-
Referee: [Architecture] Architecture section: The description of in-flight rewriting (via HTML comment, invisible CSS, or LLM-generated semantic insertion at the six points) supplies no implementation details on preserving response integrity, such as handling of CSP headers, subresource integrity (SRI), signed content, or dynamic JavaScript. This is load-bearing for the central claim that the proxy operates on the 'same retrieval surface' attackers exploit in production, as unaddressed modifications could alter navigation, parsing, or agent output and invalidate red-teaming results.
Authors: We agree that explicit details on response integrity are needed to substantiate the claim of operating on the same retrieval surface. The proxy implementation forwards responses with targeted body modifications only, preserving most headers by default while providing configuration hooks for CSP and other policies. In the revised manuscript we will add a dedicated paragraph in the Architecture section describing header handling, SRI implications, signed content considerations, and dynamic JavaScript behavior, together with configuration options that allow users to minimize deviations from unmodified pages. revision: yes
-
Referee: [Design Decisions] Design decisions and threat model sections: No behavioral equivalence tests, agent interaction logs, or case studies are provided to confirm that agents treat the rewritten pages identically to unmodified ones. Without such validation, the claim that IPI-proxy enables reproducible measurement 'on the same retrieval surface' cannot be assessed, directly undermining the tool's stated utility for hardening web-browsing agents.
Authors: The manuscript presents IPI-proxy as a configurable toolkit whose primary purpose is to enable such evaluations by end users rather than to report exhaustive pre-computed results. Nevertheless, we acknowledge that initial validation would strengthen the central claim. In the revised version we will add a short 'Behavioral Equivalence' subsection under Design Decisions that includes example interaction logs from a representative open-source web-browsing agent, confirming that navigation, parsing, and content extraction remain consistent before and after proxying, while also noting edge cases where full equivalence may not hold. revision: yes
Circularity Check
No significant circularity; tool description is self-contained
full rationale
The paper describes an engineering toolkit (intercepting proxy, YAML harness, payload library) that aggregates strings from six external benchmarks without equations, fitted parameters, predictions, or derivations. Core claims rest on the independent implementation of in-flight rewriting and test parameterization rather than any self-referential reduction. External citations for payloads are not load-bearing for the proxy architecture itself, and no uniqueness theorems or ansatzes are invoked. This matches the default expectation of no circularity for a tool paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Web-browsing AI agents retrieve and process HTML content from whitelisted domains in production settings.
- domain assumption Indirect prompt injection can be realized by embedding instructions in served HTML pages.
Reference graph
Works this paper leans on
-
[1]
Model Context Protocol Specification,
Anthropic, “Model Context Protocol Specification,”https://modelcontextprotocol.io/specification/2025-06-18 , 2024, originally introduced by Anthropic in November 2024; subsequently stewarded by the Linux Foundation Agentic AI Foundation. Ac- cessed: 2026-05-08
work page 2025
-
[2]
EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System,
P . Reddy and A. S. Gujral, “EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System,” arXiv preprint arXiv:2509.10540, 2025, aAAI Fall Symposium Series 2025
-
[3]
OWASP Top 10 for Large Language Model Applications 2025,
OWASP Foundation, “OWASP Top 10 for Large Language Model Applications 2025,” https://genai.owasp.org/resource/ owasp-top-10-for-llm-applications-2025/ , 2025, accessed: 2026-05-07
work page 2025
-
[4]
OWASP Top 10 for Agentic Applications for 2026,
——, “OWASP Top 10 for Agentic Applications for 2026,” https://genai.owasp.org/resource/ owasp-top-10-for-agentic-applications-for-2026 , 2025, accessed: 2026-05-07. 6 of 8 IPI-proxy
work page 2026
-
[5]
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T . Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real- World LLM-Integrated Applications with Indirect Prompt Injection,” in Proceedings of the 16th ACM Workshop on Artificial Intel- ligence and Security (AISec ’23), 2023, arXiv:2302.12173
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
Ignore Previous Prompt: Attack Techniques For Language Models
F . Perez and I. Ribeiro, “Ignore Previous Prompt: Attack Techniques For Language Models,” arXiv preprint arXiv:2211.09527 , 2022, neurIPS 2022 ML Safety Workshop (Best Paper Award)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[7]
Prompt Injection attack against LLM-integrated Applications
Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T . Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt Injection Attack Against LLM-integrated Applications,” arXiv preprint arXiv:2306.05499, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Formalizing and Benchmarking Prompt Injection Attacks and Defenses,
Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and Benchmarking Prompt Injection Attacks and Defenses,” in Proceedings of the 33rd USENIX Security Symposium , 2024
work page 2024
-
[9]
J. Yi, Y . Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F . Wu, “Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models,” arXiv preprint arXiv:2312.14197, 2023, accepted to KDD 2025
-
[10]
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Lan- guage Model Agents,” in Findings of the Association for Computational Linguistics: ACL 2024 , 2024, arXiv:2403.02691
work page internal anchor Pith review arXiv 2024
-
[11]
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
E. Debenedetti, J. Zhang, M. Balunović, L. Beurer-Kellner, M. Fischer, and F . Tramèr, “AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,” in Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, 2024, spotlight; arXiv:2406.13352
work page internal anchor Pith review arXiv 2024
-
[12]
I. Evtimov, A. Zharmagambetov, A. Grattafiori, C. Guo, and K. Chaudhuri, “WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks,” arXiv preprint arXiv:2504.18575, 2025
-
[13]
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge,
S. Abdelnabi, A. Fay, A. Salem, E. Zverev, K.-C. Liao, C.-H. Liu, C.-C. Kuo, J. Weigend, D. Manlangit, A. Apostolov, H. Umair, J. Donato, M. Kawakita, A. Mahboob, T . H. Bach, T .-H. Chiang, M. Cho, H. Choi, B. Kim, H. Lee, B. Pannell, C. McCauley, M. Russinovich, A. Paverd, and G. Cherubin, “LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injec...
-
[14]
Tensor Trust: Interpretable prompt injection attacks from an online game,
S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T . Wang, I. Ong, K. Elmaaroufi, P . Abbeel, T . Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game,” in Proceedings of the 12th International Conference on Learning Representations (ICLR) , 2024, spotlight; arXiv:2311.01011. [Online]. A...
-
[15]
Defending Against Indirect Prompt Injection Attacks With Spotlighting
K. Hines, G. Lopez, M. Hall, F . Zarfati, Y . Zunger, and E. Kiciman, “Defending Against Indirect Prompt Injection Attacks With Spotlighting,” in Proceedings of the Conference on Applied Machine Learning in Information Security (CAMLIS 2024) , ser. CEUR Workshop Proceedings, vol. 3920, 2024, arXiv:2403.14720. [Online]. Available: https://ceur-ws.org/Vol-3...
work page internal anchor Pith review arXiv 2024
-
[16]
StruQ: Defending Against Prompt Injection with Structured Queries,
S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending Against Prompt Injection with Structured Queries,” in Proceed- ings of the 34th USENIX Security Symposium , 2025
work page 2025
-
[17]
Secalign: Defending against prompt injection with preference optimization
S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, D. Wagner, and C. Guo, “SecAlign: Defending Against Prompt Injection with Preference Optimization,” arXiv preprint arXiv:2410.05451, 2024, aCM CCS 2025
-
[18]
Defeating Prompt Injections by Design
E. Debenedetti, I. Shumailov, T . Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F . Tramèr, “Defeating Prompt Injections by Design,” arXiv preprint arXiv:2503.18813, 2025
work page internal anchor Pith review arXiv 2025
-
[19]
L. Beurer-Kellner, B. Buesser, A.-M. Creţu, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani, A. Paverd, F . Tramèr, and V . Volhejn, “Design Patterns for Securing LLM Agents against Prompt Injections,” arXiv preprint arXiv:2506.08837, 2025
-
[20]
PyRIT: Python Risk Identification Tool for Generative AI,
Microsoft AI Red Team, “PyRIT: Python Risk Identification Tool for Generative AI,” https://github.com/microsoft/PyRIT, 2024, accessed: 2026-05-08
work page 2024
-
[21]
garak: A Framework for Security Probing Large Language Models,
L. Derczynski, E. Galinkin, J. Martin, S. Majumdar, and N. Inie, “garak: A Framework for Security Probing Large Language Models,” https://github.com/NVIDIA/garak, 2024, accessed: 2026-05-07
work page 2024
-
[22]
promptfoo: Test Your Prompts, Agents, and RAGs,
Promptfoo, “promptfoo: Test Your Prompts, Agents, and RAGs,” https://github.com/promptfoo/promptfoo, 2024, accessed: 2026-05-07. 7 of 8 IPI-proxy
work page 2024
-
[23]
Rebuff: A Self-Hardening Prompt Injection Detector,
Protect AI, “Rebuff: A Self-Hardening Prompt Injection Detector,” https://github.com/protectai/rebuff, 2023, accessed: 2026- 05-07
work page 2023
-
[24]
Lakera AI Agent Security (formerly Lakera Guard),
Lakera, “Lakera AI Agent Security (formerly Lakera Guard),” https://www.lakera.ai/ai-agent-security , 2024–2026, accessed: 2026-05-08
work page 2024
-
[25]
mitmproxy: A free and open source interactive HTTPS proxy,
A. Cortesi, M. Hils, T . Kriechbaumer, and contributors, “mitmproxy: A free and open source interactive HTTPS proxy,” https: //www.mitmproxy.org/, 2010–2026, accessed: 2026-05-08. 8 of 8
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.