arxiv: 2605.11868 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.AI

Recognition: 1 theorem link

· Lean Theorem

IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection

Chia-Pei (Janet) Chen , Kentaroh Toyoda , Anita Lai , Alex Leung

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:37 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords indirect prompt injectionred-teamingweb-browsing AI agentsintercepting proxyAI securityprompt injection benchmarksexfiltration tracking

0 comments

The pith

IPI-proxy uses an intercepting proxy to embed prompt injection attacks in live responses from whitelisted domains for testing web-browsing AI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces IPI-proxy to close the gap between static attack benchmarks and real-world web agents that only access approved sites. It does this by intercepting and rewriting actual HTTP responses to insert hidden instructions drawn from existing benchmarks. The system supports flexible configuration of how and where to place the attacks without needing mock pages. Security teams can therefore run reproducible tests that match the retrieval conditions attackers would encounter in production. A tracker also records when attacks succeed through data exfiltration.

Core claim

The authors present IPI-proxy, an open-source toolkit whose core is an intercepting proxy that rewrites real HTTP responses from whitelisted domains in flight. It draws from a library of 820 deduplicated attack strings taken from six published benchmarks and inserts them via chosen embedding techniques and HTML locations, all controlled through a YAML harness that also logs successful exfiltrations.

What carries the argument

The intercepting proxy that rewrites live HTTP responses from approved domains to embed indirect prompt injection payloads.

Load-bearing premise

Modifying responses in flight does not cause the agent to detect the change or behave differently enough to make the test results invalid.

What would settle it

Run identical agent queries both with the proxy active and with direct access to the same domains, then check whether the agent's actions, outputs, or success rates differ in ways attributable to the proxy itself.

Figures

Figures reproduced from arXiv: 2605.11868 by Alex Leung, Anita Lai, Chia-Pei (Janet) Chen, Kentaroh Toyoda.

**Figure 1.** Figure 1: IPI-proxy architecture. The agent’s browsing traffic is routed through a mitmproxy addon, which (1) matches the destination URL [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Example test-case configuration. given. 5.5 Design Discussion A handful of design choices warrant explicit justification. Why mitmproxy rather than a browser extension? An extension would tie the harness to a specific browser and would not catch traffic from headless or non-browser tool implementations. A networklevel proxy is browser-agnostic, language-agnostic, and the same configuration tests an agent … view at source ↗

read the original abstract

Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-injection benchmarks ship pre-built adversarial pages that whitelisted agents cannot reach, and generic LLM scanners probe the model API rather than its retrieved content. We present IPI-proxy, an open-source toolkit for red-teaming web-browsing agents against indirect prompt injection (IPI). At its core is an intercepting proxy that rewrites real HTTP responses from whitelisted domains in flight, embedding payloads drawn from a unified library of 820 deduplicated attack strings extracted from six published benchmarks (BIPIA, InjecAgent, AgentDojo, Tensor Trust, WASP, and LLMail-Inject). A YAML-driven test harness independently parameterizes the payload set, the embedding technique (HTML comment, invisible CSS, or LLM-generated semantic prose), and the HTML insertion point (6 locations from \icode{head\_meta} to \icode{script\_comment}), enabling parameter-sweep evaluation without mock pages or sandboxed environments. A companion exfiltration tracker logs successful callbacks. This paper describes the threat model, situates IPI-proxy among contemporary IPI benchmarks and red-teaming tools, and details its architecture, design decisions, and configuration interface. By bridging static benchmarks and live deployment, IPI-proxy gives AI security teams a reproducible substrate for measuring and hardening web-browsing agents against indirect prompt injection on the same retrieval surface attackers exploit in production.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IPI-proxy is a practical toolkit for injecting IPI attacks into live whitelisted-domain responses, but the paper gives no data on whether those modifications stay invisible to the agent.

read the letter

The paper's main contribution is the IPI-proxy itself: an open-source intercepting proxy that pulls real HTTP responses from approved domains, rewrites them in flight with payloads from six prior benchmarks, and logs exfiltration. The YAML harness lets you pick embedding style (comment, CSS, semantic text) and insertion point across six HTML locations, which is a clean way to run parameter sweeps without building mock sites from scratch. That setup directly targets the enterprise constraint where agents are limited to whitelists, so it fills a gap that static benchmarks miss. The architecture description and threat model are clear enough that someone could implement or extend it from the text. The code and 820-string library are the real deliverables here. The soft spot is the total lack of any validation. There are no tests showing the agent parses or acts on the modified pages the same way it would on unmodified ones, no checks against CSP or SRI breakage, and no behavioral equivalence data. The central claim that this hits the exact production retrieval surface therefore rests on an unverified assumption. That concern from the stress-test note holds up; without those details the results could be invalidated by side effects on navigation or parsing. This is for red-teamers and security engineers who already work with web agents and need a ready harness rather than a theoretical advance. A practitioner could pull the repo and start using it tomorrow even if the paper stays light on evidence. I would send it to peer review so the authors can add the missing validation runs and address the transparency questions.

Referee Report

2 major / 2 minor

Summary. The paper presents IPI-proxy, an open-source intercepting proxy toolkit for red-teaming web-browsing AI agents against indirect prompt injection (IPI). It rewrites real HTTP responses from whitelisted domains in flight by embedding attack payloads drawn from a unified library of 820 deduplicated strings extracted from six published benchmarks (BIPIA, InjecAgent, AgentDojo, Tensor Trust, WASP, and LLMail-Inject). A YAML-driven harness parameterizes the payload set, embedding technique (HTML comment, invisible CSS, or semantic prose), and one of six HTML insertion points, with a companion exfiltration tracker. The manuscript describes the threat model, situates the tool among existing IPI benchmarks and red-teaming resources, and details its architecture, design decisions, and configuration interface, claiming it bridges static benchmarks and live deployment for reproducible testing on production-like retrieval surfaces.

Significance. If the proxy maintains behavioral equivalence with unmodified pages, the tool would offer a valuable, reproducible substrate for AI security teams to measure and harden agents against IPI using actual whitelisted domains rather than mock pages. Credit is due for releasing an open-source implementation, unifying 820 strings across six benchmarks into a single library, providing a flexible YAML harness for parameter sweeps, and focusing on the exact retrieval surface exploited in production deployments. This addresses a practical gap between static benchmarks and live agent red-teaming.

major comments (2)

[Architecture] Architecture section: The description of in-flight rewriting (via HTML comment, invisible CSS, or LLM-generated semantic insertion at the six points) supplies no implementation details on preserving response integrity, such as handling of CSP headers, subresource integrity (SRI), signed content, or dynamic JavaScript. This is load-bearing for the central claim that the proxy operates on the 'same retrieval surface' attackers exploit in production, as unaddressed modifications could alter navigation, parsing, or agent output and invalidate red-teaming results.
[Design Decisions] Design decisions and threat model sections: No behavioral equivalence tests, agent interaction logs, or case studies are provided to confirm that agents treat the rewritten pages identically to unmodified ones. Without such validation, the claim that IPI-proxy enables reproducible measurement 'on the same retrieval surface' cannot be assessed, directly undermining the tool's stated utility for hardening web-browsing agents.

minor comments (2)

[Payload Library] The abstract states the library contains '820 deduplicated attack strings' but provides no breakdown by source benchmark or deduplication method; adding this table or appendix would improve reproducibility.
A diagram illustrating the proxy flow, harness parameterization, and exfiltration tracker would clarify the architecture description and configuration interface.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the description of the tool's fidelity and utility.

read point-by-point responses

Referee: [Architecture] Architecture section: The description of in-flight rewriting (via HTML comment, invisible CSS, or LLM-generated semantic insertion at the six points) supplies no implementation details on preserving response integrity, such as handling of CSP headers, subresource integrity (SRI), signed content, or dynamic JavaScript. This is load-bearing for the central claim that the proxy operates on the 'same retrieval surface' attackers exploit in production, as unaddressed modifications could alter navigation, parsing, or agent output and invalidate red-teaming results.

Authors: We agree that explicit details on response integrity are needed to substantiate the claim of operating on the same retrieval surface. The proxy implementation forwards responses with targeted body modifications only, preserving most headers by default while providing configuration hooks for CSP and other policies. In the revised manuscript we will add a dedicated paragraph in the Architecture section describing header handling, SRI implications, signed content considerations, and dynamic JavaScript behavior, together with configuration options that allow users to minimize deviations from unmodified pages. revision: yes
Referee: [Design Decisions] Design decisions and threat model sections: No behavioral equivalence tests, agent interaction logs, or case studies are provided to confirm that agents treat the rewritten pages identically to unmodified ones. Without such validation, the claim that IPI-proxy enables reproducible measurement 'on the same retrieval surface' cannot be assessed, directly undermining the tool's stated utility for hardening web-browsing agents.

Authors: The manuscript presents IPI-proxy as a configurable toolkit whose primary purpose is to enable such evaluations by end users rather than to report exhaustive pre-computed results. Nevertheless, we acknowledge that initial validation would strengthen the central claim. In the revised version we will add a short 'Behavioral Equivalence' subsection under Design Decisions that includes example interaction logs from a representative open-source web-browsing agent, confirming that navigation, parsing, and content extraction remain consistent before and after proxying, while also noting edge cases where full equivalence may not hold. revision: yes

Circularity Check

0 steps flagged

No significant circularity; tool description is self-contained

full rationale

The paper describes an engineering toolkit (intercepting proxy, YAML harness, payload library) that aggregates strings from six external benchmarks without equations, fitted parameters, predictions, or derivations. Core claims rest on the independent implementation of in-flight rewriting and test parameterization rather than any self-referential reduction. External citations for payloads are not load-bearing for the proxy architecture itself, and no uniqueness theorems or ansatzes are invoked. This matches the default expectation of no circularity for a tool paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard web and AI agent assumptions without introducing free parameters, new entities, or ad-hoc axioms beyond the threat model.

axioms (2)

domain assumption Web-browsing AI agents retrieve and process HTML content from whitelisted domains in production settings.
Core premise of the threat model and proxy use case.
domain assumption Indirect prompt injection can be realized by embedding instructions in served HTML pages.
Foundation for the payload embedding techniques described.

pith-pipeline@v0.9.0 · 5610 in / 1239 out tokens · 94987 ms · 2026-05-13T05:37:22.249267+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 7 internal anchors

[1]

Model Context Protocol Specification,

Anthropic, “Model Context Protocol Specification,”https://modelcontextprotocol.io/specification/2025-06-18 , 2024, originally introduced by Anthropic in November 2024; subsequently stewarded by the Linux Foundation Agentic AI Foundation. Ac- cessed: 2026-05-08

work page 2025
[2]

EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System,

P . Reddy and A. S. Gujral, “EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System,” arXiv preprint arXiv:2509.10540, 2025, aAAI Fall Symposium Series 2025

work page arXiv 2025
[3]

OWASP Top 10 for Large Language Model Applications 2025,

OWASP Foundation, “OWASP Top 10 for Large Language Model Applications 2025,” https://genai.owasp.org/resource/ owasp-top-10-for-llm-applications-2025/ , 2025, accessed: 2026-05-07

work page 2025
[4]

OWASP Top 10 for Agentic Applications for 2026,

——, “OWASP Top 10 for Agentic Applications for 2026,” https://genai.owasp.org/resource/ owasp-top-10-for-agentic-applications-for-2026 , 2025, accessed: 2026-05-07. 6 of 8 IPI-proxy

work page 2026
[5]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T . Holz, and M. Fritz, “Not What You’ve Signed Up For: Compromising Real- World LLM-Integrated Applications with Indirect Prompt Injection,” in Proceedings of the 16th ACM Workshop on Artificial Intel- ligence and Security (AISec ’23), 2023, arXiv:2302.12173

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Ignore Previous Prompt: Attack Techniques For Language Models

F . Perez and I. Ribeiro, “Ignore Previous Prompt: Attack Techniques For Language Models,” arXiv preprint arXiv:2211.09527 , 2022, neurIPS 2022 ML Safety Workshop (Best Paper Award)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Prompt Injection attack against LLM-integrated Applications

Y . Liu, G. Deng, Y . Li, K. Wang, Z. Wang, X. Wang, T . Zhang, Y . Liu, H. Wang, Y . Zheng, L. Y . Zhang, and Y . Liu, “Prompt Injection Attack Against LLM-integrated Applications,” arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Formalizing and Benchmarking Prompt Injection Attacks and Defenses,

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and Benchmarking Prompt Injection Attacks and Defenses,” in Proceedings of the 33rd USENIX Security Symposium , 2024

work page 2024
[9]

Benchmarking and defending against indirect prompt injection attacks on large language models.arXiv preprint arXiv:2312.14197, 2025

J. Yi, Y . Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F . Wu, “Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models,” arXiv preprint arXiv:2312.14197, 2023, accepted to KDD 2025

work page arXiv 2023
[10]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Lan- guage Model Agents,” in Findings of the Association for Computational Linguistics: ACL 2024 , 2024, arXiv:2403.02691

work page internal anchor Pith review arXiv 2024
[11]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

E. Debenedetti, J. Zhang, M. Balunović, L. Beurer-Kellner, M. Fischer, and F . Tramèr, “AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents,” in Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, 2024, spotlight; arXiv:2406.13352

work page internal anchor Pith review arXiv 2024
[12]

W ASP: Benchmarking web agent security against prompt injection attacks.arXiv preprint arXiv:2504.18575, 2025

I. Evtimov, A. Zharmagambetov, A. Grattafiori, C. Guo, and K. Chaudhuri, “WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks,” arXiv preprint arXiv:2504.18575, 2025

work page arXiv 2025
[13]

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge,

S. Abdelnabi, A. Fay, A. Salem, E. Zverev, K.-C. Liao, C.-H. Liu, C.-C. Kuo, J. Weigend, D. Manlangit, A. Apostolov, H. Umair, J. Donato, M. Kawakita, A. Mahboob, T . H. Bach, T .-H. Chiang, M. Cho, H. Choi, B. Kim, H. Lee, B. Pannell, C. McCauley, M. Russinovich, A. Paverd, and G. Cherubin, “LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injec...

work page arXiv 2025
[14]

Tensor Trust: Interpretable prompt injection attacks from an online game,

S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T . Wang, I. Ong, K. Elmaaroufi, P . Abbeel, T . Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game,” in Proceedings of the 12th International Conference on Learning Representations (ICLR) , 2024, spotlight; arXiv:2311.01011. [Online]. A...

work page arXiv 2024
[15]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

K. Hines, G. Lopez, M. Hall, F . Zarfati, Y . Zunger, and E. Kiciman, “Defending Against Indirect Prompt Injection Attacks With Spotlighting,” in Proceedings of the Conference on Applied Machine Learning in Information Security (CAMLIS 2024) , ser. CEUR Workshop Proceedings, vol. 3920, 2024, arXiv:2403.14720. [Online]. Available: https://ceur-ws.org/Vol-3...

work page internal anchor Pith review arXiv 2024
[16]

StruQ: Defending Against Prompt Injection with Structured Queries,

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending Against Prompt Injection with Structured Queries,” in Proceed- ings of the 34th USENIX Security Symposium , 2025

work page 2025
[17]

Secalign: Defending against prompt injection with preference optimization

S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, D. Wagner, and C. Guo, “SecAlign: Defending Against Prompt Injection with Preference Optimization,” arXiv preprint arXiv:2410.05451, 2024, aCM CCS 2025

work page arXiv 2024
[18]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T . Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F . Tramèr, “Defeating Prompt Injections by Design,” arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review arXiv 2025
[19]

Design patterns for securing llm agents against prompt injections.arXiv preprint arXiv:2506.08837, 2025

L. Beurer-Kellner, B. Buesser, A.-M. Creţu, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani, A. Paverd, F . Tramèr, and V . Volhejn, “Design Patterns for Securing LLM Agents against Prompt Injections,” arXiv preprint arXiv:2506.08837, 2025

work page arXiv 2025
[20]

PyRIT: Python Risk Identification Tool for Generative AI,

Microsoft AI Red Team, “PyRIT: Python Risk Identification Tool for Generative AI,” https://github.com/microsoft/PyRIT, 2024, accessed: 2026-05-08

work page 2024
[21]

garak: A Framework for Security Probing Large Language Models,

L. Derczynski, E. Galinkin, J. Martin, S. Majumdar, and N. Inie, “garak: A Framework for Security Probing Large Language Models,” https://github.com/NVIDIA/garak, 2024, accessed: 2026-05-07

work page 2024
[22]

promptfoo: Test Your Prompts, Agents, and RAGs,

Promptfoo, “promptfoo: Test Your Prompts, Agents, and RAGs,” https://github.com/promptfoo/promptfoo, 2024, accessed: 2026-05-07. 7 of 8 IPI-proxy

work page 2024
[23]

Rebuff: A Self-Hardening Prompt Injection Detector,

Protect AI, “Rebuff: A Self-Hardening Prompt Injection Detector,” https://github.com/protectai/rebuff, 2023, accessed: 2026- 05-07

work page 2023
[24]

Lakera AI Agent Security (formerly Lakera Guard),

Lakera, “Lakera AI Agent Security (formerly Lakera Guard),” https://www.lakera.ai/ai-agent-security , 2024–2026, accessed: 2026-05-08

work page 2024
[25]

mitmproxy: A free and open source interactive HTTPS proxy,

A. Cortesi, M. Hils, T . Kriechbaumer, and contributors, “mitmproxy: A free and open source interactive HTTPS proxy,” https: //www.mitmproxy.org/, 2010–2026, accessed: 2026-05-08. 8 of 8

work page 2010