arxiv: 2605.05509 · v1 · submitted 2026-05-06 · 💻 cs.CR

Recognition: unknown

WAAA! Web Adversaries Against Agentic Browsers

Sohom Datta , Alex Nahapetyan , William Enck , Alexandros Kapravelos

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:04 UTC · model grok-4.3

classification 💻 cs.CR

keywords agentic browsersweb securityLLM agentsprompt injectionthreat modelbrowser vulnerabilitiesconfused deputyweb attacks

0 comments

The pith

Agentic browsers exhibit five major failure modes against traditional web attacks and LLM threats, requiring rearchitecture to handle the current web safely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that browsers enhanced with large language models to act autonomously on the web create new security problems by exposing agents to both classic web attacks designed for humans and newer LLM-specific manipulations. It builds a threat model that treats the agent as a confused deputy, derives a taxonomy of 20 attacks, implements 18 of them, and demonstrates that ten traditional web threats reappear in stronger forms. A sympathetic reader would care because these systems are already being deployed, yet they cannot reliably tell malicious page content from legitimate task instructions. Testing shows the problems appear consistently across four different LLM models from multiple vendors.

Core claim

Agentic browsers exhibit five major failure modes when facing traditional and LLM web threats. The work extends the See→Act model to cover all browser components and frames the agent as a confused deputy unable to distinguish task steps from attacks. A taxonomy of 20 attacks is derived, 18 are implemented, and a generalizability study on 14 attacks across four LLMs shows that ten web threats reemerge often in amplified forms, proving that current designs are not ready for the live web.

What carries the argument

The extended See→Act browser agent model, which accounts for all browser components and frames the agent as a confused deputy unable to separate legitimate task steps from untrusted web content.

If this is right

Ten classic web threats, including social engineering attacks, return in amplified forms once an agent can be influenced by untrusted page content.
The attacks reproduce across four major LLM models spanning multiple vendors.
Agentic browsers must be rearchitected before they can be considered ready for the current web.
The confused deputy framing reveals that agents cannot reliably separate task instructions from malicious content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers of agentic browsers may need to add independent verification layers that check page trustworthiness before executing actions.
Similar confused-deputy problems could appear in other LLM-controlled interfaces that interact with untrusted external data.
Web standards might eventually need explicit signals that help agents detect when content is trying to hijack their goals.

Load-bearing premise

The 18 implemented attacks plus the tests of 14 attacks on four LLMs are assumed to represent real-world agent behavior on live websites, and the extended See→Act model is assumed to capture every relevant browser interaction without omissions.

What would settle it

A working agentic browser that completes representative user tasks on live websites without triggering any of the five failure modes or falling for the 20 attacks in the taxonomy would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.05509 by Alexandros Kapravelos, Alex Nahapetyan, Sohom Datta, William Enck.

**Figure 1.** Figure 1: A demonstration of indirect prompt injection vs traditional web attacks view at source ↗

**Figure 2.** Figure 2: Agentic browsers under the See→Act model Definition (Agentic browser). We define a agentic browser as a high-level system that consists of an LLM and a browser, with the LLM given access to a set of tools to interact with the user, the website, and the browser itself. There are typically two kinds of agentic browser products: (1) where the LLM cannot invoke any tools to interact with the browser and only p… view at source ↗

**Figure 3.** Figure 3: Agentic browsers in our threat model within the view at source ↗

**Figure 4.** Figure 4: Taxonomy of agentic browsers. against indirect prompt injections, rather than safeguarding capabilities away from a traditional web attacker’s goals. 5.3 Broad Failure Modes Using our taxonomy, we identify 5 broad, web-browser-focused failure modes that can be used to design defenses against a traditional web adversary targeting an agentic browser. These failure modes are as follows: 5.3.1 Agents Bridge … view at source ↗

**Figure 5.** Figure 5: System prompt used for proof-of-concepts view at source ↗

**Figure 6.** Figure 6: The obfuscated JavaScript for the XS-4 that was view at source ↗

read the original abstract

Large language models (LLMs) are increasingly being integrated into web browsers to create agentic browsing systems that execute actions on behalf of the user. Prior work considering the security of agentic browsers focuses exclusively on indirect prompt-injection attacks. However, by failing to consider traditional web attacks, previous agentic browser threat models have a blind spot to web social engineering attacks originally designed to trick humans. In this paper, we propose the first web-focused threat model for agentic browsers and use it to derive a taxonomy of 20 attacks across both the web and LLM space, and implement 18 of the attacks. Our threat model extends the original See$\rightarrow$Act browser agent model to account for all components of a browser, and frames the agent as a confused deputy unable to distinguish task steps from traditional web attacks. We show that 10 web threats can reemerge often in amplified forms once an agent can be influenced by untrusted page content. We further conduct a generalizability study on 14 of the 20 attacks, showing that our attacks reproduce across 4 major LLM models spanning multiple vendors. We show that agentic browsers exhibit five major failure modes when facing traditional and LLM web threats, demonstrating the need to rearchitect agentic browsers before they are ready for the current web.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper usefully extends agentic browser threat models to traditional web attacks and delivers a concrete taxonomy plus implementations, but the five failure modes rest on a narrow set of tests that may not yet justify broad rearchitecture claims.

read the letter

The core contribution here is showing that agentic browsers are exposed to classic web attacks like social engineering and UI deception, not just prompt injection. The authors extend the See→Act model to cover full browser components and produce a taxonomy of 20 attacks, then implement 18 and reproduce 14 across four LLMs from different vendors. That reproduction step is the part that stands out as actual work rather than just modeling.

Referee Report

3 major / 2 minor

Summary. The paper proposes the first web-focused threat model for agentic browsers by extending the See→Act model to treat the agent as a confused deputy. It derives a taxonomy of 20 attacks spanning web and LLM threats (implementing 18), shows that 10 traditional web threats reemerge in amplified forms, conducts a generalizability study reproducing 14 attacks across 4 LLMs, identifies five major failure modes, and concludes that agentic browsers require rearchitecting before deployment on the current web.

Significance. If the empirical results hold under more detailed scrutiny, the work is significant for bridging traditional web security and LLM agent threats in browsers. The implementation of 18 attacks and cross-model reproduction on 4 LLMs provide concrete, falsifiable examples of vulnerabilities that could influence secure design of agentic systems. This is a timely contribution given the rapid integration of LLMs into browsers.

major comments (3)

[Threat Model] Threat Model section: the extension of the See→Act model to account for all browser components (security policies, rendering, consent flows) is central to framing the confused deputy and deriving the taxonomy. The paper does not detail how these components are modeled or integrated, raising the possibility that the five failure modes are artifacts of an incomplete model rather than inherent to agentic browsers.
[Generalizability Study] Generalizability Study section: the reproduction of 14 attacks across 4 LLMs is used to support the five failure modes and the rearchitecting claim. Without full methods, controls, exact success metrics, or raw data, it is not possible to confirm that the modes are general rather than setup-specific, directly undermining the central claim as noted in the soundness assessment.
[Evaluation] Evaluation of Implemented Attacks: the claim that 10 web threats 'reemerge often in amplified forms' is load-bearing for the taxonomy and conclusion. No quantitative comparison (e.g., success rates or amplification factors versus non-agent baselines) is provided to substantiate 'amplified' or the broad need for rearchitecting.

minor comments (2)

[Abstract] Abstract: the five failure modes are referenced but not enumerated, reducing clarity for readers skimming the contribution.
[Taxonomy] The taxonomy derivation process could be more explicitly tied to the threat model with a table or diagram showing how each attack maps to See→Act components.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments and the opportunity to clarify and strengthen our manuscript. We address each of the major comments point-by-point below.

read point-by-point responses

Referee: [Threat Model] Threat Model section: the extension of the See→Act model to account for all browser components (security policies, rendering, consent flows) is central to framing the confused deputy and deriving the taxonomy. The paper does not detail how these components are modeled or integrated, raising the possibility that the five failure modes are artifacts of an incomplete model rather than inherent to agentic browsers.

Authors: The See→Act extension is presented in Section 3, where we explicitly incorporate browser components such as security policies, rendering, and consent flows into the agent's decision process to frame it as a confused deputy. This modeling directly informs the taxonomy by showing how attacks can exploit these interfaces. To address the concern about potential artifacts, we will provide a more detailed diagram and step-by-step integration description in the revised manuscript to demonstrate that the failure modes arise from the inherent architecture rather than modeling gaps. revision: partial
Referee: [Generalizability Study] Generalizability Study section: the reproduction of 14 attacks across 4 LLMs is used to support the five failure modes and the rearchitecting claim. Without full methods, controls, exact success metrics, or raw data, it is not possible to confirm that the modes are general rather than setup-specific, directly undermining the central claim as noted in the soundness assessment.

Authors: We agree that the current description of the generalizability study lacks sufficient detail for full reproducibility. In the revised manuscript, we will expand the Generalizability Study section to include complete methods, experimental controls, precise success metrics (e.g., attack success rate thresholds), and we will release the raw data and prompts used in the experiments via an open repository. This will enable verification that the five failure modes are consistent across the tested LLMs. revision: yes
Referee: [Evaluation] Evaluation of Implemented Attacks: the claim that 10 web threats 'reemerge often in amplified forms' is load-bearing for the taxonomy and conclusion. No quantitative comparison (e.g., success rates or amplification factors versus non-agent baselines) is provided to substantiate 'amplified' or the broad need for rearchitecting.

Authors: The observation that 10 web threats reemerge in amplified forms is based on the successful implementation of 18 attacks and the analysis showing that agentic execution removes human oversight, leading to higher success and impact in cases like automated credential theft or content manipulation. While we did not include direct quantitative baselines in the original submission, we will add a comparative evaluation in the revised paper, including success rate comparisons drawn from prior web security literature for non-agentic scenarios where applicable, to better substantiate the amplification claim and the rearchitecting recommendation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical attack implementations and taxonomy derivation are independent of inputs.

full rationale

The paper extends the See→Act model to create a threat model, derives a taxonomy of 20 attacks from it, implements 18 attacks, and reproduces 14 across 4 LLMs to identify five failure modes. These steps consist of concrete engineering, attack construction, and empirical testing rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. The central claim that agentic browsers exhibit the failure modes rests on the observed behavior of the implemented attacks, which are falsifiable outside the paper and do not reduce to the threat model by construction. No equations or uniqueness theorems are invoked in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical security paper with no mathematical derivations, free parameters, or invented physical entities; the threat model and taxonomy are conceptual constructs grounded in implemented attacks.

pith-pipeline@v0.9.0 · 5535 in / 1082 out tokens · 38405 ms · 2026-05-08T16:04:15.506173+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 19 canonical work pages · 1 internal anchor

[1]

d.].Blog | Windsurf

[n. d.].Blog | Windsurf. https://windsurf.com/blog
[2]

d.].Cursor Docs

Cursor Documentation [n. d.].Cursor Docs. Cursor Documentation. https: //cursor.com/docs
[3]

d.].Dia Browser | AI Chat With Your Tabs

Dia Browser [n. d.].Dia Browser | AI Chat With Your Tabs. Dia Browser. https: //www.diabrowser.com
[4]

d.].GitHub Copilot·Your AI Pair Programmer

GitHub [n. d.].GitHub Copilot·Your AI Pair Programmer. GitHub. https: //github.com/features/copilot
[5]

d.].Introducing Claude Sonnet 4.5

[n. d.].Introducing Claude Sonnet 4.5. https://www.anthropic.com/news/claude- sonnet-4-5
[6]

d.].Microsoft/Playwright-Mcp

Microsoft [n. d.].Microsoft/Playwright-Mcp. Microsoft. https://github.com/ microsoft/playwright-mcp
[7]

d.].Piloting Claude for Chrome

[n. d.].Piloting Claude for Chrome. https://www.anthropic.com/news/claude- for-chrome
[8]

https://www

2024.Remove Polyfill.Io Code from Your Website Immediately. https://www. theregister.com/2024/06/25/polyfillio_china_crisis/

2024
[9]

https://brave.com/blog/comet-prompt-injection/

2025. https://brave.com/blog/comet-prompt-injection/

2025
[10]

https://brave.com/blog/unseeable-prompt-injections/

2025. https://brave.com/blog/unseeable-prompt-injections/

2025
[11]

https://neuraltrust.ai/blog/openai-atlas-omnibox-prompt-injection

2025. https://neuraltrust.ai/blog/openai-atlas-omnibox-prompt-injection

2025
[12]

Microsoft Copi- lot

Microsoft Copilot 2025.AI Browser: Copilot Mode in Edge. Microsoft Copi- lot. https://www.microsoft.com/en-us/microsoft-copilot/for-individuals/do- more-with-ai/ai-for-daily-life/ai-browser-innovation-with-copilot-in-edge

2025
[13]

Browseros-Ai/BrowserOS

2025. Browseros-Ai/BrowserOS. BrowserOS

2025
[14]

https://www.claude.com/product/claude-code

2025.Claude Code | Claude. https://www.claude.com/product/claude-code

2025
[15]

https://www.perplexity.ai/comet/

2025.Comet Browser: A Personal AI Assistant. https://www.perplexity.ai/comet/

2025
[16]

https: //www.google.com/chrome/ai-innovations/

2025.Gemini in Chrome | The next Generation of AI in Chrome | Chrome. https: //www.google.com/chrome/ai-innovations/

2025
[17]

Introducing ChatGPT Agent: Bridging Research and Action

2025. Introducing ChatGPT Agent: Bridging Research and Action. https: //openai.com/index/introducing-chatgpt-agent/

2025
[18]

https://openai.com/index/introducing-chatgpt- atlas/

2025.Introducing ChatGPT Atlas. https://openai.com/index/introducing-chatgpt- atlas/

2025
[19]

Google DeepMind

Google DeepMind 2025.Project Mariner. Google DeepMind. https://deepmind. google/models/project-mariner/

2025
[20]

Steel-Dev/Awesome-Web-Agents

2025. Steel-Dev/Awesome-Web-Agents. Steel

2025
[21]

Devdatta Akhawe, Adam Barth, Peifung E Lam, John Mitchell, and Dawn Song
[22]

In2010 23rd IEEE Computer Security Foundations Symposium

Towards a formal foundation of web security. In2010 23rd IEEE Computer Security Foundations Symposium. IEEE, 290–304
[23]

Adam Barth, Collin Jackson, Charles Reis, TGC Team, et al. 2008. The security architecture of the chromium browser. InTechnical report. Stanford University

2008
[24]

Microsoft Corporate Blogs. [n. d.]. Introducing NLWeb: Bringing Conversational Interfaces Directly to the Web
[25]

Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, and Krishnamurthy Dvijotham. 2025. DoomArena: A Framework for Testing AI Agents Against Evolving Security Threats. https: //doi.org/10.48550/arXiv.2504.14064 arXiv:2504.14064 [cs]

work page doi:10.48550/arxiv.2504.14064 2025
[26]

CoRR abs/2502.20383(2025) PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization 17

Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, and Yizheng Chen. 2025.Why Are Web AI Agents More Vulnerable Than Stan- dalone LLMs? A Security Analysis. https://doi.org/10.48550/arXiv.2502.20383 arXiv:2502.20383 [cs]

work page doi:10.48550/arxiv.2502.20383 2025
[27]

Marco Cova, Christopher Kruegel, and Giovanni Vigna. 2010. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). Association for Computing Machinery, New York, NY, USA, 281–290. https://doi.org/10.1145/1772690.1772720

work page doi:10.1145/1772690.1772720 2010
[28]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2Web: Towards a Generalist Agent for the Web. https://doi.org/10.48550/arXiv.2306.06070 arXiv:2306.06070 [cs]

work page doi:10.48550/arxiv.2306.06070 2023
[29]

W ASP: Benchmarking web agent security against prompt injection attacks.arXiv preprint arXiv:2504.18575, 2025

Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, and Ka- malika Chaudhuri. 2025.W ASP: Benchmarking Web Agent Security Against Prompt Injection Attacks. https://doi.org/10.48550/arXiv.2504.18575 arXiv:2504.18575 [cs]

work page doi:10.48550/arxiv.2504.18575 2025
[30]

Fellou. [n. d.].Fellou Browser 2.0: Faster, More Amazing, and More Reliable than Ever.https://fellou.ai/blog/fellou-v2-launch/
[31]

Firefox. [n. d.]. Access AI Chatbots in Firefox. ([n. d.]). https://support.mozilla. org/en-US/kb/ai-chatbot#w_what-to-keep-in-mind-when-using-ai-chatbots
[32]

Anny Gakhokidze and Neha Kochar. 2021. Introducing Site Isolation in Firefox

2021
[33]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed up for: Compromising Real- World Llm-Integrated Applications with Indirect Prompt Injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security(2023). 79–90

2023
[34]

Wang, Stuart Schecter, and Collin Jackson

Lin-Shung Huang, Alex Moshchuk, Helen J. Wang, Stuart Schecter, and Collin Jackson. [n. d.]. Clickjacking: Attacks and Defenses. 413–
[35]

https://www.usenix.org/conference/usenixsecurity12/technical- sessions/presentation/huang
[36]

Lukas Knittel, Christian Mainka, Marcus Niemietz, Dominik Trevor Noß, and Jörg Schwenk. 2021. XSinator.Com: From a Formal Model to the Automatic Evaluation of Cross-Site Leaks in Web Browsers. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA, 1771–1788...

work page doi:10.1145/3460120.3484739 2021
[37]

Pierre Laperdrix, Oleksii Starov, Quan Chen, Alexandros Kapravelos, and Nick Nikiforakis. 2021. Fingerprinting in Style: Detecting Browser Extensions via Injected Style Sheets. InProceedings of the USENIX Security Symposium. 2507– 2524

2021
[38]

Inala, Chenglong Wang, Steven M

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. 2025.EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage. https://doi.org/10.48550/arXiv.2409. 11295 arXiv:2409.11295 [cs]

work page doi:10.48550/arxiv.2409 2025
[39]

Jungwon Lim, Yonghwi Jin, Mansour Alharthi, Xiaokuan Zhang, Jinho Jung, Rajat Gupta, Kuilin Li, Daehee Jang, and Taesoo Kim. 2021. SOK: On the Analysis of Web Browser Security. https://doi.org/10.48550/arXiv.2112.15561 arXiv:2112.15561 [cs]

work page doi:10.48550/arxiv.2112.15561 2021
[40]

Jungwon Lim, Yonghwi Jin, Mansour Alharthi, Xiaokuan Zhang, Jinho Jung, Rajat Gupta, Kuilin Li, Daehee Jang, and Taesoo Kim. 2021. SOK: On the Analysis of Web Browser Security. arXiv:2112.15561 [cs.CR] https://arxiv.org/abs/2112.15561

work page arXiv 2021
[41]

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, and Hai Zhao. [n. d.].Caution for the Environment: Multimodal LLM Agents Are Susceptible to Environmental Distractions. https://doi.org/10.48550/arXiv.2408. 02544 arXiv:2408.02544 [cs]

work page doi:10.48550/arxiv.2408
[42]

Mathur, G

Arunesh Mathur, Gunes Acar, Michael J. Friedman, Eli Lucherini, Jonathan Mayer, Marshini Chetty, and Arvind Narayanan. 2019. Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites.Proc. ACM Hum.-Comput. Interact.3, CSCW (Nov. 2019), 81:1–81:32. https://doi.org/10.1145/3359183

work page doi:10.1145/3359183 2019
[43]

2024.Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Itay Nakash, George Kour, Guy Uziel, and Ateret Anaby-Tavor. 2024.Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In. https://doi.org/10.48550/ arXiv.2410.16950 arXiv:2410.16950 [cs]

work page arXiv 2024
[44]

Adam Oest, Penghui Zhang, Brad Wardman, Eric Nunes, Jakub Burgis, Ali Zand, Kurt Thomas, Adam Doupé, and Gail-Joon Ahn. 2020. Sunrise to Sunset: Analyz- ing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale. In 29th USENIX Security Symposium (USENIX Security 20). 361–377

2020
[45]

Harun Oz, Ahmet Aris, Abbas Acar, Güliz Seray Tuncay, Leonardo Babun, and Selcuk Uluagac. 2023. RøB: Ransomware over Modern Web Browsers. In32nd USENIX Security Symposium (USENIX Security 23). 7073–7090

2023
[46]

Nikolaos Pantelaios and Alexandros Kapravelos. 2024. FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques. In33rd USENIX Security Symposium (USENIX Security 24). 3747–3764

2024
[47]

Qwen Team. 2026. Qwen3.6-Plus: Towards Real World Agents. https://qwen.ai/ blog?id=qwen3.6

2026
[48]

Charles Reis, Alexander Moshchuk, and Nasko Oskov. 2019. Site Isolation: Process Separation for Web Sites within the Browser. In28th USENIX Security Symposium (USENIX Security 19). 1661–1678. 13 ACM CCS ’26, June 03–05, 2018, Woodstock, NY Anonymous Author(s)

2019
[49]

Ax Sharma. [n. d.].Third Npm Protestware: ’event-Source-Polyfill’ Calls Russia Out. BleepingComputer. https://www.bleepingcomputer.com/news/security/third- npm-protestware-event-source-polyfill-calls-russia-out/
[50]

Opera Software. [n. d.].Opera Neon. This Browser Is Built to Act.Opera Neon. https://operaneon.com
[51]

Jeffrey Spaulding, DaeHun Nyang, and Aziz Mohaisen. 2017. Understanding the Effectiveness of Typosquatting Techniques. InProceedings of the Fifth ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb ’17). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/ 3132465.3132467

work page arXiv 2017
[52]

Kevin Stubbings. 2024. Attacking Browser Extensions

2024
[53]

Antoine Vastel, Walter Rudametkin, Romain Rouvoy, and Xavier Blanc. 2020. FP- Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. InMADWeb’20 - NDSS Workshop on Measurements, Attacks, and Defenses for the Web, Oleksii Starov, Alexandros Kapravelos, and Nick Nikiforakis (Eds.). San Diego, United States. https://doi.org/10.14722/n...

work page doi:10.14722/ndss.2020.23xxx 2020
[54]

Michelle Warburg. 2025. LayerX Finds that Perplexity’s Comet Browser is Up To 85% More Vulnerable to Phishing and Web Attacks Than Chrome. https://layerxsecurity.com/blog/layerx-finds-that-perplexitys-comet-browser- is-up-to-85-more-vulnerable-to-phishing-and-web-attacks-than-chrome/

2025
[55]

Quantum error thresholds for gauge-redundant digitiza- tions of lattice field theories

Fangzhou Wu, Shutong Wu, Yulong Cao, and Chaowei Xiao. 2024.WIPI: A New Web Threat for LLM-Driven Web Agents. https://doi.org/10.48550/arXiv.2402. 16965 arXiv:2402.16965 [cs]

work page doi:10.48550/arxiv.2402 2024
[56]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. [n. d.].ReAct: Synergizing Reasoning and Acting in Language Models. https://doi.org/10.48550/arXiv.2210.03629 arXiv:2210.03629 [cs]

work page internal anchor Pith review doi:10.48550/arxiv.2210.03629
[57]

Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, and Ninghui Li. 2025. BrowseSafe: Understanding and Preventing Prompt Injec- tion Within AI Browser Agents. arXiv:2511.20597 [cs.LG] https://arxiv.org/abs/ 2511.20597

work page arXiv 2025
[58]

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. 2024. GPT-4V(ision) is a Generalist Web Agent, if Grounded. arXiv:2401.01614 [cs.IR] https://arxiv. org/abs/2401.01614

work page arXiv 2024
[59]

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. https://doi.org/10.48550/arXiv.2307.13854 arXiv:2307.13854 [cs] A Open Science We will release the code for our proof-of-co...

work page Pith review doi:10.48550/arxiv.2307.13854 2024