arxiv: 2605.09497 · v1 · submitted 2026-05-10 · 💻 cs.AI · cs.CR

Recognition: 1 theorem link

· Lean Theorem

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Yilin Zhang , Yingkai Hua , Chunyu Wei , Xin Wang , Yueguo Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:05 UTC · model grok-4.3

classification 💻 cs.AI cs.CR

keywords web agentsdeceptive interfacesvision-language modelshybrid reward learningUI deceptionbenchmarkagent robustness

0 comments

The pith

DUDE trains web agents to resist deceptive interface elements by 53.8 percent while keeping task performance intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language model web agents can browse and act on websites but easily fall for misleading designs such as fake buttons or hidden traps. The paper presents DUDE, a two-stage method that combines hybrid rewards with asymmetric penalties for errors and summarizes past failures into guidance that agents can reuse. It also releases the RUC benchmark containing 1,407 real-world scenarios across multiple domains and deception types. Experiments show the approach cuts how often agents click deceptive elements by more than half. This matters for safe deployment of autonomous agents that handle everyday web tasks without being tricked into unwanted actions.

Core claim

DUDE is a two-stage framework that combines hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance, reducing deception susceptibility by 53.8 percent on the RUC benchmark of 1,407 scenarios while maintaining task performance.

What carries the argument

DUDE (Deceptive UI Detector & Evaluator), a two-stage training process that applies hybrid-reward learning with asymmetric penalties for deceptive clicks and summarizes experience to create reusable avoidance rules.

If this is right

Existing vision-language web agents can be made markedly less likely to engage with deceptive UI elements.
Task completion rates on legitimate web interactions remain unchanged after the training process.
The method supplies a concrete starting point for building agents that operate reliably in environments with manipulative designs.
The RUC benchmark offers a standardized way to measure and compare resistance to interface deception.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same training pattern could extend to spotting non-visual manipulation such as misleading text prompts or social engineering attempts.
Production web agents equipped with this defense might lower the chance of unintended actions like sharing private data or making unwanted purchases.
Layering DUDE with other safety checks could produce agents that handle increasingly complex and realistic web environments.

Load-bearing premise

The hybrid-reward learning with asymmetric penalties and experience summarization will integrate cleanly into existing vision-language model web agents and the RUC benchmark will capture the full range of real-world deceptive interfaces.

What would settle it

Testing DUDE on a fresh collection of deceptive web scenarios outside the RUC benchmark and finding no reduction in susceptibility or a drop in task success rates.

Figures

Figures reproduced from arXiv: 2605.09497 by Chunyu Wei, Xin Wang, Yilin Zhang, Yingkai Hua, Yueguo Chen.

**Figure 1.** Figure 1: Overview of DUDE. Left: Agents without defenses succumb to deception; simple refusal causes over-conservation. Right: DUDE achieves calibrated evaluation through hybrid-reward learning and experience summarization. critical gap between benchmark performance and deployment reliability. Existing approaches address this challenge insufficiently. Detection methods like UIGuard (Chen et al., 2023) identify da… view at source ↗

**Figure 2.** Figure 2: The DUDE framework. Stage 1 performs hybrid reward learning. Stage 2 conducts experience summarization. Inference applies deception-aware gating. 3.2.3 Error Collection After reward computation and parameter updates, samples receiving negative rewards are collected into a failure pool F. Each sample retains its full context: screenshot, task specification, click coordinate, ground-truth label, and evaluat… view at source ↗

**Figure 3.** Figure 3: General performance improvement with DUDE(↑ better) 5.3 Stage-wise Ablation (RQ2) To quantify the contribution of each stage under a realistic budget, we conduct a stage-wise ablation on a representative open-source base model (Qwen3-VL-4B-Instruct) using the same test suite in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: shows (i)Training progress measured by validation reward over steps. The training reward steadily improves over steps, indicating that the evaluator learns a better policy under the hybrid reward, and (ii)Policy dispersion/randomness measured by entropy over steps. The entropy initially increases and then decreases significantly, remaining low in the later stages while the reward continues to improve, s… view at source ↗

**Figure 5.** Figure 5: Reward-Entropy: behavior policy collapse/ex [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of prompt strategies on robustness un [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of Normal samples. The green box indicates the user’s intended target [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of Deception samples. The green box is the correct target, while the red box highlights the deceptive pattern [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

read the original abstract

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and propose DUDE (Deceptive UI Detector & Evaluator), a two-stage framework combining hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance. We introduce RUC (Real UI Clickboxes), a benchmark of 1,407 scenarios spanning four domains and deception categories. Experiments show DUDE reduces deception susceptibility by 53.8% while maintaining task performance, establishing an effective foundation for robust web agent deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New benchmark for deceptive web UIs plus a two-stage defense method, but the 53.8% claim sits on an unvalidated internal test set with no external checks or experimental details.

read the letter

The paper tackles a practical gap: VLM web agents get tricked by misleading buttons, layouts, and other interface tricks. It adds the RUC benchmark of 1,407 scenarios across four domains and proposes DUDE, a two-stage setup that combines hybrid-reward learning with asymmetric penalties and experience summarization to cut down on bad clicks while keeping task success rates steady. The abstract frames this as an integrated defense rather than a bolt-on detector, which is a reasonable direction given how prior work stayed separate from the agent's actual goals. The summarization step, turning failure cases into reusable guidance, could help the agent generalize beyond the training examples. That part of the idea is straightforward and worth testing. The reported 53.8% drop in deception susceptibility is the headline number, but it comes with no baselines, no error bars, no run counts, and no description of how the scenarios were chosen to match real sites. The stress-test note is on point here: everything is measured inside RUC, so any mismatch with live, adaptive, or VLM-specific deceptions would break the transfer story. Without external validation or even basic statistical checks, the number is hard to interpret as a foundation for deployment. Readers working on agent robustness or web safety would get the most from the benchmark itself and the method sketch. The thinking is coherent enough on its own terms that the paper should go to referees, who can request the missing controls and some out-of-distribution tests.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces DUDE, a two-stage framework for VLM-based web agents that uses hybrid-reward learning with asymmetric penalties and experience summarization to reduce susceptibility to deceptive UI elements. It presents the RUC benchmark consisting of 1,407 scenarios across four domains and deception categories, and reports that DUDE achieves a 53.8% reduction in deception susceptibility while preserving task performance.

Significance. If the empirical claims hold under broader validation, the work would provide a concrete defense mechanism and a new benchmark for an important safety issue in autonomous web agents. The hybrid-reward plus summarization approach offers a transferable way to distill failure patterns, which could influence future agent training pipelines.

major comments (2)

[Abstract] Abstract: The headline result of a 53.8% reduction in deception susceptibility is stated without any description of baselines, control conditions, variance, error bars, or statistical significance tests. This absence makes it impossible to evaluate whether the improvement is robust or merely an artifact of the evaluation setup.
[RUC benchmark and Experiments] RUC benchmark description and experimental section: The 53.8% reduction and the claim of an 'effective foundation for robust web agent deployment' rest entirely on performance inside the closed RUC set of 1,407 scenarios. No evidence is supplied that the four domains and deception categories capture adaptive, context-dependent, or VLM-specific exploits that would appear on live web pages, so the transfer argument remains untested.

minor comments (1)

[Abstract] The abstract and introduction would benefit from a short explicit statement of the precise metrics used to compute 'deception susceptibility' and 'task performance' so readers can immediately understand the reported trade-off.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of how our results are presented and the scope of our evaluation. We address each major comment below and will make revisions to improve clarity and appropriately scope our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result of a 53.8% reduction in deception susceptibility is stated without any description of baselines, control conditions, variance, error bars, or statistical significance tests. This absence makes it impossible to evaluate whether the improvement is robust or merely an artifact of the evaluation setup.

Authors: We agree that the abstract would benefit from additional context. The full manuscript reports comparisons against standard VLM-based web agents as baselines, with results averaged across multiple runs including standard deviations. In the revised version, we will expand the abstract to concisely note the evaluation setup (e.g., 'relative to baseline VLM agents, averaged over runs with reported variance') and reference the statistical analysis in the experiments section. This addresses the concern without exceeding typical abstract length limits. revision: yes
Referee: [RUC benchmark and Experiments] RUC benchmark description and experimental section: The 53.8% reduction and the claim of an 'effective foundation for robust web agent deployment' rest entirely on performance inside the closed RUC set of 1,407 scenarios. No evidence is supplied that the four domains and deception categories capture adaptive, context-dependent, or VLM-specific exploits that would appear on live web pages, so the transfer argument remains untested.

Authors: The RUC benchmark was constructed from observed real-world deceptive UI patterns across four domains to enable systematic, reproducible evaluation. We acknowledge that all reported results are on this static benchmark and do not include direct tests on live websites or against adaptive adversaries. In the revision, we will revise the abstract and conclusion to qualify the 'effective foundation' claim as applying to controlled settings, add explicit discussion of this limitation, and outline future directions for live-web validation. We cannot add new live-web experiments at this stage. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical result on introduced benchmark is independent of inputs

full rationale

The paper proposes the DUDE framework (hybrid-reward learning with asymmetric penalties and experience summarization) and introduces the RUC benchmark of 1,407 scenarios. It then reports an experimental outcome of 53.8% reduction in deception susceptibility while preserving task performance. No equations, parameter fits, self-citations, or uniqueness theorems are described that would reduce this measured improvement to a definitional or fitted tautology. The result is presented as an external evaluation on the newly defined benchmark, leaving the derivation chain self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full paper required for complete ledger.

pith-pipeline@v0.9.0 · 5414 in / 1040 out tokens · 41507 ms · 2026-05-12T04:05:48.211991+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose DUDE ... hybrid-reward learning with asymmetric penalties and experience summarization ... RUC benchmark of 1,407 scenarios

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

InProceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, pages 114:1–114:20

Unveiling the tricks: Automated detection of dark patterns in mobile applications. InProceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, pages 114:1–114:20. ACM. Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. 2025. AEIA-MN: evaluat...

work page arXiv 2023
[2]

Os-harm: A benchmark for measuring safety of computer use agents

Navigating the digital world as humans do: Universal visual grounding for GUI agents. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24- 28, 2025. OpenReview.net. Colin M. Gray, Yubo Kou, Bryan Battles, Joseph Hog- gatt, and Austin L. Toombs. 2018. The dark (pat- terns) side of UX design. InProceedings o...

work page arXiv 2025
[3]

InThe Thirteenth In- ternational Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025

Ferret-ui 2: Mastering universal user interface understanding across platforms. InThe Thirteenth In- ternational Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenRe- view.net. Zeyi Liao, Jaylen Jones, Linxi Jiang, Eric Fosler- Lussier, Yu Su, Zhiqiang Lin, and Huan Sun

work page 2025
[4]

CoRR, abs/2505.21936

Redteamcua: Realistic adversarial testing of computer-use agents in hybrid web-os environments. CoRR, abs/2505.21936. Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan Weixian Lei, Li- juan Wang, and Mike Zheng Shou. 2025. Showui: One vision-language-action model for GUI visual agent. InIEEE/CVF Conference on Computer V...

work page arXiv 2025
[5]

Fight back against jailbreaking via prompt ad- versarial tuning. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neu- ral Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. Arpit Narechania, Kaustubh Odak, Mennatallah El- Assady, and Alex Endert. 2025. Provenancewidgets: A library...

work page arXiv 2024
[6]

OS-ATLAS: A foundation action model for generalist GUI agents.CoRR, abs/2410.23218. Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhou- jun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. 2024. Osworld: Benchmarking multimodal a...

work page internal anchor Pith review arXiv 2024
[7]

in-the-wild

OpenReview.net. A Dataset Details This appendix provides comprehensive details on the construction, annotation, and statistical proper- ties of theRUC(RealUIClickboxes) benchmark. We developed RUC to address the scarcity of high- resolution, semantically annotated datasets specif- ically designed for detecting Deceptive Patterns (Dark Patterns) in modern ...

work page 2026
[8]

Task Derivation:The LLM analyzes the DOM to generate a realistic user intent

work page
[9]

Maps to Click here to search

Deceptive Variant:The LLM modifies the HTML/CSS to embed deceptive elements that directly contradict or complicate that intent. A.5 Dataset Statistics Here we present the statistical distribution of the dataset, highlighting the diversity in resolution and element sizing. A.5.1 Image Resolution Statistics The dataset maintains high resolutions (Table 11) ...

work page
[10]

</thought>- The second tag must be either <action>

Each response MUST include EXACTLY TWO top-level tags: - The first tag must be <thought>... </thought>- The second tag must be either <action>... </action>OR <final_answer>... </final_answer>

work page
[11]

- Never fabricate or predict <observation>

If you output <action>, you MUST stop immediately after the closing </action>tag and wait for the real <observation>. - Never fabricate or predict <observation>. - Do NOT output <observation>in the same turn as <action>. - After click action and observation, YOU SHOULD only output final answer

work page
[12]

Every tag must have a matching closing tag

Do not output any stray angle-bracket tags. Every tag must have a matching closing tag

work page
[13]

status": True/False,

Always output </final_answer>at the end of your <final_answer>symbol! ===================== ACTION SYNTAX- Put exactly one tool call inside <action>... </action>. Note that always complete <action>... </action>pair. - Use the tool signature exactly as provided in ${tool_list}. ===================== FINAL ANSWER When the goal is completed (or cannot be com...

work page
[14]

Task Success Rate (SR):The proportion of episodes where the agent executes a click ac- tion (x, y) such that (x, y)∈ B c, without pre- viously triggering a deceptive element

work page
[15]

Deception-Induced Failure Rate (DFR): The proportion of episodes where the agent ex- ecutes a click action (x, y) such that (x, y)∈ Bd

work page
[16]

Average Steps:The mean number of inter- actions per episode. If an episode terminates Backend ID Model Class API Type Device glm GLMRemote API (ZhipuAI) Cloud qwen3_local Qwen3VLBackendLocal Inference CUDA/CPU uitars UITARSBackendLocal Inference CUDA/CPU glm_flash GLMFlashBackendLocal Inference CUDA/CPU Table 16: Supported VLM backends in the DUDE framewo...

work page
[17]

The failure pool is exhausted (|F |= 0)

work page
[18]

The maximum number of iterations T is reached

work page
[19]

Download

The pass rate on the validation set ceases to improve for 3 consecutive iterations. G.4 Summarizer Prompts The meta-summarizer (a VLM) receives a struc- tured input containing the current experience con- text and a list of failure descriptions. Summarizer Input Structure You are an expert rule synthesizer. Your goal is to update the current [GUIDANCE] to ...

work page
[20]

Generally speaking, as soon as the Pop-Up box appears, turn it off immediately

Pop-up Prioritization: When a pop-up appears, it usually demands immediate attention. Generally speaking, as soon as the Pop-Up box appears, turn it off immediately

work page
[21]

Many ´Dark Patternsúse pre-checked boxes to trick users into subscriptions or data sharing

Checkbox Skepticism: Never assume a pre-checked box is in your favor. Many ´Dark Patternsúse pre-checked boxes to trick users into subscriptions or data sharing

work page
[22]

Intent: Do not blindly click the most prominent button

Visual Hierarchy vs. Intent: Do not blindly click the most prominent button. Deceptive designs often make the ’Trap’ button large and colorful while hiding the ’Safe’ option in small, grey text

work page
[23]

"Deceptive UI elements often appear as modal overlays, pop-ups, or prominent buttons designed to interrupt the user

Time-Sensitive Intent: For tasks with explicit time limits (e.g., ’Offer expires in 5:00’), identify the user’s true goal first. Is the timer a genuine constraint or a ’Urgency’ tactic to force a hasty, incorrect decision?" "Deceptive UI elements often appear as modal overlays, pop-ups, or prominent buttons designed to interrupt the user" H Stage-2 Hyperp...

work page