Recognition: 1 theorem link
· Lean TheoremDon't Click That: Teaching Web Agents to Resist Deceptive Interfaces
Pith reviewed 2026-05-12 04:05 UTC · model grok-4.3
The pith
DUDE trains web agents to resist deceptive interface elements by 53.8 percent while keeping task performance intact.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DUDE is a two-stage framework that combines hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance, reducing deception susceptibility by 53.8 percent on the RUC benchmark of 1,407 scenarios while maintaining task performance.
What carries the argument
DUDE (Deceptive UI Detector & Evaluator), a two-stage training process that applies hybrid-reward learning with asymmetric penalties for deceptive clicks and summarizes experience to create reusable avoidance rules.
If this is right
- Existing vision-language web agents can be made markedly less likely to engage with deceptive UI elements.
- Task completion rates on legitimate web interactions remain unchanged after the training process.
- The method supplies a concrete starting point for building agents that operate reliably in environments with manipulative designs.
- The RUC benchmark offers a standardized way to measure and compare resistance to interface deception.
Where Pith is reading between the lines
- The same training pattern could extend to spotting non-visual manipulation such as misleading text prompts or social engineering attempts.
- Production web agents equipped with this defense might lower the chance of unintended actions like sharing private data or making unwanted purchases.
- Layering DUDE with other safety checks could produce agents that handle increasingly complex and realistic web environments.
Load-bearing premise
The hybrid-reward learning with asymmetric penalties and experience summarization will integrate cleanly into existing vision-language model web agents and the RUC benchmark will capture the full range of real-world deceptive interfaces.
What would settle it
Testing DUDE on a fresh collection of deceptive web scenarios outside the RUC benchmark and finding no reduction in susceptibility or a drop in task success rates.
Figures
read the original abstract
Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and propose DUDE (Deceptive UI Detector & Evaluator), a two-stage framework combining hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance. We introduce RUC (Real UI Clickboxes), a benchmark of 1,407 scenarios spanning four domains and deception categories. Experiments show DUDE reduces deception susceptibility by 53.8% while maintaining task performance, establishing an effective foundation for robust web agent deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DUDE, a two-stage framework for VLM-based web agents that uses hybrid-reward learning with asymmetric penalties and experience summarization to reduce susceptibility to deceptive UI elements. It presents the RUC benchmark consisting of 1,407 scenarios across four domains and deception categories, and reports that DUDE achieves a 53.8% reduction in deception susceptibility while preserving task performance.
Significance. If the empirical claims hold under broader validation, the work would provide a concrete defense mechanism and a new benchmark for an important safety issue in autonomous web agents. The hybrid-reward plus summarization approach offers a transferable way to distill failure patterns, which could influence future agent training pipelines.
major comments (2)
- [Abstract] Abstract: The headline result of a 53.8% reduction in deception susceptibility is stated without any description of baselines, control conditions, variance, error bars, or statistical significance tests. This absence makes it impossible to evaluate whether the improvement is robust or merely an artifact of the evaluation setup.
- [RUC benchmark and Experiments] RUC benchmark description and experimental section: The 53.8% reduction and the claim of an 'effective foundation for robust web agent deployment' rest entirely on performance inside the closed RUC set of 1,407 scenarios. No evidence is supplied that the four domains and deception categories capture adaptive, context-dependent, or VLM-specific exploits that would appear on live web pages, so the transfer argument remains untested.
minor comments (1)
- [Abstract] The abstract and introduction would benefit from a short explicit statement of the precise metrics used to compute 'deception susceptibility' and 'task performance' so readers can immediately understand the reported trade-off.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of how our results are presented and the scope of our evaluation. We address each major comment below and will make revisions to improve clarity and appropriately scope our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline result of a 53.8% reduction in deception susceptibility is stated without any description of baselines, control conditions, variance, error bars, or statistical significance tests. This absence makes it impossible to evaluate whether the improvement is robust or merely an artifact of the evaluation setup.
Authors: We agree that the abstract would benefit from additional context. The full manuscript reports comparisons against standard VLM-based web agents as baselines, with results averaged across multiple runs including standard deviations. In the revised version, we will expand the abstract to concisely note the evaluation setup (e.g., 'relative to baseline VLM agents, averaged over runs with reported variance') and reference the statistical analysis in the experiments section. This addresses the concern without exceeding typical abstract length limits. revision: yes
-
Referee: [RUC benchmark and Experiments] RUC benchmark description and experimental section: The 53.8% reduction and the claim of an 'effective foundation for robust web agent deployment' rest entirely on performance inside the closed RUC set of 1,407 scenarios. No evidence is supplied that the four domains and deception categories capture adaptive, context-dependent, or VLM-specific exploits that would appear on live web pages, so the transfer argument remains untested.
Authors: The RUC benchmark was constructed from observed real-world deceptive UI patterns across four domains to enable systematic, reproducible evaluation. We acknowledge that all reported results are on this static benchmark and do not include direct tests on live websites or against adaptive adversaries. In the revision, we will revise the abstract and conclusion to qualify the 'effective foundation' claim as applying to controlled settings, add explicit discussion of this limitation, and outline future directions for live-web validation. We cannot add new live-web experiments at this stage. revision: partial
Circularity Check
No significant circularity; empirical result on introduced benchmark is independent of inputs
full rationale
The paper proposes the DUDE framework (hybrid-reward learning with asymmetric penalties and experience summarization) and introduces the RUC benchmark of 1,407 scenarios. It then reports an experimental outcome of 53.8% reduction in deception susceptibility while preserving task performance. No equations, parameter fits, self-citations, or uniqueness theorems are described that would reduce this measured improvement to a definitional or fitted tautology. The result is presented as an external evaluation on the newly defined benchmark, leaving the derivation chain self-contained and non-circular.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose DUDE ... hybrid-reward learning with asymmetric penalties and experience summarization ... RUC benchmark of 1,407 scenarios
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Unveiling the tricks: Automated detection of dark patterns in mobile applications. InProceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, pages 114:1–114:20. ACM. Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. 2025. AEIA-MN: evaluat...
-
[2]
Os-harm: A benchmark for measuring safety of computer use agents
Navigating the digital world as humans do: Universal visual grounding for GUI agents. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24- 28, 2025. OpenReview.net. Colin M. Gray, Yubo Kou, Bryan Battles, Joseph Hog- gatt, and Austin L. Toombs. 2018. The dark (pat- terns) side of UX design. InProceedings o...
-
[3]
Ferret-ui 2: Mastering universal user interface understanding across platforms. InThe Thirteenth In- ternational Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenRe- view.net. Zeyi Liao, Jaylen Jones, Linxi Jiang, Eric Fosler- Lussier, Yu Su, Zhiqiang Lin, and Huan Sun
work page 2025
-
[4]
Redteamcua: Realistic adversarial testing of computer-use agents in hybrid web-os environments. CoRR, abs/2505.21936. Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan Weixian Lei, Li- juan Wang, and Mike Zheng Shou. 2025. Showui: One vision-language-action model for GUI visual agent. InIEEE/CVF Conference on Computer V...
-
[5]
Fight back against jailbreaking via prompt ad- versarial tuning. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neu- ral Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. Arpit Narechania, Kaustubh Odak, Mennatallah El- Assady, and Alex Endert. 2025. Provenancewidgets: A library...
-
[6]
OS-ATLAS: A foundation action model for generalist GUI agents.CoRR, abs/2410.23218. Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhou- jun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. 2024. Osworld: Benchmarking multimodal a...
work page internal anchor Pith review arXiv 2024
-
[7]
OpenReview.net. A Dataset Details This appendix provides comprehensive details on the construction, annotation, and statistical proper- ties of theRUC(RealUIClickboxes) benchmark. We developed RUC to address the scarcity of high- resolution, semantically annotated datasets specif- ically designed for detecting Deceptive Patterns (Dark Patterns) in modern ...
work page 2026
-
[8]
Task Derivation:The LLM analyzes the DOM to generate a realistic user intent
-
[9]
Deceptive Variant:The LLM modifies the HTML/CSS to embed deceptive elements that directly contradict or complicate that intent. A.5 Dataset Statistics Here we present the statistical distribution of the dataset, highlighting the diversity in resolution and element sizing. A.5.1 Image Resolution Statistics The dataset maintains high resolutions (Table 11) ...
-
[10]
</thought>- The second tag must be either <action>
Each response MUST include EXACTLY TWO top-level tags: - The first tag must be <thought>... </thought>- The second tag must be either <action>... </action>OR <final_answer>... </final_answer>
-
[11]
- Never fabricate or predict <observation>
If you output <action>, you MUST stop immediately after the closing </action>tag and wait for the real <observation>. - Never fabricate or predict <observation>. - Do NOT output <observation>in the same turn as <action>. - After click action and observation, YOU SHOULD only output final answer
-
[12]
Every tag must have a matching closing tag
Do not output any stray angle-bracket tags. Every tag must have a matching closing tag
-
[13]
Always output </final_answer>at the end of your <final_answer>symbol! ===================== ACTION SYNTAX- Put exactly one tool call inside <action>... </action>. Note that always complete <action>... </action>pair. - Use the tool signature exactly as provided in ${tool_list}. ===================== FINAL ANSWER When the goal is completed (or cannot be com...
-
[14]
Task Success Rate (SR):The proportion of episodes where the agent executes a click ac- tion (x, y) such that (x, y)∈ B c, without pre- viously triggering a deceptive element
-
[15]
Deception-Induced Failure Rate (DFR): The proportion of episodes where the agent ex- ecutes a click action (x, y) such that (x, y)∈ Bd
-
[16]
Average Steps:The mean number of inter- actions per episode. If an episode terminates Backend ID Model Class API Type Device glm GLMRemote API (ZhipuAI) Cloud qwen3_local Qwen3VLBackendLocal Inference CUDA/CPU uitars UITARSBackendLocal Inference CUDA/CPU glm_flash GLMFlashBackendLocal Inference CUDA/CPU Table 16: Supported VLM backends in the DUDE framewo...
-
[17]
The failure pool is exhausted (|F |= 0)
-
[18]
The maximum number of iterations T is reached
-
[19]
The pass rate on the validation set ceases to improve for 3 consecutive iterations. G.4 Summarizer Prompts The meta-summarizer (a VLM) receives a struc- tured input containing the current experience con- text and a list of failure descriptions. Summarizer Input Structure You are an expert rule synthesizer. Your goal is to update the current [GUIDANCE] to ...
-
[20]
Generally speaking, as soon as the Pop-Up box appears, turn it off immediately
Pop-up Prioritization: When a pop-up appears, it usually demands immediate attention. Generally speaking, as soon as the Pop-Up box appears, turn it off immediately
-
[21]
Many ´Dark Patternsúse pre-checked boxes to trick users into subscriptions or data sharing
Checkbox Skepticism: Never assume a pre-checked box is in your favor. Many ´Dark Patternsúse pre-checked boxes to trick users into subscriptions or data sharing
-
[22]
Intent: Do not blindly click the most prominent button
Visual Hierarchy vs. Intent: Do not blindly click the most prominent button. Deceptive designs often make the ’Trap’ button large and colorful while hiding the ’Safe’ option in small, grey text
-
[23]
Time-Sensitive Intent: For tasks with explicit time limits (e.g., ’Offer expires in 5:00’), identify the user’s true goal first. Is the timer a genuine constraint or a ’Urgency’ tactic to force a hasty, incorrect decision?" "Deceptive UI elements often appear as modal overlays, pop-ups, or prominent buttons designed to interrupt the user" H Stage-2 Hyperp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.