UI traces of actions and timings from LLM browser agents enable identification of the underlying model with up to 96% F1 across 14 models and multiple tasks.
Attacking multimodal os agents with malicious image patches
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
MIRAGE creates perceptually benign adversarial images using diffusion and curvature-aware optimization to enable targeted prompt injection attacks on web agents like SeeAct and OpenClaw within attacker-controlled boundaries.
WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.
WebAgentGuard is a reasoning-driven multimodal model trained on large synthetic data via supervised fine-tuning and reinforcement learning to detect prompt injections in web agents better than prior defenses.
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
citing papers explorer
-
Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces
UI traces of actions and timings from LLM browser agents enable identification of the underlying model with up to 96% F1 across 14 models and multiple tasks.
-
MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents
MIRAGE creates perceptually benign adversarial images using diffusion and curvature-aware optimization to enable targeted prompt injection attacks on web agents like SeeAct and OpenClaw within attacker-controlled boundaries.
-
WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections
WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.
-
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
WebAgentGuard is a reasoning-driven multimodal model trained on large synthetic data via supervised fine-tuning and reinforcement learning to detect prompt injections in web agents better than prior defenses.
-
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.