Attacking multimodal os agents with malicious image patches

· 2025 · arXiv 2503.10809

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

UI traces of actions and timings from LLM browser agents enable identification of the underlying model with up to 96% F1 across 14 models and multiple tasks.

MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents

cs.CV · 2026-06-16 · unverdicted · novelty 5.0

MIRAGE creates perceptually benign adversarial images using diffusion and curvature-aware optimization to enable targeted prompt injection attacks on web agents like SeeAct and OpenClaw within attacker-controlled boundaries.

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

cs.CR · 2026-05-14 · unverdicted · novelty 5.0

WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

cs.CR · 2026-04-14 · unverdicted · novelty 5.0

WebAgentGuard is a reasoning-driven multimodal model trained on large synthetic data via supervised fine-tuning and reinforcement learning to detect prompt injections in web agents better than prior defenses.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces cs.CR · 2026-05-14 · unverdicted · none · ref 32
UI traces of actions and timings from LLM browser agents enable identification of the underlying model with up to 96% F1 across 14 models and multiple tasks.
MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents cs.CV · 2026-06-16 · unverdicted · none · ref 21
MIRAGE creates perceptually benign adversarial images using diffusion and curvature-aware optimization to enable targeted prompt injection attacks on web agents like SeeAct and OpenClaw within attacker-controlled boundaries.
WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections cs.CR · 2026-05-14 · unverdicted · none · ref 1
WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents cs.CR · 2026-04-14 · unverdicted · none · ref 1
WebAgentGuard is a reasoning-driven multimodal model trained on large synthetic data via supervised fine-tuning and reinforcement learning to detect prompt injections in web agents better than prior defenses.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 147
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Attacking multimodal os agents with malicious image patches

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer