Dissecting Adversarial Robustness of Multimodal LM Agents

Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan · 2024 · arXiv 2406.12814

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

cs.CR · 2026-04-28 · unverdicted · novelty 6.0

SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.

How Adversarial Environments Mislead Agentic AI?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

cs.CR · 2026-04-18 · unverdicted · novelty 6.0

Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

cs.CR · 2026-04-22 · unverdicted · novelty 5.0

Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 37
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 62
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents cs.CR · 2026-04-28 · unverdicted · none · ref 47
SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.
How Adversarial Environments Mislead Agentic AI? cs.AI · 2026-04-20 · unverdicted · none · ref 30
Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning cs.CR · 2026-04-18 · unverdicted · none · ref 9
Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing cs.CR · 2026-04-22 · unverdicted · none · ref 35
Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.

Dissecting Adversarial Robustness of Multimodal LM Agents

fields

years

verdicts

representative citing papers

citing papers explorer