hub Canonical reference

Adversarial at- tacks on multimodal agents

Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan · 2024 · arXiv 2406.12814

Canonical reference. 80% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 support 1

representative citing papers

HLL: Can Agents Cross Humanity's Last Line of Verification?

cs.AI · 2026-06-01 · unverdicted · novelty 7.0

HLL is a new benchmark that evaluates eight frontier multimodal agents on closed-loop interactive CAPTCHA solving, showing sharp performance drops under realism stressors and trace validation.

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Adversarial Hubness in Multi-Modal Retrieval

cs.CR · 2024-12-18 · unverdicted · novelty 7.0

Adversarial hubs can be generated to be retrieved as top-1 for over 84% of test queries in text-to-image retrieval, far exceeding natural hubs.

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Frontier AI agents frequently violate corrigibility by overriding interruptions in benign computer-use tasks, with misalignment increasing alongside model capability.

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

cs.CR · 2026-04-28 · unverdicted · novelty 6.0

SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.

How Adversarial Environments Mislead Agentic AI?

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning

cs.CR · 2026-04-18 · unverdicted · novelty 6.0

Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.

Mobile GUI Agents under Real-world Threats: Are We There Yet?

cs.CR · 2025-07-06 · conditional · novelty 6.0

Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

cs.CR · 2026-05-14 · unverdicted · novelty 5.0

WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

cs.CR · 2026-04-22 · unverdicted · novelty 5.0

Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

cs.CL · 2025-03-27 · accept · novelty 3.0

A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

citing papers explorer

Showing 14 of 14 citing papers.

HLL: Can Agents Cross Humanity's Last Line of Verification? cs.AI · 2026-06-01 · unverdicted · none · ref 49
HLL is a new benchmark that evaluates eight frontier multimodal agents on closed-loop interactive CAPTCHA solving, showing sharp performance drops under realism stressors and trace validation.
Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models cs.LG · 2026-05-30 · unverdicted · none · ref 19
Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 37
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 62
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Adversarial Hubness in Multi-Modal Retrieval cs.CR · 2024-12-18 · unverdicted · none · ref 80
Adversarial hubs can be generated to be retrieved as top-1 for over 84% of test queries in text-to-image retrieval, far exceeding natural hubs.
ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use cs.LG · 2026-05-29 · unverdicted · none · ref 14
Frontier AI agents frequently violate corrigibility by overriding interruptions in benign computer-use tasks, with misalignment increasing alongside model capability.
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents cs.CR · 2026-04-28 · unverdicted · none · ref 47
SnapGuard detects prompt injection attacks on screenshot-based web agents via visual stability indicators and contrast-polarity textual signals, reaching F1 0.75 while running 8x faster than GPT-4o with no added memory cost.
How Adversarial Environments Mislead Agentic AI? cs.AI · 2026-04-20 · unverdicted · none · ref 30
Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning cs.CR · 2026-04-18 · unverdicted · none · ref 9
Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
Mobile GUI Agents under Real-world Threats: Are We There Yet? cs.CR · 2025-07-06 · conditional · none · ref 29
Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections cs.CR · 2026-05-14 · unverdicted · none · ref 68
WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.
Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing cs.CR · 2026-04-22 · unverdicted · none · ref 35
Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 71
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges cs.CL · 2025-03-27 · accept · none · ref 179
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

Adversarial at- tacks on multimodal agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer