Caution for the environment: Multimodal agents are susceptible to environmental distractions,

Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao · 2024 · arXiv 2408.02544

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GUIGuard-Bench: Toward a General Evaluation for Privacy-Preserving GUI Agents

cs.CR · 2026-01-26 · unverdicted · novelty 8.0

GUIGuard-Bench is a new benchmark with annotated GUI screenshots that measures privacy recognition, planning fidelity under protection, and utility impact for trajectory-based GUI agents.

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

cs.AI · 2026-02-24 · unverdicted · novelty 7.0

The work creates a new benchmark for humanizing GUI agent touch dynamics via a MinMax detector-agent model, a mobile touch dataset, and methods showing agents can match human behavior without losing task performance.

Mobile GUI Agents under Real-world Threats: Are We There Yet?

cs.CR · 2025-07-06 · conditional · novelty 6.0

Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

cs.CR · 2025-07-13 · conditional · novelty 5.0

LaSM is a layer-wise scaling mechanism that amplifies attention and MLP modules in critical layers to defend GUI agents against pop-up attacks by correcting attention misalignment.

citing papers explorer

Showing 4 of 4 citing papers.

GUIGuard-Bench: Toward a General Evaluation for Privacy-Preserving GUI Agents cs.CR · 2026-01-26 · unverdicted · none · ref 32
GUIGuard-Bench is a new benchmark with annotated GUI screenshots that measures privacy recognition, planning fidelity under protection, and utility impact for trajectory-based GUI agents.
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization cs.AI · 2026-02-24 · unverdicted · none · ref 30
The work creates a new benchmark for humanizing GUI agent touch dynamics via a MinMax detector-agent model, a mobile touch dataset, and methods showing agents can match human behavior without losing task performance.
Mobile GUI Agents under Real-world Threats: Are We There Yet? cs.CR · 2025-07-06 · conditional · none · ref 15
Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.
LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents cs.CR · 2025-07-13 · conditional · none · ref 10
LaSM is a layer-wise scaling mechanism that amplifies attention and MLP modules in critical layers to defend GUI agents against pop-up attacks by correcting attention misalignment.

Caution for the environment: Multimodal agents are susceptible to environmental distractions,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer