From grounding to planning: Benchmarking bottlenecks in web agents.arXiv preprint arXiv:2409.01927

Segev Shlomov, Ben Wiesel, Aviad Sela, Ido Levy, Liane Galanti, Roy Abitbol · 2025 · arXiv 2409.01927

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

WinDeskGround is a parametrically generated benchmark of 1,356 instruction-target pairs that reveals accuracy declines in state-of-the-art MLLMs under partial occlusion in multi-window GUI settings.

Governance by Construction for Generalist Agents

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

CUGA introduces a runtime governance architecture that enforces policies at five checkpoints in generalist agent execution pipelines for predictable and compliant behavior.

citing papers explorer

Showing 2 of 2 citing papers.

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments cs.CV · 2026-05-13 · unverdicted · none · ref 15
WinDeskGround is a parametrically generated benchmark of 1,356 instruction-target pairs that reveals accuracy declines in state-of-the-art MLLMs under partial occlusion in multi-window GUI settings.
Governance by Construction for Generalist Agents cs.AI · 2026-05-20 · unverdicted · none · ref 19
CUGA introduces a runtime governance architecture that enforces policies at five checkpoints in generalist agent execution pipelines for predictable and compliant behavior.

From grounding to planning: Benchmarking bottlenecks in web agents.arXiv preprint arXiv:2409.01927

fields

years

verdicts

representative citing papers

citing papers explorer