GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.
Autonomous evaluation and refinement of digital agents
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
AWM induces reusable workflows from agent experiences and provides them selectively to improve success rates by 24.6% on Mind2Web and 51.1% on WebArena while reducing steps taken.
citing papers explorer
-
GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis
GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.
-
Agent Workflow Memory
AWM induces reusable workflows from agent experiences and provides them selectively to improve success rates by 24.6% on Mind2Web and 51.1% on WebArena while reducing steps taken.