HiViG is a test-time critic that combines macro-action history summarization with visual grounding of execution coordinates to reduce short-sighted and visually erroneous actions in long-horizon GUI agents.
CUARewardBench: A benchmark for evaluating reward models on computer-using agent
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 2representative citing papers
PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
AliyunConsoleAgent-32B reaches 63.52% success on a 278-task cloud console benchmark, closing to 1.82pp of frontier models at 92% lower cost via SFT distillation and GRPO RL.
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
citing papers explorer
-
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.