PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
CUARewardBench: A benchmark for evaluating reward models on computer-using agent
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
dataset 2
citation-polarity summary
roles
dataset 2representative citing papers
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
citing papers explorer
-
PRO-CUA: Process-Reward Optimization for Computer Use Agents
PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.