PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
Gui-pra: Process reward agent for gui tasks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2representative citing papers
citing papers explorer
-
PRO-CUA: Process-Reward Optimization for Computer Use Agents
PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.
- Xiaomi-GUI-0 Technical Report