CUARewardBench: A benchmark for evaluating reward models on computer-using agent

Haojia Lin, Xiaoyu Tan, Yulei Qin, Zihan Xu, Yuchen Shi, Zongyi Li · 2025 · arXiv 2510.18596

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 2

citation-polarity summary

background 1 use dataset 1

representative citing papers

A History-Aware Visually Grounded Critic for Computer Use Agents

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

HiViG is a test-time critic that combines macro-action history summarization with visual grounding of execution coordinates to reduce short-sighted and visually erroneous actions in long-horizon GUI agents.

PRO-CUA: Process-Reward Optimization for Computer Use Agents

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

PRO-CUA trains CUAs via decoupled on-policy rollouts and PRM-guided step-level optimization to enable dense credit assignment without expert trajectories or golden answers.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning

cs.AI · 2026-06-08 · unverdicted · novelty 5.0

AliyunConsoleAgent-32B reaches 63.52% success on a 278-task cloud console benchmark, closing to 1.82pp of frontier models at 92% lower cost via SFT distillation and GRPO RL.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16

citing papers explorer

Showing 1 of 1 citing paper after filters.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 39
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

CUARewardBench: A benchmark for evaluating reward models on computer-using agent

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer