VideoAgentTrek: Computer use pretraining from unlabeled videos, 2025

Dunjie Lu, Yiheng Xu, Junli Wang, Haoyuan Wu, Xinyuan Wang, Zekun Wang, Junlin Yang, Hongjin Su, Jixuan Chen, Junda Chen, Yuchen Mao, Jingren Zhou, Junyang Lin, Binyuan Hui, Tao Yu · 2025 · arXiv 2510.19488

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

cs.AI · 2026-05-25 · conditional · novelty 7.0

CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.

PhoneWorld: Scaling Phone-Use Agent Environments

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

PhoneWorld is a pipeline that converts real mobile trajectories into scalable controllable environments, yielding large gains on four benchmarks when used to supplement training data.

citing papers explorer

Showing 3 of 3 citing papers.

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents cs.AI · 2026-05-25 · conditional · none · ref 12
CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks cs.AI · 2026-06-28 · unverdicted · none · ref 56
OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.
PhoneWorld: Scaling Phone-Use Agent Environments cs.CL · 2026-05-28 · unverdicted · none · ref 7
PhoneWorld is a pipeline that converts real mobile trajectories into scalable controllable environments, yielding large gains on four benchmarks when used to supplement training data.

VideoAgentTrek: Computer use pretraining from unlabeled videos, 2025

fields

years

verdicts

representative citing papers

citing papers explorer