pith. machine review for the scientific record. sign in

arxiv: 2602.07153 · v2 · submitted 2026-02-06 · 💻 cs.AI

Recognition: unknown

ANCHOR: Branch-Point Data Generation for GUI Agents

Authors on Pith no claims yet
classification 💻 cs.AI
keywords agentsdesktoptaskanchordatademonstrationsseedsupervision
0
0 comments X
read the original abstract

End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data, yet collecting human demonstrations is expensive and existing synthetic pipelines often suffer from limited task diversity or noisy, goal-drifting trajectories. We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations. Starting from each seed, we identify branch points that correspond to meaningful state changes and propose new, state-grounded task variants conditioned on the current GUI context. An executing agent then follows the proposed instructions to generate new trajectories, while a verifier enforces task completion via state-aware checks and trajectory-level consistency. To improve supervision quality, we further apply task-conditioned step-level filtering to remove ungrounded actions and denoise post-branch segments to maintain coherent intent. Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements over zero-shot agents and representative synthesis baselines, and generalize across applications and operating systems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

    cs.LG 2026-04 unverdicted novelty 7.0

    Android Coach improves online agent training efficiency by enabling multiple actions per state via a critic-based coach, process reward model, and group-wise advantage estimation, delivering 7.5-8.3% success rate gain...

  2. Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

    cs.LG 2026-04 unverdicted novelty 5.0

    Android Coach enables Single State Multiple Actions in online RL via a critic coach with process rewards and group-wise advantage estimation, yielding 7.5-8.3% higher success rates and 1.4x training efficiency on Andr...