pith. sign in

arxiv: 2510.27266 · v2 · pith:TLX7ACJAnew · submitted 2025-10-31 · 💻 cs.CV

Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning

classification 💻 cs.CV
keywords groundingconfidencehyperclicklearningreinforcementconfidence-basedcorrectnessreward
0
0 comments X
read the original abstract

Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions. To address this, we propose HyperClick, a novel framework that enhances trustworthy GUI grounding through self-critiqued reinforcement learning (SCRL). HyperClick combines a correctness reward and a confidence alignment reward, training the policy model to output both a click prediction and an explicit confidence estimate. This approach jointly optimizes grounding accuracy and confidence reliability through confidence-based self-assessment. Extensive experiments on challenging benchmarks show that HyperClick maintains strong grounding performance while providing better-aligned confidence estimates. By exposing uncertainty alongside GUI actions, HyperClick supports confidence-based abstention in GUI automation. Code will be released here.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

    cs.AI 2026-05 unverdicted novelty 7.0

    GUI-SD is the first on-policy self-distillation framework for GUI grounding that adds privileged bounding-box context and entropy-guided weighting to outperform GRPO methods on six benchmarks in accuracy and efficiency.

  2. Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

    cs.AI 2026-05 accept novelty 7.0

    GUI-SD introduces on-policy self-distillation with visually enriched privileged context and entropy-guided weighting, outperforming GRPO and naive OPSD on six GUI grounding benchmarks while improving training efficiency.