Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning

Anan Du; Bin Qin; Jiahui Yang; Jian Luan; Pei Fu; Ruoceng Zhang; Shaojie Zhang; Shaokang Wang; Xiuwen Xi; Ying Huang

arxiv: 2510.27266 · v2 · pith:TLX7ACJAnew · submitted 2025-10-31 · 💻 cs.CV

Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning

Shaojie Zhang , Pei Fu , Ruoceng Zhang , Jiahui Yang , Anan Du , Xiuwen Xi , Shaokang Wang , Ying Huang

show 3 more authors

Bin Qin Zhenbo Luo Jian Luan

This is my paper

classification 💻 cs.CV

keywords groundingconfidencehyperclicklearningreinforcementconfidence-basedcorrectnessreward

0 comments

read the original abstract

Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions. To address this, we propose HyperClick, a novel framework that enhances trustworthy GUI grounding through self-critiqued reinforcement learning (SCRL). HyperClick combines a correctness reward and a confidence alignment reward, training the policy model to output both a click prediction and an explicit confidence estimate. This approach jointly optimizes grounding accuracy and confidence reliability through confidence-based self-assessment. Extensive experiments on challenging benchmarks show that HyperClick maintains strong grounding performance while providing better-aligned confidence estimates. By exposing uncertainty alongside GUI actions, HyperClick supports confidence-based abstention in GUI automation. Code will be released here.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
cs.AI 2026-05 unverdicted novelty 7.0

GUI-SD is the first on-policy self-distillation framework for GUI grounding that adds privileged bounding-box context and entropy-guided weighting to outperform GRPO methods on six benchmarks in accuracy and efficiency.
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
cs.AI 2026-05 accept novelty 7.0

GUI-SD introduces on-policy self-distillation with visually enriched privileged context and entropy-guided weighting, outperforming GRPO and naive OPSD on six GUI grounding benchmarks while improving training efficiency.