{"total":13,"items":[{"citing_arxiv_id":"2605.30884","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning","primary_cat":"cs.CV","submitted_at":"2026-05-29T06:17:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"GUI-C² pairs a difficulty-scoring data pipeline with an area-gated coarse-to-fine RL mechanism to improve GUI grounding accuracy and training stability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28629","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents","primary_cat":"cs.CL","submitted_at":"2026-05-27T15:37:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12501","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Covering Human Action Space for Computer Use: Data Synthesis and Benchmark","primary_cat":"cs.CV","submitted_at":"2026-05-12T17:59:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Uground-V1-2B∗[10] 2024-10 27.1 12.8 14.3 10.5 0.0 9.4 6.2 0.0 5.2 Uground-V1-7B∗[10] 2024-10 31.1 12.9 18.2 18.4 0.0 3.1 9.4 2.4 6.7 OS-Atlas-Base-7B∗[11] 2024-10 18.9 9.0 9.9 15.8 0.0 12.5 10.9 0.0 7.8 InfiGUI-R1-3B [18] 2025-04 45.2 22.0 23.2 23.7 3.1 9.4 7.8 0.0 8.8 UI-Venus-Ground-7B [19] 2025-08 50.8 26.5 24.3 23.7 3.1 18.8 9.4 0.0 11.0 GUI-G2-7B [20] 2025-07 47.5 26.4 21.1 23.7 6.2 15.6 7.8 4.8 11.6 MAI-UI-2B†[22] 2025-12 57.4 30.3 27.1 18.4 3.1 18.8 12.5 9.5 12.5 GUI-Owl-1.5-8B-Think [23] 2026-02 57.6 33.2 24.4 23.7 9.4 18.8 10.9 7.1 14.0 MAI-UI-8B†[22] 2025-12 65.8 40.7 25.1 26.3 18.8 18.8 7.8 4.8 15.3 GUI-Owl-1.5-8B-Instruct [23] 2026-02 71.1 37.4 33.7 23.7 15.6 18.8 9.4 9.5 15.4 UI-Venus-Ground-72B [19] 2025-08 61."},{"citing_arxiv_id":"2605.06664","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BAMI: Training-Free Bias Mitigation in GUI Grounding","primary_cat":"cs.CV","submitted_at":"2026-05-07T17:59:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"enhancing spatial reasoning for GUI grounding tasks. Fol- lowing this, UI-R1 [18] and GUI-R1 [20] were among the first to apply GRPO in GUI tasks. InfiGUI-R1 [16] focused on reward function design, emphasizing IoU-based metrics to improve localization accuracy. GUI-G1 [37] introduced box-attribute constraints to regulate bounding-box geome- try, while GUI-G2 [26] modeled spatial distributions using Gaussian functions. TianXi-Action [27] focused on generat- ing high-quality reinforcement learning data. Collectively, these studies affirm the efficacy of reinforcement learning in enhancing spatial reasoning in GUI tasks. 2.3. Inference Enhancement Significant attention has been given to optimizing inference strategies to exploit the capabilities of MLLMs."},{"citing_arxiv_id":"2604.27955","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GUI Agents with Reinforcement Learning: Toward Digital Inhabitants","primary_cat":"cs.AI","submitted_at":"2026-04-30T14:51:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24348","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents","primary_cat":"cs.CL","submitted_at":"2026-04-27T11:44:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"systems (computers, smartphones, and tablets) by performing actions such as clicks, swipes, and text input based on graph- ical user interfaces, in response to user instructions. Existing works have approached the construction of OS agents through various methods, including pre-training [19], [20], mid-training [21], [22], supervised fine-tuning [23], [24], reinforcement learning [25]-[28], prompt engineering [29], and multi-agent systems [30], [31]. These approaches have enhanced the OS agents' capabilities in grounding, reasoning, and task completion from different perspectives. However, in order to evolve OS agents from mere tools to trustworthy partners, it is essential to consider not only their task completion performance but also their safety [32]-"},{"citing_arxiv_id":"2604.21268","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding","primary_cat":"cs.LG","submitted_at":"2026-04-23T04:23:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13531","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management","primary_cat":"cs.AI","submitted_at":"2026-04-15T06:27:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[41] Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. An- droidinthewild: A large-scale dataset for android device control.Advances in Neural Information Processing Systems, 36:59708-59728, 2023. [42] Marta Sumyk and Oleksandr Kosovan. Cuaaudit: Meta-evaluation of vision-language models as auditors of autonomous computer-use agents.arXiv preprint arXiv:2603.10577, 2026. [43] Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, et al. Gui-g2: Gaussian reward modeling for gui grounding.arXiv preprint arXiv:2507.15846, 2025. [44] Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2."},{"citing_arxiv_id":"2604.13019","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PrecisionCUA: Iterative Visual Refinement for Pixel-Precise Cursor Grounding in Code Editors","primary_cat":"cs.CV","submitted_at":"2026-04-14T17:55:46+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09442","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UIPress: Bringing Optical Token Compression to UI-to-Code Generation","primary_cat":"cs.CL","submitted_at":"2026-04-10T15:58:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration.arXiv preprint arXiv:2503.17709(2025). [45] Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, and Yueting Zhuang. 2025. GUI-G2: Gaussian Reward Modeling for GUI Grounding. arXiv:2507.15846 [cs.LG] https://arxiv.org/abs/2507.15846 [46] Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhiheng Xi, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Xuanjing Huang, Yu-Gang Jiang, Tao Gui, Qi Zhang, Kang Wang, Yunke Zhang, and Yuran Wang. 2025. MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline"},{"citing_arxiv_id":"2603.26041","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives","primary_cat":"cs.CV","submitted_at":"2026-03-27T03:21:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.21982","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RISK: A Framework for GUI Agents in E-commerce Risk Management","primary_cat":"cs.AI","submitted_at":"2025-09-26T07:05:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.07553","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents","primary_cat":"cs.CL","submitted_at":"2025-09-09T09:46:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VeriOS-Agent is an OS agent that proactively queries humans in untrustworthy scenarios via a query-driven framework and three-stage training, achieving 19.72% higher step-wise success rate over baselines while preserving normal performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}