Canonical reference

arXiv preprint arXiv:2505.13227 , year=

Scaling Computer-Use Grounding via User Interface Decomposition, Synthesis , author= · 2025 · arXiv 2505.13227

Canonical reference. 86% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 86% of classified citations

read on arXiv browse 18 citing papers

citation-role summary

background 7

citation-polarity summary

background 6 unclear 1

representative citing papers

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

cs.AI · 2026-05-25 · conditional · novelty 7.0

CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

cs.AI · 2026-05-01 · accept · novelty 7.0 · 2 refs

GUI-SD introduces on-policy self-distillation with visually enriched privileged context and entropy-guided weighting, outperforming GRPO and naive OPSD on six GUI grounding benchmarks while improving training efficiency.

Benchmarking and Improving GUI Agents in High-Dynamic Environments

cs.CV · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

DynamicUI improves GUI agent performance in high-dynamic environments by processing interaction videos with frame clustering, action-conditioned refinement, and reflection, outperforming prior approaches on the new DynamicGUIBench spanning ten applications.

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.

OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents

cs.AI · 2025-06-19 · unverdicted · novelty 7.0

AI agents on OSWorld take 2.7-4.3 times more steps than human trajectories, with latency rising sharply due to repeated large model calls for planning and reflection.

Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

Failure-driven self-improvement raises OpenCUA-72B success rate on OSWorld from 42.3% to 48.9% via LLM diagnosis and inference-time code patches, without retraining.

One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

InnerZoom bridges cross-layer evidence in one forward pass to achieve SOTA GUI grounding accuracy on six benchmarks while cutting latency up to 31.8% versus two-pass baselines.

WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

WorldBench is a visually diverse multimodal reasoning benchmark where the strongest of 15 tested MLLMs reaches only 64% accuracy.

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

UI-Zoomer uses uncertainty quantification to trigger and size adaptive zoom-ins only on uncertain GUI grounding predictions, yielding up to 13.4% gains on benchmarks with no training.

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

cs.AI · 2025-10-28 · unverdicted · novelty 6.0

MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.

RISK: A Framework for GUI Agents in E-commerce Risk Management

cs.AI · 2025-09-26 · unverdicted · novelty 6.0

RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.

GTA1: GUI Test-time Scaling Agent

cs.AI · 2025-07-08 · unverdicted · novelty 6.0

GTA1 combines test-time scaling for action plan selection with RL-based grounding to achieve SOTA results on GUI agent benchmarks.

GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

GUI-C² pairs a difficulty-scoring data pipeline with an area-gated coarse-to-fine RL mechanism to improve GUI grounding accuracy and training stability.

MUIAnno: An Expert-Annotated Dataset and Evaluation Benchmark for Mobile UI Understanding

cs.HC · 2026-05-17 · unverdicted · novelty 5.0

MUIAnno is an expert-annotated dataset of mobile UI screens from iOS apps with structured JSON labels and baseline results for UI element detection.

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

cs.LG · 2026-04-15

citing papers explorer

Showing 18 of 18 citing papers.

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents cs.AI · 2026-05-25 · conditional · none · ref 29
CUA-Gym generates 32,112 verified RLVR tuples across 110 mock environments, enabling trained models to reach 62.1% and 72.6% on OSWorld-Verified while transferring to WebArena.
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark cs.CV · 2026-05-12 · unverdicted · none · ref 24
Presents CUActSpot benchmark and renderer-LLM data synthesis that lets a 4B model outperform larger open-source models on complex computer interactions.
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding cs.AI · 2026-05-01 · accept · none · ref 42 · 2 links
GUI-SD introduces on-policy self-distillation with visually enriched privileged context and entropy-guided weighting, outperforming GRPO and naive OPSD on six GUI grounding benchmarks while improving training efficiency.
Benchmarking and Improving GUI Agents in High-Dynamic Environments cs.CV · 2026-04-28 · unverdicted · none · ref 37 · 2 links
DynamicUI improves GUI agent performance in high-dynamic environments by processing interaction videos with frame clustering, action-conditioned refinement, and reflection, outperforming prior approaches on the new DynamicGUIBench spanning ten applications.
RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management cs.AI · 2026-04-15 · unverdicted · none · ref 51
RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents cs.AI · 2025-06-19 · unverdicted · none · ref 12
AI agents on OSWorld take 2.7-4.3 times more steps than human trajectories, with latency rising sharply due to repeated large model calls for planning and reflection.
Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents cs.CV · 2026-06-30 · unverdicted · none · ref 25
Failure-driven self-improvement raises OpenCUA-72B success rate on OSWorld from 42.3% to 48.9% via LLM diagnosis and inference-time code patches, without retraining.
One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding cs.CV · 2026-06-29 · unverdicted · none · ref 55
InnerZoom bridges cross-layer evidence in one forward pass to achieve SOTA GUI grounding accuracy on six benchmarks while cutting latency up to 31.8% versus two-pass baselines.
WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark cs.CV · 2026-06-04 · unverdicted · none · ref 173
WorldBench is a visually diverse multimodal reasoning benchmark where the strongest of 15 tested MLLMs reaches only 64% accuracy.
AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding cs.CV · 2026-05-04 · unverdicted · none · ref 37
AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding cs.CV · 2026-04-15 · unverdicted · none · ref 21
UI-Zoomer uses uncertainty quantification to trigger and size adaptive zoom-ins only on uncertain GUI grounding predictions, yielding up to 13.4% gains on benchmarks with no training.
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction cs.AI · 2025-10-28 · unverdicted · none · ref 33
MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.
RISK: A Framework for GUI Agents in E-commerce Risk Management cs.AI · 2025-09-26 · unverdicted · none · ref 23
RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.
GTA1: GUI Test-time Scaling Agent cs.AI · 2025-07-08 · unverdicted · none · ref 6
GTA1 combines test-time scaling for action plan selection with RL-based grounding to achieve SOTA results on GUI agent benchmarks.
GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning cs.CV · 2026-05-29 · unverdicted · none · ref 39
GUI-C² pairs a difficulty-scoring data pipeline with an area-gated coarse-to-fine RL mechanism to improve GUI grounding accuracy and training stability.
MUIAnno: An Expert-Annotated Dataset and Evaluation Benchmark for Mobile UI Understanding cs.HC · 2026-05-17 · unverdicted · none · ref 31
MUIAnno is an expert-annotated dataset of mobile UI screens from iOS apps with structured JSON labels and baseline results for UI element detection.
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding cs.LG · 2026-04-23 · unverdicted · none · ref 13
A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models cs.LG · 2026-04-15 · unreviewed · ref 15

arXiv preprint arXiv:2505.13227 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer