Visual test-time scaling for gui agent grounding

Visual test-time scaling for gui agent grounding , author= · 2025 · arXiv 2505.00684

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

SceneDiver introduces a coarse-to-fine focus plan generation approach for VLMs that constructs holistic scene graphs then iteratively decomposes tasks, plus a distillation adapter for VLAs, to reduce visual hallucinations in embodied AI benchmarks.

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding cs.CV · 2026-05-04 · unverdicted · none · ref 22
AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.

Visual test-time scaling for gui agent grounding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer