SceneDiver introduces a coarse-to-fine focus plan generation approach for VLMs that constructs holistic scene graphs then iteratively decomposes tasks, plus a distillation adapter for VLAs, to reduce visual hallucinations in embodied AI benchmarks.
Visual test-time scaling for gui agent grounding
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.
A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
citing papers explorer
-
Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation
SceneDiver introduces a coarse-to-fine focus plan generation approach for VLMs that constructs holistic scene graphs then iteratively decomposes tasks, plus a distillation adapter for VLAs, to reduce visual hallucinations in embodied AI benchmarks.
-
AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding
AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.
-
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
A co-evolving proposer-critic RL framework improves GUI grounding accuracy by letting the model critique its own proposals rendered on screenshots.
-
Large Language Model-Brained GUI Agents: A Survey
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.