SPRITE converts static game UI screenshots into editable engine-ready assets by using VLMs to parse complex layouts into a YAML intermediate representation.
Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology , series =
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
VLAA-GUI adds mandatory visual verifiers, multi-tier loop breakers, and on-demand search to GUI agents, reaching 77.5% on OSWorld and 61.0% on WindowsAgentArena with some models exceeding human performance.
DroidRetriever is a transparent steerable mobile automation system that decomposes information-seeking tasks with multi-LLM agents, navigates apps, synthesizes reports with screenshots, and provides a dashboard for real-time user intervention and privacy pauses.
Crepe introduces a graph-query technique in a no-code Android app for flexible collection of targeted mobile screen data with privacy and consent controls.
SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.
MUIAnno is an expert-annotated dataset of mobile UI screens from iOS apps with structured JSON labels and baseline results for UI element detection.
citing papers explorer
-
SPRITE: From Static Mockups to Engine-Ready Game UI
SPRITE converts static game UI screenshots into editable engine-ready assets by using VLMs to parse complex layouts into a YAML intermediate representation.
-
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
VLAA-GUI adds mandatory visual verifiers, multi-tier loop breakers, and on-demand search to GUI agents, reaching 77.5% on OSWorld and 61.0% on WindowsAgentArena with some models exceeding human performance.
-
DroidRetriever: A Transparent and Steerable Automation System for Collaborative Mobile Information Seeking
DroidRetriever is a transparent steerable mobile automation system that decomposes information-seeking tasks with multi-LLM agents, navigates apps, synthesizes reports with screenshots, and provides a dashboard for real-time user intervention and privacy pauses.
-
Crepe: A Mobile Screen Data Collector Using Graph Query
Crepe introduces a graph-query technique in a no-code Android app for flexible collection of targeted mobile screen data with privacy and consent controls.
-
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.
-
MUIAnno: An Expert-Annotated Dataset and Evaluation Benchmark for Mobile UI Understanding
MUIAnno is an expert-annotated dataset of mobile UI screens from iOS apps with structured JSON labels and baseline results for UI element detection.