CutVerse benchmark evaluates GUI agents on 186 complex media post-production tasks in seven apps and reports 36% success rate for existing models.
Cogagent: A visual language model for gui agents
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
SiRA uses LLM world models for simulative reasoning to achieve up to 124% higher task completion and 32.2% navigation success versus reactive baselines in web environments.
citing papers explorer
-
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing
CutVerse benchmark evaluates GUI agents on 186 complex media post-production tasks in seven apps and reports 36% success rate for existing models.
-
General Agentic Planning Through Simulative Reasoning with World Models
SiRA uses LLM world models for simulative reasoning to achieve up to 124% higher task completion and 32.2% navigation success versus reactive baselines in web environments.