hub Canonical reference

Gui agents with foundation models: A comprehensive survey

Gui agents with foundation models: A comprehensive survey , author= · 2024 · arXiv 2411.04890

Canonical reference. 86% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 86% of classified citations

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 1

citation-polarity summary

background 6 use method 1

representative citing papers

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs

cs.CV · 2026-05-10 · conditional · novelty 7.0

GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.

Faithful Mobile GUI Agents with Guided Advantage Estimator

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

Faithful-Agent raises Trap SR in GUI agents from 13.88% to 80.21% via faithfulness-oriented SFT and GuAE-enhanced RFT with consistency rewards while retaining general performance.

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.

Where Do CoT Training Gains Land in LLM based Agents?

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

CoT training in LLM agents improves prompt-action quality more than the advantage of generated reasoning, and selectively masking action supervision improves out-of-domain generalization.

Diagnosing Task Insensitivity in Language Agents

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

The paper diagnoses task insensitivity in LLM agents as a cause of weak OOD generalization, links it to attention drift, and proposes Task-Perturbed NLL Optimization as a contrastive regularizer to improve task dependence.

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

SCALE introduces three adversarial roles (Selector, Predictor, Judger) and a graph exploration method (SCALE-Hop) to enable MLLM-based web agents to self-discover limitations and improve, backed by the SCALE-20k dataset from 19 websites.

DocOS: Towards Proactive Document-Guided Actions in GUI Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

Introduces DocOS benchmark to test GUI agents on proactively locating, comprehending, and executing instructions from online documentation in interactive web settings.

Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

TIPO applies preference-intensity weighting and padding gating to stabilize preference optimization for privacy personalization in mobile GUI agents, yielding higher alignment and distinction metrics than prior methods.

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

cs.AI · 2025-10-28 · unverdicted · novelty 6.0

MGA is a memory-driven GUI agent that uses an observer for bias-free screen reading and structured memory for compact state transitions to enable efficient long-horizon automation.

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

cs.LG · 2025-06-11 · unverdicted · novelty 6.0

LPO optimizes GUI agent positional accuracy by combining information entropy for zone selection with a physical-distance reward inside a Group Relative Preference Optimization framework, claiming SOTA results on benchmarks and online tests.

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

StainFlow proposes global entity stain tracking and local stain evidence linking modules to improve process rewards for GUI agents, reporting 3.2% relative gain in online RL success and 1.8% in judgment accuracy on AndroidWorld and OGRBench.

Mobile-Aptus: Confidence-Driven Proactive and Robust Interaction in MLLM-based Mobile-Using Agents

cs.CL · 2026-05-27 · unverdicted · novelty 5.0

Mobile-Aptus uses supervised fine-tuning followed by semantic similarity retrieval and direct preference optimization to calibrate confidence scores in mobile agents, yielding over 17% average task success improvement on four benchmarks.

Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

cs.CV · 2026-03-27 · unverdicted · novelty 5.0

Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

cs.LG · 2026-02-11 · unverdicted · novelty 5.0

UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

cs.AI · 2025-09-02 · conditional · novelty 5.0

UI-TARS-2 reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld while attaining 59.8 mean normalized score on a 15-game suite through multi-turn RL and scalable data generation.

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

cs.AI · 2025-01-27 · unverdicted · novelty 5.0

A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

GUI-AC: Enhancing Continual Learning in GUI Agents

cs.CV · 2026-06-09 · unverdicted · novelty 4.0

GUI-AC stabilizes RFT for non-stationary GUI data by down-weighting noisy advantages and relaxing clipping bounds via a grounding certainty term.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions cs.AI · 2025-01-27 · unverdicted · none · ref 162
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

Gui agents with foundation models: A comprehensive survey

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer