hub Canonical reference

The dawn of gui agent: A preliminary case study with claude 3.5 computer use

Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou · 2024 · arXiv 2411.10323

Canonical reference. 71% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 71% of classified citations

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 2 method 1

citation-polarity summary

background 5 baseline 2

representative citing papers

AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.

Group-in-Group Policy Optimization for LLM Agent Training

cs.LG · 2025-05-16 · unverdicted · novelty 7.0

GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.

BAMI: Training-Free Bias Mitigation in GUI Grounding

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.

When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks

cs.HC · 2025-10-06 · conditional · novelty 6.0

A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.

Benchmarking and Mitigating Sycophancy in Medical Vision Language Models

cs.CV · 2025-09-26 · unverdicted · novelty 6.0 · 2 refs

The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

cs.AI · 2026-05-15 · unverdicted · novelty 5.0

DRS-GUI introduces a dynamic region search method with Focus/Shift/Scatter actions and MCTS-based planning that improves GUI grounding accuracy by 14% on ScreenSpot-Pro for both general and GUI-specific MLLMs without any training.

Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.

Perceptual Flow Network for Visually Grounded Reasoning

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

cs.AI · 2025-01-27 · unverdicted · novelty 5.0

A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

cs.AI · 2025-04-28 · accept · novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

citing papers explorer

Showing 12 of 12 citing papers.

AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design cs.AI · 2026-05-09 · unverdicted · none · ref 29
AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.
Group-in-Group Policy Optimization for LLM Agent Training cs.LG · 2025-05-16 · unverdicted · none · ref 27
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
BAMI: Training-Free Bias Mitigation in GUI Grounding cs.CV · 2026-05-07 · unverdicted · none · ref 11
BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents cs.CV · 2026-04-08 · unverdicted · none · ref 35
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks cs.HC · 2025-10-06 · conditional · none · ref 31
A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.
Benchmarking and Mitigating Sycophancy in Medical Vision Language Models cs.CV · 2025-09-26 · unverdicted · none · ref 12 · 2 links
The paper benchmarks sycophancy in medical VLMs using hierarchical VQA templates and proposes VIPER to filter non-evidence social cues, reducing sycophancy while preserving interpretability.
DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding cs.AI · 2026-05-15 · unverdicted · none · ref 15
DRS-GUI introduces a dynamic region search method with Focus/Shift/Scatter actions and MCTS-based planning that improves GUI grounding accuracy by 14% on ScreenSpot-Pro for both general and GUI-specific MLLMs without any training.
Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents cs.AI · 2026-05-04 · unverdicted · none · ref 27
A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.
Perceptual Flow Network for Visually Grounded Reasoning cs.CV · 2026-05-04 · unverdicted · none · ref 14
PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions cs.AI · 2025-01-27 · unverdicted · none · ref 61
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review cs.AI · 2025-04-28 · accept · none · ref 136
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
Large Language Model-Brained GUI Agents: A Survey cs.AI · 2024-11-27 · unverdicted · none · ref 169
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

The dawn of gui agent: A preliminary case study with claude 3.5 computer use

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer