AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.
The dawn of GUI agent: A preliminary case study with claude 3.5 computer use
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.
PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).
citing papers explorer
-
AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design
AHD Agent trains a 4B-parameter LLM via agentic RL to actively use tools for automatic heuristic design, matching or exceeding larger baselines across eight domains with fewer evaluations.
-
Group-in-Group Policy Optimization for LLM Agent Training
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
-
BAMI: Training-Free Bias Mitigation in GUI Grounding
BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.
-
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
GameWorld is a new benchmark providing standardized interfaces, 34 games, 170 tasks, and verifiable outcome metrics to evaluate multimodal large language model agents in video game environments.
-
Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents
A new algorithm learns correct agent behavior models from few traces by combining dominator analysis, LLMs, and automata to validate sequential executions with high accuracy.
-
Perceptual Flow Network for Visually Grounded Reasoning
PFlowNet decouples perception from reasoning, integrates multi-dimensional rewards with vicinal geometric shaping via variational RL, and reports new SOTA results on V* Bench (90.6%) and MME-RealWorld-lite (67.0%).