hub

arXiv preprint arXiv:2302.01560 , year=

· 2023 · arXiv 2302.01560

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

VLATIM benchmark reveals large VLMs excel at high-level planning in physics puzzles but struggle with precise visual grounding and mouse control, so they lack human-like problem-solving capabilities.

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

World2Minecraft turns real scenes into Minecraft worlds via occupancy prediction and releases a large indoor occupancy dataset to improve such models.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

Voyager: An Open-Ended Embodied Agent with Large Language Models

cs.AI · 2023-05-25 · unverdicted · novelty 7.0

Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more unique items and 15.3x faster milestone unlocks than prior methods while generalizing技能

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

cs.AI · 2023-04-22 · accept · novelty 7.0

LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.

MMSkills: Towards Multimodal Skills for General Visual Agents

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.

From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

cs.CR · 2026-02-24 · unverdicted · novelty 6.0

The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

cs.CL · 2024-10-30 · unverdicted · novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

PaLM-E: An Embodied Multimodal Language Model

cs.LG · 2023-03-06 · conditional · novelty 6.0

PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.

Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game

cs.MA · 2026-04-21 · unverdicted · novelty 5.0

Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 4.0

The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

citing papers explorer

Showing 16 of 16 citing papers.

State-Centric Decision Process cs.AI · 2026-05-12 · unverdicted · none · ref 47
SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.
Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games? cs.AI · 2026-05-11 · unverdicted · none · ref 7
VLATIM benchmark reveals large VLMs excel at high-level planning in physics puzzles but struggle with precise visual grounding and mouse control, so they lack human-like problem-solving capabilities.
World2Minecraft: Occupancy-Driven Simulated Scenes Construction cs.CV · 2026-04-30 · unverdicted · none · ref 11
World2Minecraft turns real scenes into Minecraft worlds via occupancy prediction and releases a large indoor occupancy dataset to improve such models.
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 51
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis cs.AI · 2026-04-06 · unverdicted · none · ref 20
GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 164
Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
Voyager: An Open-Ended Embodied Agent with Large Language Models cs.AI · 2023-05-25 · unverdicted · none · ref 55
Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more unique items and 15.3x faster milestone unlocks than prior methods while generalizing技能
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency cs.AI · 2023-04-22 · accept · none · ref 58
LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.
MMSkills: Towards Multimodal Skills for General Visual Agents cs.AI · 2026-05-13 · unverdicted · none · ref 29
MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.
From Reaction to Anticipation: Proactive Failure Recovery through Agentic Task Graph for Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 44
AgentChord models manipulation tasks as directed graphs enriched with anticipatory recovery branches, using specialized agents to enable immediate, low-latency failure responses and improve success on long-horizon bimanual tasks.
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents cs.CR · 2026-02-24 · unverdicted · none · ref 37
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents cs.CL · 2024-10-30 · unverdicted · none · ref 104
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
PaLM-E: An Embodied Multimodal Language Model cs.LG · 2023-03-06 · conditional · none · ref 36
PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.
Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game cs.MA · 2026-04-21 · unverdicted · none · ref 30
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 29
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 184
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

arXiv preprint arXiv:2302.01560 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer