pith. sign in

hub Canonical reference

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Canonical reference. 89% of citing Pith papers cite this work as background.

30 Pith papers citing it
Background 89% of classified citations
abstract

The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20%, highlighting the limitations of Reinforcement Learning (RL) based controllers used in existing methods. To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate on the "ObtainDiamond" task, demonstrating superior robustness compared to traditional RL-based controllers. Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website at https://github.com/OpenGVLab/GITM.

hub tools

citation-role summary

background 7 method 2

citation-polarity summary

representative citing papers

2.5-D Decomposition for LLM-Based Spatial Construction

cs.AI · 2026-05-08 · unverdicted · novelty 6.0 · 3 refs

A neuro-symbolic 2.5-D decomposition pipeline separates LLM horizontal planning from deterministic vertical execution, achieving 94.6% structural accuracy on the Build What I Mean benchmark.

SkillDroid: Compile Once, Reuse Forever

cs.HC · 2026-04-16 · conditional · novelty 6.0

SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.

A Survey on Large Language Model based Autonomous Agents

cs.AI · 2023-08-22 · accept · novelty 6.0

A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.

Experience Transfer for Multimodal LLM Agents in Minecraft Game

cs.AI · 2026-04-07 · unverdicted · novelty 5.0

Echo framework enables experience transfer for multimodal LLM agents in Minecraft by decomposing knowledge into structure, attribute, process, function, and interaction dimensions and applying in-context analogy learning, achieving 1.3x-1.7x speedup on object-unlocking tasks with burst-like chain-un

citing papers explorer

Showing 30 of 30 citing papers.