pith. machine review for the scientific record. sign in

arxiv: 2305.16291 · v2 · submitted 2023-05-25 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Voyager: An Open-Ended Embodied Agent with Large Language Models

Ajay Mandlekar, Anima Anandkumar, Chaowei Xiao, Guanzhi Wang, Linxi Fan, Yuke Zhu, Yunfan Jiang, Yuqi Xie

Pith reviewed 2026-05-10 13:06 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords embodied agentlarge language modelsMinecraftlifelong learningskill libraryautomatic curriculumiterative promptingexploration
0
0 comments X

The pith

Voyager uses GPT-4 to build a reusable library of executable skills that lets it explore Minecraft far more effectively than prior agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Voyager as the first embodied agent that uses a large language model to explore Minecraft continuously, acquire diverse skills, and make discoveries entirely without human intervention. It achieves this with three components: an automatic curriculum that drives exploration, an ever-growing library of executable code for storing and retrieving behaviors, and an iterative prompting loop that refines programs using environment feedback, execution errors, and self-verification. The resulting skills are compositional and interpretable, allowing abilities to compound over time while avoiding catastrophic forgetting. A sympathetic reader would care because the approach bypasses model fine-tuning and demonstrates strong transfer to new worlds, pointing toward agents that can keep learning in open-ended settings.

Core claim

Voyager is an LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. It consists of an automatic curriculum that maximizes exploration, an ever-growing skill library of executable code for storing and retrieving complex behaviors, and a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, bypassing the need for model parameter fine-tuning. The skills developed are temporally extended, interpretable, and compositional, which compounds the

What carries the argument

The ever-growing skill library of executable code that stores and retrieves complex behaviors, combined with iterative prompting that refines programs using environment feedback and self-verification.

Load-bearing premise

That repeated black-box queries to GPT-4, augmented only by environment feedback, execution errors, and self-verification, will reliably produce correct, temporally extended, and compositional executable code for increasingly complex behaviors without human debugging or fine-tuning.

What would settle it

Observing that Voyager fails to transfer its skill library to solve novel tasks in a fresh Minecraft world at rates substantially above baselines, or that the library stops expanding after limited exploration, would disprove the central claim of effective lifelong learning.

read the original abstract

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft. It uses three components—an automatic curriculum to maximize exploration, an ever-growing library of executable skills, and an iterative prompting mechanism with GPT-4 that incorporates environment feedback, execution errors, and self-verification—to enable continuous skill acquisition and novel discoveries without human intervention or model fine-tuning. The skills are claimed to be temporally extended, interpretable, and compositional. Empirically, Voyager is reported to achieve 3.3x more unique items, 2.3x longer distances traveled, and up to 15.3x faster unlocking of tech-tree milestones than prior SOTA, while also demonstrating zero-shot transfer to new worlds via the skill library.

Significance. If the empirical results hold under rigorous verification, the work would represent a meaningful advance in open-ended embodied agents by demonstrating how in-context learning with LLMs can support compounding, lifelong skill acquisition in complex environments. The open-sourcing of the full codebase and prompts is a clear strength that supports reproducibility and community follow-up.

major comments (3)
  1. [Section 4] Section 4 (Experiments): The reported performance multipliers (3.3x items, 2.3x distance, 15.3x faster milestones) are presented without ablations that isolate the iterative prompting mechanism from the curriculum and skill library; this is load-bearing for attributing the gains to the three-component loop rather than any single element.
  2. [Section 3.3] Section 3.3 (Iterative Prompting): No quantitative breakdown is provided of iteration counts per skill, success/failure rates of the self-verification loop, or common error modes when GPT-4 generates temporally extended code; without these, the claim that black-box queries reliably produce correct compositional programs without human repair cannot be fully assessed.
  3. [Section 4.2] Section 4.2 (Comparisons): The evaluation against prior SOTA does not state whether baselines were re-run under identical conditions, random seeds, or compute budgets, or whether published numbers were used; this directly affects the reliability of the cross-method claims.
minor comments (2)
  1. [Abstract] The abstract and Section 1 could more explicitly define the precise metrics for 'unique items' and 'tech tree milestones' to aid immediate interpretation.
  2. [Figure 2] Figure 2 (architecture diagram) would benefit from clearer labeling of the data flow between the skill library retrieval and the prompting module.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We believe the suggested clarifications and additions will improve the paper's rigor and transparency. We address each major comment below.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Experiments): The reported performance multipliers (3.3x items, 2.3x distance, 15.3x faster milestones) are presented without ablations that isolate the iterative prompting mechanism from the curriculum and skill library; this is load-bearing for attributing the gains to the three-component loop rather than any single element.

    Authors: We agree that isolating the contribution of the iterative prompting mechanism through targeted ablations would strengthen the attribution of performance gains to the full three-component architecture. In the revised version, we will include new ablation experiments that compare the full Voyager system against a variant without iterative prompting (while retaining the automatic curriculum and skill library), providing quantitative evidence for its role in the observed improvements. revision: yes

  2. Referee: [Section 3.3] Section 3.3 (Iterative Prompting): No quantitative breakdown is provided of iteration counts per skill, success/failure rates of the self-verification loop, or common error modes when GPT-4 generates temporally extended code; without these, the claim that black-box queries reliably produce correct compositional programs without human repair cannot be fully assessed.

    Authors: We acknowledge the value of providing more detailed statistics on the iterative prompting process to allow readers to better evaluate the reliability of the black-box LLM queries. We will add a quantitative analysis in Section 3.3, including average iteration counts per skill, success and failure rates of the self-verification loop, and a categorization of common error modes encountered during code generation and execution. revision: yes

  3. Referee: [Section 4.2] Section 4.2 (Comparisons): The evaluation against prior SOTA does not state whether baselines were re-run under identical conditions, random seeds, or compute budgets, or whether published numbers were used; this directly affects the reliability of the cross-method claims.

    Authors: To ensure fair and reproducible comparisons, all baseline methods were re-implemented and evaluated in our experimental setup using the same Minecraft environment, task definitions, and evaluation protocols as Voyager. We will explicitly document this in the revised Section 4.2, including details on the use of consistent random seeds and compute budgets where applicable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with external benchmarks

full rationale

The paper introduces Voyager as an LLM-based agent with three components (automatic curriculum, skill library, iterative prompting) and supports its claims solely through empirical Minecraft experiments comparing against prior SOTA baselines. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text. Performance metrics (3.3x items, 2.3x distance, 15.3x faster milestones) are measured outcomes, not quantities defined in terms of themselves or obtained by renaming inputs. Self-citations, if present in the full text, are not load-bearing for any central claim; the architecture is described directly and the results are externally falsifiable via reproduction. This is a standard empirical systems paper with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the pre-existing capabilities of GPT-4 and the Minecraft simulator; no new free parameters, axioms beyond standard LLM prompting assumptions, or invented entities are introduced.

axioms (1)
  • domain assumption Large language models accessed via black-box prompting can iteratively generate and correct executable code for controlling an embodied agent when given environment feedback and execution errors.
    This assumption underpins the iterative prompting mechanism and the claim of autonomous skill improvement.

pith-pipeline@v0.9.0 · 5544 in / 1385 out tokens · 30367 ms · 2026-05-10T13:06:05.697222+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • HierarchyEmergence, HierarchyForcing hierarchy_emergence_forces_phi echoes

    The skills developed by VOYAGER are temporally extended, interpretable, and compositional, which compounds the agent’s abilities rapidly and alleviates catastrophic forgetting.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Continual Harness: Online Adaptation for Self-Improving Foundation Agents

    cs.LG 2026-05 conditional novelty 8.0

    Continual Harness automates online self-improvement for foundation-model embodied agents by refining prompts, sub-agents, skills, and memory within one run, cutting button-press costs on Pokemon Red and Emerald and cl...

  2. Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models

    cs.CV 2026-05 unverdicted novelty 8.0

    Flame3D enables zero-shot compositional 3D scene reasoning by representing scenes as editable visual-textual memories exposed to agentic MLLMs through composable and synthesizable spatial tools.

  3. ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

    cs.CR 2026-05 unverdicted novelty 8.0

    ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.

  4. RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    RS-Claw enables remote sensing agents to actively explore tools via hierarchical skill trees, achieving up to 86% token compression and outperforming flat registration and RAG baselines on Earth-Bench.

  5. AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

    cs.SE 2026-05 conditional novelty 7.0

    10.7% of passing SWE-agent trajectories are Lucky Passes with chaotic behaviors, and a quality score based on process references changes model rankings across eight backends.

  6. State-Centric Decision Process

    cs.AI 2026-05 unverdicted novelty 7.0

    SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

  7. GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

    cs.LG 2026-05 unverdicted novelty 7.0

    GEAR reshapes GRPO trajectory advantages using divergence signals from a ground-truth-conditioned teacher to create adaptive token- and segment-level credit regions.

  8. Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

    cs.AI 2026-05 unverdicted novelty 7.0

    VLATIM benchmark reveals large VLMs excel at high-level planning in physics puzzles but struggle with precise visual grounding and mouse control, so they lack human-like problem-solving capabilities.

  9. MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

    cs.AI 2026-05 unverdicted novelty 7.0

    MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

  10. TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning

    cs.AI 2026-05 unverdicted novelty 7.0

    TimeClaw is an exploratory execution learning system that turns multiple valid tool-use paths into hierarchical distilled experience for improved time-series reasoning without test-time adaptation.

  11. EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

    cs.AI 2026-05 unverdicted novelty 7.0

    EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...

  12. FORTIS: Benchmarking Over-Privilege in Agent Skills

    cs.AI 2026-05 unverdicted novelty 7.0

    FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.

  13. Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

    cs.SE 2026-05 conditional novelty 7.0

    SkillGuard extracts executable environment contracts from LLM skill documents to detect only relevant drifts, reporting zero false positives on 599 cases, 100% precision in known-drift tests, and raising one-round rep...

  14. RewardHarness: Self-Evolving Agentic Post-Training

    cs.AI 2026-05 unverdicted novelty 7.0

    RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

  15. MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents

    cs.RO 2026-05 unverdicted novelty 7.0

    MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency...

  16. Tools as Continuous Flow for Evolving Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    FlowAgent models tool chaining as continuous latent trajectory generation with conditional flow matching to deliver global planning, formal utility bounds, and better robustness on long-horizon tasks, plus a new plan-...

  17. When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

    cs.AI 2026-05 unverdicted novelty 7.0

    A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

  18. Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent

    cs.AI 2026-05 unverdicted novelty 7.0

    AIDA is the first end-to-end autonomous agent that combines a domain-specific language with Pareto-guided reinforcement learning to discover insights from complex business data.

  19. Agentick: A Unified Benchmark for General Sequential Decision-Making Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.

  20. SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.

  21. More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

    cs.AI 2026-05 conditional novelty 7.0

    Full factorial testing of five LLM agent components reveals that the complete 'All-In' combination is consistently outperformed by smaller subsets due to cross-component interference, with optimal subsets being task- ...

  22. Inference-Time Budget Control for LLM Search Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.

  23. Belief Memory: Agent Memory Under Partial Observability

    cs.AI 2026-05 unverdicted novelty 7.0

    BeliefMem stores multiple candidate conclusions with probabilities in agent memory and updates them via Noisy-OR rules to preserve uncertainty under partial observability.

  24. Belief Memory: Agent Memory Under Partial Observability

    cs.AI 2026-05 unverdicted novelty 7.0

    BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines o...

  25. ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

    cs.CL 2026-05 unverdicted novelty 7.0

    LLM reasoning traces can be compiled into reusable symbolic solvers that achieve high accuracy on program synthesis benchmarks at zero inference cost and transfer to other domains.

  26. SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting

    cs.NI 2026-05 unverdicted novelty 7.0

    SADE encodes a Cisco-style phase-gated diagnostic policy into LLM agents, delivering a 37-point root-cause F1 gain on the NIKA benchmark with 22 points attributable to the policy itself.

  27. Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems

    cs.AI 2026-05 unverdicted novelty 7.0

    A foresight-based local purification method using multi-persona simulations and recursive diagnosis reduces infectious jailbreak spread in multi-agent systems from over 95% to below 5.47% while matching benign perform...

  28. Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

    cs.SE 2026-04 unverdicted novelty 7.0

    Comet-H orchestrates LLMs via deficit-scoring prompt selection and half-life task tracking to co-evolve research software components, demonstrated by a static analysis tool reaching F1=0.768 versus a 0.364 baseline.

  29. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 7.0

    A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...

  30. Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

    cs.CL 2026-04 unverdicted novelty 7.0

    AHE automates coding-agent harness evolution via component, experience, and decision observability, raising Terminal-Bench 2 pass@1 from 69.7% to 77.0% with transfer gains across models and benchmarks.

  31. Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

    cs.GR 2026-04 unverdicted novelty 7.0

    Cutscene Agent uses a multi-agent LLM system and a new toolkit for game engine control to automate end-to-end 3D cutscene generation, evaluated on the introduced CutsceneBench.

  32. Skill Retrieval Augmentation for Agentic AI

    cs.CL 2026-04 unverdicted novelty 7.0

    Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.

  33. Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

    cs.SE 2026-04 unverdicted novelty 7.0

    TraceToChain models LLM agent traces as absorbing DTMCs using automatic clustering and smoothed MLE, with KS and AIC validation, to reconcile pass@k, pass^k, and RDC as projections of a single first-passage success-ti...

  34. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  35. AEL: Agent Evolving Learning for Open-Ended Environments

    cs.CL 2026-04 conditional novelty 7.0

    AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechan...

  36. The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook

    cs.CY 2026-04 unverdicted novelty 7.0

    Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.

  37. Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

    cs.AI 2026-04 unverdicted novelty 7.0

    COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.

  38. Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

    cs.CR 2026-04 unverdicted novelty 7.0

    AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new z...

  39. Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes

    cs.CV 2026-04 unverdicted novelty 7.0

    Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.

  40. Structural Verification for Reliable EDA Code Generation without Tool-in-the-Loop Debugging

    cs.SE 2026-04 unverdicted novelty 7.0

    Structural dependency graphs and staged pre-execution verification raise LLM-based EDA code pass rates to 82.5% (single-step) and 70-84% (multi-step) while halving tool calls by catching dependency violations before runtime.

  41. AI scientists produce results without reasoning scientifically

    cs.AI 2026-04 conditional novelty 7.0

    LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.

  42. SAGER: Self-Evolving User Policy Skills for Recommendation Agent

    cs.IR 2026-04 unverdicted novelty 7.0

    SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to mem...

  43. Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

    cs.LG 2026-04 unverdicted novelty 7.0

    RL expands the capability boundary of LLM agents on compositional tool-use tasks, shown by non-converging pass curves at large k with increasing T, while SFT regresses it and the effect is absent on simpler tasks.

  44. SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation

    cs.CV 2026-04 unverdicted novelty 7.0

    SemiFA is a four-agent LangGraph pipeline that combines DINOv2 and LLaVA image analysis with SECS/GEM telemetry and vector retrieval to produce complete FA reports in 48 seconds.

  45. Exploration and Exploitation Errors Are Measurable for Language Model Agents

    cs.AI 2026-04 unverdicted novelty 7.0

    A policy-agnostic metric and controllable 2D grid environments with task DAGs enable measurement of exploration and exploitation errors in language model agents from observed actions.

  46. Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution

    cs.RO 2026-04 unverdicted novelty 7.0

    A runtime governance framework for embodied agents achieves 96.2% interception of unauthorized actions and 91.4% recovery success in 1000 simulation trials by externalizing policy enforcement.

  47. ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback

    cs.AI 2026-04 unverdicted novelty 7.0

    ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.

  48. GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

    cs.AI 2026-04 unverdicted novelty 7.0

    GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.

  49. SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

    cs.AI 2026-04 unverdicted novelty 7.0

    SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.

  50. MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

    cond-mat.mtrl-sci 2026-04 conditional novelty 7.0

    MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and R...

  51. C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

    cs.AI 2026-03 unverdicted novelty 7.0

    C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

  52. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

    cs.LG 2024-01 conditional novelty 7.0

    Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

  53. VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

    cs.RO 2023-07 unverdicted novelty 7.0

    VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

  54. Mastering Diverse Domains through World Models

    cs.AI 2023-01 unverdicted novelty 7.0

    DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

  55. Cognifold: Always-On Proactive Memory via Cognitive Folding

    cs.AI 2026-05 unverdicted novelty 6.0

    Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...

  56. MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-...

  57. When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

    stat.ML 2026-05 unverdicted novelty 6.0

    A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.

  58. PriorZero: Bridging Language Priors and World Models for Decision Making

    cs.LG 2026-05 unverdicted novelty 6.0

    PriorZero uses root-only LLM prior injection in MCTS and alternating world-model training with LLM fine-tuning to raise exploration efficiency and final performance on Jericho text games and BabyAI gridworlds.

  59. SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

    cs.CR 2026-05 unverdicted novelty 6.0

    SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.

  60. Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    SLIM dynamically optimizes active external skills in agentic RL via leave-one-skill-out marginal contribution estimates and three lifecycle operations, outperforming baselines by 7.1% on ALFWorld and SearchQA while sh...

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · cited by 142 Pith papers · 23 internal anchors

  1. [1]

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv: Arxiv-1712.05474, 2017

  2. [2]

    Habitat: A platform for embodied AI research

    Manolis Savva, Jitendra Malik, Devi Parikh, Dhruv Batra, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, and Vladlen Koltun. Habitat: A platform for embodied AI research. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 9...

  3. [3]

    robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

    Yuke Zhu, Josiah Wong, Ajay Mandlekar, and Roberto Martín-Martín. robosuite: A mod- ular simulation framework and benchmark for robot learning. arXiv preprint arXiv: Arxiv- 2009.12293, 2020

  4. [4]

    Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, and Silvio Savarese

    Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, and Silvio Savarese. Interactive gibson benchmark (igibson 0.5): A benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv: Arxiv-1910.14442, 2019

  5. [5]

    arXiv preprint arXiv:2012.02924 (2020),https: //arxiv.org/abs/2012.029243

    Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D’Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. igibson 1.0: a simulation environment for interactive tasks in large realistic scenes. arXiv preprint arXiv: Arxiv-2012.02...

  6. [6]

    Reinforcement learning in robotics: A survey

    Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013

  7. [7]

    Deep reinforcement learning: A brief survey

    Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017

  8. [8]

    Video pretraining (vpt): Learning to act by watching unlabeled online videos

    Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. Video pretraining (vpt): Learning to act by watching unlabeled online videos. arXiv preprint arXiv: Arxiv-2206.11795, 2022

  9. [9]

    Creating multimodal interactive agents with imitation and self-supervised learning

    DeepMind Interactive Agents Team, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy Scully, Tamara von Glehn, Greg Wayne, Nathaniel Won...

  10. [10]

    Alphastar: Mastering the real-time strategy game starcraft ii

    Oriol Vinyals, Igor Babuschkin, Junyoung Chung, Michael Mathieu, Max Jaderberg, Wo- jciech M Czarnecki, Andrew Dudzik, Aja Huang, Petko Georgiev, Richard Powell, et al. Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind blog, 2, 2019

  11. [11]

    Stanley, and Jeff Clune

    Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv: Arxiv-1901.10995, 2019

  12. [12]

    Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm

    Joost Huizinga and Jeff Clune. Evolving multimodal robot behavior via many stepping stones with the combinatorial multiobjective evolutionary algorithm. Evolutionary computation , 30(2):131–164, 2022

  13. [13]

    Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeffrey Clune, and Kenneth O. Stanley. Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedin...

  14. [14]

    Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

    Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, and Jeff Clune. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv: Arxiv-2106.14876, 2021

  15. [15]

    Bayen, Stuart Russell, Andrew Critch, and Sergey Levine

    Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre M. Bayen, Stuart Russell, Andrew Critch, and Sergey Levine. Emergent complexity and zero-shot transfer via unsupervised environment design. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 3...

  16. [16]

    Code as policies: Language model programs for embodied control.arXiv preprint arXiv:2209.07753,

    Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. arXiv preprint arXiv: Arxiv-2209.07753, 2022

  17. [17]

    Shao-Hua Sun, Te-Lin Wu, and Joseph J. Lim. Program guided agent. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020

  18. [18]

    Proto: Program-guided transformer for program-guided tasks

    Zelin Zhao, Karan Samel, Binghong Chen, and Le Song. Proto: Program-guided transformer for program-guided tasks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021...

  19. [19]

    Vima: General robot manipu- lation with multimodal prompts

    Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi (Jim) Fan. Vima: General robot manipu- lation with multimodal prompts. ARXIV .ORG, 2022. 12

  20. [20]

    Cliport: What and where pathways for robotic manipulation

    Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. arXiv preprint arXiv: Arxiv-2109.12098, 2021

  21. [21]

    SECANT: self-expert cloning for zero-shot generalization of visual policies

    Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, and Animashree Anandkumar. SECANT: self-expert cloning for zero-shot generalization of visual policies. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings o...

  22. [22]

    Progprompt: Generating situ- ated robot task plans using large language models,

    Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv: Arxiv-2209.11302, 2022

  23. [23]

    doi:10.48550/arXiv.2206.08853 , author =

    Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv: Arxiv-2206.08853, 2022

  24. [24]

    So- cratic models: Composing zero-shot multimodal reasoning with language

    Andy Zeng, Adrian Wong, Stefan Welker, Krzysztof Choromanski, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, and Pete Florence. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv: Arxiv-2204.00598, 2022

  25. [25]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Le...

  26. [26]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv: Arxiv- 2207.05608, 2022

  27. [27]

    Language models as zero- shot planners: Extracting actionable knowledge for embodied agents

    Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero- shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors,International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, U...

  28. [28]

    Significant-gravitas/auto-gpt: An experimental open-source attempt to make gpt-4 fully au- tonomous., 2023

  29. [29]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv: Arxiv-2210.03629, 2022

  30. [30]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv: Arxiv-2303.11366, 2023

  31. [31]

    Part, Christopher Kanan, and Stefan Wermter

    German Ignacio Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019

  32. [32]

    A comprehensive survey of continual learning: Theory, method and application, 2024 a

    Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv: Arxiv-2302.00487, 2023

  33. [33]

    Playing Atari with Deep Reinforcement Learning

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv: Arxiv-1312.5602, 2013. 13

  34. [34]

    OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józe- fowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, I...

  35. [35]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report. arXiv preprint arXiv: Arxiv-2303.08774, 2023

  36. [36]

    Emergent Abilities of Large Language Models

    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models. arXiv preprint arXiv: Arxiv-2206.07682, 2022

  37. [37]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

  38. [38]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020

  39. [39]

    Diversity is all you need: Learning skills without a reward function

    Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019

  40. [40]

    Stanley, and Jeff Clune

    Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in N...

  41. [41]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  42. [42]

    Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O. Stanley. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv: Arxiv-1901.01753, 2019

  43. [43]

    Auto- matic curriculum learning for deep RL: A short survey

    Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, and Pierre-Yves Oudeyer. Auto- matic curriculum learning for deep RL: A short survey. In Christian Bessiere, editor,Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4819–4825. ijcai.org, 2020. 14

  44. [44]

    Intrinsically motivated goal exploration processes with automatic curriculum learning

    Sébastien Forestier, Rémy Portelas, Yoan Mollard, and Pierre-Yves Oudeyer. Intrinsically motivated goal exploration processes with automatic curriculum learning. The Journal of Machine Learning Research, 23(1):6818–6858, 2022

  45. [45]

    Tenenbaum

    Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B. Tenenbaum. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv: Arxiv-2006.08381, 2020

  46. [46]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv: Arxiv-2201.11903, 2022

  47. [47]

    Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu

    V olodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep rein- forcement learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, US...

  48. [48]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv: Arxiv-1707.06347, 2017

  49. [49]

    Lillicrap, Jonathan J

    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun, editors,4th International Conference on Learning Repre- sentations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceed...

  50. [50]

    Introducing chatgpt, 2022

  51. [51]

    New and improved embedding model, 2022

  52. [52]

    Prismarinejs/mineflayer: Create minecraft bots with a powerful, stable, and high level javascript api., 2013

    PrismarineJS. Prismarinejs/mineflayer: Create minecraft bots with a powerful, stable, and high level javascript api., 2013

  53. [53]

    Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling

    Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hanna Hajishirzi, Sameer Singh, and Roy Fox. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. ARXIV .ORG, 2023

  54. [54]

    Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction

    Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, and Yitao Liang. Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. arXiv preprint arXiv: Arxiv-2301.10034, 2023

  55. [55]

    Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents

    Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv: Arxiv-2302.01560, 2023

  56. [56]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv: Arxiv-2303.12712, 2023

  57. [57]

    arXiv preprint arXiv:2304.01852 (2023) 29

    Yiheng Liu, Tianle Han, Siyuan Ma, Jiayue Zhang, Yuanyuan Yang, Jiaming Tian, Hao He, Antong Li, Mengshen He, Zhengliang Liu, Zihao Wu, Dajiang Zhu, Xiang Li, Ning Qiang, Dingang Shen, Tianming Liu, and Bao Ge. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv: Arxiv-2304.01852, 2023

  58. [58]

    Prismer: A vision-language model with an ensemble of experts

    Shikun Liu, Linxi Fan, Edward Johns, Zhiding Yu, Chaowei Xiao, and Anima Anandkumar. Prismer: A vision-language model with an ensemble of experts. arXiv preprint arXiv: Arxiv- 2303.02506, 2023. 15

  59. [59]

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. Palm-e: An embodied ...

  60. [60]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models. arXiv preprint arXiv: Arxiv-2302.13971, 2023

  61. [61]

    Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov

    William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: A large-scale dataset of minecraft demonstrations. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019 , pages 2442–244...

  62. [62]

    William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, and Phillip Wang. The minerl 2019 competition on sample efficient re- inforcement learning using human priors. arXiv preprint arXiv: Arxiv-1904.10079, 2019

  63. [63]

    William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov, John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, and Oriol Vinyals. The minerl 2020 competition on sample efficient reinforcement learning using human priors...

  64. [64]

    Minerl diamond 2021 competition: Overview, results, and lessons learned

    Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Jun- you Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, and Aleksei Shpilman. Minerl diamond 2021 compe...

  65. [65]

    The malmo platform for artificial intelligence experimentation

    Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial intelligence experimentation. In Subbarao Kambhampati, editor, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 4246–4247. IJCAI/AAAI Press, 2016

  66. [66]

    Juewu-mc: Playing minecraft with sample-efficient hierarchical reinforcement learning

    Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, and Wei Yang. Juewu-mc: Playing minecraft with sample-efficient hierarchical reinforcement learning. arXiv preprint arXiv: Arxiv-2112.04907, 2021

  67. [67]

    Seihai: A sample-efficient hierarchical ai for the minerl competition

    Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie Wu, Jianye Hao, Dong Li, and Pingzhong Tang. Seihai: A sample-efficient hierarchical ai for the minerl competition. arXiv preprint arXiv: Arxiv-2111.08857, 2021

  68. [68]

    Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, and Aleksandr I. Panov. Hierarchical deep q-network from imperfect demonstrations in minecraft. Cogn. Syst. Res., 65:74–78, 2021

  69. [69]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv: Arxiv-2301.04104, 2023

  70. [70]

    Craft an iron sword: Dy- namically generating interactive game characters by prompting large language models tuned on code

    Ryan V olum, Sudha Rao, Michael Xu, Gabriel DesGarennes, Chris Brockett, Benjamin Van Durme, Olivia Deng, Akanksha Malhotra, and Bill Dolan. Craft an iron sword: Dy- namically generating interactive game characters by prompting large language models tuned on code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), page...

  71. [71]

    Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks.arXiv preprint arXiv:2303.16563, 2(5), 2023

    Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, and Zongqing Lu. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv: 2303.16563, 2023. 16

  72. [72]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stef...

  73. [73]

    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradb...

  74. [74]

    Scaling Instruction-Finetuned Language Models

    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean,...

  75. [75]

    A survey of embodied AI: from simulators to research tasks

    Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerg. Top. Comput. Intell., 6(2):230–244, 2022

  76. [76]

    X., Chernova, S., Davison, A

    Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, and Hao Su. Rearrangement: A challenge for embodied ai. arXiv preprint arXiv: Arxiv-2011.01975, 2020

  77. [77]

    Recent advances in robot learning from demonstration

    Harish Ravichandar, Athanasios S Polydoros, Sonia Chernova, and Aude Billard. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 3:297–330, 2020

  78. [78]

    A review of physics simulators for robotic applications

    Jack Collins, Shelvin Chand, Anthony Vanderkop, and David Howard. A review of physics simulators for robotic applications. IEEE Access, 9:51416–51431, 2021

  79. [79]

    Salakhutdi- nov

    So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, and R. Salakhutdi- nov. Film: Following instructions in language with modular methods. International Conference on Learning Representations, 2021. 17

  80. [80]

    A persistent spatial semantic representation for high-level natural language instruction execution

    Valts Blukis, Chris Paxton, Dieter Fox, Animesh Garg, and Yoav Artzi. A persistent spatial semantic representation for high-level natural language instruction execution. In 5th Annual Conference on Robot Learning, 2021

Showing first 80 references.