hub Canonical reference

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

· 2022 · arXiv 2201.07207

Canonical reference. 86% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 86% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 baseline 1

citation-polarity summary

background 6 baseline 1

representative citing papers

Code as Policies: Language Model Programs for Embodied Control

cs.RO · 2022-09-16 · accept · novelty 8.0

Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

cs.CV · 2025-06-26 · unverdicted · novelty 7.0

FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

cs.RO · 2022-04-04 · accept · novelty 7.0

SayCan combines an LLM's high-level semantic knowledge with robot skill value functions to select only feasible actions, enabling completion of abstract natural-language instructions on a real mobile manipulator.

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

cs.CV · 2022-04-01 · unverdicted · novelty 7.0

Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

cs.AI · 2026-06-26 · unverdicted · novelty 6.0

GILP combines a small parameterized world model with LLM agent reasoning via a consistency gate, reducing hallucinated-state rate from 0.176 to 0.035 and raising success from 0.668 to 0.838 on graph planning benchmarks.

How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.

LoopTrap: Termination Poisoning Attacks on LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

cs.RO · 2025-02-27 · accept · novelty 6.0

OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

cs.AI · 2023-03-31 · conditional · novelty 6.0

CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

cs.RO · 2022-09-22 · unverdicted · novelty 6.0

ProgPrompt generates situated robot task plans by prompting LLMs with program-like specifications of actions, objects, and executable examples, achieving state-of-the-art success in VirtualHome tasks and physical robot deployment.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

cs.AI · 2026-05-27 · unverdicted · novelty 5.0

Synthesizes existing Tree-of-Thoughts work into a unified taxonomy using classical heuristic search terminology and identifies design patterns across shallow and deep reasoning tasks.

MORN: Metacognitive Object-Goal Regulation for Resource-Rational Long-Horizon Navigation

cs.RO · 2026-05-16 · unverdicted · novelty 5.0

MORN augments frozen VLM-based object navigation agents with a System 2 meta-controller using Potentiality Index, Persistence Gating, and Evidence Accumulation to improve goal completion rate from 0.23 to 0.30 and reduce wasted steps on the HM3D dataset.

citing papers explorer

Showing 18 of 18 citing papers.

Code as Policies: Language Model Programs for Embodied Control cs.RO · 2022-09-16 · accept · none · ref 14
Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing cs.CV · 2025-06-26 · unverdicted · none · ref 15
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 104
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 28
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 28
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances cs.RO · 2022-04-04 · accept · none · ref 23
SayCan combines an LLM's high-level semantic knowledge with robot skill value functions to select only feasible actions, enabling completion of abstract natural-language instructions on a real mobile manipulator.
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language cs.CV · 2022-04-01 · unverdicted · none · ref 6
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents cs.AI · 2026-06-26 · unverdicted · none · ref 8
GILP combines a small parameterized world model with LLM agent reasoning via a consistency gate, reducing hallucinated-state rate from 0.176 to 0.035 and raising success from 0.668 to 0.838 on graph planning benchmarks.
How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning cs.RO · 2026-05-16 · unverdicted · none · ref 14
DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.
LoopTrap: Termination Poisoning Attacks on LLM Agents cs.CR · 2026-05-07 · unverdicted · none · ref 15
LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control cs.RO · 2026-02-13 · unverdicted · none · ref 38
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success cs.RO · 2025-02-27 · accept · none · ref 16
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society cs.AI · 2023-03-31 · conditional · none · ref 50
CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models cs.RO · 2022-09-22 · unverdicted · none · ref 2
ProgPrompt generates situated robot task plans by prompting LLMs with program-like specifications of actions, objects, and executable examples, achieving state-of-the-art success in VirtualHome tasks and physical robot deployment.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 37
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 39
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns cs.AI · 2026-05-27 · unverdicted · none · ref 2
Synthesizes existing Tree-of-Thoughts work into a unified taxonomy using classical heuristic search terminology and identifies design patterns across shallow and deep reasoning tasks.
MORN: Metacognitive Object-Goal Regulation for Resource-Rational Long-Horizon Navigation cs.RO · 2026-05-16 · unverdicted · none · ref 38
MORN augments frozen VLM-based object navigation agents with a System 2 meta-controller using Potentiality Index, Persistence Gating, and Evidence Accumulation to improve goal completion rate from 0.23 to 0.30 and reduce wasted steps on the HM3D dataset.

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer