hub Canonical reference

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch · 2022 · arXiv 2201.07207

Canonical reference. 86% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 86% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 baseline 1

citation-polarity summary

background 6 baseline 1

representative citing papers

Code as Policies: Language Model Programs for Embodied Control

cs.RO · 2022-09-16 · accept · novelty 8.0

Language models generate robot policy code from natural language commands via few-shot prompting, enabling spatial-geometric reasoning, generalization, and precise control on real robots.

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

cs.CV · 2025-06-26 · unverdicted · novelty 7.0

FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

cs.RO · 2022-04-04 · accept · novelty 7.0

SayCan combines an LLM's high-level semantic knowledge with robot skill value functions to select only feasible actions, enabling completion of abstract natural-language instructions on a real mobile manipulator.

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

cs.CV · 2022-04-01 · unverdicted · novelty 7.0

Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.

How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.

LoopTrap: Termination Poisoning Attacks on LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

cs.RO · 2025-02-27 · accept · novelty 6.0

OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

cs.AI · 2023-03-31 · conditional · novelty 6.0

CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.

ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

cs.RO · 2022-09-22 · unverdicted · novelty 6.0

ProgPrompt generates situated robot task plans by prompting LLMs with program-like specifications of actions, objects, and executable examples, achieving state-of-the-art success in VirtualHome tasks and physical robot deployment.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

MORN: Metacognitive Object-Goal Regulation for Resource-Rational Long-Horizon Navigation

cs.RO · 2026-05-16 · unverdicted · novelty 5.0

MORN augments frozen VLM-based object navigation agents with a System 2 meta-controller using Potentiality Index, Persistence Gating, and Evidence Accumulation to improve goal completion rate from 0.23 to 0.30 and reduce wasted steps on the HM3D dataset.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 37
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 39
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer