pith. sign in

super hub Canonical reference

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Canonical reference. 86% of citing Pith papers cite this work as background.

153 Pith papers citing it
Background 86% of classified citations
abstract

Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/.

hub tools

citation-role summary

background 52 method 3 baseline 2 dataset 1 other 1

citation-polarity summary

claims ledger

  • abstract Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to

authors

co-cited works

representative citing papers

Mind2Web: Towards a Generalist Agent for the Web

cs.CL · 2023-06-09 · accept · novelty 8.0

Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.

PAL: Program-aided Language Models

cs.CL · 2022-11-18 · conditional · novelty 8.0

PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.

State-Centric Decision Process

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

SDP constructs a task-induced state space from raw text by having agents commit to and certify natural-language predicates as states, enabling structured planning and analysis in unstructured language environments.

Effective Explanations Support Planning Under Uncertainty

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

Semantic Area Graph Reasoning for Multi-Robot Language-Guided Search

cs.RO · 2026-04-17 · unverdicted · novelty 7.0

SAGR builds a semantic area graph from occupancy maps so LLMs can assign rooms to robots for language-guided search, staying competitive with standard exploration while improving semantic target finding by up to 18.8% in large environments.

citing papers explorer

Showing 50 of 153 citing papers.