Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

· 2018 · cs.CL · arXiv 1809.00786

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a language-conditioned image generation network, and then generates the actions required to complete them. Our model is trained from demonstration only without external resources. To evaluate our approach, we introduce two benchmarks for instruction following: LANI, a navigation task; and CHAI, where an agent executes household instructions. Our evaluation demonstrates the advantages of our model decomposition, and illustrates the challenges posed by our new benchmarks.

representative citing papers

Sentinel: Embodied Cooperative Spatial Reasoning and Planning

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

Introduces Sentinel Challenge benchmark and CoSaR framework for cooperative spatial reasoning and planning among 3-5 decentralized embodied agents across 14 city-scale scenes.

citing papers explorer

Showing 1 of 1 citing paper.

Sentinel: Embodied Cooperative Spatial Reasoning and Planning cs.CV · 2026-05-25 · unverdicted · none · ref 30 · internal anchor
Introduces Sentinel Challenge benchmark and CoSaR framework for cooperative spatial reasoning and planning among 3-5 decentralized embodied agents across 14 city-scale scenes.

Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

fields

years

verdicts

representative citing papers

citing papers explorer