Introduces Sentinel Challenge benchmark and CoSaR framework for cooperative spatial reasoning and planning among 3-5 decentralized embodied agents across 14 city-scale scenes.
Clip on wheels: Zero-shot object navigation as object localization and exploration
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
representative citing papers
ViTL uses an LLM to translate natural language into LTL for DFA-coordinated VLM navigation, enabling zero-shot long-horizon tasks with temporal constraints on HM3D.
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
ReFineVLA adds teacher-generated reasoning steps to VLA training and reports state-of-the-art success rates on SimplerEnv WidowX and Google Robot benchmarks.
citing papers explorer
-
Sentinel: Embodied Cooperative Spatial Reasoning and Planning
Introduces Sentinel Challenge benchmark and CoSaR framework for cooperative spatial reasoning and planning among 3-5 decentralized embodied agents across 14 city-scale scenes.