A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.
InProceedings of the 2021 Conference of the North American Chap- ter of the Association for Computational Linguistics (NAACL-HLT)
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.
Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.
citing papers explorer
-
Spatial Reasoning via Modality Switching Between Language and Symbolic Representation
Introduces a trustworthiness-and-complexity switching metric that lets LLMs choose between language and grid modalities for spatial reasoning, yielding up to 42% gains in tested settings.
-
Einstein World Models
Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.