Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.
Emergent world models and latent variable estimation in chess-playing language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
In the Flux environment, RL agents with explicit latent state access achieve ~79% win rate versus ~11% for LLMs on long-horizon tasks, illustrating limitations of sequence prediction for dynamic reasoning.
citing papers explorer
-
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners
Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.
-
Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform
In the Flux environment, RL agents with explicit latent state access achieve ~79% win rate versus ~11% for LLMs on long-horizon tasks, illustrating limitations of sequence prediction for dynamic reasoning.