Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.
Revisiting the othello world model hypothesis.arXiv preprint arXiv:2503.04421,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners
Supervised fine-tuning lets LLMs linearly encode action validity and state predicates, with broader state-space coverage during training improving world-model recovery.