WMF-AM is a depth-parameterized benchmark that measures LLMs' cumulative state tracking ability without scratchpads, validated on 28 models across arithmetic and non-arithmetic tasks with ablations confirming the construct.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Language models encode modal categories via linear difference vectors in their activations that predict fine-grained human plausibility judgments better than prior reports suggested.
citing papers explorer
-
WMF-AM: Probing LLM Working Memory via Depth-Parameterized Cumulative State Tracking
WMF-AM is a depth-parameterized benchmark that measures LLMs' cumulative state tracking ability without scratchpads, validated on 28 models across arithmetic and non-arithmetic tasks with ablations confirming the construct.
-
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Language models encode modal categories via linear difference vectors in their activations that predict fine-grained human plausibility judgments better than prior reports suggested.