Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.
The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Large language models (LLMs) form implicit beliefs (posteriors over latent variables) from prompts, but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 infers the parameters of a normal distribution from in-context samples. We show that parameter posteriors are encoded as curved manifolds in representation space, and trace how they evolve along the prompt. Standard linear steering moves representations off-manifold, inducing unintended, coupled changes, whereas geometry-aware methods preserve the target belief family. Our work demonstrates an example of linear field probing (LFP) as a principled approach to tile the data manifold and make interventions that respect the underlying geometry. Our results suggest that LLM beliefs are inherently geometric objects, and that globally linear representations are often inadequate abstractions.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames
Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.