The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

· 2026 · cs.CL · arXiv 2602.02315

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language models (LLMs) form implicit beliefs (posteriors over latent variables) from prompts, but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 infers the parameters of a normal distribution from in-context samples. We show that parameter posteriors are encoded as curved manifolds in representation space, and trace how they evolve along the prompt. Standard linear steering moves representations off-manifold, inducing unintended, coupled changes, whereas geometry-aware methods preserve the target belief family. Our work demonstrates an example of linear field probing (LFP) as a principled approach to tile the data manifold and make interventions that respect the underlying geometry. Our results suggest that LLM beliefs are inherently geometric objects, and that globally linear representations are often inadequate abstractions.

representative citing papers

Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames cs.LG · 2026-05-28 · unverdicted · none · ref 24 · internal anchor
Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

fields

years

verdicts

representative citing papers

citing papers explorer