arxiv: 2604.23985 · v1 · submitted 2026-04-27 · 💻 cs.AI · cs.CL· cs.LG

Recognition: unknown

Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

Jack King , Evelina Fedorenko , Eghbal A. Hosseini

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:42 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords large language modelsrepresentational geometrycontextual curvaturenext-token entropytrajectory straighteningbehavioral uncertaintyperturbation experimentsautoregressive models

0 comments

The pith

Contextual curvature of representational trajectories in LLMs correlates with and causally modulates next-token entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a direct connection between a geometric property of model internals and observable prediction behavior. It shows that how sharply the hidden-state trajectory bends over recent context tracks the uncertainty in the model's next-token output. This link appears during training and survives selective perturbation tests that alter curvature in one direction but not others. Straightening representations via a regularization term during training then lowers entropy while leaving validation loss intact.

Core claim

Contextual curvature—a geometric measure of how sharply the representational trajectory bends over recent context—is correlated with next-token entropy across GPT-2 XL and Pythia-2.8B. The correlation emerges over the course of training. Trajectory-aligned perturbations that change curvature reliably alter entropy, while geometrically misaligned perturbations leave entropy unchanged. Regularizing representations to be straighter during training modestly reduces token-level entropy without degrading validation loss.

What carries the argument

Contextual curvature, the geometric measure of how sharply the representational trajectory bends over recent context, which links internal geometry to behavioral uncertainty.

If this is right

The relationship between curvature and entropy strengthens as training proceeds, indicating it is acquired rather than present from initialization.
Only perturbations that preserve the geometric alignment of the trajectory affect entropy, showing that the effect is not explained by generic changes to representation magnitude or direction.
Adding a straightening regularizer to the training objective lowers next-token entropy at no cost to validation loss.
The same curvature-entropy link appears in two architecturally distinct models, suggesting it is not an artifact of one specific network family.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If curvature can be read out from hidden states, it may offer a low-cost way to estimate local prediction uncertainty without sampling multiple tokens.
The finding raises the possibility that other geometric descriptors of trajectories, such as torsion or speed, could relate to additional behavioral quantities like calibration or hallucination rates.
Training objectives that explicitly penalize curvature might be tested as a lightweight alternative to temperature scaling or other post-hoc uncertainty adjustments.

Load-bearing premise

Trajectory-aligned interventions change only curvature and leave all other unmeasured representational properties that could affect entropy untouched.

What would settle it

If a trajectory-aligned perturbation that successfully alters measured curvature produces no change in next-token entropy, or if a misaligned perturbation produces a reliable entropy change, the claimed selective dependence would be falsified.

Figures

Figures reproduced from arXiv: 2604.23985 by Eghbal A. Hosseini, Evelina Fedorenko, Jack King.

**Figure 1.** Figure 1: (A) Schematic of the LLM architecture, focusing on the token representation in the residual stream after each transformer block. (B) Hypothesized relationship between the input, internal representation, and output for an easy-to-predict sentence (Sentence 1) versus a hard-to-predict sentence (Sentence 2). A predictable sentence is expected to have a straighter internal representation (lower curvature). Con… view at source ↗

**Figure 2.** Figure 2: Curvature and entropy relationship across layers. (A) Average contextual curvature across transformer layers in GPT-2 XL on the long-context dataset. (B) Same as (A) for Pythia-2.8B. In both models, curvature decreases from early layers and reaches a minimum in the middle of the network. (C) Predictive performance (Pearson r) of contextual curvature for next-token entropy across layers in GPT-2 XL. (D) Sam… view at source ↗

**Figure 3.** Figure 3: Training dynamics of curvature and uncertainty. (A) Layer-wise contextual curvature across training checkpoints for Pythia-2.8B on the long-context dataset. (B) Predictive performance (Pearson r) of contextual curvature for next-token entropy across layers at the same checkpoints. (C) Average curvature at the minimum-curvature layer over training. (D) Average token-level entropy over training. (E) Predicti… view at source ↗

**Figure 4.** Figure 4: Trajectory-aligned perturbations selectively couple curvature to uncertainty. (A) Schematic of the five perturbation types, ranging from unconstrained to trajectory-aligned: full-space (entire activation space), randomsubspace (random low-dimensional subspace), activation-subspace (top PCs of activations), trajectory-subspace (top PCs of token-to-token displacement vectors), and planar-subspace (plane of … view at source ↗

**Figure 5.** Figure 5: (A) Experimental design: (i) regularization is applied to the residual stream to modulate curvature, (ii) increasing curvature (Tangled) or decreasing it (Untangled), with (iii) hypothesized layer-wise curvature and (iv) hypothesized entropy effects. (B) Training loss across conditions, showing similar convergence. (C) Layer-wise curvature on the validation set. As hypothesized, the Untangled model shows l… view at source ↗

read the original abstract

In autoregressive large language models (LLMs), temporal straightening offers an account of how the next-token prediction objective shapes representations. Models learn to progressively straighten the representational trajectory of input sequences across layers, potentially facilitating next-token prediction via linear extrapolation. However, a direct link between this trajectory and token-level behavior has been missing. We provide such a link by relating contextual curvature-a geometric measure of how sharply the representational trajectory bends over recent context-to next-token entropy. Across two models (GPT-2 XL and Pythia-2.8B), contextual curvature is correlated with entropy, and this relationship emerges during training. Perturbation experiments reveal selective dependence: manipulating curvature through trajectory-aligned interventions reliably modulates entropy, while geometrically misaligned perturbations have no effect. Finally, regularizing representations to be straighter during training modestly reduces token-level entropy without degrading validation loss. These results identify trajectory curvature as a task-aligned representational feature that influences behavioral uncertainty in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links contextual curvature to next-token entropy via correlations, training emergence, selective perturbations, and regularization, but the causal isolation in the interventions is the part that needs the most scrutiny.

read the letter

The main thing here is that contextual curvature—a measure of how much the representation trajectory bends over recent tokens—tracks next-token entropy in GPT-2 XL and Pythia-2.8B. The relationship shows up during training, only trajectory-aligned perturbations move entropy while misaligned ones do not, and adding a straightness penalty in training lowers entropy without raising validation loss. That gives the work several lines of evidence instead of just one correlation plot. The selective intervention and the regularization result are the clearest additions beyond the prior straightening literature. They turn a geometric observation into something that can be manipulated and that affects a behavioral quantity. The paper does well by keeping the claim tied to concrete experiments rather than broad speculation. The soft spot is exactly the one the stress-test note flags. The selectivity claim rests on the aligned perturbations changing curvature while leaving everything else matched. The abstract does not give the curvature formula, the precise construction of the perturbations, or the checks that rule out differences in norm, direction relative to the computation graph, or other activation statistics. If those details are not tight in the full text, the entropy difference could come from something other than curvature. The correlations and training dynamics are easier to evaluate once the methods section is there, but the intervention part carries the causal weight. This is for people in mechanistic interpretability who already care about representation geometry and uncertainty. A reader working on levers for controlling model confidence or on geometric accounts of prediction will get something concrete from the experiments. It is worth sending to peer review. The core claim is specific enough that referees can check the perturbation controls and ask for the missing details without the paper being incoherent on its own terms.

Referee Report

3 major / 3 minor

Summary. The paper claims that in autoregressive LLMs, contextual curvature—a geometric measure of bending in the representational trajectory over recent context—directly modulates next-token entropy. It reports positive correlations between curvature and entropy in GPT-2 XL and Pythia-2.8B, shows the relationship emerging during training, demonstrates selective entropy modulation via trajectory-aligned perturbations (but not misaligned ones), and finds that regularizing representations toward straighter trajectories during training reduces entropy without increasing validation loss.

Significance. If the central causal link holds, the work supplies a concrete geometric account of how representational structure influences behavioral uncertainty, extending temporal straightening ideas to token-level predictions. The combination of correlational, developmental, interventional, and regularization evidence is a strength; the regularization result in particular offers a falsifiable, task-aligned manipulation with potential downstream utility for uncertainty control.

major comments (3)

[Perturbation experiments] Perturbation experiments section: the claim that trajectory-aligned interventions selectively modulate curvature (and thereby entropy) while misaligned ones do not requires explicit verification that the two perturbation classes are matched on all other statistics that could affect entropy (e.g., activation magnitude, direction relative to the residual stream, or higher-order moments). Without such controls or an ablation showing that entropy changes scale with the curvature component alone, the selective dependence could arise from unintended side effects rather than curvature per se.
[Methods] Methods / curvature definition: the exact operationalization of 'contextual curvature' (e.g., the precise formula for trajectory bending over recent context, choice of distance metric, and window size) is not stated with sufficient precision to allow independent replication or to rule out post-hoc parameter choices that could inflate the reported correlations.
[Training dynamics] Results on emergence during training: the reported emergence of the curvature-entropy correlation lacks details on statistical controls (e.g., multiple-comparison correction across layers and checkpoints, sample sizes per checkpoint, and exclusion criteria for sequences), making it difficult to assess whether the developmental pattern is robust or sensitive to analysis decisions.

minor comments (3)

[Abstract / Introduction] The abstract and introduction would benefit from a brief equation or pseudocode for the curvature metric to orient readers before the experimental claims.
[Figures] Figure captions for the perturbation results should explicitly state the number of trials, error bars (e.g., SEM or 95% CI), and whether the aligned/misaligned conditions were yoked on perturbation magnitude.
[Regularization results] The regularization experiment reports 'modest' entropy reduction; quantitative effect sizes and comparison to a matched baseline (e.g., random regularization) would strengthen the claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us identify areas for improvement. We respond to each major comment below and will incorporate the suggested revisions to enhance the clarity and rigor of the manuscript.

read point-by-point responses

Referee: [Perturbation experiments] Perturbation experiments section: the claim that trajectory-aligned interventions selectively modulate curvature (and thereby entropy) while misaligned ones do not requires explicit verification that the two perturbation classes are matched on all other statistics that could affect entropy (e.g., activation magnitude, direction relative to the residual stream, or higher-order moments). Without such controls or an ablation showing that entropy changes scale with the curvature component alone, the selective dependence could arise from unintended side effects rather than curvature per se.

Authors: We acknowledge the importance of ruling out alternative explanations for the selective effects observed in our perturbation experiments. In the revised version, we will add explicit controls and matching statistics for the aligned and misaligned perturbations, including comparisons of activation magnitudes, directions in the residual stream, and higher-order moments. Additionally, we will include an ablation analysis demonstrating that the entropy modulation scales with the curvature change induced by the perturbations. revision: yes
Referee: [Methods] Methods / curvature definition: the exact operationalization of 'contextual curvature' (e.g., the precise formula for trajectory bending over recent context, choice of distance metric, and window size) is not stated with sufficient precision to allow independent replication or to rule out post-hoc parameter choices that could inflate the reported correlations.

Authors: We agree that the methods section requires greater precision for replicability. In the revision, we will provide the exact mathematical definition of contextual curvature, including the formula for trajectory bending, the distance metric employed (Euclidean distance in the representation space), and the specific window size used for the recent context. We will also include pseudocode and details on how parameters were selected to ensure transparency. revision: yes
Referee: [Training dynamics] Results on emergence during training: the reported emergence of the curvature-entropy correlation lacks details on statistical controls (e.g., multiple-comparison correction across layers and checkpoints, sample sizes per checkpoint, and exclusion criteria for sequences), making it difficult to assess whether the developmental pattern is robust or sensitive to analysis decisions.

Authors: We appreciate the referee's point regarding the need for rigorous statistical reporting. In the updated manuscript, we will include detailed information on the statistical analyses, such as the application of multiple-comparison corrections (e.g., Bonferroni or FDR), the sample sizes for each checkpoint and layer, and the criteria used for sequence inclusion or exclusion. This will allow readers to better evaluate the robustness of the emergence pattern. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical correlations and interventions

full rationale

The paper links contextual curvature to next-token entropy via direct measurements (correlations across models and training), perturbation experiments (trajectory-aligned vs. misaligned interventions), and a regularization experiment during training. No load-bearing step reduces by the paper's own equations or self-citations to a fitted parameter, self-defined quantity, or ansatz imported from the authors' prior work. The central results are externally falsifiable via the described experiments and do not rely on renaming known patterns or uniqueness theorems. This is the expected non-finding for an empirical study whose derivation chain consists of data collection and statistical tests rather than algebraic closure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the curvature measure is described as a geometric quantity derived from existing representations, with no additional postulated objects or unstated assumptions listed.

pith-pipeline@v0.9.0 · 5468 in / 1132 out tokens · 61716 ms · 2026-05-08T03:42:11.941353+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

URL https://www.nature.com/articles/s41467-021-25939-z

doi: 10.1038/s41467-021-25939-z. URL https://www.nature.com/articles/s41467-021-25939-z . Publisher: Nature Publishing Group. Antonino Greco, Julia Moser, Hubert Preissl, and Markus Siegel. Predictive learning shapes the representational geometry of the human brain.Nature Communications, 15(1):9670, November 2024. ISSN 2041-1723. doi: 10 APREPRINT- APRIL2...

work page doi:10.1038/s41467-021-25939-z 2024
[2]

2008 , journal =

doi: 10.1016/j.cognition.2007.05.006. URL https://www.sciencedirect.com/science/article/pii/ S0010027707001436. Eghbal Hosseini and Evelina Fedorenko. Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S...

work page doi:10.1016/j.cognition.2007.05.006 2007
[3]

Understanding polysemanticity in neural networks through coding theory,

URLhttps://arxiv.org/abs/2401.17975. Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, and Ekdeep Singh Lubana. Belief dynamics reveal the dual nature of in-context learning and activation steering, 2025. URL https: //arxiv.org/abs/2511.00617. Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valerie Costa, Gr...

work page arXiv 2025
[4]

arXiv preprint arXiv:2511.01836 , year=

URLhttps://arxiv.org/abs/2511.01836. Core Francisco Park, Andrew Lee, Ekdeep Singh Lubana, Yongyi Yang, Maya Okawa, Kento Nishi, Martin Wattenberg, and Hidenori Tanaka. ICLR: In-Context Learning of Representations, May 2025. URL http://arxiv.org/abs/ 2501.00070. arXiv:2501.00070 [cs]. Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J...

work page arXiv 2025
[5]

& Balestriero, R

URLhttps://arxiv.org/abs/2602.22617. Ranganath Krishnan, Piyush Khanna, and Omesh Tickoo. Enhancing trust in large language models with uncertainty- aware fine-tuning, 2024. URLhttps://arxiv.org/abs/2412.02904. Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning, 2022. URLhttps://ar...

work page arXiv 2024