arxiv: 2605.00847 · v2 · submitted 2026-04-15 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

Angelos Ioannis Lagos, Aryan Sharma, Cutter Dawes, Shivam Raval

Pith reviewed 2026-05-10 14:03 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords hierarchical structurelanguage modelslinear probeslatent representationstree traversalmathematical reasoningmodel interpretability

0 comments

The pith

Language models encode hierarchical depth and pairwise distances in low-dimensional subspaces of their latent representations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces H-probes, linear probes that recover node depth and pairwise distances directly from language model hidden states. In synthetic tree traversal tasks the probes locate the relevant subspaces, which turn out to be low-dimensional yet necessary for correct task performance. Ablation studies show that removing these subspaces impairs accuracy, while the probes themselves generalize to new trees and even to out-of-domain examples. Weaker versions of the same depth and distance signals appear in the step-by-step traces of mathematical reasoning problems.

Core claim

H-probes recover subspaces in language model activations that contain the depth of nodes and the distances between pairs of nodes in hierarchical structures. On synthetic tree traversal problems these subspaces are low-dimensional, their targeted removal degrades performance, and the probes remain effective across different trees and even on out-of-distribution examples. The same kind of signal, though attenuated, appears in the reasoning traces of mathematical problems.

What carries the argument

H-probes: a collection of linear probes trained to predict depth and pairwise distance from latent representations.

If this is right

The hierarchy subspaces are low-dimensional.
Removing them reduces task accuracy, showing causal importance.
The subspaces support generalization both within and across domains.
Similar hierarchical signals exist, more faintly, in mathematical reasoning traces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the subspaces are causally important, targeted interventions on them could steer the model's hierarchical decisions.
The linear extractability suggests that other abstract relations such as causality or ordering may be isolable by similar probes.
Success on synthetic trees raises the question of whether the same subspaces remain critical when models handle richer, real-world hierarchies.

Load-bearing premise

Depth and pairwise distance capture the hierarchical information that models use for reasoning, and this information is linearly readable from the activations.

What would settle it

The probes fail to predict depth or distance above chance on held-out synthetic trees, or ablating the identified dimensions produces no measurable drop in task accuracy.

Figures

Figures reproduced from arXiv: 2605.00847 by Angelos Ioannis Lagos, Aryan Sharma, Cutter Dawes, Shivam Raval.

**Figure 2.** Figure 2: Distance and depth is well-described by the learned H-probes framework in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Subspaces identified by both the distance and depth probes are highly similar [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Out-of-distribution transfer from probes trained on depth 1–2 trees to held-out [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: H-probes performance co-occurs with task success. Left, the overall task accuracy [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Ablating the distance probe subspace affects model accuracy to a similar extent as [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: PCA component sweep for the 14B reasoning model. Increasing the number of [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Representative examples of hierarchical representations identified by our H [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Layer-sweep logit shifts under ablation of the H-probe subspace versus baselines. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Layerwise probe statistics; reasoning 1.5B and 7B; distance and depth metrics [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Layerwise probe statistics; reasoning 14B and chat 1.8B; distance and depth [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Layerwise probe statistics; chat 7B and 14B; distance and depth metrics over [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Layer-sweep logit shifts; reasoning 1.5B and 7B; node-logit sensitivity across [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Layer-sweep logit shifts; chat 14B; probe-aligned sensitivity versus controls [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Ablation results; reasoning 1.5B; accuracy and node-logit effects under probe and [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Ablation results; reasoning 7B; accuracy and node-logit effects under probe and [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Ablation results; chat 1.8B; accuracy and node-logit effects under probe and [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 18.** Figure 18: Ablation results; chat 7B; accuracy and node-logit effects under probe and control [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗

**Figure 19.** Figure 19: Ablation results; chat 14B; accuracy and node-logit effects under probe and [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗

**Figure 20.** Figure 20: Exact-only ablation results on GSM8K for the 14B reasoning model (left) and [PITH_FULL_IMAGE:figures/full_fig_p023_20.png] view at source ↗

**Figure 21.** Figure 21: Layer-wise H-probe performance on GSM8K for the 14B reasoning model (left) [PITH_FULL_IMAGE:figures/full_fig_p024_21.png] view at source ↗

**Figure 22.** Figure 22: Similarity of distance subspaces and depth directions across five disjoint GSM8K [PITH_FULL_IMAGE:figures/full_fig_p024_22.png] view at source ↗

read the original abstract

Representing and navigating hierarchy is a fundamental primitive of reasoning. Large language models have demonstrated proficiency in a wide variety of tasks requiring hierarchical reasoning, but there exists limited analysis on how the models geometrically represent the necessary latent constructions for such thinking. To this end, we develop H-probes, a collection of linear probes that extract hierarchical structure, specifically depth and pairwise distance, from latent representations. In synthetic tree traversal tasks, the H-probes robustly find the subspaces containing hierarchical structure necessary to complete the tasks; furthermore, in comprehensive ablation experiments, we show that these hierarchy-containing subspaces are low-dimensional, causally important for high task performance, and generalize within- and out-of-domain. Furthermore, we find analogous, though weaker, hierarchical structure in real-world hierarchical contexts such as mathematical reasoning traces. These results demonstrate that models represent hierarchy not only at the level of syntax and concepts, but at deeper levels of abstraction -- including the reasoning process itself.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces H-probes, linear probes for extracting hierarchical structure (specifically depth and pairwise distance) from LLM latent representations. It reports that in synthetic tree traversal tasks these probes identify low-dimensional subspaces containing the necessary hierarchy, that ablation experiments demonstrate these subspaces are causally important for task performance and generalize within- and out-of-domain, and that weaker analogous structure appears in real mathematical reasoning traces.

Significance. If the results hold, particularly the causal necessity of the identified subspaces, the work would provide concrete geometric evidence that LLMs encode hierarchy at the level of reasoning processes rather than only syntax or concepts. The use of controlled synthetic tasks combined with ablation-based tests of importance and generalization is a methodological strength that could inform both interpretability research and efforts to enhance hierarchical reasoning in models.

major comments (1)

[Ablation experiments] Ablation experiments: the claim that the low-dimensional subspaces identified by H-probes are causally important for task performance rests on performance drops after projection or zeroing along the probe directions. However, these interventions are not compared against controls that remove random or non-hierarchical subspaces of matched dimensionality while preserving overall representation norm and other linear features. Without such controls, the observed drops could result from generic dimensionality reduction rather than specific removal of hierarchy, which is load-bearing for the central causal claim especially in the synthetic tree tasks where the input explicitly encodes tree structure and multiple redundant pathways may exist.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a brief explicit statement of how H-probes are trained (e.g., supervision signal, loss, and whether they are trained on the same data used for the downstream tasks).
[Figures] Figure captions and axis labels for the ablation plots should include error bars or confidence intervals and state the number of random seeds or runs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of our work's potential significance and for the constructive major comment on the ablation experiments. We address the point below and will revise the manuscript accordingly to strengthen the causal claims.

read point-by-point responses

Referee: Ablation experiments: the claim that the low-dimensional subspaces identified by H-probes are causally important for task performance rests on performance drops after projection or zeroing along the probe directions. However, these interventions are not compared against controls that remove random or non-hierarchical subspaces of matched dimensionality while preserving overall representation norm and other linear features. Without such controls, the observed drops could result from generic dimensionality reduction rather than specific removal of hierarchy, which is load-bearing for the central causal claim especially in the synthetic tree tasks where the input explicitly encodes tree structure and multiple redundant pathways may exist.

Authors: We appreciate the referee identifying this gap in our ablation analysis. The referee is correct that the manuscript does not currently include controls ablating random or non-hierarchical subspaces of matched dimensionality (while preserving norm), which leaves open the possibility that observed performance drops stem from generic dimensionality reduction effects rather than specific removal of hierarchical structure. To address this directly, we will add new control experiments in the revised manuscript. These will involve sampling random directions in the activation space of the same dimensionality as the H-probe subspaces (e.g., via Gaussian sampling followed by orthogonalization to the probe directions where needed), applying identical projection and zeroing interventions, and comparing the resulting task performance drops against those from the hierarchy-specific directions. We will also preserve overall representation norms in the controls as suggested. This will provide a direct test of specificity. In the synthetic tree tasks, while inputs encode tree structure and redundant pathways may exist, the added controls will quantify whether the H-probe directions produce larger drops than random ones, thereby supporting the causal role of the identified subspaces. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical probing methodology

full rationale

The paper develops linear H-probes to extract depth and pairwise distance from latent representations, then evaluates them via experiments on synthetic tree tasks and ablations for dimensionality, causal importance, and generalization. These steps consist of standard probe training followed by intervention-based testing on task performance; no load-bearing claim reduces by construction to a fitted parameter or self-citation chain. The central results rest on external metrics (accuracy drops, generalization scores) rather than any self-definitional loop, satisfying the criteria for a self-contained empirical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a new probing method; no explicit free parameters, additional axioms, or invented entities beyond the probes themselves are detailed in the provided abstract.

axioms (1)

domain assumption Hierarchical structure in reasoning tasks is linearly representable in model latent spaces
Core premise enabling the use of linear probes for depth and distance extraction

invented entities (1)

H-probes no independent evidence
purpose: Linear probes to extract depth and pairwise distance from latent representations
New method introduced to measure hierarchical structure

pith-pipeline@v0.9.0 · 5471 in / 1247 out tokens · 45626 ms · 2026-05-10T14:03:11.357512+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 20 canonical work pages · 11 internal anchors

[1]

Generalization from starvation: Hints of universality in llm knowledge graph learning.arXiv preprint arXiv:2410.08255,

David D Baek, Yuxiao Li, and Max Tegmark. Generalization from starvation: Hints of universality in llm knowledge graph learning.arXiv preprint arXiv:2410.08255,

work page arXiv
[2]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Training Verifiers to Solve Math Word Problems

URLhttps://arxiv.org/abs/2110.14168. Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

arXiv preprint arXiv:2405.14860 , year=

Joshua Engels, Eric J Michaud, Isaac Liao, Wes Gurnee, and Max Tegmark. Not all language model features are one-dimensionally linear.arXiv preprint arXiv:2405.14860,

work page arXiv
[6]

Scaling relationship on learning mathematical reasoning with large language models.arXiv preprint arXiv:2308.01825, 2023

Zeyuan Allen Gao, Leo Gao, Tengyu Xu, Haotian Ye, Yike Wu, and Percy Liang. Physics of language models: Part 2.1, grade-school math and the hidden reasoning process.arXiv preprint arXiv:2308.01825,

work page arXiv
[7]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

2024 , journal =

Wes Gurnee and Max Tegmark. Language models represent space and time.arXiv preprint arXiv:2310.02207,

work page arXiv
[9]

Designing and interpreting probes with control tasks

10 John Hewitt and Percy Liang. Designing and interpreting probes with control tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2733–2743,

2019
[10]

A structural probe for finding syntax in word representations

John Hewitt and Christopher D Manning. A structural probe for finding syntax in word representations. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language T echnologies, Volume 1 (Long and Short Papers), pp. 4129–4138,

2019
[11]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

doi: 10.18653/ v1/2021.naacl-main.381

Subhash Kantamneni and Max Tegmark. Language models use trigonometry to do addition. arXiv preprint arXiv:2502.00873,

work page arXiv
[13]

From System 1 to System 2: A Survey of Reasoning Large Language Models

Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, et al. From system 1 to system 2: A survey of reasoning large language models.arXiv preprint arXiv:2502.17419,

work page internal anchor Pith review arXiv
[14]

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Samuel Marks and Max Tegmark. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets.arXiv preprint arXiv:2310.06824,

work page internal anchor Pith review arXiv
[15]

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models.arXiv preprint arXiv:2311.03658,

work page internal anchor Pith review arXiv
[16]

arXiv preprint arXiv:2406.01506 , year=

Kiho Park, Yo Joong Choe, Yibo Jiang, and Victor Veitch. The geometry of categorical and hierarchical concepts in large language models.arXiv preprint arXiv:2406.01506,

work page arXiv
[17]

Orchard: A benchmark for measuring systematic generalization of multi-hierarchical reasoning.arXiv preprint arXiv:2111.14034,

Bill Tuck Weng Pung and Alvin Chan. Orchard: A benchmark for measuring systematic generalization of multi-hierarchical reasoning.arXiv preprint arXiv:2111.14034,

work page arXiv
[18]

Thread: Thinking deeper with recursive spawning

Philip Schroeder, Nathaniel W Morgan, Hongyin Luo, and James Glass. Thread: Thinking deeper with recursive spawning. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language T echnologies (Volume 1: Long Papers), pp. 8418–8442,

2025
[19]

Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, and Iryna Gurevych. Shape happens: Automatic feature manifold discovery in llms via supervised multi- dimensional scaling.arXiv preprint arXiv:2510.01025,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Analyzing the structure of attention in a transformer language model,

11 Jesse Vig and Yonatan Belinkov. Analyzing the structure of attention in a transformer language model.arXiv preprint arXiv:1906.04284,

work page arXiv 1906
[21]

Hierarchical reasoning model, 2025

Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model.arXiv preprint arXiv:2506.21734,

work page arXiv
[22]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171,

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Sch ¨arli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625,

work page internal anchor Pith review arXiv
[24]

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, et al. Representation engineering: A top-down approach to ai transparency.arXiv preprint arXiv:2310.01405,

work page internal anchor Pith review arXiv
[25]

PATH: n0 n1 n2 ... n f

For samples with s= 2, we construct the traversals by concatenating the shortest paths between consecutive nodes and dropping duplicate boundary nodes. The resulting dataset is comprised of the following: (depth= 1, steps= 1): 61 examples, (depth= 1, steps= 2): 22 examples, (depth= 2, steps= 1): 439 examples, and (depth= 2, steps= 2): 478 examples. We not...

2000
[26]

We extracted depth direction from our depth probe coefficients and lifted to the full space via PCA components when required

and the depth probe direction (having rank 1). We extracted depth direction from our depth probe coefficients and lifted to the full space via PCA components when required. Interventions were implemented via forward hooks at various transformer layers. When ablating subspaces, we updated our hidden states by subtracting the projection onto the ablation ba...

2054