Recognition: unknown
Rethinking Intrinsic Dimension Estimation in Neural Representations
Pith reviewed 2026-05-10 00:47 UTC · model grok-4.3
The pith
Common intrinsic dimension estimators do not track the true underlying ID of neural network representations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We theoretically and empirically demonstrate that common ID estimators are not tracking the true underlying ID of the representation. We contrast this negative result with an investigation of the underlying factors that may drive commonly reported ID-related results on neural representation in the literature and offer a new perspective on ID estimation in neural representations.
What carries the argument
The mismatch between theoretical intrinsic dimension of a neural activation manifold and the numerical output of standard local or global ID estimators applied to the same points.
If this is right
- Interpretations that link ID growth or shrinkage directly to learning dynamics or generalization require additional validation.
- Reported ID trends in the literature are more likely driven by changes in local density, curvature, or optimization trajectory than by the manifold's intrinsic dimension.
- Future analyses of neural representations should prioritize diagnostics that separate estimator bias from geometric properties of the data.
- New ID estimation procedures may need to be developed that explicitly account for the non-uniform sampling and training-induced structure present in neural activations.
Where Pith is reading between the lines
- The result implies that ID-based arguments for capacity or compressibility in deep networks may need to be replaced by direct measurements of effective dimension via other means such as Hessian spectra or pruning sensitivity.
- This opens the possibility that many phenomena previously attributed to ID changes are instead symptoms of how gradient descent shapes the distribution of activations.
- Testable extension: apply the same estimator mismatch analysis to transformer attention heads or diffusion model latents to check whether the discrepancy generalizes beyond standard feed-forward layers.
Load-bearing premise
A well-defined true intrinsic dimension exists for neural representations independently of the estimators being used and can be meaningfully contrasted with those estimators' outputs.
What would settle it
A controlled synthetic dataset whose activations lie on a known low-dimensional manifold where standard estimators such as MLE or correlation dimension recover the ground-truth dimension across varying sample sizes and noise levels.
Figures
read the original abstract
The analysis of neural representation has become an integral part of research aiming to better understand the inner workings of neural networks. While there are many different approaches to investigate neural representations, an important line of research has focused on doing so through the lens of intrinsic dimensions (IDs). Although this perspective has provided valuable insights and stimulated substantial follow-up research, important limitations of this approach have remained largely unaddressed. In this paper, we highlight a crucial discrepancy between theory and practice of IDs in neural representations, theoretically and empirically showing that common ID estimators are, in fact, not tracking the true underlying ID of the representation. We contrast this negative result with an investigation of the underlying factors that may drive commonly reported ID-related results on neural representation in the literature. Building on these insights, we offer a new perspective on ID estimation in neural representations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that common intrinsic dimension (ID) estimators do not track the true underlying ID of neural representations. It provides theoretical arguments and empirical evidence for a discrepancy between theory and practice, investigates factors that may drive commonly reported ID-related findings in the literature, and proposes a new perspective on ID estimation for neural representations.
Significance. If the central negative result holds, the work would be significant for the field of neural representation analysis. It challenges reliance on standard ID estimators in interpretability research and offers both a critique and constructive investigation of driving factors plus an alternative viewpoint. The combination of theoretical and empirical components, along with the focus on underlying factors rather than solely a negative claim, strengthens its potential impact if the independence of the 'true ID' notion is adequately established.
major comments (2)
- [Theoretical and empirical evidence sections (as referenced in abstract)] The central negative result—that common ID estimators fail to track the true underlying ID—requires an explicit, estimator-independent definition or construction of that 'true' ID for neural representations (which are induced by optimization rather than given as a priori manifolds). Without this, the discrepancy may reflect disagreement among procedures rather than failure to track an objective truth. This is load-bearing for the subsequent analysis of driving factors and the new perspective.
- [Empirical evaluation] The empirical demonstrations should include controls that isolate whether observed discrepancies arise from the estimators themselves or from properties of the trained representations (e.g., via synthetic manifolds with known ground-truth ID that match the geometry induced by neural training).
minor comments (2)
- Clarify notation for ID estimators and any new quantities introduced in the 'new perspective' to avoid ambiguity for readers familiar with prior ID literature.
- Ensure all figures comparing estimator outputs include statistical details such as variance across runs or seeds.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our central claims. We address each major point below and will revise the manuscript accordingly where needed.
read point-by-point responses
-
Referee: [Theoretical and empirical evidence sections (as referenced in abstract)] The central negative result—that common ID estimators fail to track the true underlying ID—requires an explicit, estimator-independent definition or construction of that 'true' ID for neural representations (which are induced by optimization rather than given as a priori manifolds). Without this, the discrepancy may reflect disagreement among procedures rather than failure to track an objective truth. This is load-bearing for the subsequent analysis of driving factors and the new perspective.
Authors: We agree that an explicit, estimator-independent definition of the true ID is necessary to ground the negative result. In the manuscript, the true ID is defined as the minimal dimensionality of the data-generating process that explains the observed variability in the neural activations, derived from the optimization-induced distribution rather than from any ID estimator. This draws on standard manifold assumptions but accounts for the fact that the representation is the output of training. We will revise the theoretical section (and add a dedicated paragraph in the introduction) to state this definition formally and upfront, including why it is independent of the estimators under consideration. This should address the concern that the discrepancy could be merely procedural. revision: yes
-
Referee: [Empirical evaluation] The empirical demonstrations should include controls that isolate whether observed discrepancies arise from the estimators themselves or from properties of the trained representations (e.g., via synthetic manifolds with known ground-truth ID that match the geometry induced by neural training).
Authors: We acknowledge the value of such isolating controls. Our current experiments already include some synthetic settings with known IDs, but they do not fully replicate the precise geometry arising from neural optimization on real data. Constructing exact synthetic proxies for trained representations is non-trivial, yet we will add a new controlled experiment using low-dimensional synthetic manifolds (with ground-truth ID) on which we train simple networks to induce comparable local geometries. This will help separate estimator behavior from representation properties and will be included in the revised empirical section. revision: partial
Circularity Check
No circularity: negative result rests on independent ground-truth constructions rather than estimator-dependent definitions.
full rationale
The paper advances a negative result by contrasting common ID estimators against controlled settings (synthetic data and theoretical models) where the underlying manifold dimension is fixed by construction of the data-generating process, independent of any estimator. The subsequent analysis of driving factors (e.g., curvature, sampling effects) and the offered new perspective are derived from these discrepancies without redefining the target ID via the estimators under test or via self-citation chains. No load-bearing step reduces to a fitted parameter or ansatz imported from the authors' prior work; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural representations possess a well-defined true intrinsic dimension independent of common estimators
Reference graph
Works this paper leans on
-
[1]
2006 , publisher=
Pattern recognition and machine learning , author=. 2006 , publisher=
2006
-
[2]
Adaptive Control Processes: A Guided Tour , author =
-
[3]
2016 , publisher=
Deep Learning , author=. 2016 , publisher=
2016
-
[4]
IEEE transactions on pattern analysis and machine intelligence , volume=
Representation learning: A review and new perspectives , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2013 , publisher=
2013
-
[5]
1982 , publisher=
Stone, Charles J , journal=. 1982 , publisher=
1982
-
[6]
2007 , publisher=
Bickel, Peter J and Li, Bo , journal=. 2007 , publisher=
2007
-
[7]
Nakada, Ryumei and Imaizumi, Masaaki , journal=
-
[8]
Schmidt-Hieber, Johannes , journal=
-
[9]
Chen, Minshuo and Jiang, Haoming and Liao, Wenjing and Zhao, Tuo , journal=
-
[10]
Journal of Statistical Planning and Inference , volume =. 2023 , issn =. doi:https://doi.org/10.1016/j.jspi.2022.05.008 , author =
-
[11]
Forty-second International Conference on Machine Learning , year=
Adjustment for Confounding using Pre-Trained Representations , author=. Forty-second International Conference on Machine Learning , year=
-
[12]
Science , volume=
A global geometric framework for nonlinear dimensionality reduction , author=. Science , volume=. 2000 , publisher=
2000
-
[13]
Journal of the American Mathematical Society , volume=
Testing the manifold hypothesis , author=. Journal of the American Mathematical Society , volume=
-
[14]
International Conference on Learning Representations , year=
Isotropy in the contextual embedding space: Clusters and manifolds , author=. International Conference on Learning Representations , year=
-
[15]
Pope, Phillip and Zhu, Chen and Abdelkader, Ahmed and Goldblum, Micah and Goldstein, Tom , booktitle=
-
[16]
Ansuini, Alessio and Laio, Alessandro and Macke, Jakob H and Zoccolan, Davide , journal=
-
[17]
Gong, Sixue and Boddeti, Vishnu Naresh and Jain, Anil K , booktitle=
-
[18]
Konz, Nicholas and Mazurowski, Maciej A , booktitle=
-
[19]
Intrinsic dimensionality explains the effectiveness of language model fine-tuning , author=. Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) , pages=
-
[20]
Advances in Neural Information Processing Systems , volume=
The geometry of hidden representations of large transformer models , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Emily Cheng and Diego Doimo and Corentin Kervadec and Iuri Macocco and Lei Yu and Alessandro Laio and Marco Baroni , booktitle=
-
[22]
Chunyuan Li and Heerad Farkhoor and Rosanne Liu and Jason Yosinski , booktitle=
-
[23]
bioRxiv , pages=
The geometry of hidden representations of protein language models , author=. bioRxiv , pages=. 2022 , publisher=
2022
-
[24]
Advances in Neural Information Processing Systems , volume=
The representation landscape of few-shot learning and fine-tuning in large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[25]
Nicholas Konz and Maciej A Mazurowski , booktitle=
-
[26]
arXiv preprint arXiv:2501.10573 , year=
The geometry of tokens in internal representations of large language models , author=. arXiv preprint arXiv:2501.10573 , year=
-
[27]
arXiv preprint arXiv:2505.17998 , year=
Aljaafari, Nura and Carvalho, Danilo S and Freitas, Andr. arXiv preprint arXiv:2505.17998 , year=
-
[28]
Bradley C. A. Brown and Anthony L. Caterini and Brendan Leigh Ross and Jesse C. Cresswell and Gabriel Loaiza-Ganem , title=. 2023 , booktitle=
2023
-
[29]
Kiho Park and Yo Joong Choe and Yibo Jiang and Victor Veitch , booktitle=
-
[30]
Kiho Park and Yo Joong Choe and Victor Veitch , booktitle=
-
[31]
2024 , volume =
Jiang, Yibo and Rajendran, Goutham and Ravikumar, Pradeep Kumar and Aragam, Bryon and Veitch, Victor , booktitle =. 2024 , volume =
2024
-
[32]
Samuel Marks and Max Tegmark , booktitle=
-
[33]
Layers at Similar Depths Generate Similar Activations Across
Christopher Wolfram and Aaron Schein , booktitle=. Layers at Similar Depths Generate Similar Activations Across
-
[34]
International Conference on Similarity Search and Applications , pages=
Relationships Between Local Intrinsic Dimensionality and Tail Entropy , author=. International Conference on Similarity Search and Applications , pages=. 2021 , organization=
2021
-
[35]
Entropy , volume=
Local intrinsic dimensionality, entropy and statistical divergences , author=. Entropy , volume=. 2022 , publisher=
2022
-
[36]
Statistics and analysis of shapes , pages=
Determining intrinsic dimension and entropy of high-dimensional shape spaces , author=. Statistics and analysis of shapes , pages=. 2006 , publisher=
2006
-
[37]
Levina, Elizaveta and Bickel, Peter , journal=
-
[38]
and Ghahramani, Zoubin , year=
MacKay, David J.C. and Ghahramani, Zoubin , year=
-
[39]
Scientific reports , volume=
Estimating the intrinsic dimension of datasets by a minimal neighborhood information , author=. Scientific reports , volume=. 2017 , publisher=
2017
-
[40]
Scientific Reports , volume=
The generalized ratios intrinsic dimension estimator , author=. Scientific Reports , volume=. 2022 , publisher=
2022
-
[41]
Dimension und
Hausdorff, Felix , journal=. Dimension und. 1918 , publisher=
1918
-
[42]
2013 , publisher=
Fractal geometry: mathematical foundations and applications , author=. 2013 , publisher=
2013
-
[43]
Ergodic theory and dynamical systems , volume=
Dimension, entropy and Lyapunov exponents , author=. Ergodic theory and dynamical systems , volume=. 1982 , publisher=
1982
-
[44]
Lectures on dynamics, fractal geometry, and metric number theory , author=. J. Mod. Dyn , volume=
-
[45]
arXiv preprint arXiv:1312.2298 , year=
On the estimation of pointwise dimension , author=. arXiv preprint arXiv:1312.2298 , year=
-
[46]
International Conference on Machine Learning , pages=
The Lipschitz constant of self-attention , author=. International Conference on Machine Learning , pages=. 2021 , organization=
2021
-
[47]
Advances in Neural Information Processing Systems , volume=
Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks , author=. Advances in Neural Information Processing Systems , volume=
-
[48]
arXiv preprint arXiv:1704.00805 , year=
On the properties of the softmax function with application in game theory and reinforcement learning , author=. arXiv preprint arXiv:1704.00805 , year=
-
[49]
Advances in Neural Information Processing Systems , volume=
Pay attention to your loss: understanding misconceptions about lipschitz neural networks , author=. Advances in Neural Information Processing Systems , volume=
-
[50]
Information Sciences , volume=
Intrinsic dimension estimation: Advances and open problems , author=. Information Sciences , volume=. 2016 , publisher=
2016
-
[51]
arXiv preprint arXiv:2507.13887 , year=
A survey of dimension estimation methods , author=. arXiv preprint arXiv:2507.13887 , year=
-
[52]
Penrose and J
Mathew D. Penrose and J. E. Yukich , journal =. Limit theory for point processes in manifolds , urldate =
-
[53]
The Annals of Applied Probability , volume=
Weak laws of large numbers in geometric probability , author=. The Annals of Applied Probability , volume=. 2003 , publisher=
2003
-
[54]
Oscar Skean and Md Rifat Arefin and Dan Zhao and Niket Nikul Patel and Jalal Naghiyev and Yann LeCun and Ravid Shwartz-Ziv , booktitle=
-
[55]
2007 15th European signal processing conference , pages=
The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=
2007
-
[56]
IEEE Transactions on Information Theory , volume=
Measures of entropy from data using infinitely divisible kernels , author=. IEEE Transactions on Information Theory , volume=. 2014 , publisher=
2014
-
[57]
nearest neighbor
When is “nearest neighbor” meaningful? , author=. International conference on database theory , pages=. 1999 , organization=
1999
-
[58]
International conference on database theory , pages=
On the surprising behavior of distance metrics in high dimensional space , author=. International conference on database theory , pages=. 2001 , organization=
2001
-
[59]
Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher , booktitle=
-
[60]
International Conference on Machine Learning , pages=
Pythia: A suite for analyzing large language models across training and scaling , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[61]
The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and Lélio Renard Lavaud and Marie-Anne Lachaux and Pierre Stock and Teven Le Scao and Thibaut Lavril and Thomas Wang and Timothée Lacroix and Willi...
work page internal anchor Pith review Pith/arXiv arXiv
-
[63]
Michael Robinson and Sourya Dey and Tony Chiang , booktitle=
-
[64]
Advances in Neural Information Processing Systems , volume=
Sparse manifold clustering and embedding , author=. Advances in Neural Information Processing Systems , volume=
-
[65]
IEEE Signal Processing Magazine , volume=
Subspace clustering , author=. IEEE Signal Processing Magazine , volume=. 2011 , publisher=
2011
-
[66]
TorchVision maintainers and contributors , year = 2016, journal =
2016
-
[67]
Advances in Neural Information Processing Systems , volume=
Pytorch: An imperative style, high-performance deep learning library , author=. Advances in Neural Information Processing Systems , volume=
-
[68]
Entropy , volume=
Scikit-dimension: a python package for intrinsic dimension estimation , author=. Entropy , volume=. 2021 , publisher=
2021
-
[69]
2022 , issn =
Patterns , pages =. 2022 , issn =
2022
-
[70]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very deep convolutional networks for large-scale image recognition , author=. arXiv preprint arXiv:1409.1556 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[71]
Advances in Neural Information Processing Systems , volume=
Imagenet classification with deep convolutional neural networks , author=. Advances in Neural Information Processing Systems , volume=
-
[72]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[73]
2009 , organization=
Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle=. 2009 , organization=
2009
-
[74]
Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda , year=. 2006.03677 , archivePrefix=
-
[75]
AI Alignment Forum , pages=
Residual stream norms grow exponentially over the forward pass , author=. AI Alignment Forum , pages=
-
[76]
Gupta, Akshat and Ozdemir, Atahan and Anumanchipalli, Gopala , journal=
-
[77]
Lawson, Tim and Farnik, Lucy and Houghton, Conor and Aitchison, Laurence , booktitle=
-
[78]
Benjamin Matthias Ruppik and Julius von Rohrscheidt and Carel van Niekerk and Michael Heck and Renato Vukovic and Shutong Feng and Hsien-chin Lin and Nurul Lubis and Bastian Rieck and Marcus Zibrowius and Milica Gasic , booktitle=
-
[79]
Simon Roschmann and Quentin Bouniot and Zeynep Akata , booktitle=
-
[80]
Hanzhang Wang and Jiawen Zhang and Qingyuan Ma , booktitle=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.