pith. machine review for the scientific record. sign in

arxiv: 2605.06304 · v1 · submitted 2026-05-07 · 🧬 q-bio.NC

Recognition: unknown

A multi-scale information geometry reveals the structure of mutual information in neural populations

Matthew Chalk, Simone Azeglio, Steeve Laquitaine, Ulisse Ferrari

Pith reviewed 2026-05-08 03:12 UTC · model grok-4.3

classification 🧬 q-bio.NC
keywords representational geometrymutual informationneural population codesRiemannian metricFisher informationcoarse-grainingmulti-scale analysisdiffusion models
0
0 comments X

The pith

A unique Riemannian geometry on stimulus space emerges from how distances contract with lost resolution and directly encodes mutual information in neural populations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a multi-scale Riemannian metric by starting from first principles on how stimulus distances shrink when resolution is lost through coarse-graining. This produces an extension of the Fisher information metric that stretches directions contributing more to mutual information and shrinks those contributing less. The geometry therefore gives a single, information-theoretic account of representational structure from fine details to coarse distinctions. The metric tensor is practical to estimate with diffusion models and, when applied to visual cortex responses to natural images, its eigenvectors highlight the stimulus variations that carry the most transmitted information.

Core claim

A unique Riemannian representational geometry emerges from first principles governing how distances contract as stimulus resolution is lost through coarse-graining. This results in a multi-scale extension of the Fisher information metric that captures encoding structure from fine stimulus details to coarse global distinctions. The resulting geometry is exactly related to the mutual information encoded by the population: well-encoded stimulus directions are expanded while poorly encoded directions are contracted. The metric tensor can be estimated using diffusion models and, when applied to visual cortical responses, its eigenvectors identify the stimulus variations that contribute most to信息传

What carries the argument

The multi-scale Riemannian metric tensor obtained by enforcing first-principles contraction of distances under coarse-graining; it expands stimulus directions in proportion to their contribution to mutual information.

If this is right

  • Eigenvectors of the estimated metric identify stimulus features that contribute most to mutual information transmission.
  • The same geometry applies unchanged from fine local distinctions to coarse global ones.
  • Diffusion-model estimation makes the metric computable for high-dimensional stimuli and large populations.
  • Features recovered from cortical responses remain stable across different modelling choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same contraction principle could be applied to artificial networks to test whether their internal geometries also expand informative directions.
  • Behavioral discrimination thresholds measured at multiple scales could be compared directly against the predicted metric contractions.
  • The framework offers a way to ask whether efficient coding theories should be stated at a single scale or across multiple coarse-graining levels simultaneously.

Load-bearing premise

Distances between stimuli must contract according to specific first principles whenever resolution is lost through coarse-graining.

What would settle it

Compute the metric tensor from a simple known neural population model whose mutual information contributions are calculated directly; the tensor eigenvectors will fail to align with the highest-information directions if the claim is false.

Figures

Figures reproduced from arXiv: 2605.06304 by Matthew Chalk, Simone Azeglio, Steeve Laquitaine, Ulisse Ferrari.

Figure 1
Figure 1. Figure 1: A D E x x B f( x ) x t bell-shaped tuning curve monotonic tuning curve x f( x ) t J ( x ) x bell shaped monotonic C x G ( x ) bell shaped monotonic F contribution at scale t view at source ↗
Figure 2
Figure 2. Figure 2: uniform prior ridge prior G(x) firing rate A B D t = 0 t = 1.5 t = 4 tuning curves C G(x) neuron 1 neuron 2 f( x ) stim dim 1. stim dim 2 stim dim 2 p(x) stim dim 1. stim dim 1. stim dim 1. stim dim 1. stim dim 1. stim dim 2 stim dim 1. E p(x) view at source ↗
Figure 3
Figure 3. Figure 3: Multi-scale geometry of neural population recordings in V1 and V4. (A) Deep CNN encoding model used to predict V1 and V4 neural responses to natural images (Appendix C.2). (B–C) Measured versus predicted responses across all neurons and images, for V1 (B) and V4 (C). (D) Example image with V1 and V4 receptive field regions outlined in blue and yellow, respectively. (E) Leading four eigenvectors of the mult… view at source ↗
Figure 4
Figure 4. Figure 4: Robustness across encoding model architectures. (A) Predictive performance for Inception, ResNet, and ONOFF models, for V1 and V4. (B) Leading eigenvectors of the multi-scale Fisher G(x) for an example image (Fig. 3D), for each model architecture fitted to V1 (top) and V4 (bottom). (C) Same as B, for the Fisher information J(x). (D–E) Subspace alignment of the top k eigenvectors of G(x) (D) and J(x) (E), c… view at source ↗
Figure 5
Figure 5. Figure 5: Local metric field projected onto the leading eigenvectors of G(x). (A) Ellipses show the multi-scale Fisher G(x) computed from Inception (blue) and ResNet (orange) encoding models, as the reference image is perturbed along the leading two eigenvectors of G(x) from the Inception model. At each perturbed image, both metrics are projected onto the same two-dimensional subspace (leading Inception eigenvectors… view at source ↗
Figure 6
Figure 6. Figure 6: Bures–Wasserstein distance between reconstructed geometries. dBW computed from the top-k eigenvalue/eigenvector pairs (Eq. (41)) as a function of k, averaged across images, for four encoder/area comparisons: V1 vs V4 within each backbone (Inception, ResNet) and Inception vs ResNet within each area (V1, V4). (Left) Multi-scale metric G(x); cross-model distances (same area, different backbone) lie below cros… view at source ↗
read the original abstract

Understanding how neural population responses represent sensory information is a central problem in systems neuroscience. One approach is to define a representational geometry on stimulus space in which distances reflect how reliably stimuli can be distinguished from neural activity. However, different constructions of these distances can lead to qualitatively different conclusions about the neural code. Here, we show that a unique Riemannian representational geometry emerges from first principles governing how distances contract as stimulus resolution is lost through coarse-graining. This results in a multi-scale extension of the Fisher information metric, capturing encoding structure from fine stimulus details to coarse global distinctions. The resulting geometry is exactly related to the mutual information encoded by the population: well encoded stimulus directions - those contributing more to mutual information - are expanded, whereas poorly encoded directions are contracted. The metric tensor can be estimated using diffusion models, making the framework practical for large neural populations and high-dimensional stimuli. Applied to visual cortical responses to natural images, the eigenvectors of the metric tensor identify stimulus variations that contribute most to information transmission, yielding interpretable features that are robust to modelling choices. Together, these results provide a principled, information-theoretic framework for characterising neural population codes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims that a unique Riemannian representational geometry on stimulus space emerges from first principles on distance contraction under stimulus coarse-graining. This yields a multi-scale extension of the Fisher information metric whose local geometry is exactly related to the mutual information encoded by a neural population (well-encoded directions expanded, poorly encoded directions contracted). The metric tensor is estimated via diffusion models and applied to visual cortical responses to natural images, where its eigenvectors identify robust, interpretable stimulus features contributing most to information transmission.

Significance. If the derivation of uniqueness and the exact MI relation hold without auxiliary choices, the framework would supply a principled, parameter-free link between representational geometry and information content, extending Fisher geometry to multiple scales while remaining practical for high-dimensional data. The diffusion-model estimation and application to natural-image responses are concrete strengths that could make the approach widely usable in systems neuroscience.

major comments (2)
  1. [§3] §3 (derivation of the metric tensor): The contraction principles under coarse-graining are asserted to fix a unique Riemannian metric, but the text does not supply an explicit uniqueness proof or demonstrate that the stated axioms exclude other candidate metrics (e.g., other monotonic contractions or non-Riemannian extensions). Without this, the claim that the geometry 'emerges' uniquely remains unverified and load-bearing for the central result.
  2. [§4, Eq. (12)–(15)] §4, Eq. (12)–(15): The exact relation between the multi-scale metric and mutual information is presented as following directly from the contraction rule, yet the steps appear to introduce an additional identification (integration of the metric yielding KL terms). If this identification is not forced by the contraction axioms alone, the 'exactly related' assertion risks circularity and requires a clearer separation between derived and imposed elements.
minor comments (3)
  1. [Figure 2] Figure 2: the schematic of multi-scale coarse-graining lacks explicit stimulus examples or quantitative contraction factors, reducing clarity for readers unfamiliar with the construction.
  2. [Notation] Notation: the multi-scale parameter is introduced as λ but later appears as a vector in some equations; consistent vector/scalar distinction would aid readability.
  3. [Methods] The abstract states that the metric 'can be estimated using diffusion models' but the main text does not compare estimation error or bias against direct sampling methods; a short paragraph on this would strengthen the practical claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us clarify key aspects of the derivation and strengthen the manuscript. We address each major comment point by point below. Revisions have been made to incorporate explicit proofs and clearer logical separations where the original text was insufficiently detailed.

read point-by-point responses
  1. Referee: [§3] §3 (derivation of the metric tensor): The contraction principles under coarse-graining are asserted to fix a unique Riemannian metric, but the text does not supply an explicit uniqueness proof or demonstrate that the stated axioms exclude other candidate metrics (e.g., other monotonic contractions or non-Riemannian extensions). Without this, the claim that the geometry 'emerges' uniquely remains unverified and load-bearing for the central result.

    Authors: We agree that the original presentation would benefit from an explicit uniqueness argument. In the revised manuscript, we have added a new subsection in §3 that provides a rigorous uniqueness proof. Starting from the contraction axioms (monotonicity under coarse-graining, locality, and smoothness), we derive a functional equation for the metric tensor and show that its unique solution (up to global scaling) is the multi-scale extension of the Fisher metric. We further demonstrate that alternative monotonic contractions or non-Riemannian structures violate either the locality condition or the requirement that distances reduce to the Fisher metric in the infinitesimal limit. This establishes that the geometry emerges uniquely from the stated principles. revision: yes

  2. Referee: [§4, Eq. (12)–(15)] §4, Eq. (12)–(15): The exact relation between the multi-scale metric and mutual information is presented as following directly from the contraction rule, yet the steps appear to introduce an additional identification (integration of the metric yielding KL terms). If this identification is not forced by the contraction axioms alone, the 'exactly related' assertion risks circularity and requires a clearer separation between derived and imposed elements.

    Authors: We thank the referee for identifying the need for clearer separation. In the revised §4, we have restructured the derivation into two distinct stages. First, the metric tensor is obtained solely from the contraction axioms without reference to mutual information. Second, we show that integrating this metric along stimulus-space paths yields the mutual information contribution exactly, because the integral equals the expected KL divergence between the neural response distributions conditioned on nearby stimuli (a direct consequence of the definition of MI and the contraction rule). No auxiliary identification is imposed; the equality follows from the chain rule for KL divergences under the coarse-graining map. The revised text includes an explicit paragraph delineating these steps and a new equation clarifying the integration. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained from independent contraction principles

full rationale

The paper states that a unique Riemannian geometry emerges from first principles on how distances contract under coarse-graining of stimulus resolution, yielding a multi-scale Fisher extension whose local properties relate to mutual information contributions. No quoted equations or steps reduce the claimed uniqueness or exact MI relation to a fitted parameter, self-citation chain, or definitional tautology; the contraction rules are presented as external inputs that fix the metric tensor, after which the MI link is derived rather than presupposed. The framework remains independent of the target result and does not rename known patterns or smuggle ansatzes via self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unstated but load-bearing assumption about the functional form of distance contraction under coarse-graining; no free parameters or new entities are explicitly introduced in the abstract, though estimation via diffusion models will introduce modeling choices.

axioms (1)
  • domain assumption Distances between stimuli contract in a specific, first-principles-governed manner when stimulus resolution is lost through coarse-graining.
    This contraction rule is invoked to derive the unique Riemannian geometry and its multi-scale Fisher extension.

pith-pipeline@v0.9.0 · 5508 in / 1315 out tokens · 48840 ms · 2026-05-08T03:12:22.105409+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Matching categorical object representations in inferior temporal cortex of man and monkey.Neuron, 60(6):1126–1141, 2008

    Nikolaus Kriegeskorte, Marieke Mur, Douglas A Ruff, Roozbeh Kiani, Jerzy Bodurka, Hossein Esteky, Keiji Tanaka, and Peter A Bandettini. Matching categorical object representations in inferior temporal cortex of man and monkey.Neuron, 60(6):1126–1141, 2008

  2. [2]

    Quantifying differences in neural population activity with shape metrics.bioRxiv, pages 2025–01, 2025

    Joao Barbosa, Amin Nejatbakhsh, Lyndon Duong, Sarah E Harvey, Scott L Brincat, Markus Siegel, Earl K Miller, and Alex H Williams. Quantifying differences in neural population activity with shape metrics.bioRxiv, pages 2025–01, 2025

  3. [3]

    SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in Neural Information Processing Systems, 30, 2017

    Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in Neural Information Processing Systems, 30, 2017

  4. [4]

    Insights on representational similarity in neural networks with canonical correlation

    Ari S Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation. InAdvances in Neural Information Processing Systems, volume 31, pages 5732–5741, 2018

  5. [5]

    Deep supervised, but not unsuper- vised, models may explain it cortical representation.PLoS computational biology, 10(11): e1003915, 2014

    Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsuper- vised, models may explain it cortical representation.PLoS computational biology, 10(11): e1003915, 2014

  6. [6]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational Conference on Machine Learning, pages 3519–3529. PMLR, 2019

  7. [7]

    Springer, 2016

    Shun-ichi Amari.Information geometry and its applications. Springer, 2016

  8. [8]

    Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

    Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geometry.Nature Reviews Neuroscience, 22(11):703–718, 2021

  9. [9]

    Cambridge University Press, Cambridge, 2nd edition, 2009

    David Applebaum.Lévy Processes and Stochastic Calculus, volume 116 ofCambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2nd edition, 2009

  10. [10]

    Evans.Partial Differential Equations, volume 19 ofGraduate Studies in Mathe- matics

    Lawrence C. Evans.Partial Differential Equations, volume 19 ofGraduate Studies in Mathe- matics. American Mathematical Society, Providence, Rhode Island, 2nd edition, 2010

  11. [11]

    Decompos- ing stimulus-specific sensory neural information via diffusion models

    Steeve Laquitaine, Simone Azeglio, Carlo Paris, Ulisse Ferrari, and Matthew Chalk. Decompos- ing stimulus-specific sensory neural information via diffusion models. InAdvances in Neural Information Processing Systems, volume 38, 2025

  12. [12]

    Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

    Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

  13. [13]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  14. [14]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

  15. [15]

    Mmg: Mutual information estimation via the mmse gap in diffusion.arXiv preprint arXiv:2509.20609, 2025

    Longxuan Yu, Xing Shi, Xianghao Kong, Tong Jia, and Greg Ver Steeg. Mmg: Mutual information estimation via the mmse gap in diffusion.arXiv preprint arXiv:2509.20609, 2025

  16. [16]

    Interpretable dif- fusion via information decomposition

    Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, and Greg Ver Steeg. Interpretable dif- fusion via information decomposition. InThe Twelfth International Conference on Learning Representations, 2024

  17. [17]

    Diffusion pid: Interpreting diffusion via partial information decomposition

    Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, and Yonatan Bisk. Diffusion pid: Interpreting diffusion via partial information decomposition. Advances in Neural Information Processing Systems, 37:2045–2079, 2024

  18. [18]

    Diffusion posterior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael T McCann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InInternational Conference on Learning Representations, 2023. 11

  19. [19]

    An extensive dataset of spiking activity to reveal the syntax of the ventral stream.Neuron, 113(4):539–553, 2025

    Paolo Papale, Feng Wang, Matthew W Self, and Pieter R Roelfsema. An extensive dataset of spiking activity to reveal the syntax of the ventral stream.Neuron, 113(4):539–553, 2025

  20. [20]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015

  21. [21]

    Deep residual learning for im- age recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  22. [22]

    On the bures–wasserstein distance between positive definite matrices.Expositiones mathematicae, 37(2):165–191, 2019

    Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the bures–wasserstein distance between positive definite matrices.Expositiones mathematicae, 37(2):165–191, 2019

  23. [23]

    Eigen-distortions of hierarchical representations.Advances in Neural Information Processing Systems, 30, 2017

    Alexander Berardino, Valero Laparra, Johannes Ballé, and Eero Simoncelli. Eigen-distortions of hierarchical representations.Advances in Neural Information Processing Systems, 30, 2017

  24. [24]

    Mutual information and minimum mean- square error in gaussian channels.IEEE Transactions on Information Theory, 51(4):1261–1282, 2005

    Dongning Guo, Shlomo Shamai, and Sergio Verdú. Mutual information and minimum mean- square error in gaussian channels.IEEE Transactions on Information Theory, 51(4):1261–1282, 2005

  25. [25]

    Information-theoretic diffusion

    Xianghao Kong, Rob Brekelmans, and Greg Ver Steeg. Information-theoretic diffusion. InThe Eleventh International Conference on Learning Representations, 2023

  26. [26]

    Fiquet, Florentin Guth, Jona Ballé, and Eero P

    Guy Ohayon, Pierre-Étienne H. Fiquet, Florentin Guth, Jona Ballé, and Eero P. Simoncelli. Learning a distance measure from the information-estimation geometry of data. InThe F our- teenth International Conference on Learning Representations, 2026

  27. [27]

    Statistical inference on representational geometries.Elife, 12:e82566, 2023

    Heiko H Schütt, Alexander D Kipnis, Jörn Diedrichsen, and Nikolaus Kriegeskorte. Statistical inference on representational geometries.Elife, 12:e82566, 2023

  28. [28]

    Generalized shape metrics on neural representations.Advances in Neural Information Processing Systems, 34:4738–4750, 2021

    Alex H Williams, Erin Kunz, Simon Kornblith, and Scott Linderman. Generalized shape metrics on neural representations.Advances in Neural Information Processing Systems, 34:4738–4750, 2021

  29. [29]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018

  30. [30]

    Tuning landscapes of the ventral stream.Cell Reports, 41(6), 2022

    Binxu Wang and Carlos R Ponce. Tuning landscapes of the ventral stream.Cell Reports, 41(6), 2022

  31. [31]

    Discriminating image representations with principal distortions

    Jenelle Feather, David Lipshutz, Sarah E Harvey, Alex H Williams, and Eero P Simoncelli. Discriminating image representations with principal distortions. InThe Thirteenth International Conference on Learning Representations, 2025

  32. [32]

    Comparing neural models using their perceptual discriminability predictions

    Jingyang Zhou, Chanwoo Chun, Ajay Subramanian, and Eero P Simoncelli. Comparing neural models using their perceptual discriminability predictions. InProceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, volume 243, pages 170–181. PMLR, 2024

  33. [33]

    Information geometry of the retinal representation manifold.Advances in Neural Information Processing Systems, 36:44310–44322, 2023

    Xuehao Ding, Dongsoo Lee, Joshua Melander, George Sivulka, Surya Ganguli, and Stephen Baccus. Information geometry of the retinal representation manifold.Advances in Neural Information Processing Systems, 36:44310–44322, 2023

  34. [34]

    arXiv preprint arXiv:2206.08666 , year=

    Konstantin F Willeke, Paul G Fahey, Mohammad Bashiri, Laura Pede, Max F Burg, Christoph Blessing, Santiago A Cadena, Zhiwei Ding, Konstantin-Klemens Lurz, Kayla Ponder, et al. The sensorium competition on predicting large-scale mouse primary visual cortex activity.arXiv preprint arXiv:2206.08666, 2022

  35. [35]

    The dynamic sensorium competition for predicting large-scale mouse visual cortex activity from videos.ArXiv, pages arXiv–2305, 2024

    Polina Turishcheva, Paul G Fahey, Michaela Vystrˇcilová, Laura Hansel, Rachel Froebe, Kayla Ponder, Yongrong Qiu, Konstantin F Willeke, Mohammad Bashiri, Eric Wang, et al. The dynamic sensorium competition for predicting large-scale mouse visual cortex activity from videos.ArXiv, pages arXiv–2305, 2024. 12

  36. [36]

    Majaj, Rishi Rajalingham, Elias B

    Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, Kailyn Schmidt, Daniel L. K. Yamins, and James J. DiCarlo. Brain-score: Which artificial neural network for object recognition is most brain-like?bioRxiv, page 407007, 2018

  37. [37]

    OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

    Konstantin F Willeke, Polina Turishcheva, Alex Gilbert, Goirik Chakrabarty, Hasan A Bedel, Paul G Fahey, Yongrong Qiu, Marissa A Weis, Michaela Vystrˇcilová, Taliah Muhammad, et al. Omnimouse: Scaling properties of multi-modal, multi-task brain models on 150b neural tokens. arXiv preprint arXiv:2604.18827, 2026

  38. [38]

    The renormalization group: A probabilistic view.Il Nuovo Cimento B (1971-1996), 26(1):99–119, 1975

    G Jona-Lasinio. The renormalization group: A probabilistic view.Il Nuovo Cimento B (1971-1996), 26(1):99–119, 1975

  39. [39]

    Information loss under coarse graining: A geometric approach.Physical Review E, 98(5):052112, 2018

    Archishman Raju, Benjamin B Machta, and James P Sethna. Information loss under coarse graining: A geometric approach.Physical Review E, 98(5):052112, 2018

  40. [40]

    Information geometry and the renormalization group.Physical Review E, 92(5):052101, 2015

    Reevu Maity, Subhash Mahapatra, and Tapobrata Sarkar. Information geometry and the renormalization group.Physical Review E, 92(5):052101, 2015

  41. [41]

    Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015. URL https: //api.semanticscholar.org/CorpusID:16664790

  42. [42]

    Things: A database of 1,854 object concepts and more than 26,000 naturalistic object images.PloS one, 14(10):e0223792, 2019

    Martin N Hebart, Adam H Dickter, Alexis Kidder, Wan Y Kwok, Anna Corriveau, Caitlin Van Wicklin, and Chris I Baker. Things: A database of 1,854 object concepts and more than 26,000 naturalistic object images.PloS one, 14(10):e0223792, 2019

  43. [43]

    Re- thinking the inception architecture for computer vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Re- thinking the inception architecture for computer vision. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016

  44. [44]

    Using goal-driven deep learning models to understand sensory cortex.Nature Neuroscience, 19(3):356–365, 2016

    Daniel LK Yamins and James J DiCarlo. Using goal-driven deep learning models to understand sensory cortex.Nature Neuroscience, 19(3):356–365, 2016

  45. [45]

    Torchvision: Pytorch’s computer vision library

    TorchVision maintainers and contributors. Torchvision: Pytorch’s computer vision library. https://github.com/pytorch/vision, 2016

  46. [46]

    Plenoptic: A platform for synthesizing model-optimized visual stimuli.Journal of Vision, 23(9):5822–5822, 2023

    Lyndon Duong, Kathryn Bonnen, William Broderick, Pierre-Étienne Fiquet, Nikhil Parthasarathy, Thomas Yerxa, Xinyuan Zhao, and Eero Simoncelli. Plenoptic: A platform for synthesizing model-optimized visual stimuli.Journal of Vision, 23(9):5822–5822, 2023

  47. [47]

    JHU press, 2013

    Gene H Golub and Charles F Van Loan.Matrix computations. JHU press, 2013

  48. [48]

    What representational similarity measures imply about decodable information

    Sarah E Harvey, David Lipshutz, and Alex H Williams. What representational similarity measures imply about decodable information. InProceedings of UniReps: the Second Edition of the Workshop on Unifying Representations in Neural Models, volume 285, pages 140–151. PMLR, 2024. 13 A Relation between geometry and mutual information Recently, Laquitaine et al....