pith. sign in

arxiv: 2606.09287 · v2 · pith:24CUQR4Cnew · submitted 2026-06-08 · 💻 cs.LG

Trajectory Geometry of Transformer Representations Across Layers

Pith reviewed 2026-06-27 16:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords transformer representationstrajectory geometrylayerwise cosine similaritymechanistic interpretabilitysemantic convergencerepresentational stabilityrepresentation manifold
0
0 comments X

The pith

Layerwise cosine similarity reveals a universal three-phase structure in how transformer representations evolve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats each forward pass as a trajectory of token representations through successive layers in a high-dimensional space. Five metrics computed directly in that space track how the trajectories change: their length and curvature, how much related prompts converge, how stable representations remain, and the similarity between one layer and the next. Across GPT-2, TinyLlama, and Qwen2.5, related prompts draw closer in middle and late layers, reasoning prompts trace more curved paths than lexical ones, ambiguous tokens split into widely separated paths, and the layer-to-layer similarity trace always breaks into the same three phases. These patterns disappear when layers are randomly reordered or embeddings are randomized.

Core claim

Viewing the transformer forward pass as a discrete population trajectory and measuring it with five ambient-space metrics shows that semantically related prompts converge in middle-to-late layers, reasoning tasks produce higher-curvature trajectories, ambiguous tokens cause representational bifurcation up to 5.6 times greater than controls, and layerwise cosine similarity exposes a consistent three-phase structure of encoding, elaboration, and output preparation across three model families.

What carries the argument

Layerwise cosine similarity computed on the sequence of layer activations, used to expose the three-phase trajectory structure.

If this is right

  • Semantically related prompts converge significantly in middle-to-late layers.
  • Reasoning tasks produce trajectories of greater curvature than lexical variations.
  • Ambiguous tokens exhibit trajectory bifurcation with up to 5.6x representational separation by the final layer.
  • The three-phase structure holds across GPT-2, TinyLlama, and Qwen2.5.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metrics could flag when a model is handling ambiguity by tracking whether bifurcation appears on new inputs.
  • Curvature differences might serve as a task-agnostic indicator of computational load across prompt types.
  • Re-running the pipeline on models fine-tuned for specific domains would test whether the three phases shift or remain fixed.

Load-bearing premise

The five metrics computed directly in ambient space capture the model's actual computational dynamics rather than incidental features of the embedding geometry or tokenization.

What would settle it

The three-phase pattern in layerwise cosine similarity fails to appear consistently across the three architectures, or the four reported effects remain after shuffled-layer and random-embedding controls are applied.

Figures

Figures reproduced from arXiv: 2606.09287 by Gopal Singh, Vishal Pandey, Yacine Mahdid.

Figure 1
Figure 1. Figure 1: Analytical pipeline. From prompt input to hidden state extraction, high-dimensional metric computation, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Global PCA (left) and UMAP (right) projections of the [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trajectory Convergence Index CI(l) across layers for GPT-2, TinyLlama, and Qwen2.5, plotted on a normalized layer axis. Shaded bands show 95% bootstrap CIs, and the grey band shows the null distribution under C1 (random labels). Non-overlapping CIs in middle-to-late layers confirm statistically significant semantic compression. 5.2 Finding 2: Curvature Encodes Computational Complexity Claim: Reasoning and … view at source ↗
Figure 4
Figure 4. Figure 4: Total trajectory length L(τ ) grouped by prompt family, aggregated across all three models. This figure includes the full five prompt families (F1–F5), showing that reasoning prompts (F4) traverse significantly longer paths than lexical variations (F2) (p < 0.001, d > 1.8). Error bars show 95% bootstrap CIs. For matched unambiguous controls in equivalent syntactic structures, the mean separation ratio is 1… view at source ↗
Figure 5
Figure 5. Figure 5: Trajectory bifurcation signatures for ambiguous vs. unambiguous prompt pairs. Red curve (ambiguous pairs, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: shows SIM(l) across layers for all three models. We identify three phases with consistent proportional boundaries: • Phase I - Encoding (l ≤ ⌊L/4⌋): Low cosine similarity (0.35–0.55 in GPT-2), indicating rapid representational change as shallow contextual structure is established. • Phase II - Elaboration (⌊L/4⌋ < l ≤ ⌊3L/4⌋): Stabilized similarity (0.70–0.85), coinciding with the semantic convergence and … view at source ↗
Figure 7
Figure 7. Figure 7: 2D PCA overlay of trajectory keyframes across five selected layers of GPT-2 Small (12 layers total). All five [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: 2D PCA overlay of trajectory keyframes across five selected layers of TinyLlama (22 layers total). All five [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: 2D PCA overlay of trajectory keyframes across five selected layers of Qwen2.5-1.5B (28 layers total). All five [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold, drawing on geometric tools from computational neuroscience. Rather than probing for pre-specified features, we characterize trajectory geometry using five metrics computed directly in the ambient space: trajectory length, curvature, a semantic convergence index, layerwise cosine similarity, and representational stability. Across three model families (GPT-2, TinyLlama, Qwen2.5) and five controlled prompt families, we report four findings. First, semantically related prompts converge significantly in middle-to-late layers (peak CI 0.41--0.58, p<0.001, Mann-Whitney U), consistent with attractor-like dynamics. Second, reasoning tasks produce trajectories of greater curvature than lexical variations (0.71--0.83 rad vs. 0.27--0.31 rad), suggesting curvature encodes computational complexity. Third, ambiguous tokens exhibit trajectory bifurcation with up to 5.6x representational separation by the final layer, absent in unambiguous controls. Fourth, layerwise cosine similarity reveals a universal three-phase structure: encoding, elaboration, and output preparation, consistent across all three architectures. All four effects vanish under shuffled-layer and random-embedding controls. We release a fully open-source, model-agnostic pipeline and argue that trajectory geometry constitutes a principled, probe-free lens for mechanistic interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript recasts the transformer forward pass as a discrete trajectory through a high-dimensional representation manifold and computes five geometric metrics directly in ambient space (trajectory length, curvature, semantic convergence index, layerwise cosine similarity, representational stability). Across GPT-2, TinyLlama and Qwen2.5 on five prompt families it reports four main findings: semantically related prompts converge in middle-to-late layers (peak CI 0.41-0.58, p<0.001), reasoning tasks yield higher curvature than lexical ones (0.71-0.83 rad vs 0.27-0.31 rad), ambiguous tokens produce up to 5.6x representational separation, and layerwise cosine similarity exhibits a universal three-phase structure (encoding, elaboration, output preparation); all effects vanish under shuffled-layer and random-embedding controls. A model-agnostic open-source pipeline is released.

Significance. If the metrics can be shown to isolate computational dynamics rather than embedding geometry, the work supplies a probe-free, geometry-based lens for mechanistic interpretability together with an immediately usable open pipeline; the reported cross-architecture consistency would then constitute a concrete, falsifiable signature of layer-wise computation.

major comments (3)
  1. [Abstract] Abstract: the exact mathematical definitions (or equations) for curvature and the semantic convergence index are not supplied, yet the text reports precise numerical results (peak CI 0.41-0.58, p<0.001, curvature ranges 0.71-0.83 rad); without these formulas the statistical claims cannot be reproduced or verified.
  2. [Results (layerwise cosine similarity)] Results section on layerwise cosine similarity: the three-phase structure is obtained from direct ambient-space cosine similarity with no mention of per-layer centering, normalization, or dimensionality reduction; because the random-embedding control description leaves open whether initial embedding statistics are preserved, it remains possible that the phases are geometric artifacts rather than signatures of distinct computational stages.
  3. [Methods (controls)] Methods (controls subsection): the random-embedding control is stated to eliminate the reported effects, but the text does not specify whether the embedding matrix is replaced by an independent random matrix of the same shape while keeping the identical vocabulary and token-frequency distribution; if the latter properties are retained, the control does not isolate embedding geometry from layer-wise computation.
minor comments (2)
  1. [Abstract] Abstract: the five metrics are listed but no forward reference is given to the sections or equations that define them.
  2. [Results] Throughout: reported p-values and effect sizes are not accompanied by exact sample sizes, degrees of freedom, or full test statistics beyond the Mann-Whitney U label.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, with clarifications and commitments to revision where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the exact mathematical definitions (or equations) for curvature and the semantic convergence index are not supplied, yet the text reports precise numerical results (peak CI 0.41-0.58, p<0.001, curvature ranges 0.71-0.83 rad); without these formulas the statistical claims cannot be reproduced or verified.

    Authors: We agree that the absence of explicit definitions in the abstract limits immediate reproducibility. In the revised manuscript we will insert concise mathematical definitions for both quantities (curvature as the angle between consecutive discrete velocity vectors; semantic convergence index as the mean pairwise cosine similarity across semantically related trajectories) directly into the abstract while respecting length limits. revision: yes

  2. Referee: [Results (layerwise cosine similarity)] Results section on layerwise cosine similarity: the three-phase structure is obtained from direct ambient-space cosine similarity with no mention of per-layer centering, normalization, or dimensionality reduction; because the random-embedding control description leaves open whether initial embedding statistics are preserved, it remains possible that the phases are geometric artifacts rather than signatures of distinct computational stages.

    Authors: The layerwise cosine similarity is deliberately computed in raw ambient space without centering, normalization or reduction in order to characterize the native geometry; we will add an explicit statement of this design choice and its rationale to the results section. For the random-embedding control we will expand the description to state that the embedding matrix is replaced by a random matrix of identical shape whose entries are sampled from a distribution matching the original mean and variance (while the vocabulary and prompt token frequencies are unchanged). This control is intended to demonstrate that the three-phase pattern requires the specific learned embedding geometry rather than generic high-dimensional properties. revision: yes

  3. Referee: [Methods (controls)] Methods (controls subsection): the random-embedding control is stated to eliminate the reported effects, but the text does not specify whether the embedding matrix is replaced by an independent random matrix of the same shape while keeping the identical vocabulary and token-frequency distribution; if the latter properties are retained, the control does not isolate embedding geometry from layer-wise computation.

    Authors: We will revise the methods section to give the precise specification requested: the learned embedding matrix is replaced by an independent random matrix of the same shape, with entries drawn from a normal distribution whose first two moments match those of the original embeddings, while the vocabulary and the token-frequency distribution induced by the prompt set are retained. We maintain that this construction isolates the contribution of the trained embedding geometry because the transformer weights and layer computations remain fixed; the fact that all reported effects disappear under this randomization indicates that the patterns arise from the interaction between the learned embeddings and the subsequent layers rather than from arbitrary embedding-space geometry alone. revision: yes

Circularity Check

0 steps flagged

No circularity: metrics are direct computations from activations; findings are empirical observations.

full rationale

The paper defines its five metrics (trajectory length, curvature, semantic convergence index, layerwise cosine similarity, representational stability) explicitly as quantities computed directly in the ambient space from layer activations in the transformer forward pass. The central claim of a universal three-phase structure is an empirical pattern observed in the layerwise cosine similarity values across models, with no equations, fitted parameters, or self-citations that reduce any result to its own inputs by construction. Controls (shuffled-layer, random-embedding) are external to the metric definitions. No load-bearing step matches any enumerated circularity pattern; the derivation chain consists of direct measurement and observation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; ledger therefore limited to claims explicitly stated in the abstract. No free parameters or invented entities are mentioned. Two domain assumptions are required for the central framing.

axioms (2)
  • domain assumption The transformer forward pass can be recast as a discrete population trajectory through a high-dimensional representation manifold
    Opening sentence of the abstract states this recasting as the starting point.
  • domain assumption The five listed metrics (trajectory length, curvature, semantic convergence index, layerwise cosine similarity, representational stability) computed directly in ambient space characterize the geometry of these trajectories
    Abstract states the metrics are computed directly and used to report the four findings.

pith-pipeline@v0.9.1-grok · 5796 in / 1512 out tokens · 24586 ms · 2026-06-27T16:57:44.639789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 3 canonical work pages · 3 internal anchors

  1. [1]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, 2017

  2. [2]

    In-context learning and induction heads

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

  3. [3]

    A mathematical framework for transformer circuits.Transformer Circuits Thread, 2021

    Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

  4. [4]

    BERT rediscovers the classical NLP pipeline

    Ian Tenney, Dipanjan Das, and Ellie Pavlick. BERT rediscovers the classical NLP pipeline. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, 2019. 12 Trajectory Geometry of Transformer Representations Across Layers

  5. [5]

    What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, 2019

    Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, 2019

  6. [6]

    Computation through neural population dynamics.Annual Review of Neuroscience, 43:249–275, 2020

    Saurabh Vyas, Matthew D Golub, David Sussillo, and Krishna V Shenoy. Computation through neural population dynamics.Annual Review of Neuroscience, 43:249–275, 2020

  7. [7]

    Dimensionality reduction for large-scale neural recordings.Nature Neuroscience, 17(11):1500–1509, 2014

    John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural recordings.Nature Neuroscience, 17(11):1500–1509, 2014

  8. [8]

    Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 1(8), 2019

  9. [9]

    TinyLlama: An Open-Source Small Language Model

    Peiyuan Zhang, Guangtao Zeng, Tianhao Wang, and Wei Lu. TinyLlama: An open-source small language model. arXiv preprint arXiv:2401.02385, 2024

  10. [10]

    Qwen2.5 Technical Report

    Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2025

  11. [11]

    Locating and editing factual associations in GPT

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems, volume 35, pages 17359–17372, 2022

  12. [12]

    Transformer feed-forward layers are key-value memories

    Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. InConference on Empirical Methods in Natural Language Processing, pages 9484–9495, 2021

  13. [13]

    Toy models of superposition.Transformer Circuits Thread, 2022

    Nelson Elhage, Tom Henighan, Nicholas Joseph, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. Toy models of superposition.Transformer C...

  14. [14]

    Progress measures for grokking via mechanistic interpretability

    Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. InInternational Conference on Learning Representations, 2023

  15. [15]

    Understanding intermediate layers using linear classifier probes

    Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes. In International Conference on Learning Representations Workshop, 2017

  16. [16]

    Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space

    Mor Geva, Avi Caciularu, Kevin Wang, and Yoav Goldberg. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. InConference on Empirical Methods in Natural Language Processing, pages 30–45, 2022

  17. [17]

    Interpreting GPT: The logit lens

    nostalgebraist. Interpreting GPT: The logit lens. LessWrong, 2020

  18. [18]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational Conference on Machine Learning, pages 3519–3529, 2019

  19. [19]

    The platonic representation hypothesis

    Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. The platonic representation hypothesis. In International Conference on Machine Learning, 2024

  20. [20]

    Representational similarity analysis — connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

    Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis — connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

  21. [21]

    Cortical control of arm movements: A dynamical systems perspective.Annual Review of Neuroscience, 36:337–359, 2013

    Krishna V Shenoy, Maneesh Sahani, and Mark M Churchland. Cortical control of arm movements: A dynamical systems perspective.Annual Review of Neuroscience, 36:337–359, 2013

  22. [22]

    Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics.Neuron, 98(5):1005–1019, 2018

    Evan D Remington, Devika Narain, Eghbal A Hosseini, and Mehrdad Jazayeri. Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics.Neuron, 98(5):1005–1019, 2018

  23. [23]

    Neural networks and physical systems with emergent collective computational abilities.Proceed- ings of the National Academy of Sciences, 79(8):2554–2558, 1982

    John J Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceed- ings of the National Academy of Sciences, 79(8):2554–2558, 1982

  24. [24]

    Neural manifolds for the control of movement

    Juan A Gallego, Matthew G Perich, Lee E Miller, and Sara A Solla. Neural manifolds for the control of movement. Neuron, 94(5):978–984, 2017

  25. [25]

    Neural circuit dynamics for flexible sensorimotor mapping.Nature Neuroscience, 18(7):1025–1033, 2015

    David Sussillo, Mark M Churchland, Matthew T Kaufman, and Krishna V Shenoy. Neural circuit dynamics for flexible sensorimotor mapping.Nature Neuroscience, 18(7):1025–1033, 2015

  26. [26]

    SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability

    Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. InAdvances in Neural Information Processing Systems, volume 30, 2017

  27. [27]

    Insights on representational similarity in neural networks with canonical correlation

    Ari S Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation. InAdvances in Neural Information Processing Systems, volume 31, 2018

  28. [28]

    Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020

    Vardan Papyan, X Y Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020. 13 Trajectory Geometry of Transformer Representations Across Layers

  29. [29]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018

  30. [30]

    RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

    Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

  31. [31]

    Transformers: State-of-the-art natural language processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M Rush. Transformers: State-of-the-art ...

  32. [32]

    A proposal on machine learning via dynamical systems.Communications in Mathematics and Statistics, 5(1):1–11, 2017

    Weinan E. A proposal on machine learning via dynamical systems.Communications in Mathematics and Statistics, 5(1):1–11, 2017

  33. [33]

    American Mathematical Society, Providence, RI, 2010

    Herbert Edelsbrunner and John Harer.Computational Topology: An Introduction. American Mathematical Society, Providence, RI, 2010

  34. [34]

    Linearity of relation decoding in transformer language models.International Conference on Learning Representations, 2024

    Evan Hernandez, Kevin Meng, Vishaal Suresh, Usha Sharma, Martin Wattenberg, Jacob Andreas, and Yonatan Belinkov. Linearity of relation decoding in transformer language models.International Conference on Learning Representations, 2024

  35. [35]

    Navigating the neural space in search of the neural code.Neuron, 93(5):1003– 1014, 2017

    Mehrdad Jazayeri and Arash Afraz. Navigating the neural space in search of the neural code.Neuron, 93(5):1003– 1014, 2017

  36. [36]

    Westview Press, Cambridge, MA, 1994

    Steven H Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, Cambridge, MA, 1994. Appendix The appendix contains: (A) extended per-model statistical results, (B) the complete prompt dataset, (C) trajec- tory animation frames, and (D) full reproducibility details. All raw outputs, CSVs, ...