Finite-Lag Operator Geometry of Recurrent Representations

Kanishka Reddy

arxiv: 2607.01746 · v1 · pith:H3HZ7EDYnew · submitted 2026-07-02 · 💻 cs.LG

Finite-Lag Operator Geometry of Recurrent Representations

Kanishka Reddy This is my paper

Pith reviewed 2026-07-03 17:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords finite-lag operator geometryrecurrent representationsconditional transport lawsource-centered transport tensorcoordinate circulationdeterministic recurrent motioncarre-du-champ geometrydense Gaussian estimator

0 comments

The pith

A finite-lag conditional transport law from source-successor pairs decomposes recurrent dynamics into conditional spread and coherent displacement plus directed circulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a geometry for recurrent hidden states that treats them as trajectories observed at finite time lags rather than as static points. It begins with the conditional transport law Q_Δ(dy|x) between a source state and its successor after lag Δ, which is estimated by a dense Gaussian smoothing operator on observed pairs. From this law the authors derive a source-centered transport tensor G_Δ that splits exactly into a spread component and a coherent displacement component, together with an antisymmetric circulation tensor W_Δ^ρ that records net directed flow. They establish affine covariance of the derived quantities, stability of the dense estimator on bounded trajectory clouds, and a separation theorem showing that the finite-lag objects detect deterministic recurrent motion invisible to infinitesimal carre-du-champ geometry. The linear-Gaussian case supplies an explicit calibration in terms of the update matrix and covariances, and controlled experiments confirm the decomposition and the architecture-dependent differences it reveals in repeat-copy networks.

Core claim

From the directed finite-lag law we derive a source-centered transport tensor G_Δ, which decomposes exactly into conditional spread and coherent displacement, and an antisymmetric coordinate circulation W_Δ^ρ, which summarizes directed lagged flow. We prove affine covariance with explicit metric dependence of scalar summaries, dense estimator stability on bounded trajectory clouds, and a finite-lag separation result showing that source-centered transport detects deterministic recurrent motion not recorded by infinitesimal carre-du-champ geometry.

What carries the argument

The conditional transport law Q_Δ(dy|x) estimated by a dense Gaussian source-smoothing operator, which produces the source-centered transport tensor G_Δ and the antisymmetric circulation W_Δ^ρ.

If this is right

G_Δ is affine covariant and its scalar summaries depend explicitly on the chosen metric.
The dense estimator for G_Δ and W_Δ^ρ remains stable whenever trajectories remain inside a bounded cloud.
Source-centered transport separates deterministic recurrent motion from noise in a way infinitesimal geometry cannot.
In the linear-Gaussian case the quantities reduce to closed-form expressions involving the update matrix A_Δ, source covariance, and innovation covariance.
Architecture-dependent differences appear in total transport scale and coherent displacement trace when the same task is solved by different repeat-copy networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition could be used to compare internal flow structure across recurrent architectures even when task performance is matched.
Because the circulation term is antisymmetric, it may serve as a diagnostic for directed information flow that is invisible to symmetric distance-based measures.
Metric dependence of the scalar summaries suggests that practitioners should select the embedding distance according to the physical or representational scale they wish to emphasize.
The finite-lag separation result raises the question of whether similar lag-based operators can be defined for non-Euclidean state spaces common in modern sequence models.

Load-bearing premise

The conditional transport law can be reliably estimated from observed source-successor pairs by a dense Gaussian source-smoothing operator, which requires bounded trajectory clouds.

What would settle it

In a linear-Gaussian recurrent system engineered to contain deterministic periodic orbits, measure whether the coherent-displacement trace of G_Δ is detectably positive while the corresponding carre-du-champ quadratic form remains zero.

Figures

Figures reproduced from arXiv: 2607.01746 by Kanishka Reddy.

**Figure 2.** Figure 2: Phase-resolved finite-lag geometry on repeat-copy with delay [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Controlled circulation sweep. The Frobenius norm of coordinate circulation increases with [PITH_FULL_IMAGE:figures/full_fig_p035_3.png] view at source ↗

read the original abstract

Recurrent representations are trajectories, but representation geometry is often measured from static snapshots. We develop finite-lag operator geometry for recurrent hidden states from observed source-successor pairs $(X_t,X_{t+\Delta})$. The primitive is the conditional transport law $Q_\Delta(dy\mid x)$, estimated by a dense Gaussian source-smoothing operator. From this directed finite-lag law we derive a source-centered transport tensor $G_\Delta$, which decomposes exactly into conditional spread and coherent displacement, and an antisymmetric coordinate circulation $W_\Delta^\rho$, which summarizes directed lagged flow. We prove affine covariance with explicit metric dependence of scalar summaries, dense estimator stability on bounded trajectory clouds, and a finite-lag separation result showing that source-centered transport detects deterministic recurrent motion not recorded by infinitesimal carre-du-champ geometry. A linear-Gaussian closed form calibrates the quantities in terms of the update $A_\Delta$, source covariance, and innovation covariance. Controlled experiments validate the decomposition, circulation, covariance, and stability predictions. In performance matched repeat-copy networks, the framework reveals architecture dependent differences in total transport scale and coherent displacement trace, while coherent displacement fraction is metric and resolution dependent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a finite-lag transport geometry for recurrent states that exactly decomposes into spread and coherent displacement and separates from carre-du-champ geometry.

read the letter

The main thing here is a finite-lag operator geometry built from observed source-successor pairs in recurrent hidden states. They estimate a conditional transport law via dense Gaussian smoothing, then derive a source-centered tensor G_Δ that splits exactly into conditional spread and coherent displacement, plus an antisymmetric circulation W_Δ^ρ for directed lagged flow. They prove affine covariance with explicit metric dependence, estimator stability on bounded trajectory clouds, and a separation result showing this detects deterministic recurrent motion missed by infinitesimal carre-du-champ geometry. A linear-Gaussian closed form ties everything to the update matrix and covariances.

The decomposition is exact by construction and the proofs cover the stated properties. Controlled experiments check the predictions, and the repeat-copy network tests show architecture-dependent differences in transport scale and displacement trace, with the fraction being metric and resolution dependent.

The soft spot is the Gaussian smoothing assumption for the transport law. Stability holds only under bounded trajectory clouds, which is stated but narrows the practical range. If trajectories are sparse or unbounded the estimator may degrade, and the metric dependence means some scalars require care when comparing across setups. The separation claim rests on the finite-lag construction, which looks distinct on the given description.

This is for people working on dynamics in recurrent representations who need lagged measures beyond static snapshots. A reader focused on operator geometry or transport methods would find the decomposition and separation useful.

It deserves peer review. The claims are specific, the math is laid out for the linear case, and the experiments are targeted enough for referees to evaluate the derivations and the claimed distinctions.

Referee Report

0 major / 1 minor

Summary. The paper claims to develop finite-lag operator geometry for recurrent hidden states from observed source-successor pairs (X_t, X_{t+Δ}). The primitive is the conditional transport law Q_Δ(dy|x) estimated by a dense Gaussian source-smoothing operator. From this it derives the source-centered transport tensor G_Δ, which decomposes exactly into conditional spread and coherent displacement, and the antisymmetric coordinate circulation W_Δ^ρ summarizing directed lagged flow. It proves affine covariance with explicit metric dependence of scalar summaries, dense estimator stability on bounded trajectory clouds, and a finite-lag separation result showing source-centered transport detects deterministic recurrent motion not recorded by infinitesimal carre-du-champ geometry. A linear-Gaussian closed form calibrates the quantities in terms of the update A_Δ, source covariance, and innovation covariance. Controlled experiments validate the decomposition, circulation, covariance, and stability predictions, and the framework is applied to performance-matched repeat-copy networks.

Significance. If the derivations hold, the work supplies a new geometric framework for analyzing directed lagged flows in recurrent representations, with an explicit decomposition of transport and a proved distinction from existing infinitesimal geometry. The linear-Gaussian closed form and the controlled experiments validating multiple predictions are concrete strengths that support calibration and empirical checks.

minor comments (1)

The description of the dense Gaussian source-smoothing operator used to estimate Q_Δ(dy|x) would benefit from an explicit statement of the bandwidth selection procedure and its sensitivity, as this directly affects the practical estimator whose stability is claimed.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and positive assessment of the manuscript, including the recognition of the derivations, the linear-Gaussian closed form, the controlled experiments, and the distinction from infinitesimal geometry. The recommendation for minor revision is appreciated. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation begins from the externally estimated conditional transport law Q_Δ(dy|x) obtained from observed source-successor pairs via a dense Gaussian source-smoothing operator. All subsequent objects (G_Δ, W_Δ^ρ, scalar summaries, affine covariance, stability bounds, and the finite-lag separation from carre-du-champ geometry) are obtained by explicit algebraic decomposition or proved consequences of this law under the stated bounded-trajectory-cloud assumption. The linear-Gaussian closed form is a direct specialization of the same transport law rather than a fitted parameter renamed as a prediction. No self-citation chain, self-definitional loop, or ansatz smuggled via prior work is present; the central claims remain independent of the target results and are externally falsifiable on the observed pairs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Based solely on the abstract, the framework rests on the Gaussian smoothing estimator and bounded-cloud stability; new tensors and circulation are introduced without independent evidence outside the derivation.

axioms (2)

standard math Affine covariance of scalar summaries with explicit metric dependence
Stated as proved in the paper for the transport tensor
domain assumption Dense estimator stability on bounded trajectory clouds
Invoked to support the Gaussian source-smoothing operator

invented entities (2)

source-centered transport tensor G_Δ no independent evidence
purpose: Decomposes conditional transport into spread and coherent displacement
New object derived from the finite-lag law
antisymmetric coordinate circulation W_Δ^ρ no independent evidence
purpose: Summarizes directed lagged flow
New object derived from the finite-lag law

pith-pipeline@v0.9.1-grok · 5727 in / 1389 out tokens · 27298 ms · 2026-07-03T17:39:21.981832+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 28 canonical work pages

[1]

SVCCA: Singu- lar vector canonical correlation analysis for deep learning dynamics and interpretability

Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. SVCCA: Singu- lar vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems, volume 30, pages 6076–6085, 2017

2017
[2]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 3519–3529. PMLR, 2019

2019
[3]

Macke, and Davide Zoccolan

Alessio Ansuini, Alessandro Laio, Jakob H. Macke, and Davide Zoccolan. Intrinsic dimension of data representations in deep neural networks. InAdvances in Neural Information Processing Systems, volume 32, 2019

2019
[4]

Lee, and Haim Sompolinsky

Uri Cohen, SueYeon Chung, Daniel D. Lee, and Haim Sompolinsky. Separability and geometry of object manifolds in deep neural networks.Nature Communications, 11(1):746, 2020. doi: 10.1038/s41467-020-14578-5

work page doi:10.1038/s41467-020-14578-5 2020
[5]

Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020. doi: 10.1073/pnas.2015509117

work page doi:10.1073/pnas.2015509117 2020
[6]

Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, and Smita Krishnaswamy

Danqi Liao, Chen Liu, Benjamin W. Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, and Smita Krishnaswamy. Assessing neural network representations during training using noise-resilient diffusion spectral entropy, 2023

2023
[7]

Steindl, Selma Mazioud, Ellie Schueler, Folu Ogundipe, Ellen Zhang, Yvan Grinspan, Kristof Reimann, Peyton Crevasse, Dhananjay Bhaskar, Siddharth Viswanath, Yanlei Zhang, Tim G

Elliott Abel, Andrew J. Steindl, Selma Mazioud, Ellie Schueler, Folu Ogundipe, Ellen Zhang, Yvan Grinspan, Kristof Reimann, Peyton Crevasse, Dhananjay Bhaskar, Siddharth Viswanath, Yanlei Zhang, Tim G. J. Rudner, Ian Adelstein, and Smita Krishnaswamy. Exploring the manifold of neural networks using diffusion geometry, 2024

2024
[8]

Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003. doi: 10.1162/ 089976603321780317

2003
[9]

Coifman and Stéphane Lafon

Ronald R. Coifman and Stéphane Lafon. Diffusion maps.Applied and Computational Harmonic Analysis, 21(1):5–30, 2006. doi: 10.1016/j.acha.2006.04.006

work page doi:10.1016/j.acha.2006.04.006 2006
[10]

Diffusion operator geometry of feedforward representations, 2026

Kanishka Reddy. Diffusion operator geometry of feedforward representations, 2026

2026
[11]

Peter J. Schmid. Dynamic mode decomposition of numerical and experimental data.Journal of Fluid Mechanics, 656:5–28, 2010. doi: 10.1017/S0022112010001217

work page doi:10.1017/s0022112010001217 2010
[12]

Williams, Ioannis G

Matthew O. Williams, Ioannis G. Kevrekidis, and Clarence W. Rowley. A data-driven ap- proximation of the koopman operator: Extending dynamic mode decomposition.Journal of Nonlinear Science, 25(6):1307–1346, 2015. doi: 10.1007/s00332-015-9258-5

work page doi:10.1007/s00332-015-9258-5 2015
[13]

Identification of slow molecular order parameters for markov model construction.The Journal of Chemical Physics, 139(1):015102, 2013

Guillermo Pérez-Hernández, Fabian Paul, Toni Giorgino, Gianni De Fabritiis, and Frank Noé. Identification of slow molecular order parameters for markov model construction.The Journal of Chemical Physics, 139(1):015102, 2013. doi: 10.1063/1.4811489

work page doi:10.1063/1.4811489 2013
[14]

Schwantes and Vijay S

Christian R. Schwantes and Vijay S. Pande. Improvements in markov state model construction reveal many non-native interactions in the folding of ntl9.Journal of Chemical Theory and Computation, 9(4):2000–2009, 2013. doi: 10.1021/ct300878a

work page doi:10.1021/ct300878a 2000
[15]

Variational approach for learning markov processes from time series data.Journal of Nonlinear Science, 30(1):23–66, 2020

Hao Wu and Frank Noé. Variational approach for learning markov processes from time series data.Journal of Nonlinear Science, 30(1):23–66, 2020. doi: 10.1007/s00332-019-09567-y

work page doi:10.1007/s00332-019-09567-y 2020
[16]

V AMPnets for deep learning of molecular kinetics.Nature Communications, 9(1):5, 2018

Andreas Mardt, Luca Pasquali, Hao Wu, and Frank Noé. V AMPnets for deep learning of molecular kinetics.Nature Communications, 9(1):5, 2018. doi: 10.1038/s41467-017-02388-1

work page doi:10.1038/s41467-017-02388-1 2018
[17]

Diffusions hypercontractives.Séminaire de Probabilités XIX 1983/84, 1123:177–206, 1985

Dominique Bakry and Michel Émery. Diffusions hypercontractives.Séminaire de Probabilités XIX 1983/84, 1123:177–206, 1985. doi: 10.1007/BFb0075847. 10

work page doi:10.1007/bfb0075847 1983
[18]

Springer, 2014

Dominique Bakry, Ivan Gentil, and Michel Ledoux.Analysis and Geometry of Markov Diffusion Operators, volume 348 ofGrundlehren der mathematischen Wissenschaften. Springer, 2014. doi: 10.1007/978-3-319-00227-9

work page doi:10.1007/978-3-319-00227-9 2014
[19]

Coifman, and Ioannis G

Boaz Nadler, Stephane Lafon, Ronald R. Coifman, and Ioannis G. Kevrekidis. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems.Applied and Computational Harmonic Analysis, 21(1):113–127, 2006. doi: 10.1016/j.acha.2005.07.004

work page doi:10.1016/j.acha.2005.07.004 2006
[20]

Diffusion geometry, 2024

Iolo Jones. Diffusion geometry, 2024

2024
[21]

Manifold diffusion geometry: Curvature, tangent spaces, and dimension, 2024

Iolo Jones. Manifold diffusion geometry: Curvature, tangent spaces, and dimension, 2024

2024
[22]

Computing diffusion geometry, 2026

Iolo Jones and David Lanners. Computing diffusion geometry, 2026

2026
[23]

Williams, Clarence W

Matthew O. Williams, Clarence W. Rowley, and Ioannis G. Kevrekidis. A kernel-based method for data-driven koopman spectral analysis.Journal of Computational Dynamics, 2(2):247–265,
[24]

doi: 10.3934/jcd.2015005

work page doi:10.3934/jcd.2015005
[25]

Data-driven model reduction and transfer operator approximation.Journal of Nonlinear Science, 28(3):985–1010, 2018

Stefan Klus, Feliks Nuske, Péter Koltai, Hao Wu, Ioannis Kevrekidis, Christof Schutte, and Frank Noé. Data-driven model reduction and transfer operator approximation.Journal of Nonlinear Science, 28(3):985–1010, 2018. doi: 10.1007/s00332-017-9437-7

work page doi:10.1007/s00332-017-9437-7 2018
[26]

Eigendecompositions of transfer opera- tors in reproducing kernel hilbert spaces.Journal of Nonlinear Science, 30:283–315, 2020

Stefan Klus, Ingmar Schuster, and Krikamol Muandet. Eigendecompositions of transfer opera- tors in reproducing kernel hilbert spaces.Journal of Nonlinear Science, 30:283–315, 2020. doi: 10.1007/s00332-019-09574-z

work page doi:10.1007/s00332-019-09574-z 2020
[27]

An analytic framework for identifying finite-time coherent sets in time- dependent dynamical systems.Physica D: Nonlinear Phenomena, 250:1–19, 2013

Gary Froyland. An analytic framework for identifying finite-time coherent sets in time- dependent dynamical systems.Physica D: Nonlinear Phenomena, 250:1–19, 2013. doi: 10.1016/j.physd.2013.01.013

work page doi:10.1016/j.physd.2013.01.013 2013
[28]

Dynamic isoperimetry and the geometry of lagrangian coherent structures

Gary Froyland. Dynamic isoperimetry and the geometry of lagrangian coherent structures. Nonlinearity, 28(10):3587–3622, 2015. doi: 10.1088/0951-7715/28/10/3587

work page doi:10.1088/0951-7715/28/10/3587 2015
[29]

Coifman and Matthew J

Ronald R. Coifman and Matthew J. Hirn. Diffusion maps for changing data.Applied and Computational Harmonic Analysis, 36(1):79–107, 2014. doi: 10.1016/j.acha.2013.03.001

work page doi:10.1016/j.acha.2013.03.001 2014
[30]

Marshall and Matthew J

Nicholas F. Marshall and Matthew J. Hirn. Time coupled diffusion maps.Applied and Computational Harmonic Analysis, 45(3):709–728, 2018. doi: 10.1016/j.acha.2017.08.007

work page doi:10.1016/j.acha.2017.08.007 2018
[31]

Opening the black box: Low-dimensional dynamics in high- dimensional recurrent neural networks

David Sussillo and Omri Barak. Opening the black box: Low-dimensional dynamics in high- dimensional recurrent neural networks. InAdvances in Neural Information Processing Systems, volume 26, 2013

2013
[32]

Neural circuits as computational dynamical systems.Current Opinion in Neurobiology, 25:156–163, 2014

David Sussillo. Neural circuits as computational dynamical systems.Current Opinion in Neurobiology, 25:156–163, 2014. doi: 10.1016/j.conb.2014.01.008

work page doi:10.1016/j.conb.2014.01.008 2014
[33]

McIntosh, David B

Niru Maheswaranathan, Lane T. McIntosh, David B. Kastner, Josh Melander, L. E. Brezovec, Aran Nayebi, Julia Wang, Surya Ganguli, and Stephen A. Baccus. Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics, 2019

2019
[34]

Williams, Matthew D

Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, and David Sussillo. Universality and individuality in neural dynamics across large populations of recurrent networks. InAdvances in Neural Information Processing Systems, volume 32, 2019

2019
[35]

Rylan Schaeffer, Mikail Khona, Leenoy Meshulam, International Brain Laboratory, and Ila R. Fiete. Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice. InAdvances in Neural Information Processing Systems, volume 33, pages 4584–4596, 2020

2020
[36]

Jimmy T. H. Smith, Scott W. Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. InAdvances in Neural Information Processing Systems, volume 34, pages 16700–16713, 2021. 11

2021
[37]

Linking connectivity, dynamics, and com- putations in low-rank recurrent neural networks.Neuron, 99(3):609–623.e29, 2018

Francesca Mastrogiuseppe and Srdjan Ostojic. Linking connectivity, dynamics, and com- putations in low-rank recurrent neural networks.Neuron, 99(3):609–623.e29, 2018. doi: 10.1016/j.neuron.2018.07.003

work page doi:10.1016/j.neuron.2018.07.003 2018
[38]

Shaping dynamics with multiple populations in low-rank recurrent networks.Neural Computation, 33(6):1572–1615, 2021

Manuel Beiran, Alexis Dubreuil, Adrian Valente, Francesca Mastrogiuseppe, and Srdjan Os- tojic. Shaping dynamics with multiple populations in low-rank recurrent networks.Neural Computation, 33(6):1572–1615, 2021. doi: 10.1162/neco_a_01381

work page doi:10.1162/neco_a_01381 2021
[39]

Pillow, and Srdjan Ostojic

Adrian Valente, Jonathan W. Pillow, and Srdjan Ostojic. Extracting computational mechanisms from neural data using low-rank RNNs. InAdvances in Neural Information Processing Systems, volume 35, pages 24072–24086, 2022

2022
[40]

Macke, and Omri Barak

Matthijs Pals, Jakob H. Macke, and Omri Barak. Trained recurrent neural networks develop phase-locked limit cycles in a working memory task.PLOS Computational Biology, 20(2): e1011852, 2024. doi: 10.1371/journal.pcbi.1011852

work page doi:10.1371/journal.pcbi.1011852 2024
[41]

Recurrent neural networks with transient trajectory explain working memory en- coding mechanisms.Communications Biology, 8:88, 2025

Chen Liu et al. Recurrent neural networks with transient trajectory explain working memory en- coding mechanisms.Communications Biology, 8:88, 2025. doi: 10.1038/s42003-024-07282-3

work page doi:10.1038/s42003-024-07282-3 2025
[42]

Dynamical phases of short-term memory mechanisms in RNNs

Bariscan Kurtkaya, Fatih Dinc, Mert Yuksekgonul, Marta Blanco-Pozo, Ege Cirakman, Mark Schnitzer, Yucel Yemez, Hidenori Tanaka, Peng Yuan, and Nina Miolane. Dynamical phases of short-term memory mechanisms in RNNs. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 32032–320...

2025
[43]

Sejnowski, and Hava T

Arjun Karuvally, Terrence J. Sejnowski, and Hava T. Siegelmann. Hidden traveling waves bind working memory variables in recurrent neural networks. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research. PMLR, 2024

2024
[44]

Driscoll, Krishna V

Laura N. Driscoll, Krishna V . Shenoy, and David Sussillo. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs.Nature Neuroscience, 27(7):1349–1363,
[45]

doi: 10.1038/s41593-024-01668-6

work page doi:10.1038/s41593-024-01668-6
[46]

Long short-term memory.Neural Computation, 9 (8):1735–1780, 1997

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.Neural Computation, 9 (8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[47]

Learning phrase representations using RNN encoder– decoder for statistical machine translation

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder– decoder for statistical machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1724–1734, 2014. doi: 10.3115/v1/D14-1179

work page doi:10.3115/v1/d14-1179 2014
[48]

Lipschitz regularity of graph laplacians on random data clouds.SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022

Jeff Calder, Nicolás García Trillos, and Marta Lewicka. Lipschitz regularity of graph laplacians on random data clouds.SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022. doi: 10.1137/20M1356610

work page doi:10.1137/20m1356610 2022
[49]

the carré du champ on ht

Daniel Ting, Ling Huang, and Michael I. Jordan. An analysis of the convergence of graph laplacians. InProceedings of the 27th International Conference on Machine Learning, pages 1079–1086, 2010. 12 A Empirical operator construction Here we give the operational details of the empirical finite-lag operatorbP∆ used throughout the paper. The dense Gaussian so...

2010

[1] [1]

SVCCA: Singu- lar vector canonical correlation analysis for deep learning dynamics and interpretability

Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. SVCCA: Singu- lar vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems, volume 30, pages 6076–6085, 2017

2017

[2] [2]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 3519–3529. PMLR, 2019

2019

[3] [3]

Macke, and Davide Zoccolan

Alessio Ansuini, Alessandro Laio, Jakob H. Macke, and Davide Zoccolan. Intrinsic dimension of data representations in deep neural networks. InAdvances in Neural Information Processing Systems, volume 32, 2019

2019

[4] [4]

Lee, and Haim Sompolinsky

Uri Cohen, SueYeon Chung, Daniel D. Lee, and Haim Sompolinsky. Separability and geometry of object manifolds in deep neural networks.Nature Communications, 11(1):746, 2020. doi: 10.1038/s41467-020-14578-5

work page doi:10.1038/s41467-020-14578-5 2020

[5] [5]

Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020. doi: 10.1073/pnas.2015509117

work page doi:10.1073/pnas.2015509117 2020

[6] [6]

Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, and Smita Krishnaswamy

Danqi Liao, Chen Liu, Benjamin W. Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, and Smita Krishnaswamy. Assessing neural network representations during training using noise-resilient diffusion spectral entropy, 2023

2023

[7] [7]

Steindl, Selma Mazioud, Ellie Schueler, Folu Ogundipe, Ellen Zhang, Yvan Grinspan, Kristof Reimann, Peyton Crevasse, Dhananjay Bhaskar, Siddharth Viswanath, Yanlei Zhang, Tim G

Elliott Abel, Andrew J. Steindl, Selma Mazioud, Ellie Schueler, Folu Ogundipe, Ellen Zhang, Yvan Grinspan, Kristof Reimann, Peyton Crevasse, Dhananjay Bhaskar, Siddharth Viswanath, Yanlei Zhang, Tim G. J. Rudner, Ian Adelstein, and Smita Krishnaswamy. Exploring the manifold of neural networks using diffusion geometry, 2024

2024

[8] [8]

Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003

Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373–1396, 2003. doi: 10.1162/ 089976603321780317

2003

[9] [9]

Coifman and Stéphane Lafon

Ronald R. Coifman and Stéphane Lafon. Diffusion maps.Applied and Computational Harmonic Analysis, 21(1):5–30, 2006. doi: 10.1016/j.acha.2006.04.006

work page doi:10.1016/j.acha.2006.04.006 2006

[10] [10]

Diffusion operator geometry of feedforward representations, 2026

Kanishka Reddy. Diffusion operator geometry of feedforward representations, 2026

2026

[11] [11]

Peter J. Schmid. Dynamic mode decomposition of numerical and experimental data.Journal of Fluid Mechanics, 656:5–28, 2010. doi: 10.1017/S0022112010001217

work page doi:10.1017/s0022112010001217 2010

[12] [12]

Williams, Ioannis G

Matthew O. Williams, Ioannis G. Kevrekidis, and Clarence W. Rowley. A data-driven ap- proximation of the koopman operator: Extending dynamic mode decomposition.Journal of Nonlinear Science, 25(6):1307–1346, 2015. doi: 10.1007/s00332-015-9258-5

work page doi:10.1007/s00332-015-9258-5 2015

[13] [13]

Identification of slow molecular order parameters for markov model construction.The Journal of Chemical Physics, 139(1):015102, 2013

Guillermo Pérez-Hernández, Fabian Paul, Toni Giorgino, Gianni De Fabritiis, and Frank Noé. Identification of slow molecular order parameters for markov model construction.The Journal of Chemical Physics, 139(1):015102, 2013. doi: 10.1063/1.4811489

work page doi:10.1063/1.4811489 2013

[14] [14]

Schwantes and Vijay S

Christian R. Schwantes and Vijay S. Pande. Improvements in markov state model construction reveal many non-native interactions in the folding of ntl9.Journal of Chemical Theory and Computation, 9(4):2000–2009, 2013. doi: 10.1021/ct300878a

work page doi:10.1021/ct300878a 2000

[15] [15]

Variational approach for learning markov processes from time series data.Journal of Nonlinear Science, 30(1):23–66, 2020

Hao Wu and Frank Noé. Variational approach for learning markov processes from time series data.Journal of Nonlinear Science, 30(1):23–66, 2020. doi: 10.1007/s00332-019-09567-y

work page doi:10.1007/s00332-019-09567-y 2020

[16] [16]

V AMPnets for deep learning of molecular kinetics.Nature Communications, 9(1):5, 2018

Andreas Mardt, Luca Pasquali, Hao Wu, and Frank Noé. V AMPnets for deep learning of molecular kinetics.Nature Communications, 9(1):5, 2018. doi: 10.1038/s41467-017-02388-1

work page doi:10.1038/s41467-017-02388-1 2018

[17] [17]

Diffusions hypercontractives.Séminaire de Probabilités XIX 1983/84, 1123:177–206, 1985

Dominique Bakry and Michel Émery. Diffusions hypercontractives.Séminaire de Probabilités XIX 1983/84, 1123:177–206, 1985. doi: 10.1007/BFb0075847. 10

work page doi:10.1007/bfb0075847 1983

[18] [18]

Springer, 2014

Dominique Bakry, Ivan Gentil, and Michel Ledoux.Analysis and Geometry of Markov Diffusion Operators, volume 348 ofGrundlehren der mathematischen Wissenschaften. Springer, 2014. doi: 10.1007/978-3-319-00227-9

work page doi:10.1007/978-3-319-00227-9 2014

[19] [19]

Coifman, and Ioannis G

Boaz Nadler, Stephane Lafon, Ronald R. Coifman, and Ioannis G. Kevrekidis. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems.Applied and Computational Harmonic Analysis, 21(1):113–127, 2006. doi: 10.1016/j.acha.2005.07.004

work page doi:10.1016/j.acha.2005.07.004 2006

[20] [20]

Diffusion geometry, 2024

Iolo Jones. Diffusion geometry, 2024

2024

[21] [21]

Manifold diffusion geometry: Curvature, tangent spaces, and dimension, 2024

Iolo Jones. Manifold diffusion geometry: Curvature, tangent spaces, and dimension, 2024

2024

[22] [22]

Computing diffusion geometry, 2026

Iolo Jones and David Lanners. Computing diffusion geometry, 2026

2026

[23] [23]

Williams, Clarence W

Matthew O. Williams, Clarence W. Rowley, and Ioannis G. Kevrekidis. A kernel-based method for data-driven koopman spectral analysis.Journal of Computational Dynamics, 2(2):247–265,

[24] [24]

doi: 10.3934/jcd.2015005

work page doi:10.3934/jcd.2015005

[25] [25]

Data-driven model reduction and transfer operator approximation.Journal of Nonlinear Science, 28(3):985–1010, 2018

Stefan Klus, Feliks Nuske, Péter Koltai, Hao Wu, Ioannis Kevrekidis, Christof Schutte, and Frank Noé. Data-driven model reduction and transfer operator approximation.Journal of Nonlinear Science, 28(3):985–1010, 2018. doi: 10.1007/s00332-017-9437-7

work page doi:10.1007/s00332-017-9437-7 2018

[26] [26]

Eigendecompositions of transfer opera- tors in reproducing kernel hilbert spaces.Journal of Nonlinear Science, 30:283–315, 2020

Stefan Klus, Ingmar Schuster, and Krikamol Muandet. Eigendecompositions of transfer opera- tors in reproducing kernel hilbert spaces.Journal of Nonlinear Science, 30:283–315, 2020. doi: 10.1007/s00332-019-09574-z

work page doi:10.1007/s00332-019-09574-z 2020

[27] [27]

An analytic framework for identifying finite-time coherent sets in time- dependent dynamical systems.Physica D: Nonlinear Phenomena, 250:1–19, 2013

Gary Froyland. An analytic framework for identifying finite-time coherent sets in time- dependent dynamical systems.Physica D: Nonlinear Phenomena, 250:1–19, 2013. doi: 10.1016/j.physd.2013.01.013

work page doi:10.1016/j.physd.2013.01.013 2013

[28] [28]

Dynamic isoperimetry and the geometry of lagrangian coherent structures

Gary Froyland. Dynamic isoperimetry and the geometry of lagrangian coherent structures. Nonlinearity, 28(10):3587–3622, 2015. doi: 10.1088/0951-7715/28/10/3587

work page doi:10.1088/0951-7715/28/10/3587 2015

[29] [29]

Coifman and Matthew J

Ronald R. Coifman and Matthew J. Hirn. Diffusion maps for changing data.Applied and Computational Harmonic Analysis, 36(1):79–107, 2014. doi: 10.1016/j.acha.2013.03.001

work page doi:10.1016/j.acha.2013.03.001 2014

[30] [30]

Marshall and Matthew J

Nicholas F. Marshall and Matthew J. Hirn. Time coupled diffusion maps.Applied and Computational Harmonic Analysis, 45(3):709–728, 2018. doi: 10.1016/j.acha.2017.08.007

work page doi:10.1016/j.acha.2017.08.007 2018

[31] [31]

Opening the black box: Low-dimensional dynamics in high- dimensional recurrent neural networks

David Sussillo and Omri Barak. Opening the black box: Low-dimensional dynamics in high- dimensional recurrent neural networks. InAdvances in Neural Information Processing Systems, volume 26, 2013

2013

[32] [32]

Neural circuits as computational dynamical systems.Current Opinion in Neurobiology, 25:156–163, 2014

David Sussillo. Neural circuits as computational dynamical systems.Current Opinion in Neurobiology, 25:156–163, 2014. doi: 10.1016/j.conb.2014.01.008

work page doi:10.1016/j.conb.2014.01.008 2014

[33] [33]

McIntosh, David B

Niru Maheswaranathan, Lane T. McIntosh, David B. Kastner, Josh Melander, L. E. Brezovec, Aran Nayebi, Julia Wang, Surya Ganguli, and Stephen A. Baccus. Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics, 2019

2019

[34] [34]

Williams, Matthew D

Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, and David Sussillo. Universality and individuality in neural dynamics across large populations of recurrent networks. InAdvances in Neural Information Processing Systems, volume 32, 2019

2019

[35] [35]

Rylan Schaeffer, Mikail Khona, Leenoy Meshulam, International Brain Laboratory, and Ila R. Fiete. Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice. InAdvances in Neural Information Processing Systems, volume 33, pages 4584–4596, 2020

2020

[36] [36]

Jimmy T. H. Smith, Scott W. Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. InAdvances in Neural Information Processing Systems, volume 34, pages 16700–16713, 2021. 11

2021

[37] [37]

Linking connectivity, dynamics, and com- putations in low-rank recurrent neural networks.Neuron, 99(3):609–623.e29, 2018

Francesca Mastrogiuseppe and Srdjan Ostojic. Linking connectivity, dynamics, and com- putations in low-rank recurrent neural networks.Neuron, 99(3):609–623.e29, 2018. doi: 10.1016/j.neuron.2018.07.003

work page doi:10.1016/j.neuron.2018.07.003 2018

[38] [38]

Shaping dynamics with multiple populations in low-rank recurrent networks.Neural Computation, 33(6):1572–1615, 2021

Manuel Beiran, Alexis Dubreuil, Adrian Valente, Francesca Mastrogiuseppe, and Srdjan Os- tojic. Shaping dynamics with multiple populations in low-rank recurrent networks.Neural Computation, 33(6):1572–1615, 2021. doi: 10.1162/neco_a_01381

work page doi:10.1162/neco_a_01381 2021

[39] [39]

Pillow, and Srdjan Ostojic

Adrian Valente, Jonathan W. Pillow, and Srdjan Ostojic. Extracting computational mechanisms from neural data using low-rank RNNs. InAdvances in Neural Information Processing Systems, volume 35, pages 24072–24086, 2022

2022

[40] [40]

Macke, and Omri Barak

Matthijs Pals, Jakob H. Macke, and Omri Barak. Trained recurrent neural networks develop phase-locked limit cycles in a working memory task.PLOS Computational Biology, 20(2): e1011852, 2024. doi: 10.1371/journal.pcbi.1011852

work page doi:10.1371/journal.pcbi.1011852 2024

[41] [41]

Recurrent neural networks with transient trajectory explain working memory en- coding mechanisms.Communications Biology, 8:88, 2025

Chen Liu et al. Recurrent neural networks with transient trajectory explain working memory en- coding mechanisms.Communications Biology, 8:88, 2025. doi: 10.1038/s42003-024-07282-3

work page doi:10.1038/s42003-024-07282-3 2025

[42] [42]

Dynamical phases of short-term memory mechanisms in RNNs

Bariscan Kurtkaya, Fatih Dinc, Mert Yuksekgonul, Marta Blanco-Pozo, Ege Cirakman, Mark Schnitzer, Yucel Yemez, Hidenori Tanaka, Peng Yuan, and Nina Miolane. Dynamical phases of short-term memory mechanisms in RNNs. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 32032–320...

2025

[43] [43]

Sejnowski, and Hava T

Arjun Karuvally, Terrence J. Sejnowski, and Hava T. Siegelmann. Hidden traveling waves bind working memory variables in recurrent neural networks. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research. PMLR, 2024

2024

[44] [44]

Driscoll, Krishna V

Laura N. Driscoll, Krishna V . Shenoy, and David Sussillo. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs.Nature Neuroscience, 27(7):1349–1363,

[45] [45]

doi: 10.1038/s41593-024-01668-6

work page doi:10.1038/s41593-024-01668-6

[46] [46]

Long short-term memory.Neural Computation, 9 (8):1735–1780, 1997

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.Neural Computation, 9 (8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997

[47] [47]

Learning phrase representations using RNN encoder– decoder for statistical machine translation

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder– decoder for statistical machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1724–1734, 2014. doi: 10.3115/v1/D14-1179

work page doi:10.3115/v1/d14-1179 2014

[48] [48]

Lipschitz regularity of graph laplacians on random data clouds.SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022

Jeff Calder, Nicolás García Trillos, and Marta Lewicka. Lipschitz regularity of graph laplacians on random data clouds.SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022. doi: 10.1137/20M1356610

work page doi:10.1137/20m1356610 2022

[49] [49]

the carré du champ on ht

Daniel Ting, Ling Huang, and Michael I. Jordan. An analysis of the convergence of graph laplacians. InProceedings of the 27th International Conference on Machine Learning, pages 1079–1086, 2010. 12 A Empirical operator construction Here we give the operational details of the empirical finite-lag operatorbP∆ used throughout the paper. The dense Gaussian so...

2010