pith. machine review for the scientific record. sign in

arxiv: 2604.20595 · v1 · submitted 2026-04-22 · 💻 cs.NE · cs.LG· nlin.AO

Recognition: unknown

An explicit operator explains end-to-end computation in the modern neural networks used for sequence and language modeling

Alexandra N. Busch, Anif N. Shikder, Arthur Powanwe, J\'an Min\'a\v{c}, Luisa Liboni, Lyle E. Muller, Ramit Dey, Roberto C. Budzinski, Sayantan Auddy

Pith reviewed 2026-05-09 22:40 UTC · model grok-4.3

classification 💻 cs.NE cs.LGnlin.AO
keywords correspondencenetworknonlinearenableexactexpressionimplementationmathematical
0
0 comments X

The pith

S4D state space models correspond exactly to wave propagation and nonlinear wave interactions in a one-dimensional ring oscillator network, with a closed-form operator describing the complete input-output map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

State space models such as S4 are used in modern AI to process long sequences like text or time series. The authors show that one efficient version, called S4D, can be rewritten as activity in a circle of coupled nonlinear oscillators. Recent inputs create traveling waves around this ring. A final nonlinear step then lets these waves interact, producing the model's output for tasks like classification. They derive a single exact operator that captures every step of this process without approximation. The result reframes the abstract matrix operations of S4D as concrete wave dynamics with a clear physical meaning.

Core claim

We derive an exact operator expression for the full forward pass of S4D, yielding an analytical characterization of its complete input-output map. This expression reveals that the nonlinear decoder in the system induces interactions between these information-carrying waves that enable classifying real-world sequences.

Load-bearing premise

The diagonal linear time-invariant implementation of S4 can be exactly embedded into a ring network topology in which inputs are encoded as waves of activity, and this embedding preserves the full computation without loss or approximation.

Figures

Figures reproduced from arXiv: 2604.20595 by Alexandra N. Busch, Anif N. Shikder, Arthur Powanwe, J\'an Min\'a\v{c}, Luisa Liboni, Lyle E. Muller, Ramit Dey, Roberto C. Budzinski, Sayantan Auddy.

Figure 1
Figure 1. Figure 1: The nonlinear oscillator network produces rich spatiotemporal dynamics across its ring topol￾ogy. (a) The network is composed of N nodes arranged on a one-dimensional ring (left), resulting in a network ad￾jacency matrix with connections between nodes in a neigh￾borhood of n steps on the ring with boundaries conditions (right). (b) For specific combinations of network connectivity and phase-lags, traveling… view at source ↗
Figure 2
Figure 2. Figure 2: The oscillator network correspondence reveals S4D generates traveling waves in its dynamical state. (a) The mathematical correspondence embeds S4D into a ring topology with a specific structure of network connections, where connection strengths weaken with distance between nodes and phase delays shift systematically. The network adjacency matrix is depicted at right, with magnitude (top) and phase (bottom)… view at source ↗
Figure 3
Figure 3. Figure 3: Traveling waves distinguish different inputs in a simple dataset. (a) Representative samples of timeseries data from each of the three classes. Class 1 contains 15 Hz sinusoids embedded in noise; Class 2 contains 20 Hz sinusoids embedded in noise; and Class 3 consists of signals with noise only. P (b) We calculate the difference in modal energy Ej = k |µj (k)| 2 across inputs, which is the summed differenc… view at source ↗
Figure 4
Figure 4. Figure 4: Operator description of S4 performing classification on real-world input sequences. (a) The input-to￾output mapping in S4 admits a closed-form operator expression: the recurrent operator Dτ evolves the latent state in the Fourier eigenbasis, producing modal amplitudes µi(k) that encode traveling-wave dynamics. The mixing matrix C forms intermediate features yk, and the nonlinear readout is expressed at the… view at source ↗
Figure 5
Figure 5. Figure 5: compares several diagonal SSMs by visualiz￾ing the networks that result from translating their recur￾rence operators into the oscillator network context, using the expression K = F DF† (as in [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

We establish a mathematical correspondence between state space models, a state-of-the-art architecture for capturing long-range dependencies in data, and an exactly solvable nonlinear oscillator network. As a specific example of this general correspondence, we analyze the diagonal linear time-invariant implementation of the Structured State Space Sequence model (S4). The correspondence embeds S4D, a specific implementation of S4, into a ring network topology, in which recent inputs are encoded, as waves of activity traveling over the one-dimensional spatial layout of the network. We then derive an exact operator expression for the full forward pass of S4D, yielding an analytical characterization of its complete input-output map. This expression reveals that the nonlinear decoder in the system induces interactions between these information-carrying waves that enable classifying real-world sequences. These results generalize across modern SSM architectures, and show that they admit an exact mathematical description with a clear physical interpretation. These insights enable a new level of interpretability for these systems in terms of nonlinear oscillator networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No circularity: derivation presents independent mathematical embedding and operator derivation.

full rationale

The provided abstract and context describe establishing a correspondence by embedding S4D into a ring network of oscillators and deriving an exact operator for the forward pass. No quoted equations or steps reduce the claimed result to a re-expression of fitted parameters, self-citations, or ansatzes by construction. The embedding is asserted to preserve the computation exactly, and the operator is presented as newly derived from that structure. Per hard rules, absent specific quotes exhibiting reduction (e.g., Eq. X = input by definition), no circularity is identified. This is the expected outcome for a self-contained mathematical correspondence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one core domain assumption extracted from the abstract; no free parameters or new invented entities are introduced in the provided text.

axioms (1)
  • domain assumption The diagonal linear time-invariant S4D implementation admits an exact embedding into a ring network of nonlinear oscillators that preserves the full forward pass.
    This embedding is the load-bearing premise that allows the wave interpretation and the exact operator derivation.

pith-pipeline@v0.9.0 · 5520 in / 1168 out tokens · 27094 ms · 2026-05-09T22:40:17.502517+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages · 3 internal anchors

  1. [1]

    and in trained recurrent neural networks [29]. It has previously been recognized that this property can be a useful way to store long-term dependencies directly in a network’s activity structure [3, 30], but has not previ- ously been expressed in a direct mathematical form. We can now show that, when driven by input, S4D indeed stores information about th...

  2. [2]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ˚A. Kaiser, and I. Polosukhin, Attention is all you need, inAdvances in Neural Infor- mation Processing Systems, Vol. 30 (2017)

  3. [3]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, Neural machine translation by jointly learning to align and translate, arXiv:1409.0473 (2014)

  4. [4]

    Muller, P

    L. Muller, P. S. Churchland, and T. J. Sejnowski, Trans- formers and cortical waves: encoders for pulling in con- text across time, Trends in neurosciences (2024)

  5. [5]

    Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, Efficient transformers: A survey, arXiv:2009.06732 (2020)

  6. [6]

    Generating Long Sequences with Sparse Transformers

    R. Child, Generating long sequences with sparse trans- formers, arXiv:1904.10509 (2019)

  7. [7]

    Katharopoulos, A

    A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, Transformers are rnns: Fast autoregressive transformers with linear attention, inInternational Conference on Ma- chine Learning(2020)

  8. [8]

    A. Gu, K. Goel, and C. R´ e, Efficiently modeling long sequences with structured state spaces, arXiv:2111.00396 (2021)

  9. [9]

    A. Gu, K. Goel, A. Gupta, and C. R´ e, On the parameter- ization and initialization of diagonal state space models, Advances in Neural Information Processing Systems35 (2022)

  10. [10]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, Mamba: Linear-time sequence model- ing with selective state spaces, arXiv:2312.00752 (2023)

  11. [11]

    J. T. H. Smith, A. Warrington, and S. W. Linder- man, Simplified state space layers for sequence modeling, arXiv:2208.04933 (2022)

  12. [12]

    Orvieto, S

    A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gul- cehre, R. Pascanu, and S. De, Resurrecting recurrent neural networks for long sequences, inInternational Con- ference on Machine Learning(PMLR, 2023)

  13. [13]

    Elhage, N

    N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly,et al., A mathematical framework for transformer circuits, Trans- former Circuits Thread1, 12 (2021)

  14. [14]

    Wang and B

    S. Wang and B. Xue, State-space models with layer-wise nonlinearity are universal approximators with exponen- tial decaying memory, inAdvances in Neural Information Processing Systems, Vol. 36 (2023)

  15. [15]

    Muca Cirone, A

    N. Muca Cirone, A. Orvieto, B. Walker, C. Salvi, and T. Lyons, Theoretical foundations of deep selective state- space models, inAdvances in Neural Information Pro- cessing Systems, Vol. 37 (2024)

  16. [16]

    Muller, J

    L. Muller, J. Min´ aˇ c, and T. T. Nguyen, Algebraic ap- proach to the kuramoto model, Physical Review E104, L022201 (2021)

  17. [17]

    R. C. Budzinski, A. N. Busch, S. Mestern, E. Martin, L. H. B. Liboni, F. W. Pasini, J. Min´ aˇ c, T. Coleman, W. Inoue, and L. E. Muller, An exact mathematical de- scription of computation with transient spatiotemporal dynamics in a complex-valued neural network, Commu- nications Physics7, 239 (2024)

  18. [18]

    Gupta, A

    A. Gupta, A. Gu, and J. Berant, Diagonal state spaces are as effective as structured state spaces, inAdvances in neural information processing systems, Vol. 35 (2022)

  19. [19]

    S. H. Strogatz and R. E. Mirollo, Collective synchroni- sation in lattices of nonlinear oscillators with random- ness, Journal of Physics A: Mathematical and General 21, L699 (1988)

  20. [20]

    D. M. Abrams and S. H. Strogatz, Chimera states for coupled oscillators, Physical Review Letters93, 174102 (2004)

  21. [21]

    L. H. B. Liboni, R. C. Budzinski, A. N. Busch, S. L¨ owe, T. A. Keller, M. Welling, and L. E. Muller, Image seg- mentation with traveling waves in an exactly solvable recurrent neural network, Proceedings of the National Academy of Sciences122, e2321319121 (2025)

  22. [22]

    P. J. Davis,Circulant Matrices(Wiley, 1979)

  23. [23]

    Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler, 11 Long range arena: A benchmark for efficient transform- ers, inInternational Conference on Learning Representa- tions(2021)

  24. [24]

    R. C. Budzinski, T. T. Nguyen, J. Do` an, J. Min´ aˇ c, T. J. Sejnowski, and L. E. Muller, Geometry unites synchrony, chimeras, and waves in nonlinear oscillator networks, Chaos: An Interdisciplinary Journal of Nonlinear Science 32, 031104 (2022)

  25. [25]

    R. C. Budzinski, T. T. Nguyen, G. B. Benigno, J. Do` an, J. Min´ aˇ c, T. J. Sejnowski, and L. E. Muller, Analyti- cal prediction of specific spatiotemporal patterns in non- linear oscillator networks with distance-dependent time delays, Physical Review Research5, 013159 (2023)

  26. [26]

    Muller, F

    L. Muller, F. Chavane, J. Reynolds, and T. J. Sejnowski, Cortical travelling waves: mechanisms and computa- tional principles, Nature Reviews Neuroscience19, 255 (2018)

  27. [27]

    G. B. Benigno, R. C. Budzinski, Z. W. Davis, J. H. Reynolds, and L. Muller, Waves traveling over a map of visual space can ignite short-term predictions of sensory input, Nature Communications14, 3409 (2023)

  28. [28]

    T. A. Keller, L. Muller, T. Sejnowski, and M. Welling, Traveling waves encode the recent past and enhance se- quence learning, inICLR(2024)

  29. [29]

    Perrard and M

    S. Perrard and M. Labousse, Transition to chaos in wave memory dynamics in a harmonic well: Deterministic and noise-driven behavior, Chaos: An Interdisciplinary Jour- nal of Nonlinear Science28(2018)

  30. [30]

    T. A. Keller and M. Welling, Neural wave ma- chines: learning spatiotemporally structured represen- tations with locally coupled oscillatory recurrent neural networks, inInternational Conference on Machine Learn- ing(2023)

  31. [31]

    T. A. Keller, L. Muller, T. J. Sejnowski, and M. Welling, A spatiotemporal perspective on dynamical computation in neural information processing systems, ArXiv , arXiv (2026)

  32. [32]

    Carleman, Application de la theorie des polynomes orthogonaux a un probleme de la theorie des fonctions analytiques, Arkiv f¨ or Matematik, Astronomi och Fysik 17, 1 (1932)

    T. Carleman, Application de la theorie des polynomes orthogonaux a un probleme de la theorie des fonctions analytiques, Arkiv f¨ or Matematik, Astronomi och Fysik 17, 1 (1932)

  33. [33]

    A., Lines, J., Flynn, M., Large, J., Bostrom, A.,

    A. Bagnall, H. A. Dau, J. Levy, G. Forestier, C. Hou, G. Jehan, and L. Ye, The uea multivariate time series classification archive, 2018, arXiv:1811.00075 (2018)

  34. [34]

    Amini, C

    A. Amini, C. Zheng, Q. Sun, and N. Motee, Carleman lin- earization of nonlinear systems and its finite-section ap- proximations, Discrete and Continuous Dynamical Sys- tems - B30, 577 (2025)

  35. [35]

    A. M. Saxe, J. L. McClelland, and S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, arXiv:1312.6120 (2013)

  36. [36]

    D. J. Heeger and W. E. Mackey, Oscillatory recurrent gated neural integrator circuits (organics), a unifying the- oretical framework for neural dynamics, Proceedings of the National Academy of Sciences116, 22783 (2019)

  37. [37]

    T. K. Rusch and D. Rus, Oscillatory state-space models, arXiv:2410.03943 (2024)

  38. [38]

    Miyato, S

    T. Miyato, S. L¨ owe, A. Geiger, and M. Welling, Artificial kuramoto oscillatory neurons, arXiv:2410.13821 (2024)

  39. [39]

    Karuvally, T

    A. Karuvally, T. J. Sejnowski, and H. T. Siegelmann, Hidden traveling waves bind working memory variables in recurrent neural networks, arXiv:2402.10163 (2024)

  40. [40]

    Muzellec, A

    S. Muzellec, A. Alamia, T. Serre, and R. VanRullen, En- hancing deep neural networks through complex-valued representations and kuramoto synchronization dynamics, arXiv:2502.21077 (2025)

  41. [41]

    T. A. Engel and N. A. Steinmetz, New perspectives on di- mensionality and variability from large-scale cortical dy- namics, Current opinion in neurobiology58, 181 (2019)

  42. [42]

    J. D. Hart, L. Larger, T. E. Murphy, and R. Roy, De- layed dynamical systems: networks, chimeras and reser- voir computing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sci- ences377, 20180389 (2019)

  43. [43]

    Ebato, K

    Y. Ebato, K. Nakajima, and R. Masuda, Impact of time- history terms on reservoir dynamics and prediction accu- racy in echo state networks, Scientific Reports14, 8871 (2024)

  44. [44]

    S. K. Tavakoli and A. Longtin, Boosting reservoir com- puter performance with multiple delays, Physical Review E109, 054203 (2024)

  45. [45]

    Marzen, Time delays improve performance of certain neural networks, Physics17, 111 (2024)

    S. Marzen, Time delays improve performance of certain neural networks, Physics17, 111 (2024)

  46. [46]

    Nanda, L

    N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Stein- hardt, Progress measures for grokking via mechanistic interpretability, inInternational Conference on Learning Representations(2023). 12 I. APPENDIX A. Closed-form diagonalization of circulant operators LetC∈C N×N be a circulant matrix generated by the vector c= (c 1, c2, . . . , cN), such that each ...