An explicit operator explains end-to-end computation in the modern neural networks used for sequence and language modeling

Alexandra N. Busch, Anif N. Shikder, Arthur Powanwe, J\'an Min\'a\v{c}, Luisa Liboni, Lyle E. Muller, Ramit Dey, Roberto C. Budzinski, Sayantan Auddy

Pith reviewed 2026-05-09 22:40 UTC · model grok-4.3

classification 💻 cs.NE cs.LGnlin.AO

keywords correspondencenetworknonlinearenableexactexpressionimplementationmathematical

0 comments

The pith

S4D state space models correspond exactly to wave propagation and nonlinear wave interactions in a one-dimensional ring oscillator network, with a closed-form operator describing the complete input-output map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

State space models such as S4 are used in modern AI to process long sequences like text or time series. The authors show that one efficient version, called S4D, can be rewritten as activity in a circle of coupled nonlinear oscillators. Recent inputs create traveling waves around this ring. A final nonlinear step then lets these waves interact, producing the model's output for tasks like classification. They derive a single exact operator that captures every step of this process without approximation. The result reframes the abstract matrix operations of S4D as concrete wave dynamics with a clear physical meaning.

Core claim

We derive an exact operator expression for the full forward pass of S4D, yielding an analytical characterization of its complete input-output map. This expression reveals that the nonlinear decoder in the system induces interactions between these information-carrying waves that enable classifying real-world sequences.

Load-bearing premise

The diagonal linear time-invariant implementation of S4 can be exactly embedded into a ring network topology in which inputs are encoded as waves of activity, and this embedding preserves the full computation without loss or approximation.

Figures

Figures reproduced from arXiv: 2604.20595 by Alexandra N. Busch, Anif N. Shikder, Arthur Powanwe, J\'an Min\'a\v{c}, Luisa Liboni, Lyle E. Muller, Ramit Dey, Roberto C. Budzinski, Sayantan Auddy.

**Figure 1.** Figure 1: The nonlinear oscillator network produces rich spatiotemporal dynamics across its ring topology. (a) The network is composed of N nodes arranged on a one-dimensional ring (left), resulting in a network adjacency matrix with connections between nodes in a neighborhood of n steps on the ring with boundaries conditions (right). (b) For specific combinations of network connectivity and phase-lags, traveling… view at source ↗

**Figure 2.** Figure 2: The oscillator network correspondence reveals S4D generates traveling waves in its dynamical state. (a) The mathematical correspondence embeds S4D into a ring topology with a specific structure of network connections, where connection strengths weaken with distance between nodes and phase delays shift systematically. The network adjacency matrix is depicted at right, with magnitude (top) and phase (bottom)… view at source ↗

**Figure 3.** Figure 3: Traveling waves distinguish different inputs in a simple dataset. (a) Representative samples of timeseries data from each of the three classes. Class 1 contains 15 Hz sinusoids embedded in noise; Class 2 contains 20 Hz sinusoids embedded in noise; and Class 3 consists of signals with noise only. P (b) We calculate the difference in modal energy Ej = k |µj (k)| 2 across inputs, which is the summed differenc… view at source ↗

**Figure 4.** Figure 4: Operator description of S4 performing classification on real-world input sequences. (a) The input-tooutput mapping in S4 admits a closed-form operator expression: the recurrent operator Dτ evolves the latent state in the Fourier eigenbasis, producing modal amplitudes µi(k) that encode traveling-wave dynamics. The mixing matrix C forms intermediate features yk, and the nonlinear readout is expressed at the… view at source ↗

**Figure 5.** Figure 5: compares several diagonal SSMs by visualizing the networks that result from translating their recurrence operators into the oscillator network context, using the expression K = F DF† (as in [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

We establish a mathematical correspondence between state space models, a state-of-the-art architecture for capturing long-range dependencies in data, and an exactly solvable nonlinear oscillator network. As a specific example of this general correspondence, we analyze the diagonal linear time-invariant implementation of the Structured State Space Sequence model (S4). The correspondence embeds S4D, a specific implementation of S4, into a ring network topology, in which recent inputs are encoded, as waves of activity traveling over the one-dimensional spatial layout of the network. We then derive an exact operator expression for the full forward pass of S4D, yielding an analytical characterization of its complete input-output map. This expression reveals that the nonlinear decoder in the system induces interactions between these information-carrying waves that enable classifying real-world sequences. These results generalize across modern SSM architectures, and show that they admit an exact mathematical description with a clear physical interpretation. These insights enable a new level of interpretability for these systems in terms of nonlinear oscillator networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper embeds S4D into a ring of oscillators and claims an exact closed-form operator for the full forward pass, but the derivation steps and verification against real implementations are not shown clearly enough to confirm no artifacts.

read the letter

The main takeaway is that this work tries to give S4D a physical picture by mapping its diagonal linear time-invariant dynamics onto a one-dimensional ring network where inputs travel as waves, then writing down an operator that includes the nonlinear decoder and the resulting wave interactions for classification. That operator is presented as exact and generalizable to other SSMs. If the steps hold, it supplies a concrete way to think about long-range sequence modeling in terms of oscillator interactions rather than abstract state updates. The ring topology and wave encoding are the concrete new pieces here, and they do line up with the abstract's description of preserving the input-output map without loss. The physical interpretation of the decoder as inducing wave mixing is a reasonable reading of what the math produces. The soft spot is that the abstract alone does not include the intermediate algebra or any numerical check against an actual S4D implementation, so it is impossible to see whether the ring size, periodicity, or discretization choices introduce mismatches for arbitrary sequence lengths or parameter values. The stress-test concern about boundary or finite-size effects is worth checking in the full text; if those are not addressed explicitly, the 'exact' label weakens. The generalization claim across SSM architectures is stated but not demonstrated beyond the S4D case. This paper is aimed at researchers who want interpretability tools or alternative design intuitions for state-space sequence models. A reader already working on S4 variants or oscillator-based networks could extract the embedding idea and test it themselves. It is worth sending to peer review because the core correspondence is specific enough that referees can verify the operator derivation and the artifact question directly.

Circularity Check

0 steps flagged

No circularity: derivation presents independent mathematical embedding and operator derivation.

full rationale

The provided abstract and context describe establishing a correspondence by embedding S4D into a ring network of oscillators and deriving an exact operator for the forward pass. No quoted equations or steps reduce the claimed result to a re-expression of fitted parameters, self-citations, or ansatzes by construction. The embedding is asserted to preserve the computation exactly, and the operator is presented as newly derived from that structure. Per hard rules, absent specific quotes exhibiting reduction (e.g., Eq. X = input by definition), no circularity is identified. This is the expected outcome for a self-contained mathematical correspondence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one core domain assumption extracted from the abstract; no free parameters or new invented entities are introduced in the provided text.

axioms (1)

domain assumption The diagonal linear time-invariant S4D implementation admits an exact embedding into a ring network of nonlinear oscillators that preserves the full forward pass.
This embedding is the load-bearing premise that allows the wave interpretation and the exact operator derivation.

pith-pipeline@v0.9.0 · 5520 in / 1168 out tokens · 27094 ms · 2026-05-09T22:40:17.502517+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages · 3 internal anchors

[1]

and in trained recurrent neural networks [29]. It has previously been recognized that this property can be a useful way to store long-term dependencies directly in a network’s activity structure [3, 30], but has not previ- ously been expressed in a direct mathematical form. We can now show that, when driven by input, S4D indeed stores information about th...
[2]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ˚A. Kaiser, and I. Polosukhin, Attention is all you need, inAdvances in Neural Infor- mation Processing Systems, Vol. 30 (2017)

2017
[3]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Neural machine translation by jointly learning to align and translate, arXiv:1409.0473 (2014)

work page internal anchor Pith review arXiv 2014
[4]

Muller, P

L. Muller, P. S. Churchland, and T. J. Sejnowski, Trans- formers and cortical waves: encoders for pulling in con- text across time, Trends in neurosciences (2024)

2024
[5]

Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, Efficient transformers: A survey, arXiv:2009.06732 (2020)

work page arXiv 2009
[6]

Generating Long Sequences with Sparse Transformers

R. Child, Generating long sequences with sparse trans- formers, arXiv:1904.10509 (2019)

work page internal anchor Pith review arXiv 1904
[7]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, Transformers are rnns: Fast autoregressive transformers with linear attention, inInternational Conference on Ma- chine Learning(2020)

2020
[8]

A. Gu, K. Goel, and C. R´ e, Efficiently modeling long sequences with structured state spaces, arXiv:2111.00396 (2021)

work page internal anchor Pith review arXiv 2021
[9]

A. Gu, K. Goel, A. Gupta, and C. R´ e, On the parameter- ization and initialization of diagonal state space models, Advances in Neural Information Processing Systems35 (2022)

2022
[10]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, Mamba: Linear-time sequence model- ing with selective state spaces, arXiv:2312.00752 (2023)

work page Pith review arXiv 2023
[11]

J. T. H. Smith, A. Warrington, and S. W. Linder- man, Simplified state space layers for sequence modeling, arXiv:2208.04933 (2022)

work page arXiv 2022
[12]

Orvieto, S

A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gul- cehre, R. Pascanu, and S. De, Resurrecting recurrent neural networks for long sequences, inInternational Con- ference on Machine Learning(PMLR, 2023)

2023
[13]

Elhage, N

N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly,et al., A mathematical framework for transformer circuits, Trans- former Circuits Thread1, 12 (2021)

2021
[14]

Wang and B

S. Wang and B. Xue, State-space models with layer-wise nonlinearity are universal approximators with exponen- tial decaying memory, inAdvances in Neural Information Processing Systems, Vol. 36 (2023)

2023
[15]

Muca Cirone, A

N. Muca Cirone, A. Orvieto, B. Walker, C. Salvi, and T. Lyons, Theoretical foundations of deep selective state- space models, inAdvances in Neural Information Pro- cessing Systems, Vol. 37 (2024)

2024
[16]

Muller, J

L. Muller, J. Min´ aˇ c, and T. T. Nguyen, Algebraic ap- proach to the kuramoto model, Physical Review E104, L022201 (2021)

2021
[17]

R. C. Budzinski, A. N. Busch, S. Mestern, E. Martin, L. H. B. Liboni, F. W. Pasini, J. Min´ aˇ c, T. Coleman, W. Inoue, and L. E. Muller, An exact mathematical de- scription of computation with transient spatiotemporal dynamics in a complex-valued neural network, Commu- nications Physics7, 239 (2024)

2024
[18]

Gupta, A

A. Gupta, A. Gu, and J. Berant, Diagonal state spaces are as effective as structured state spaces, inAdvances in neural information processing systems, Vol. 35 (2022)

2022
[19]

S. H. Strogatz and R. E. Mirollo, Collective synchroni- sation in lattices of nonlinear oscillators with random- ness, Journal of Physics A: Mathematical and General 21, L699 (1988)

1988
[20]

D. M. Abrams and S. H. Strogatz, Chimera states for coupled oscillators, Physical Review Letters93, 174102 (2004)

2004
[21]

L. H. B. Liboni, R. C. Budzinski, A. N. Busch, S. L¨ owe, T. A. Keller, M. Welling, and L. E. Muller, Image seg- mentation with traveling waves in an exactly solvable recurrent neural network, Proceedings of the National Academy of Sciences122, e2321319121 (2025)

2025
[22]

P. J. Davis,Circulant Matrices(Wiley, 1979)

1979
[23]

Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler, 11 Long range arena: A benchmark for efficient transform- ers, inInternational Conference on Learning Representa- tions(2021)

2021
[24]

R. C. Budzinski, T. T. Nguyen, J. Do` an, J. Min´ aˇ c, T. J. Sejnowski, and L. E. Muller, Geometry unites synchrony, chimeras, and waves in nonlinear oscillator networks, Chaos: An Interdisciplinary Journal of Nonlinear Science 32, 031104 (2022)

2022
[25]

R. C. Budzinski, T. T. Nguyen, G. B. Benigno, J. Do` an, J. Min´ aˇ c, T. J. Sejnowski, and L. E. Muller, Analyti- cal prediction of specific spatiotemporal patterns in non- linear oscillator networks with distance-dependent time delays, Physical Review Research5, 013159 (2023)

2023
[26]

Muller, F

L. Muller, F. Chavane, J. Reynolds, and T. J. Sejnowski, Cortical travelling waves: mechanisms and computa- tional principles, Nature Reviews Neuroscience19, 255 (2018)

2018
[27]

G. B. Benigno, R. C. Budzinski, Z. W. Davis, J. H. Reynolds, and L. Muller, Waves traveling over a map of visual space can ignite short-term predictions of sensory input, Nature Communications14, 3409 (2023)

2023
[28]

T. A. Keller, L. Muller, T. Sejnowski, and M. Welling, Traveling waves encode the recent past and enhance se- quence learning, inICLR(2024)

2024
[29]

Perrard and M

S. Perrard and M. Labousse, Transition to chaos in wave memory dynamics in a harmonic well: Deterministic and noise-driven behavior, Chaos: An Interdisciplinary Jour- nal of Nonlinear Science28(2018)

2018
[30]

T. A. Keller and M. Welling, Neural wave ma- chines: learning spatiotemporally structured represen- tations with locally coupled oscillatory recurrent neural networks, inInternational Conference on Machine Learn- ing(2023)

2023
[31]

T. A. Keller, L. Muller, T. J. Sejnowski, and M. Welling, A spatiotemporal perspective on dynamical computation in neural information processing systems, ArXiv , arXiv (2026)

2026
[32]

Carleman, Application de la theorie des polynomes orthogonaux a un probleme de la theorie des fonctions analytiques, Arkiv f¨ or Matematik, Astronomi och Fysik 17, 1 (1932)

T. Carleman, Application de la theorie des polynomes orthogonaux a un probleme de la theorie des fonctions analytiques, Arkiv f¨ or Matematik, Astronomi och Fysik 17, 1 (1932)

1932
[33]

A., Lines, J., Flynn, M., Large, J., Bostrom, A.,

A. Bagnall, H. A. Dau, J. Levy, G. Forestier, C. Hou, G. Jehan, and L. Ye, The uea multivariate time series classification archive, 2018, arXiv:1811.00075 (2018)

work page arXiv 2018
[34]

Amini, C

A. Amini, C. Zheng, Q. Sun, and N. Motee, Carleman lin- earization of nonlinear systems and its finite-section ap- proximations, Discrete and Continuous Dynamical Sys- tems - B30, 577 (2025)

2025
[35]

A. M. Saxe, J. L. McClelland, and S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, arXiv:1312.6120 (2013)

work page Pith review arXiv 2013
[36]

D. J. Heeger and W. E. Mackey, Oscillatory recurrent gated neural integrator circuits (organics), a unifying the- oretical framework for neural dynamics, Proceedings of the National Academy of Sciences116, 22783 (2019)

2019
[37]

T. K. Rusch and D. Rus, Oscillatory state-space models, arXiv:2410.03943 (2024)

work page arXiv 2024
[38]

Miyato, S

T. Miyato, S. L¨ owe, A. Geiger, and M. Welling, Artificial kuramoto oscillatory neurons, arXiv:2410.13821 (2024)

work page arXiv 2024
[39]

Karuvally, T

A. Karuvally, T. J. Sejnowski, and H. T. Siegelmann, Hidden traveling waves bind working memory variables in recurrent neural networks, arXiv:2402.10163 (2024)

work page arXiv 2024
[40]

Muzellec, A

S. Muzellec, A. Alamia, T. Serre, and R. VanRullen, En- hancing deep neural networks through complex-valued representations and kuramoto synchronization dynamics, arXiv:2502.21077 (2025)

work page arXiv 2025
[41]

T. A. Engel and N. A. Steinmetz, New perspectives on di- mensionality and variability from large-scale cortical dy- namics, Current opinion in neurobiology58, 181 (2019)

2019
[42]

J. D. Hart, L. Larger, T. E. Murphy, and R. Roy, De- layed dynamical systems: networks, chimeras and reser- voir computing, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sci- ences377, 20180389 (2019)

2019
[43]

Ebato, K

Y. Ebato, K. Nakajima, and R. Masuda, Impact of time- history terms on reservoir dynamics and prediction accu- racy in echo state networks, Scientific Reports14, 8871 (2024)

2024
[44]

S. K. Tavakoli and A. Longtin, Boosting reservoir com- puter performance with multiple delays, Physical Review E109, 054203 (2024)

2024
[45]

Marzen, Time delays improve performance of certain neural networks, Physics17, 111 (2024)

S. Marzen, Time delays improve performance of certain neural networks, Physics17, 111 (2024)

2024
[46]

Nanda, L

N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Stein- hardt, Progress measures for grokking via mechanistic interpretability, inInternational Conference on Learning Representations(2023). 12 I. APPENDIX A. Closed-form diagonalization of circulant operators LetC∈C N×N be a circulant matrix generated by the vector c= (c 1, c2, . . . , cN), such that each ...

2023