pith. sign in

arxiv: 2605.20922 · v1 · pith:KWG5EU3Xnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.CV

Winfree Oscillatory Neural Network

Pith reviewed 2026-05-21 06:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords oscillatory neural networkssynchronization dynamicsWinfree modelimage classificationreasoning tasksparameter efficiencydynamical systemsphase-based representations
0
0 comments X

The pith

The Winfree Oscillatory Neural Network uses synchronization dynamics to evolve representations on the torus and scales competitively to ImageNet-1K while solving complex reasoning tasks with far fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the Winfree Oscillatory Neural Network as a dynamical architecture grounded in generalized Winfree oscillator dynamics. Representations evolve on the d-dimensional torus through structured interactions that embed phase-based inductive biases, which can be realized either by fixed trigonometric mappings or by learnable neural networks. The resulting model is tested on standard image classification benchmarks and on logic reasoning problems such as Maze-hard and Sudoku. It reaches competitive accuracy on CIFAR and ImageNet-1K and attains 80.1 percent accuracy on Maze-hard while using only one percent of the parameters required by earlier state-of-the-art approaches. A sympathetic reader would care because the results indicate that oscillatory synchronization can serve as a practical, parameter-efficient foundation for neural computation across both perceptual and symbolic domains.

Core claim

WONN evolves representations on the torus (S^1)^d through structured oscillatory interactions based on generalized Winfree dynamics. These interactions combine phase-based inductive biases with flexible hierarchical mechanisms that are instantiated either as fixed trigonometric mappings or as learnable neural networks. The architecture achieves competitive or superior performance with strong parameter efficiency on image recognition tasks including CIFAR and ImageNet-1K, and on complex reasoning tasks such as Maze-hard and Sudoku. It is the first synchronization-based oscillatory architecture shown to scale competitively to ImageNet-1K, and on Maze-hard it reaches 80.1 percent accuracy using

What carries the argument

Generalized Winfree dynamics on the torus (S^1)^d, where phase-based inductive biases are realized through either fixed trigonometric mappings or learnable neural networks that implement hierarchical oscillatory interactions.

If this is right

  • Oscillatory synchronization can function as a drop-in computational primitive for both visual feature hierarchies and multi-step logical inference.
  • Models built from these dynamics can maintain high accuracy on complex tasks while using dramatically lower parameter counts than feedforward or attention-based alternatives.
  • The same architecture can be applied without major redesign to both image classification and combinatorial reasoning domains.
  • Fixed trigonometric instantiations of the interaction rules suffice for competitive performance, suggesting that the core benefit comes from the oscillatory structure rather than from learned coupling alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Physical hardware that naturally supports continuous phase dynamics could implement WONN layers with lower energy cost than digital matrix multiplications.
  • The torus representation may allow direct encoding of periodic or cyclic structures that appear in time-series or geometric reasoning problems.
  • Varying the number of oscillators per layer or the dimensionality d of the torus offers a new axis for scaling model capacity beyond width or depth.

Load-bearing premise

Phase-based inductive biases from the oscillatory interactions will produce representations that transfer effectively to standard vision benchmarks and logic tasks without requiring extensive task-specific tuning or post-hoc adjustments.

What would settle it

Train a WONN model end-to-end on the full ImageNet-1K dataset using the same interaction mechanisms described and measure top-1 accuracy; if accuracy remains substantially below that of conventional convolutional or transformer networks of comparable compute budget, the scalability claim is falsified.

Figures

Figures reproduced from arXiv: 2605.20922 by Jiawen Dai, Yue Song.

Figure 1
Figure 1. Figure 1: Illustration of the proposed WONN. The network evolves on the toroidal phase space (S 1 ) d , where neurons are represented as phase oscillators. Through generalized Winfree synchronization dy￾namics, oscillators self-organize into structured collective states, enabling effective computation for both image recognition and reasoning tasks. Oscillations and synchronization are ubiquitous mechanisms for organ… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the WONN architecture. The input is encoded into an initial frequency state Ωinit, while the phase state Θinit is initialized randomly. Computation proceeds through stacked Winfree dynamics layers, each consisting of multiple parameter-shared recurrent dynamics steps followed by layer-transition updates of both phase and frequency states. Through this iterative synchronization process, WONN evo… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of S and I. To capture structured interactions, we introduce a grouped for￾mulation. As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the final phase distribution in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Synchronous path formation on Maze-hard. Each panel shows the predicted per-cell path [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional qualitative visualizations of two-peak distributions on image recognition. [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Layer-wise evolution of the weighted maps associated with the two dominant phase peaks [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy–energy dynamics on Maze￾hard. We report the test accuracy and the corre￾sponding energy over training epochs. After a slight increase in the early stage, the energy grad￾ually decreases as training proceeds, whereas the accuracy exhibits a steady improvement trend [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Partial temporal evolution of HRM on the example shown in Sec. 3.3. Top: discrete path predictions over time. Bottom: predicted path probability heatmaps over time. HRM remains mostly inactive during early H-block updates and then undergoes an abrupt, insight-like transition, while WONN progressively synchronizes diffuse candidate path fragments into a coherent valid path. 25 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 10
Figure 10. Figure 10: Complete temporal evolution of the example shown in Sec. 3.3. [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: More Maze-hard pathfinding examples. Each pair shows the discrete path prediction and [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Temporal evolution of WONN on Sudoku. Early steps show global synchronized explo￾ration over candidate digit assignments, while later steps progressively refine these predictions and converge to a correct globally consistent solution. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
read the original abstract

Oscillations and synchronization are widely believed to play a fundamental role in representation and computation. However, existing machine learning approaches based on synchronization dynamics have largely been confined to specialized settings such as object discovery, with limited evidence of scalability to standard vision benchmarks or logic reasoning tasks. We propose the Winfree Oscillatory Neural Network (WONN), a dynamical neural architecture based on generalized Winfree dynamics. WONN evolves representations on the torus $(S^1)^d$ through structured oscillatory interactions, combining phase-based inductive biases with flexible and hierarchical interaction mechanisms instantiated as either fixed trigonometric mappings or learnable neural networks. We evaluate WONN on image recognition and complex reasoning tasks, including CIFAR, ImageNet, Maze-hard, and Sudoku. Across these domains, WONN achieves competitive or superior performance with strong parameter efficiency. In particular, WONN is, to our knowledge, the first synchronization-based oscillatory architecture to scale competitively to ImageNet-1K. Furthermore, on Maze-hard, WONN achieves 80.1% accuracy using only 1% of the parameters of prior state-of-the-art models. These results suggest that structured oscillatory dynamics provide a scalable and parameter-efficient alternative to conventional neural architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Winfree Oscillatory Neural Network (WONN), a dynamical neural architecture based on generalized Winfree dynamics evolving representations on the torus (S^1)^d. It combines phase-based inductive biases with hierarchical interaction mechanisms that can be instantiated either as fixed trigonometric mappings or as learnable neural networks. The authors evaluate WONN on image recognition (CIFAR, ImageNet-1K) and reasoning tasks (Maze-hard, Sudoku), claiming competitive or superior performance with strong parameter efficiency; specifically, WONN is presented as the first synchronization-based oscillatory architecture to scale competitively to ImageNet-1K, and it achieves 80.1% accuracy on Maze-hard using only 1% of the parameters of prior state-of-the-art models.

Significance. If the empirical results hold and the contribution of the oscillatory Winfree dynamics can be isolated from the learnable interaction modules, the work would be significant as the first demonstration that synchronization-based architectures can scale to ImageNet-1K while offering substantial parameter efficiency on reasoning tasks. This could open a new direction for phase-based inductive biases in neural networks, provided the claims are supported by controlled ablations and reproducible details.

major comments (2)
  1. [Architecture / Interaction mechanisms] Architecture section: The interaction mechanisms are allowed to be either fixed trigonometric mappings or learnable neural networks. The central claim that generalized Winfree dynamics on (S^1)^d with phase-based inductive biases enable scalable, parameter-efficient performance is load-bearing on the oscillatory structure. Without ablations that restrict interactions to fixed trigonometric mappings (and compare directly to the learnable-NN instantiation) on ImageNet-1K and Maze-hard, it remains possible that the reported competitiveness and 80.1% result derive primarily from the hierarchical learnable components rather than the Winfree dynamics or phase biases. This directly affects whether the architecture constitutes a genuine synchronization-based alternative.
  2. [Experiments / Results] Experimental evaluation: The abstract and results report concrete numbers (ImageNet-1K competitiveness, Maze-hard 80.1% at 1% parameters) yet supply no architecture diagrams, training hyper-parameters, ablation studies isolating the phase mappings, or error bars across runs. These omissions make it impossible to verify that the scalability claim is supported by the oscillatory dynamics rather than standard neural-network flexibility, which is load-bearing for the headline contribution.
minor comments (2)
  1. [Abstract] The abstract refers to 'CIFAR' without specifying CIFAR-10 or CIFAR-100; clarify the exact dataset and any preprocessing.
  2. [Preliminaries] Notation for the state space is introduced as (S^1)^d; ensure this is used consistently in all equations and figures rather than switching to informal descriptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the contribution of the generalized Winfree dynamics. We address each major comment below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Architecture / Interaction mechanisms] Architecture section: The interaction mechanisms are allowed to be either fixed trigonometric mappings or learnable neural networks. The central claim that generalized Winfree dynamics on (S^1)^d with phase-based inductive biases enable scalable, parameter-efficient performance is load-bearing on the oscillatory structure. Without ablations that restrict interactions to fixed trigonometric mappings (and compare directly to the learnable-NN instantiation) on ImageNet-1K and Maze-hard, it remains possible that the reported competitiveness and 80.1% result derive primarily from the hierarchical learnable components rather than the Winfree dynamics or phase biases. This directly affects whether the architecture constitutes a genuine synchronization-based alternative.

    Authors: We agree that controlled ablations isolating the fixed trigonometric mappings are important to substantiate the role of the oscillatory structure. In the revised manuscript we add these experiments on both ImageNet-1K and Maze-hard. The fixed-mapping variant retains competitive accuracy (within 2-3 points of the learnable version) while using even fewer parameters, indicating that the phase-based Winfree dynamics and torus representation provide the core inductive bias. We have updated the architecture description and added a dedicated ablation subsection with the corresponding results. revision: yes

  2. Referee: [Experiments / Results] Experimental evaluation: The abstract and results report concrete numbers (ImageNet-1K competitiveness, Maze-hard 80.1% at 1% parameters) yet supply no architecture diagrams, training hyper-parameters, ablation studies isolating the phase mappings, or error bars across runs. These omissions make it impossible to verify that the scalability claim is supported by the oscillatory dynamics rather than standard neural-network flexibility, which is load-bearing for the headline contribution.

    Authors: We accept that the original submission lacked sufficient experimental detail. The revised manuscript now includes: (i) architecture diagrams in the main text and appendix, (ii) a full hyper-parameter table covering optimizer, learning-rate schedule, batch size, and synchronization time constants, (iii) additional ablations that isolate the phase-mapping components, and (iv) error bars reported over three independent random seeds for the ImageNet-1K and Maze-hard results. These changes directly address the request for reproducibility and allow readers to assess the contribution of the Winfree dynamics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark results are independent of model inputs

full rationale

The paper proposes WONN as a dynamical architecture evolving representations on the torus via generalized Winfree dynamics, with interactions instantiated as either fixed trigonometric mappings or learnable neural networks. It then reports empirical performance on held-out benchmarks including CIFAR, ImageNet-1K, Maze-hard (80.1% accuracy at 1% parameters), and Sudoku. These metrics are measured outcomes of training and evaluation rather than quantities defined by construction from fitted parameters, self-referential equations, or load-bearing self-citations. The central claims rest on experimental scaling results against external datasets and prior models, making the derivation self-contained with no reduction of predictions to inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that generalized Winfree dynamics supply useful inductive biases for representation learning; no new physical entities are introduced, and the learnable interaction networks constitute trainable parameters rather than fixed constants.

free parameters (1)
  • parameters of the learnable neural networks for interactions
    These are trained on the target tasks and therefore constitute free parameters whose values are determined by data rather than derived from first principles.
axioms (1)
  • domain assumption Generalized Winfree dynamics can be instantiated on the torus (S^1)^d to evolve neural representations
    Invoked in the abstract as the foundation for structured oscillatory interactions.

pith-pipeline@v0.9.0 · 5740 in / 1420 out tokens · 46868 ms · 2026-05-21T06:19:06.305785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 4 internal anchors

  1. [1]

    Generative models of cortical oscillations: Neurobiological implications of the kuramoto model.Frontiers in Human Neuroscience, 4:190, 2010

    Michael Breakspear, Stewart Heitmann, and Andreas Daffertshofer. Generative models of cortical oscillations: Neurobiological implications of the kuramoto model.Frontiers in Human Neuroscience, 4:190, 2010

  2. [2]

    Oxford University Press, New York, 2006

    György Buzsáki.Rhythms of the Brain. Oxford University Press, New York, 2006

  3. [3]

    Continuous thought machines.arXiv preprint arXiv:2505.05522, 2025

    Luke Darlow, Ciaran Regan, Sebastian Risi, Jeffrey Seely, and Llion Jones. Continuous thought machines.arXiv preprint arXiv:2505.05522, 2025

  4. [4]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

  5. [5]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

  6. [6]

    Oscillations in an artificial neural network convert competing inputs into a temporal code.PLOS Computational Biology, 20(9):e1012429, 2024

    Katharina Duecker, Marco Idiart, Marcel van Gerven, and Ole Jensen. Oscillations in an artificial neural network convert competing inputs into a temporal code.PLOS Computational Biology, 20(9):e1012429, 2024

  7. [7]

    The functional role of oscillatory dynamics in neocortical circuits: A computational perspective.Proceedings of the National Academy of Sciences, 122(4):e2412830122, 2025

    Felix Effenberger, Pedro Carvalho, Igor Dubinin, and Wolf Singer. The functional role of oscillatory dynamics in neocortical circuits: A computational perspective.Proceedings of the National Academy of Sciences, 122(4):e2412830122, 2025

  8. [8]

    A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence.Trends in Cognitive Sciences, 9(10):474–480, 2005

    Pascal Fries. A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence.Trends in Cognitive Sciences, 9(10):474–480, 2005

  9. [9]

    Temporal hierarchy of intrinsic neural timescales converges with spatial core-periphery organi- zation.Communications Biology, 4(1):277, 2021

    Mehrshad Golesorkhi, Javier Gomez-Pilar, Shankar Tumati, Maia Fraser, and Georg Northoff. Temporal hierarchy of intrinsic neural timescales converges with spatial core-periphery organi- zation.Communications Biology, 4(1):277, 2021

  10. [10]

    Recurrent complex-weighted autoencoders for unsupervised object discovery

    Anand Gopalakrishnan, Aleksandar Stani´c, Jürgen Schmidhuber, and Michael Curtis Mozer. Recurrent complex-weighted autoencoders for unsupervised object discovery. InAdvances in Neural Information Processing Systems, volume 37, 2024

  11. [11]

    Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645:633–638, 2025

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645:633–638, 2025

  12. [12]

    Emergent dynamics of winfree oscillators on locally coupled networks.Journal of Differential Equations, 260(5):4203– 4236, 2016

    Seung-Yeal Ha, Dongnam Ko, Jinyeong Park, and Sang Woo Ryoo. Emergent dynamics of winfree oscillators on locally coupled networks.Journal of Differential Equations, 260(5):4203– 4236, 2016

  13. [13]

    Heeger, and Nava Rubin

    Uri Hasson, Eunice Yang, Ignacio Vallines, David J. Heeger, and Nava Rubin. A hierarchy of temporal receptive windows in human cortex.Journal of Neuroscience, 28(10):2539–2550, 2008

  14. [14]

    Deep residual learning for im- age recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  15. [15]

    Honey, Thomas Thesen, Tobias H

    Christopher J. Honey, Thomas Thesen, Tobias H. Donner, Lauren J. Silbert, Chad E. Carlson, Orrin Devinsky, Werner K. Doyle, Nava Rubin, David J. Heeger, and Uri Hasson. Slow cortical dynamics and the accumulation of information over long timescales.Neuron, 76(2):423–434, 2012

  16. [16]

    Less is More: Recursive Reasoning with Tiny Networks

    Alexia Jolicoeur-Martineau. Less is more: Recursive reasoning with tiny networks.arXiv preprint arXiv:2510.04871, 2025. 10

  17. [17]

    Anderson Keller, Lyle Muller, Terrence J

    T. Anderson Keller, Lyle Muller, Terrence J. Sejnowski, and Max Welling. Traveling waves encode the recent past and enhance sequence learning. InInternational Conference on Learning Representations, 2024

  18. [18]

    Anderson Keller and Max Welling

    T. Anderson Keller and Max Welling. Neural wave machines: Learning spatiotemporally struc- tured representations with locally coupled oscillatory recurrent neural networks. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 16168–16189. PMLR, 2023

  19. [19]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

  20. [20]

    Springer, Berlin, Heidelberg, 1984

    Yoshiki Kuramoto.Chemical Oscillations, Waves, and Turbulence, volume 19 ofSpringer Series in Synergetics. Springer, Berlin, Heidelberg, 1984

  21. [21]

    Beyond a*: Better planning with transformers via search dynamics bootstrapping

    Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, and Yuandong Tian. Beyond a*: Better planning with transformers via search dynamics bootstrapping. InFirst Conference on Language Modeling, 2024

  22. [22]

    Krause Synchronization Transformers

    Jingkun Liu, Yisong Yue, Max Welling, and Yue Song. Krause synchronization transformers. arXiv preprint arXiv:2602.11534, 2026

  23. [23]

    Rotating features for object discovery

    Sindy Löwe, Phillip Lippe, Francesco Locatello, and Max Welling. Rotating features for object discovery. InAdvances in Neural Information Processing Systems, volume 36, 2023

  24. [24]

    Complex-valued autoencoders for object discovery.Transactions on Machine Learning Research, 2022

    Sindy Löwe, Phillip Lippe, Maja Rudolph, and Max Welling. Complex-valued autoencoders for object discovery.Transactions on Machine Learning Research, 2022

  25. [25]

    Manoranjani, Shamik Gupta, D

    M. Manoranjani, Shamik Gupta, D. V . Senthilkumar, and V . K. Chandrasekar. Generalization of the kuramoto model to the winfree model by a symmetry breaking coupling.The European Physical Journal Plus, 138(2):144, 2023

  26. [26]

    Artificial kuramoto oscillatory neurons

    Takeru Miyato, Sindy Löwe, Andreas Geiger, and Max Welling. Artificial kuramoto oscillatory neurons. InInternational Conference on Learning Representations, 2025

  27. [27]

    Sejnowski

    Lyle Muller, Frédéric Chavane, John Reynolds, and Terrence J. Sejnowski. Cortical travelling waves: Mechanisms and computational principles.Nature Reviews Neuroscience, 19(5):255– 268, 2018

  28. [28]

    Phased LSTM: Accelerating recurrent network training for long or event-based sequences

    Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased LSTM: Accelerating recurrent network training for long or event-based sequences. InAdvances in Neural Information Processing Systems, volume 29, 2016

  29. [29]

    Tuan Nguyen, Hirotada Honda, Takashi Sano, Vinh Nguyen, Shugo Nakamura, and Tan M. Nguyen. From coupled oscillators to graph neural networks: Reducing over-smoothing via a kuramoto model-based approach. InProceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 ofProceedings of Machine Learning Research, pages...

  30. [30]

    Recurrent relational networks

    Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. InAdvances in Neural Information Processing Systems, volume 31, 2018

  31. [31]

    Kuranet: Systems of coupled oscillators that learn to synchronize.arXiv preprint arXiv:2105.02838, 2021

    Matthew Ricci, Minju Jung, Yuwei Zhang, Mathieu Chalvidal, Aneri Soni, and Thomas Serre. Kuranet: Systems of coupled oscillators that learn to synchronize.arXiv preprint arXiv:2105.02838, 2021

  32. [32]

    Konstantin Rusch, Benjamin P

    T. Konstantin Rusch, Benjamin P. Chamberlain, James Rowbottom, Siddhartha Mishra, and Michael M. Bronstein. Graph-coupled oscillator networks. InProceedings of the 39th Inter- national Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 18888–18909. PMLR, 2022

  33. [33]

    Konstantin Rusch and Siddhartha Mishra

    T. Konstantin Rusch and Siddhartha Mishra. Coupled oscillatory recurrent neural network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies. InInternational Conference on Learning Representations, 2021. 11

  34. [34]

    Wolf Singer and Charles M. Gray. Visual feature integration and the temporal correlation hypothesis.Annual Review of Neuroscience, 18:555–586, 1995

  35. [35]

    Flow factorized representation learning

    Yue Song, Andy Keller, Nicu Sebe, and Max Welling. Flow factorized representation learning. InAdvances in Neural Information Processing Systems, volume 36, 2023

  36. [36]

    Latent traversals in generative models as potential flows

    Yue Song, Andy Keller, Nicu Sebe, and Max Welling. Latent traversals in generative models as potential flows. InInternational Conference on Machine Learning. PMLR, 2023

  37. [37]

    Anderson Keller, Sevan Brodjian, Takeru Miyato, Yisong Yue, Pietro Perona, and Max Welling

    Yue Song, T. Anderson Keller, Sevan Brodjian, Takeru Miyato, Yisong Yue, Pietro Perona, and Max Welling. Kuramoto orientation diffusion models. InAdvances in Neural Information Processing Systems, 2025

  38. [38]

    Springer Nature, 2025

    Yue Song, Thomas Anderson Keller, Nicu Sebe, and Max Welling.Structured representation learning: from homomorphisms and disentanglement to equivariance and topography. Springer Nature, 2025

  39. [39]

    Contrastive training of complex-valued autoencoders for object discovery

    Aleksandar Stani´c, Anand Gopalakrishnan, Kazuki Irie, and Jürgen Schmidhuber. Contrastive training of complex-valued autoencoders for object discovery. InAdvances in Neural Informa- tion Processing Systems, volume 36, 2023

  40. [40]

    Eshraghian, Nhan Duy Truong, and Omid Kavehei

    Yuchen Tian, Samuel Tensingh, Jason K. Eshraghian, Nhan Duy Truong, and Omid Kavehei. Synchrony-gated plasticity with dopamine modulation for spiking neural networks.Transactions on Machine Learning Research, 2025

  41. [41]

    Training data-efficient image transformers and distillation through attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers and distillation through attention. In International Conference on Machine Learning, pages 10347–10357, 2021

  42. [42]

    Treisman and Garry Gelade

    Anne M. Treisman and Garry Gelade. A feature-integration theory of attention.Cognitive Psychology, 12(1):97–136, 1980

  43. [43]

    The correlation theory of brain function

    Christoph von der Malsburg. The correlation theory of brain function. Technical Report Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, 1981

  44. [44]

    Hierarchical Reasoning Model

    Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, and Yasin Abbasi Yadkori. Hierarchical reasoning model.arXiv preprint arXiv:2506.21734, 2025

  45. [45]

    SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver

    Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. InProceedings of the 36th International Conference on Machine Learning, pages 6545–6554, 2019

  46. [46]

    A. T. Winfree. Biological rhythms and the behavior of populations of coupled oscillators. Journal of Theoretical Biology, 16(1):15–42, 1967

  47. [47]

    Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency

    Mingqing Xiao, Yansen Wang, Dongqi Han, Caihua Shan, and Dongsheng Li. Kuramoto oscillatory phase encoding: Neuro-inspired synchronization for improved learning efficiency. arXiv preprint arXiv:2604.07904, 2026

  48. [48]

    Learning to solve constraint satisfaction problems with recurrent transformer

    Zhun Yang, Adam Ishay, and Joohyung Lee. Learning to solve constraint satisfaction problems with recurrent transformer. InInternational Conference on Learning Representations, 2023. 12 Appendix Contents A Related Work 14 B Discussions 14 B.1 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 B.2 Difference againstAKOrN...