pith. machine review for the scientific record. sign in

arxiv: 2605.10356 · v1 · submitted 2026-05-11 · 🧬 q-bio.NC

Recognition: no theorem link

Cortico-cerebellar modularity as an architectural inductive bias for efficient temporal learning

Alexandra Voce, Claudia Clopath, Emmanouil Giannakakis

Pith reviewed 2026-05-12 05:05 UTC · model grok-4.3

classification 🧬 q-bio.NC
keywords cortico-cerebellarrecurrent neural networkstemporal learningmodularityinductive biascerebellar modulelearning efficiencyRNN architecture
0
0 comments X

The pith

The cortico-cerebellar RNN learns temporal tasks faster and to higher performance than standard recurrent networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether copying the modular split between cerebral cortex and cerebellum can improve how artificial recurrent networks handle sequences over time. It builds a hybrid model with a recurrent core plus a separate feedforward module modeled on cerebellar circuitry. This CB-RNN reaches better accuracy and trains more quickly than fully recurrent networks that use the same total number of parameters. The advantage remains even when the recurrent core is frozen after only a short initial phase and the cerebellar module alone continues to adapt, which implies the modularity itself supplies the efficiency gain.

Core claim

The cortico-cerebellar RNN augments a recurrent cortical module with a cerebellar-inspired feedforward module. This architecture produces faster learning and higher final performance on temporal tasks than parameter-matched fully recurrent baselines. The efficiency advantage survives when the recurrent core is frozen after minimal training and subsequent updates are restricted to the cerebellar module. The results indicate that the cerebellar module is the main source of the improvement and that the cortical recurrent network can largely serve as a fixed reservoir.

What carries the argument

The heterogeneous modular architecture of the CB-RNN, formed by coupling a recurrent cortical core to a feedforward cerebellar module, which supplies a structural inductive bias for temporal learning.

If this is right

  • The CB-RNN learns faster than fully recurrent baselines across temporal tasks of varying difficulty.
  • It reaches higher maximum performance than the baselines.
  • Freezing the recurrent core after minimal training and delegating learning to the cerebellar module preserves the efficiency gains.
  • The cortical network can largely function as a fixed reservoir once initial training is complete.
  • Heterogeneous modular architectures act as a structural inductive bias that improves learning in neural systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same split could lower the cost of online adaptation in deployed recurrent models by limiting weight updates to the smaller module.
  • The principle might generalize to other paired brain regions and corresponding artificial modules for non-temporal tasks.
  • Testing the architecture on larger-scale sequence problems would reveal whether the efficiency benefit scales with network size.
  • The results offer a concrete way to implement reservoir-style computation inside modern recurrent networks without hand-designing the reservoir.

Load-bearing premise

The performance gains arise specifically from the cerebellar-inspired modularity rather than from any unaccounted differences in effective capacity, initialization, or optimization dynamics.

What would settle it

An experiment that replaces the cerebellar module with a random feedforward network of identical size and connectivity statistics, then shows the performance gap disappears while all other factors remain matched, would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.10356 by Alexandra Voce, Claudia Clopath, Emmanouil Giannakakis.

Figure 1
Figure 1. Figure 1: The cortico-cerebellar RNN (CB-RNN) architecture and curriculum design. (a) Cortico-CB loop motivating the model architecture. (b) The CB-RNN combines a recurrent core with a CB-inspired feedforward module, which receives the current input and recurrent hidden state, processes them through GC- and PC-like layers, and returns a learned bias to the RNN. Models trained on N-DMS and N-Parity tasks. (c) Curricu… view at source ↗
Figure 2
Figure 2. Figure 2: CB-RNN modularity improves curriculum learning efficiency and final performance. (a) Task difficulty N reached over training for DMS (left) and Parity (right); shading denotes AUC. (b) Epochs to half of the global maximum N across model scales. CB-RNN models consistently learned faster and reached higher difficulty levels than parameter-matched RNN-only baselines at all model sizes. Across both tasks, CB-R… view at source ↗
Figure 3
Figure 3. Figure 3: CB-RNN models retain learning advantage under multi-task and task-switching demands. (a) Curriculum progression during multi-task training; shading denotes AUC. (b) Phase￾resolved curriculum trajectories during task switching; flat segments reflect epochs where the other task was active, dashed lines mark switch epochs. CB-RNNs accumulated more learning progress and adapted faster following each switch tha… view at source ↗
Figure 4
Figure 4. Figure 4: Cerebellar module drives learning efficiency even when RNN plasticity is restricted. (a) Single-task and (b) multi-task curriculum progression; (c) task-switching trajectories. Reservoir variants outpaced the RNN-only baseline, demonstrating that the CB module is a primary driver of the efficiency advantage. In the full CB-RNN, all modules initially expanded their representational dimensionality with incre… view at source ↗
Figure 5
Figure 5. Figure 5: Restricted recurrent plasticity shifts representational expansion across model modules. d95 across task difficulty N for DMS (a) and Parity (b), in full (left) and reservoir (right) models. In the full model, CB populations compress at higher Ns while RNN hidden state continues to expand; this compression is absent in reservoir models, where CB modules instead sustain representational growth. Lines show me… view at source ↗
Figure 6
Figure 6. Figure 6: Reservoir constraints redistribute intrinsic timescales across recurrent and cerebellar modules. Population timescale (τpop) across task difficulty N for DMS (a) and Parity (b), in full (left) and reservoir (right) models. In the full model, the RNN hidden state dominates long timescales at high Ns, while CB populations decline; in reservoir models, timescales redistribute across CB modules to compensate f… view at source ↗
Figure 7
Figure 7. Figure 7: Cerebellar bias is necessary for task performance and supports class-discriminative representations. (a) Ablation schematic. (b, c) Accuracy across N for full, CB-ablated, and RNN￾only models on DMS and Parity. CB ablation substantially reduced accuracy, with partial recovery on DMS but not Parity. Lines show mean across runs; shading indicates SD. (d, e) t-SNE projections of recurrent hidden-state activit… view at source ↗
Figure 8
Figure 8. Figure 8: GRU-based controls reveal an attenuated and task-dependent CB benefit. Single-task DMS and Parity curriculum progression was compared between CB-GRU models and parameter￾matched GRU-only baselines. On DMS, CB-GRU showed faster early curriculum progression but reached a similar final difficulty to the GRU-only baseline. On Parity, CB-GRU progressed faster and reached higher final task difficulty. Lines show… view at source ↗
Figure 9
Figure 9. Figure 9: CB-RNN performance advantage is preserved across model sizes. Single-task cur￾riculum progression for DMS (a) and Parity (b) across CB-RNNs with different GC-layer sizes and approximately parameter-matched RNN-only baselines. CB-RNNs generally progressed faster and reached higher task difficulty across model scales. To test whether GC-layer expansion contributed to the CB-RNN advantage, we compared the AUC… view at source ↗
Figure 10
Figure 10. Figure 10: GC-layer expansion increases the CB-RNN AUC advantage. Difference in curriculum AUC between CB-RNNs and their approximately parameter-matched RNN-only baselines for non￾expansive (GC = 64) and expanded (GC = 256) CB modules. The expanded GC model showed a larger AUC advantage, most clearly on DMS. Bars show mean AUC difference; error bars show SD across random seed repeats. D.2 Representational Analyses T… view at source ↗
Figure 11
Figure 11. Figure 11: Representational metrics are broadly preserved across GC expansion sizes. (a, b) Dimensionality (d95) and (c, d) population timescale (τpop) were compared between GC = 256 (left) and GC = 512 (right) CB-RNNs. Across both tasks, the qualitative patterns were similar across GC sizes, indicating that the main representational results were not strongly dependent on the chosen GC expansion size. Shaded regions… view at source ↗
Figure 12
Figure 12. Figure 12: CB-RNN performance depends on the information available to the cerebellar module. Single-task DMS and Parity runs were repeated with restricted CB input configurations. (a) When CB input is only hidden state activity ht, DMS performance is largely preserved but Parity performance falls to match baseline. (b) When CB input is restricted to task input xt only, performance across DMS and Parity falls to matc… view at source ↗
Figure 13
Figure 13. Figure 13: Readout-only retraining does not rescue cerebellar pathway ablation. (a) In DMS, disabling the CB pathway strongly reduced accuracy, and retraining only the output readout produced only partial recovery. (b) In Parity, readout retraining produced little recovery after CB pathway ablation. In both tasks, the residual gap from the intact CB-RNN remained substantial, indicating that the CB pathway supports t… view at source ↗
read the original abstract

The cerebellum and cerebral cortex form tightly coupled circuits thought to support flexible and efficient temporal processing. How this interaction shapes cortical learning dynamics, and whether such heterogeneous modularity can benefit artificial systems, remains unclear. Here, we augment a recurrent neural network (RNN) with a cerebellar-inspired feedforward module and evaluate the resulting architecture on temporal tasks of varying difficulty. The cortico-cerebellar RNN (CB-RNN) learns faster and reaches higher maximum performance than parameter-matched fully recurrent baselines across a variety of regimes. Crucially, freezing the recurrent core after minimal training and delegating subsequent learning to the cerebellar module preserves superior learning efficiency, suggesting the cerebellar module is a primary driver of efficiency and that the cortical network can largely function as a fixed reservoir. Our results suggest that heterogeneous modular architectures can act as a powerful structural inductive bias in neural systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a cortico-cerebellar RNN (CB-RNN) architecture that augments a recurrent cortical core with a cerebellar-inspired feedforward module. It claims that this hybrid model learns temporal tasks faster and achieves higher maximum performance than parameter-matched fully recurrent baselines. A key result is that freezing the recurrent core after minimal training and continuing learning only in the cerebellar module preserves the efficiency advantage, suggesting the cortical network can largely function as a fixed reservoir and that the cerebellar module is the primary driver of efficiency. The authors conclude that cortico-cerebellar modularity provides a structural inductive bias for efficient temporal learning.

Significance. If the empirical advantages are robust and specifically attributable to the proposed modularity rather than generic differences in recurrent size or optimization dynamics, the work would demonstrate that heterogeneous modular architectures inspired by biology can serve as effective inductive biases in artificial recurrent networks. The freezing protocol is a useful control for isolating the feedforward module's contribution and could inform more efficient RNN designs for temporal processing.

major comments (3)
  1. [Abstract] Abstract: the central claim that CB-RNN 'learns faster and reaches higher maximum performance' than parameter-matched fully recurrent baselines is presented without any quantitative metrics, effect sizes, learning-curve references, or statistical tests, preventing direct evaluation of the magnitude and reliability of the reported advantage.
  2. [Freezing experiment] Freezing experiment (described in abstract and results): while freezing the recurrent core after minimal training and training only the cerebellar module preserves superior efficiency, the design lacks a control condition using a randomly initialized fixed recurrent reservoir (not derived from the cortico-cerebellar architecture) paired with an equivalent feedforward module. This omission leaves open the possibility that any fixed recurrent core plus feedforward module would yield similar gains, undermining the attribution to cortico-cerebellar modularity specifically.
  3. [Architecture and experimental setup] Architecture and experimental setup: the claim of parameter-matched baselines requires explicit reporting of recurrent-unit counts (CB-RNN recurrent core must be smaller than the baseline to accommodate cerebellar parameters). Without this, differences in effective recurrent capacity, gradient propagation, or optimization landscape could explain the results independently of the modular inductive bias.
minor comments (2)
  1. [Results] All performance comparisons should report means and error bars across multiple random seeds together with appropriate statistical tests; this is especially important given the emphasis on 'faster learning' and 'higher maximum performance'.
  2. [Methods] A schematic diagram or explicit equations defining the cerebellar-inspired feedforward module (connectivity, activation, parameter count) would improve clarity and reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment point by point below, outlining the revisions we will make to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that CB-RNN 'learns faster and reaches higher maximum performance' than parameter-matched fully recurrent baselines is presented without any quantitative metrics, effect sizes, learning-curve references, or statistical tests, preventing direct evaluation of the magnitude and reliability of the reported advantage.

    Authors: We agree that the abstract would benefit from greater specificity to allow immediate assessment of the claims. In the revised manuscript, we will update the abstract to include quantitative details such as approximate effect sizes for learning speed and peak performance (drawn from the main results), along with explicit references to the relevant learning curves and statistical tests reported in the results section. revision: yes

  2. Referee: [Freezing experiment] Freezing experiment (described in abstract and results): while freezing the recurrent core after minimal training and training only the cerebellar module preserves superior efficiency, the design lacks a control condition using a randomly initialized fixed recurrent reservoir (not derived from the cortico-cerebellar architecture) paired with an equivalent feedforward module. This omission leaves open the possibility that any fixed recurrent core plus feedforward module would yield similar gains, undermining the attribution to cortico-cerebellar modularity specifically.

    Authors: This is a fair and important point. Our freezing protocol uses a minimally trained cortical core to show that the cerebellar module can sustain efficient learning, but we acknowledge that a random fixed-reservoir control would more cleanly isolate whether the advantage stems specifically from cortico-cerebellar modularity rather than any fixed recurrent component plus feedforward module. We will add this control condition in the revised manuscript. revision: yes

  3. Referee: [Architecture and experimental setup] Architecture and experimental setup: the claim of parameter-matched baselines requires explicit reporting of recurrent-unit counts (CB-RNN recurrent core must be smaller than the baseline to accommodate cerebellar parameters). Without this, differences in effective recurrent capacity, gradient propagation, or optimization landscape could explain the results independently of the modular inductive bias.

    Authors: We thank the referee for highlighting this need for transparency. Although the total parameter counts are matched between conditions, we will revise the methods and results sections to explicitly report the number of recurrent units in the CB-RNN cortical core (reduced to accommodate cerebellar parameters) versus the fully recurrent baseline. This will allow readers to directly evaluate any differences in recurrent capacity or dynamics. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical architecture comparison with no derivations or self-referential fitting.

full rationale

The paper contains no equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations. All claims rest on direct simulation results comparing the CB-RNN to parameter-matched fully recurrent baselines, including the freezing experiment. These are external benchmarks (task performance metrics) that do not reduce to the paper's own inputs by construction. The skeptic concern about unaccounted capacity differences is a question of experimental controls, not circularity in any claimed derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract; the work is purely empirical.

pith-pipeline@v0.9.0 · 5451 in / 1139 out tokens · 40228 ms · 2026-05-12T05:05:18.480342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Brna, Suraj Chakravarthi Raja, Nick Cheney, Jeff Clune, Anurag Daram, Stefano Fusi, Peter Helfer, Leslie Kay, Nicholas Ketz, Zsolt Kira, Soheil Kolouri, Jeffrey L

    Dhireesha Kudithipudi, Mario Aguilar-Simon, Jonathan Babb, Maxim Bazhenov, Douglas Blackiston, Josh Bongard, Andrew P. Brna, Suraj Chakravarthi Raja, Nick Cheney, Jeff Clune, Anurag Daram, Stefano Fusi, Peter Helfer, Leslie Kay, Nicholas Ketz, Zsolt Kira, Soheil Kolouri, Jeffrey L. Krichmar, Sam Kriegman, Michael Levin, Sandeep Madireddy, Santosh Manicka,...

  2. [2]

    Lake, Tomer D

    Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. Building machines that learn and think like people.Behavioral and Brain Sciences, 40:e253, 2017

  3. [3]

    Inductive biases for deep learning of higher-level cogni- tion.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478(2266):20210068, October 2022

    Anirudh Goyal and Yoshua Bengio. Inductive biases for deep learning of higher-level cogni- tion.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478(2266):20210068, October 2022

  4. [4]

    Marblestone, Greg Wayne, and Konrad P

    Adam H. Marblestone, Greg Wayne, and Konrad P. Kording. Toward an Integration of Deep Learning and Neuroscience.Frontiers in Computational Neuroscience, 10, September 2016

  5. [5]

    Modularity is the Bedrock of Natural and Artificial Intelligence, February

    Alessandro Salatiello. Modularity is the Bedrock of Natural and Artificial Intelligence, February

  6. [6]

    arXiv:2602.18960 [cs] version: 1

  7. [7]

    A hierarchy of time-scales and the brain

    Stefan J Kiebel, Jean Daunizeau, and Karl J Friston. A hierarchy of time-scales and the brain. PLoS computational biology, 4(11):e1000209, 2008

  8. [8]

    Miller and Jonathan D

    Earl K. Miller and Jonathan D. Cohen. An Integrative Theory of Prefrontal Cortex Function. Annual Review of Neuroscience, 24(1):167–202, March 2001

  9. [9]

    Shenoy, and William T

    Valerio Mante, David Sussillo, Krishna V . Shenoy, and William T. Newsome. Context-dependent computation by recurrent dynamics in prefrontal cortex.Nature, 503(7474):78–84, November 2013

  10. [10]

    A theory of cerebellar cortex.The Journal of Physiology, 202(2):437–470, June 1969

    David Marr. A theory of cerebellar cortex.The Journal of Physiology, 202(2):437–470, June 1969

  11. [11]

    Alex Cayco-Gajic and R

    N. Alex Cayco-Gajic and R. Angus Silver. Re-evaluating Circuit Mechanisms Underlying Pattern Separation.Neuron, 101(4):584–602, February 2019

  12. [12]

    Fredrik Johansson, Dan-Anders Jirenhed, Anders Rasmussen, Riccardo Zucca, and Germund Hesslow. Memory trace and timing mechanism localized to cerebellar Purkinje cells.Proceed- ings of the National Academy of Sciences of the United States of America, 111(41):14930–14934, October 2014

  13. [13]

    Middleton and Peter L

    Frank A. Middleton and Peter L. Strick. Cerebellar Projections to the Prefrontal Cortex of the Primate.The Journal of Neuroscience, 21(2):700–712, January 2001

  14. [14]

    Thomas, Michael N

    Zhenyu Gao, Courtney Davis, Alyse M. Thomas, Michael N. Economo, Amada M. Abrego, Karel Svoboda, Chris I. De Zeeuw, and Nuo Li. A cortico-cerebellar loop for motor planning. Nature, 563(7729):113–116, November 2018

  15. [15]

    Tank, and Samuel S.-H

    Ben Deverett, Mikhail Kislin, David W. Tank, and Samuel S.-H. Wang. Cerebellar disruption impairs working memory during evidence accumulation.Nature Communications, 10(1):3128, July 2019

  16. [16]

    Verpeut, Silke Bergeler, Mikhail Kislin, F

    Jessica L. Verpeut, Silke Bergeler, Mikhail Kislin, F. William Townes, Ugne Klibaite, Zahra M. Dhanerawala, Austin Hoag, Sanjeev Janarthanan, Caroline Jung, Junuk Lee, Thomas J. Pisano, Kelly M. Seagraves, Joshua W. Shaevitz, and Samuel S.-H. Wang. Cerebellar contributions to a brainwide network for flexible behavior in mice.Communications Biology, 6(1):6...

  17. [17]

    Jeffrey L. Elman. Finding Structure in Time.Cognitive Science, 14(2):179–211, March 1990

  18. [18]

    Nicolas Perez-Nieves, Vincent C. H. Leung, Pier Luigi Dragotti, and Dan F. M. Goodman. Neural heterogeneity promotes robust learning.Nature Communications, 12(1):5791, October 2021

  19. [19]

    Soo, Vishwa Goudar, and Xiao-Jing Wang

    Wayne W.M. Soo, Vishwa Goudar, and Xiao-Jing Wang. Training biologically plausible recurrent neural networks on cognitive tasks with long-term dependencies, October 2023

  20. [20]

    Versatile Learning without Synaptic Plasticity in a Spiking Neural Network.bioRxiv, January 2026

    Kai Mason, Navid Akbari, Aaron Gruber, and Wilten Nicola. Versatile Learning without Synaptic Plasticity in a Spiking Neural Network.bioRxiv, January 2026

  21. [21]

    Perich, Luca Mazzucato, and Guillaume Lajoie

    Ezekiel Williams, Alexandre Payeur, Avery Hee-Woon Ryoo, Thomas Jiralerspong, Matthew G. Perich, Luca Mazzucato, and Guillaume Lajoie. Expressivity of Neural Networks with Random Weights and Learned Biases, 2024. Version Number: 3

  22. [22]

    Bostan, Peter L

    Daniele Caligiore, Giovanni Pezzulo, Gianluca Baldassarre, Andreea C. Bostan, Peter L. Strick, Kenji Doya, Rick C. Helmich, Michiel Dirkx, James Houk, Henrik Jörntell, Angel Lago-Rodriguez, Joseph M. Galea, R. Chris Miall, Traian Popa, Asha Kishore, Paul F. M. J. Verschure, Riccardo Zucca, and Ivan Herreros. Consensus Paper: Towards a Systems-Level View o...

  23. [23]

    The Cerebro-Cerebellum as a Locus of Forward Model: A Review.Frontiers in Systems Neuroscience, 14:19, April 2020

    Hirokazu Tanaka, Takahiro Ishikawa, Jongho Lee, and Shinji Kakei. The Cerebro-Cerebellum as a Locus of Forward Model: A Review.Frontiers in Systems Neuroscience, 14:19, April 2020

  24. [24]

    Abigail L. Person. Corollary Discharge Signals in the Cerebellum.Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 4(9):813–819, September 2019

  25. [25]

    Rancz, Taro Ishikawa, Ian Duguid, Paul Chadderton, Séverine Mahon, and Michael Häusser

    Ede A. Rancz, Taro Ishikawa, Ian Duguid, Paul Chadderton, Séverine Mahon, and Michael Häusser. High-fidelity transmission of sensory information by single cerebellar mossy fibre boutons.Nature, 450(7173):1245–1248, December 2007

  26. [26]

    Stoodley and Peter T

    Catherine J. Stoodley and Peter T. Tsai. Adaptive Prediction for Social Contexts: The Cerebellar Contribution to Typical and Atypical Social Behaviors.Annual Review of Neuroscience, 44(V olume 44, 2021):475–493, July 2021

  27. [27]

    Cerebro- cerebellar networks facilitate learning through feedback decoupling.Nature Communications, 14(1):51, January 2023

    Ellen Boven, Joseph Pemberton, Paul Chadderton, Richard Apps, and Rui Ponte Costa. Cerebro- cerebellar networks facilitate learning through feedback decoupling.Nature Communications, 14(1):51, January 2023

  28. [28]

    Cerebellar-driven cortical dynamics can enable task acquisition, switching and consolidation.Nature Communications, 15(1):10913, 2024

    Joseph Pemberton, Paul Chadderton, and Rui Ponte Costa. Cerebellar-driven cortical dynamics can enable task acquisition, switching and consolidation.Nature Communications, 15(1):10913, 2024

  29. [29]

    Alex Cayco-Gajic

    Leonardo Agueci and N. Alex Cayco-Gajic. Distributed learning across fast and slow neural systems supports efficient motor adaptation, June 2025. Pages: 2025.06.01.657238 Section: New Results

  30. [30]

    Rectifier nonlinearities improve neural network acoustic models

    Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearities improve neural network acoustic models. InProceedings of the 30th International Conference on Machine Learning, volume 30, Atlanta, Georgia, 2013

  31. [31]

    How to Construct Deep Recurrent Neural Networks, April 2014

    Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. How to Construct Deep Recurrent Neural Networks, April 2014. arXiv:1312.6026 [cs]

  32. [32]

    Learn- ing to (learn at test time): Rnns with expressive hidden states.arXiv preprint arXiv:2407.04620,

    Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, et al. Learning to (learn at test time): Rnns with expressive hidden states.arXiv preprint arXiv:2407.04620, 2024

  33. [33]

    Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

    Sina Khajehabdollahi, Roxana Zeraati, Emmanouil Giannakakis, Tim Jakob Schäfer, Georg Martius, and Anna Levina. Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks. 2023. Version Number: 3. 11

  34. [34]

    Curriculum learning

    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 41–48, New York, NY , USA, June 2009. Association for Computing Machinery

  35. [35]

    Engel, and Anna Levina

    Roxana Zeraati, Tatiana A. Engel, and Anna Levina. A flexible Bayesian framework for unbiased estimation of timescales.Nature Computational Science, 2(3):193–204, March 2022

  36. [36]

    Müller, Fulvia Palesi, Kevin Y

    Eli J. Müller, Fulvia Palesi, Kevin Y . Hou, Joshua Tan, Thomas Close, Claudia A. M. Gandini Wheeler-Kingschott, Egidio D’Angelo, Fernando Calamante, and James M. Shine. Parallel processing relies on a distributed, low-dimensional cortico-cerebellar architecture.Network Neuroscience, 7(2):844–863, June 2023

  37. [37]

    Jutta Peterburs, David Hofmann, Michael P. I. Becker, Alexander M. Nitsch, Wolfgang H. R. Miltner, and Thomas Straube. The role of the cerebellum for feedback processing and behavioral switching in a reversal-learning task.Brain and Cognition, 125:142–148, August 2018

  38. [38]

    Stoodley, Anila M

    Catherine J. Stoodley, Anila M. D’Mello, Jacob Ellegood, Vikram Jakkamsetti, Pei Liu, Mary Beth Nebel, Jennifer M. Gibson, Elyza Kelly, Fantao Meng, Christopher A. Cano, Juan M. Pascual, Stewart H. Mostofsky, Jason P. Lerch, and Peter T. Tsai. Altered cerebellar connectivity in autism and cerebellar-mediated rescue of autism-related behaviors in mice. Nat...

  39. [39]

    Cerebellar output shapes cortical preparatory activity during motor adaptation.Nature Communications, 16(1):2574, March 2025

    Sharon Israely, Hugo Ninou, Ori Rajchert, Lee Elmaleh, Ran Harel, Firas Mawase, Jonathan Kadmon, and Yifat Prut. Cerebellar output shapes cortical preparatory activity during motor adaptation.Nature Communications, 16(1):2574, March 2025

  40. [40]

    Kirschen, S

    Matthew P. Kirschen, S. H. Annabel Chen, Pamela Schraedley-Desmond, and John E. Desmond. Load- and practice-dependent increases in cerebro-cerebellar activation in verbal working memory: an fMRI study.NeuroImage, 24(2):462–472, January 2005

  41. [41]

    Herzfeld, Damien Pastor, Adrian M

    David J. Herzfeld, Damien Pastor, Adrian M. Haith, Yves Rossetti, Reza Shadmehr, and Jacinta O’Shea. Contributions of the cerebellum and the motor cortex to acquisition and retention of motor memories.NeuroImage, 98:147–158, September 2014

  42. [42]

    Angus Silver

    Hana Roš, Yizhou Xie, Sadra Sadeh, and R. Angus Silver. Population activity of mossy fibre axon input to the cerebellar cortex during behaviours, March 2025. Pages: 2025.03.23.644738 Section: New Results

  43. [43]

    Dynamic organization of cerebellar climbing fiber response and synchrony in multiple functional components reduces dimensions for reinforcement learning

    Huu Hoang, Shinichiro Tsutsumi, Masanori Matsuzaki, Masanobu Kano, Mitsuo Kawato, Kazuo Kitamura, and Keisuke Toyama. Dynamic organization of cerebellar climbing fiber response and synchrony in multiple functional components reduces dimensions for reinforcement learning. eLife, 12:e86340, September 2023

  44. [44]

    The Cerebellar Thalamus.The Cerebel- lum, March 2019

    Christophe Habas, Mario Manto, and Pierre Cabaraux. The Cerebellar Thalamus.The Cerebel- lum, March 2019

  45. [45]

    Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells.The Journal of Physiology, 324(1):113–134, March 1982

    Masao Ito, Masaki Sakurai, and Pavich Tongroach. Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells.The Journal of Physiology, 324(1):113–134, March 1982

  46. [46]

    Lillicrap, Adam Santoro, Luke Marris, Colin J

    Timothy P. Lillicrap, Adam Santoro, Luke Marris, Colin J. Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, June 2020

  47. [47]

    Local online learning in recurrent networks with random feedback.eLife, 8:e43299, May 2019

    James M Murray. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, May 2019

  48. [48]

    A solution to the learning dilemma for recurrent networks of spiking neurons.Nature Communications, 11(1):3625, July 2020

    Guillaume Bellec, Franz Scherr, Anand Subramoney, Elias Hajek, Darjan Salaj, Robert Leg- enstein, and Wolfgang Maass. A solution to the learning dilemma for recurrent networks of spiking neurons.Nature Communications, 11(1):3625, July 2020. 12 A Compute resources. All experiments were run on a local workstation equipped with three NVIDIA GPUs: one GeForce...