Distilling a Modular Reservoir Through a Genomic Bottleneck

Charley M. Wu; Emmanouil Giannakakis; Mani Hamidi; Sina Khajehabdollahi

arxiv: 2606.28380 · v1 · pith:L7H324ANnew · submitted 2026-06-20 · 💻 cs.NE · cs.AI

Distilling a Modular Reservoir Through a Genomic Bottleneck

Mani Hamidi , Sina Khajehabdollahi , Charley M. Wu , Emmanouil Giannakakis This is my paper

Pith reviewed 2026-06-30 11:29 UTC · model grok-4.3

classification 💻 cs.NE cs.AI

keywords hypernetworksmodular reservoir computingcurriculum meta-learningsparse recurrent networkstemporal tasksgenomic bottleneckconnectivity generation

0 comments

The pith

Hypernetworks learn a compressed generative process that produces sparse modular reservoirs capable of solving difficult temporal tasks with minimal training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a hypernetwork, trained via curriculum-based meta-learning, can act as a generative model to decode connectivity for a modular reservoir from a compressed representation. This draws from the biological analogy of a genome guiding initial neural structure before experience-based refinement. If successful, the resulting sparse recurrent networks handle temporal processing tasks efficiently without needing extensive additional optimization or losing robustness. A sympathetic reader would care because the method offers a way to initialize structured networks that combine evolutionary-style compression with developmental plasticity.

Core claim

A hypernetwork trained through curriculum-based meta-learning can generate the connectivity of a modular reservoir from a compressed blueprint, yielding sparse recurrent networks that solve difficult temporal tasks with minimal training and without concessions to robustness.

What carries the argument

The hypernetwork as a compressed generative process that produces modular and sparse reservoir connectivity.

If this is right

The generated networks require only minimal training on new temporal tasks.
Sparsity and modularity in the produced connectivity preserve task performance and robustness.
Curriculum meta-learning enables the hypernetwork to scale the generative process across varying task difficulties.
The approach bridges compressed blueprint generation with subsequent plasticity for efficient recurrent computation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be extended to generate initial connectivity for other recurrent architectures beyond reservoirs.
If the hypernetwork generalizes across domains, it might reduce the need for task-specific architecture search in sequential data problems.
Testing the generated networks on real-world time series benchmarks would clarify practical utility beyond synthetic tasks.

Load-bearing premise

A hypernetwork trained via curriculum meta-learning can reliably produce functional modular and sparse connectivity that generalizes to difficult temporal tasks.

What would settle it

A direct test in which hypernetwork-generated reservoirs consistently fail to solve the target temporal tasks or require substantial further training to reach performance would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.28380 by Charley M. Wu, Emmanouil Giannakakis, Mani Hamidi, Sina Khajehabdollahi.

**Figure 2.** Figure 2: Implementing an indirect learning scheme using a g-net. Learning takes place at two levels: in the “evolutionary” or genomic level, the hypernetworks (“g-nets”), are trained to generate just the inter-module weights of the RNN (the “p-net”). In the “lifelong learning” phase, the remaining input and output parameters of the p-net are further trained. b) following past (Khajehabdollahi et al., 2024; Hamidi… view at source ↗

**Figure 3.** Figure 3: Performance and parameter efficiency. A. Parameter count (|Θ|) scaling: directly trained parameters (solid) grow with N and M2 , while indirectly trained parameters (dashed) scale much more favorably. B. Learning efficiency (maximum N solved per trainable parameter, Nmax |Θ| ) over training epochs. Solid lines = direct (both tasks), dashed = indirect (parity), dotted = indirect (DMS). The indirect adva… view at source ↗

**Figure 4.** Figure 4: A,B shows how hierarchical networks’ performance is affected by perturbations of their connectivity weights for both parity and DMS, where we measured the average accuracy with which the perturbed networks continued to solve the tasks. We perturbed the learnedWF m connections of both directly and indirectly trained networks that had learned to solve up to and including N = 40. Different magnitudes of pert… view at source ↗

**Figure 5.** Figure 5: Compressibility of WF across modules. Panels A–D show representative parity networks. A. Directly trained connections are fully uncorrelated between modules, while indirectly learned connections (B) are highly conserved. C. Directly trained weights change rapidly across modules; D. indirectly learned weights show minimal variation. E. SVD rank-90 (number of components for 90% reconstruction fidelity) for b… view at source ↗

**Figure 6.** Figure 6: Within-module compressibility of WF m. A. Hierarchically clustered weight heatmaps at module m=20 for a representative parity network (dashed border = direct, solid = indirect). Indirectly learned weights exhibit more regular block structure. B. Within-module SVD rank-90 across depth, split by training method: Direct (left) and Indirect (right, note different y-scale). Solid lines = parity, dashdot = DM… view at source ↗

read the original abstract

The intricate structures of biological neural networks largely emerge during development, guided by a comparatively compressed blueprint encoded in the genome. The connectivity that emerges from this decoding process is rich in structure, and already equips the organism with functional modules upon birth. This initial structure serves as a scaffold that can be gradually refined and fine-tuned through lifelong experience, via a variety of plasticity mechanisms. Drawing inspiration from this interaction between evolutionary and developmental modes of learning, we use hypernetworks to learn a compressed generative process that generates the connectivity of a modular reservoir. We show that this marriage between curriculum-based meta-learning and modular reservoir computing can generate sparse recurrent networks that solve difficult temporal tasks with minimal training and without concessions to robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hypernetwork as genomic bottleneck for modular reservoirs is a fresh framing but the abstract asserts performance without any supporting data or details.

read the letter

The one thing to know is that this paper claims a hypernetwork trained with curriculum meta-learning can generate sparse modular reservoirs that solve difficult temporal tasks efficiently, but the abstract gives no numbers or details to support that.

What is new is the use of hypernetworks specifically as a compressed blueprint for reservoir connectivity, combined with meta-learning to produce modular and sparse structures inspired by biological development.

The paper does a good job explaining the motivation from evolutionary and developmental learning and how it translates to the technical setup of generating recurrent network weights via the hypernetwork.

On the positive side, if it works, it could offer a way to create reservoirs with built-in structure rather than relying on random initialization.

The soft spots are clear from the abstract: the soundness is weak because there are no quantitative results, baselines, ablation studies, or error bars provided. The claim about minimal training and no concessions to robustness is asserted without evidence of the tasks used, the curriculum schedule, or any robustness measures like noise tolerance.

The weakest assumption is that the generated connectivity generalizes to held-out difficult tasks while maintaining efficiency and robustness, and that assumption is not evidenced here.

This paper is for researchers in reservoir computing and meta-learning interested in biologically inspired methods for generating network architectures.

A reader who wants to see new generative techniques for structured reservoirs might find value if the full paper delivers on the experiments.

I would recommend sending it for peer review to evaluate the actual results and methods in detail.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes using hypernetworks trained via curriculum-based meta-learning to learn a compressed generative process ('genomic bottleneck') that produces the connectivity of a modular reservoir. The central claim is that this combination generates sparse recurrent networks capable of solving difficult temporal tasks with minimal training and without concessions to robustness, drawing an analogy to biological development where a compressed genome guides initial structured connectivity that is later refined.

Significance. If the empirical claims hold with proper validation, the approach could provide a biologically inspired route to generating structured, sparse reservoirs that generalize efficiently, potentially advancing meta-learning applications in reservoir computing by reducing the need for extensive per-task training while preserving robustness properties.

major comments (2)

[Abstract] Abstract: The central performance claim ('generate sparse recurrent networks that solve difficult temporal tasks with minimal training and without concessions to robustness') is asserted without any quantitative results, baselines, ablation studies, error bars, task descriptions, curriculum schedule, sparsity/modularity metrics, training budget comparisons, or robustness measures (e.g., noise tolerance). This makes it impossible to assess whether the generated connectivity transfers to held-out tasks while preserving the stated efficiency and robustness.
[Abstract / Introduction] The assumption that the hypernetwork reliably produces functional modular/sparse connectivity that generalizes is stated as the core contribution but lacks any description of the temporal tasks, how the curriculum meta-learning schedule is constructed, or quantitative evidence of generalization beyond meta-training.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We address the major points below and outline revisions to improve clarity and accessibility of the empirical claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim ('generate sparse recurrent networks that solve difficult temporal tasks with minimal training and without concessions to robustness') is asserted without any quantitative results, baselines, ablation studies, error bars, task descriptions, curriculum schedule, sparsity/modularity metrics, training budget comparisons, or robustness measures (e.g., noise tolerance). This makes it impossible to assess whether the generated connectivity transfers to held-out tasks while preserving the stated efficiency and robustness.

Authors: We agree that the abstract, as a high-level summary, does not include the requested quantitative details, metrics, or evidence. The full manuscript reports these in the Experiments section, including task performance numbers, baseline comparisons, ablations on the genomic bottleneck, error bars, curriculum details, sparsity and modularity metrics, training budgets, and robustness measures such as noise tolerance. To address the concern directly, we will revise the abstract to incorporate key quantitative highlights supporting the central claim. revision: yes
Referee: [Abstract / Introduction] The assumption that the hypernetwork reliably produces functional modular/sparse connectivity that generalizes is stated as the core contribution but lacks any description of the temporal tasks, how the curriculum meta-learning schedule is constructed, or quantitative evidence of generalization beyond meta-training.

Authors: The manuscript describes the temporal tasks, curriculum meta-learning schedule construction, and quantitative generalization evidence in the Methods and Results sections. However, we acknowledge that the abstract and introduction do not sufficiently preview these elements to support the core claim upfront. We will add a concise overview of the tasks, schedule, and generalization metrics to the introduction. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual framework with no fitted predictions or self-referential reductions

full rationale

The provided abstract and description contain no equations, parameter fits, or derivation steps. The central claim is an empirical assertion that a hypernetwork trained via curriculum meta-learning generates functional modular reservoirs; this is presented as a demonstration rather than a mathematical reduction of outputs to inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked in the given text. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations or implementation details, so free parameters, axioms, and invented entities cannot be enumerated.

pith-pipeline@v0.9.1-grok · 5652 in / 939 out tokens · 34073 ms · 2026-06-30T11:29:31.856940+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 10 canonical work pages · 4 internal anchors

[1]

Duplication of modules facilitates the evolution of functional specialization

Calabretta, R and Nolfi, S and Parisi, D and Wagner, G P. Duplication of modules facilitates the evolution of functional specialization. Artif. Life
[2]

2013 , publisher=

From DNA to diversity: molecular genetics and the evolution of animal design , author=. 2013 , publisher=

2013
[3]

Teacher-student compression with generative adversarial networks

Liu, Ruishan and Fusi, Nicolo and Mackey, Lester. Teacher-student compression with generative adversarial networks. arXiv [cs.LG]
[4]

Teacher-class network: A neural network compression mechanism

Malik, Shaiq Munir and Haider, Muhammad Umair and Tharani, Mohbat and Rasheed, Musab and Taj, Murtaza. Teacher-class network: A neural network compression mechanism. arXiv [cs.LG]
[5]

Superposition of many models into one

Cheung, Brian and Terekhov, Alex and Chen, Yubei and Agrawal, Pulkit and Olshausen, Bruno. Superposition of many models into one. arXiv:1902.05522

work page internal anchor Pith review Pith/arXiv arXiv 1902
[6]

Multi-Rate VAE : Train Once, Get the Full Rate-Distortion Curve

Bae, Juhan and Zhang, Michael R and Ruan, Michael and Wang, Eric and Hasegawa, So and Ba, Jimmy and Grosse, Roger. Multi-Rate VAE : Train Once, Get the Full Rate-Distortion Curve. arXiv [cs.LG]
[7]

arXiv preprint arXiv:2502.20237 , year=

Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks , author=. arXiv preprint arXiv:2502.20237 , year=

work page arXiv
[8]

Advances in neural information processing systems , volume=

Generalization in reinforcement learning with selective noise injection and information bottleneck , author=. Advances in neural information processing systems , volume=
[9]

ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

Meta-Learning an Evolvable Developmental Encoding , author=. ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

2024
[10]

Proceedings of the Genetic and Evolutionary Computation Conference , pages=

Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents , author=. Proceedings of the Genetic and Evolutionary Computation Conference , pages=
[11]

Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages=

Growing artificial neural networks for control: the role of neuronal diversity , author=. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages=
[12]

Artificial Life Conference Proceedings 35 , volume=

Towards self-assembling artificial neural networks through neural developmental programs , author=. Artificial Life Conference Proceedings 35 , volume=. 2023 , organization=

2023
[13]

2023 , month =

Najarro, Elias and Sudhakaran, Shyam and Risi, Sebastian , title = ". 2023 , month =. doi:10.1162/isal_a_00697 , url =

work page doi:10.1162/isal_a_00697 2023
[14]

Recent advances in physical reservoir computing: A review

Tanaka, Gouhei and Yamane, Toshiyuki and Héroux, Jean Benoit and Nakane, Ryosho and Kanazawa, Naoki and Takeda, Seiji and Numata, Hidetoshi and Nakano, Daiju and Hirose, Akira. Recent advances in physical reservoir computing: A review. Neural Netw
[15]

Development , volume=

Understanding axon guidance: are we nearly there yet? , author=. Development , volume=. 2018 , publisher=

2018
[16]

On the existence of information bottlenecks in living and non-living systems

Crosscombe, Michael and Sato, Hiroki. On the existence of information bottlenecks in living and non-living systems. The 2023 Conference on Artificial Life

2023
[17]

The basal ganglia over 500 million years

Grillner, Sten and Robertson, Brita. The basal ganglia over 500 million years. Curr. Biol
[18]

Evolutionary conservation of the basal ganglia as a common vertebrate mechanism for action selection

Stephenson-Jones, Marcus and Samuelsson, Ebba and Ericsson, Jesper and Robertson, Brita and Grillner, Sten. Evolutionary conservation of the basal ganglia as a common vertebrate mechanism for action selection. Curr. Biol
[19]

Resynthesizing behavior through phylogenetic refinement

Cisek, Paul. Resynthesizing behavior through phylogenetic refinement. Atten. Percept. Psychophys
[20]

Meta-learning by the Baldwin effect

Fernando, Chrisantha Thomas and Sygnowski, Jakub and Osindero, Simon and Wang, Jane and Schaul, Tom and Teplyashin, Denis and Sprechmann, Pablo and Pritzel, Alexander and Rusu, Andrei A. Meta-learning by the Baldwin effect. arXiv [cs.NE]
[21]

Novelty and imitation within the brain: a Darwinian neurodynamic approach to combinatorial problems

Czégel, Dániel and Giaffar, Hamza and Csillag, Márton and Futó, Bálint and Szathmáry, Eörs. Novelty and imitation within the brain: a Darwinian neurodynamic approach to combinatorial problems. Sci. Rep
[22]

bioRxiv , year=

A cortical information bottleneck during decision-making , author=. bioRxiv , year=
[23]

Elife , volume=

Circuits for integrating learned and innate valences in the insect brain , author=. Elife , volume=. 2021 , publisher=

2021
[24]

Computer science review , volume=

Reservoir computing approaches to recurrent neural network training , author=. Computer science review , volume=. 2009 , publisher=

2009
[25]

Deep Reservoir Computing

Gallicchio, Claudio and Micheli, Alessio. Deep Reservoir Computing. Reservoir Computing: Theory, Physical Implementations, and Applications. 2021. doi:10.1007/978-981-13-1687-6_4

work page doi:10.1007/978-981-13-1687-6_4 2021
[26]

A role for relaxed selection in the evolution of the language capacity

Deacon, Terrence W. A role for relaxed selection in the evolution of the language capacity. Proc. Natl. Acad. Sci. U. S. A
[27]

Science Advances , volume=

Inductive biases of neural network modularity in spatial navigation , author=. Science Advances , volume=. 2024 , publisher=

2024
[28]

Curiosity driven exploration of learned disentangled goal spaces

Laversanne-Finot, Adrien and Péré, Alexandre and Oudeyer, Pierre-Yves. Curiosity driven exploration of learned disentangled goal spaces. arXiv [cs.LG]
[29]

Representation learning in deep RL via discrete information bottleneck

Islam, Riashat and Zang, Hongyu and Tomar, Manan and Didolkar, Aniket and Islam, Md Mofijul and Arnob, Samin Yeasar and Iqbal, Tariq and Li, Xin and Goyal, Anirudh and Heess, Nicolas and Lamb, Alex. Representation learning in deep RL via discrete information bottleneck. arXiv [cs.LG]
[30]

Annual review of biochemistry , volume=

Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints , author=. Annual review of biochemistry , volume=. 2004 , publisher=

2004
[31]

BioSystems , volume=

Error-correcting codes and information in biology , author=. BioSystems , volume=. 2019 , publisher=

2019
[32]

arXiv preprint arXiv:2001.08028 , year=

Natural selection finds natural gradient , author=. arXiv preprint arXiv:2001.08028 , year=

work page arXiv 2001
[33]

Evolutionary Optimization of Model Merging Recipes

Akiba, Takuya and Shing, Makoto and Tang, Yujin and Sun, Qi and Ha, David. Evolutionary Optimization of Model Merging Recipes. arXiv [cs.NE]
[34]

Nature Machine Intelligence , volume=

Designing neural networks through neuroevolution , author=. Nature Machine Intelligence , volume=. 2019 , publisher=

2019
[35]

Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration

Kim, Jaekyeom and Kim, Minjung and Woo, Dongyeon and Kim, Gunhee. Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration. arXiv [cs.LG]
[36]

Measuring compositionality in representation learning

Andreas, Jacob. Measuring compositionality in representation learning. Int Conf Learn Represent
[37]

Journal of Petroleum Science and Engineering , volume=

A fast and independent architecture of artificial neural network for permeability prediction , author=. Journal of Petroleum Science and Engineering , volume=. 2012 , publisher=

2012
[38]

Network Neuroscience , volume=

Optimal modularity and memory capacity of neural reservoirs , author=. Network Neuroscience , volume=. 2019 , publisher=

2019
[39]

Journal of Comparative Neurology , volume=

The modular organization of the cerebral cortex: Evolutionary significance and possible links to neurodevelopmental conditions , author=. Journal of Comparative Neurology , volume=. 2019 , publisher=

2019
[40]

Proceedings of the National Academy of Sciences , volume=

The modular and integrative functional architecture of the human brain , author=. Proceedings of the National Academy of Sciences , volume=. 2015 , publisher=

2015
[41]

Neural networks , volume=

Design and evolution of modular neural network architectures , author=. Neural networks , volume=. 1994 , publisher=

1994
[42]

The Twelfth International Conference on Learning Representations , year=

Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks , author=. The Twelfth International Conference on Learning Representations , year=
[43]

Frontiers in neuroscience , volume=

Modular and hierarchically modular organization of brain networks , author=. Frontiers in neuroscience , volume=. 2010 , publisher=

2010
[44]

2023 , eprint=

Principled Weight Initialization for Hypernetworks , author=. 2023 , eprint=

2023
[45]

Breaking neural network scaling laws with modularity

Boopathy, Akhilan and Jiang, Sunshine and Yue, William and Hwang, Jaedong and Iyer, Abhiram and Fiete, Ila. Breaking neural network scaling laws with modularity. arXiv [cs.LG]
[46]

Don't cut corners: Exact conditions for modularity in biologically inspired representations

Dorrell, Will and Hsu, Kyle and Hollingsworth, Luke and Lee, Jin Hwa and Wu, Jiajun and Finn, Chelsea and Latham, Peter E and Behrens, Tim E J and Whittington, James C R. Don't cut corners: Exact conditions for modularity in biologically inspired representations. arXiv [q-bio.NC]
[47]

Inductive biases of neural network modularity in spatial navigation

Zhang, Ruiyi and Pitkow, Xaq and Angelaki, Dora E. Inductive biases of neural network modularity in spatial navigation. Sci. Adv
[48]

Nature communications , volume=

A critique of pure learning and what artificial neural networks can learn from animal brains , author=. Nature communications , volume=. 2019 , publisher=

2019
[49]

Proceedings of the National Academy of Sciences , volume=

Encoding innate ability through a genomic bottleneck , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

2024
[50]

HyperNetworks

Ha, David and Dai, Andrew and Le, Quoc V. HyperNetworks. arXiv [cs.LG]
[51]

Programmed and self-organized flow of information during morphogenesis

Collinet, Claudio and Lecuit, Thomas. Programmed and self-organized flow of information during morphogenesis. Nat. Rev. Mol. Cell Biol
[52]

The Genomic Code: the genome instantiates a generative model of the organism

Mitchell, Kevin J and Cheney, Nick. The Genomic Code: the genome instantiates a generative model of the organism. Trends Genet
[53]

Genetic programming and evolvable machines , volume=

Compositional pattern producing networks: A novel abstraction of development , author=. Genetic programming and evolvable machines , volume=. 2007 , publisher=

2007
[54]

ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning , author=. ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

2024
[55]

arXiv preprint arXiv:2406.09020 , year=

Meta-Learning an Evolvable Developmental Encoding , author=. arXiv preprint arXiv:2406.09020 , year=

work page arXiv
[56]

A tale of two algorithms: Structured slots explain prefrontal sequence memory and are unified with hippocampal cognitive maps

Whittington, James C R and Dorrell, William and Behrens, Timothy E J and Ganguli, Surya and El-Gaby, Mohamady. A tale of two algorithms: Structured slots explain prefrontal sequence memory and are unified with hippocampal cognitive maps. Neuron
[57]

Minimum Description Length recurrent neural networks

Lan, Nur and Geyer, Michal and Chemla, Emmanuel and Katzir, Roni. Minimum Description Length recurrent neural networks. arXiv [cs.CL]
[58]

Artificial Intelligence Review , volume=

A brief review of hypernetworks in deep learning , author=. Artificial Intelligence Review , volume=. 2024 , publisher=

2024
[59]

Artificial life , volume=

An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons , author=. Artificial life , volume=. 2012 , publisher=

2012
[60]

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Saxe, Andrew M and Sodhani, Shagun and Lewallen, Sam. The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. arXiv [cs.LG]
[61]

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

Malach, Eran and Yehudai, Gilad and Shalev-Schwartz, Shai and Shamir, Ohad. Proving the Lottery Ticket Hypothesis: Pruning is All You Need. International Conference on Machine Learning
[62]

Failures of gradient-based Deep Learning

Shalev-Shwartz, Shai and Shamir, Ohad and Shammah, Shaked. Failures of gradient-based Deep Learning. arXiv [cs.LG]
[63]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Frankle, Jonathan and Carbin, Michael. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv [cs.LG]
[64]

Multiplicative interactions and where to find them

Jayakumar, Siddhant M and Menick, Jacob and Czarnecki, Wojciech M and Schwarz, Jonathan and Rae, Jack W and Osindero, Simon and Teh, Y and Harley, Tim and Pascanu, Razvan. Multiplicative interactions and where to find them. Int Conf Learn Represent
[65]

The genesis and evolution of homeobox gene clusters

Garcia-Fernàndez, Jordi. The genesis and evolution of homeobox gene clusters. Nat. Rev. Genet
[66]

Designing neural networks through neuroevolution

Stanley, Kenneth O and Clune, Jeff and Lehman, Joel and Miikkulainen, Risto. Designing neural networks through neuroevolution. Nature Machine Intelligence
[67]

Complex computation from developmental priors

Barabási, Dániel L and Beynon, Taliesin and Katona, \'A dam and Perez-Nieves, Nicolas. Complex computation from developmental priors. Nat. Commun
[68]

2018 , eprint=

From Nodes to Networks: Evolving Recurrent Neural Networks , author=. 2018 , eprint=

2018
[69]

Self-Tuning Networks: Bilevel optimization of hyperparameters using structured best-response functions

MacKay, Matthew and Vicol, Paul and Lorraine, Jon and Duvenaud, David and Grosse, Roger. Self-Tuning Networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv [cs.LG]
[70]

Modular growth of hierarchical networks: Efficient, general, and robust curriculum learning

Hamidi, Mani and Khajehabdollahi, Sina and Giannakakis, Emmanouil and Schäfer, Tim J and Levina, Anna and Wu, Charley M. Modular growth of hierarchical networks: Efficient, general, and robust curriculum learning. The 2024 Conference on Artificial Life

2024
[71]

Stochastic Hyperparameter Optimization through Hypernetworks

Lorraine, Jonathan and Duvenaud, David. Stochastic Hyperparameter Optimization through Hypernetworks. arXiv [cs.LG]
[72]

Proceedings of the Royal Society b: Biological sciences , volume=

The evolutionary origins of modularity , author=. Proceedings of the Royal Society b: Biological sciences , volume=. 2013 , publisher=

2013
[73]

Fourier features let networks learn high frequency functions in low dimensional domains

Tancik, Matthew and Srinivasan, Pratul P and Mildenhall, Ben and Fridovich-Keil, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan T and Ng, Ren. Fourier features let networks learn high frequency functions in low dimensional domains. arXiv [cs.CV]
[74]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[75]

DARTS: Differentiable Architecture Search

Darts: Differentiable architecture search , author=. arXiv preprint arXiv:1806.09055 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

Advances in Neural Information Processing Systems , volume=

Meta architecture search , author=. Advances in Neural Information Processing Systems , volume=
[77]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Meta-learning of neural architectures for few-shot learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[78]

arXiv:2110.09165 [cond-mat, physics:nlin, q-bio] , author =

A reservoir of timescales in random neural networks , url =. arXiv:2110.09165 [cond-mat, physics:nlin, q-bio] , author =. 2021 , note =

work page arXiv 2021
[79]

Training Compute-Optimal Large Language Models

Training compute-optimal large language models , author=. arXiv preprint arXiv:2203.15556 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[80]

, author=

Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=

Showing first 80 references.

[1] [1]

Duplication of modules facilitates the evolution of functional specialization

Calabretta, R and Nolfi, S and Parisi, D and Wagner, G P. Duplication of modules facilitates the evolution of functional specialization. Artif. Life

[2] [2]

2013 , publisher=

From DNA to diversity: molecular genetics and the evolution of animal design , author=. 2013 , publisher=

2013

[3] [3]

Teacher-student compression with generative adversarial networks

Liu, Ruishan and Fusi, Nicolo and Mackey, Lester. Teacher-student compression with generative adversarial networks. arXiv [cs.LG]

[4] [4]

Teacher-class network: A neural network compression mechanism

Malik, Shaiq Munir and Haider, Muhammad Umair and Tharani, Mohbat and Rasheed, Musab and Taj, Murtaza. Teacher-class network: A neural network compression mechanism. arXiv [cs.LG]

[5] [5]

Superposition of many models into one

Cheung, Brian and Terekhov, Alex and Chen, Yubei and Agrawal, Pulkit and Olshausen, Bruno. Superposition of many models into one. arXiv:1902.05522

work page internal anchor Pith review Pith/arXiv arXiv 1902

[6] [6]

Multi-Rate VAE : Train Once, Get the Full Rate-Distortion Curve

Bae, Juhan and Zhang, Michael R and Ruan, Michael and Wang, Eric and Hasegawa, So and Ba, Jimmy and Grosse, Roger. Multi-Rate VAE : Train Once, Get the Full Rate-Distortion Curve. arXiv [cs.LG]

[7] [7]

arXiv preprint arXiv:2502.20237 , year=

Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks , author=. arXiv preprint arXiv:2502.20237 , year=

work page arXiv

[8] [8]

Advances in neural information processing systems , volume=

Generalization in reinforcement learning with selective noise injection and information bottleneck , author=. Advances in neural information processing systems , volume=

[9] [9]

ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

Meta-Learning an Evolvable Developmental Encoding , author=. ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

2024

[10] [10]

Proceedings of the Genetic and Evolutionary Computation Conference , pages=

Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents , author=. Proceedings of the Genetic and Evolutionary Computation Conference , pages=

[11] [11]

Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages=

Growing artificial neural networks for control: the role of neuronal diversity , author=. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages=

[12] [12]

Artificial Life Conference Proceedings 35 , volume=

Towards self-assembling artificial neural networks through neural developmental programs , author=. Artificial Life Conference Proceedings 35 , volume=. 2023 , organization=

2023

[13] [13]

2023 , month =

Najarro, Elias and Sudhakaran, Shyam and Risi, Sebastian , title = ". 2023 , month =. doi:10.1162/isal_a_00697 , url =

work page doi:10.1162/isal_a_00697 2023

[14] [14]

Recent advances in physical reservoir computing: A review

Tanaka, Gouhei and Yamane, Toshiyuki and Héroux, Jean Benoit and Nakane, Ryosho and Kanazawa, Naoki and Takeda, Seiji and Numata, Hidetoshi and Nakano, Daiju and Hirose, Akira. Recent advances in physical reservoir computing: A review. Neural Netw

[15] [15]

Development , volume=

Understanding axon guidance: are we nearly there yet? , author=. Development , volume=. 2018 , publisher=

2018

[16] [16]

On the existence of information bottlenecks in living and non-living systems

Crosscombe, Michael and Sato, Hiroki. On the existence of information bottlenecks in living and non-living systems. The 2023 Conference on Artificial Life

2023

[17] [17]

The basal ganglia over 500 million years

Grillner, Sten and Robertson, Brita. The basal ganglia over 500 million years. Curr. Biol

[18] [18]

Evolutionary conservation of the basal ganglia as a common vertebrate mechanism for action selection

Stephenson-Jones, Marcus and Samuelsson, Ebba and Ericsson, Jesper and Robertson, Brita and Grillner, Sten. Evolutionary conservation of the basal ganglia as a common vertebrate mechanism for action selection. Curr. Biol

[19] [19]

Resynthesizing behavior through phylogenetic refinement

Cisek, Paul. Resynthesizing behavior through phylogenetic refinement. Atten. Percept. Psychophys

[20] [20]

Meta-learning by the Baldwin effect

Fernando, Chrisantha Thomas and Sygnowski, Jakub and Osindero, Simon and Wang, Jane and Schaul, Tom and Teplyashin, Denis and Sprechmann, Pablo and Pritzel, Alexander and Rusu, Andrei A. Meta-learning by the Baldwin effect. arXiv [cs.NE]

[21] [21]

Novelty and imitation within the brain: a Darwinian neurodynamic approach to combinatorial problems

Czégel, Dániel and Giaffar, Hamza and Csillag, Márton and Futó, Bálint and Szathmáry, Eörs. Novelty and imitation within the brain: a Darwinian neurodynamic approach to combinatorial problems. Sci. Rep

[22] [22]

bioRxiv , year=

A cortical information bottleneck during decision-making , author=. bioRxiv , year=

[23] [23]

Elife , volume=

Circuits for integrating learned and innate valences in the insect brain , author=. Elife , volume=. 2021 , publisher=

2021

[24] [24]

Computer science review , volume=

Reservoir computing approaches to recurrent neural network training , author=. Computer science review , volume=. 2009 , publisher=

2009

[25] [25]

Deep Reservoir Computing

Gallicchio, Claudio and Micheli, Alessio. Deep Reservoir Computing. Reservoir Computing: Theory, Physical Implementations, and Applications. 2021. doi:10.1007/978-981-13-1687-6_4

work page doi:10.1007/978-981-13-1687-6_4 2021

[26] [26]

A role for relaxed selection in the evolution of the language capacity

Deacon, Terrence W. A role for relaxed selection in the evolution of the language capacity. Proc. Natl. Acad. Sci. U. S. A

[27] [27]

Science Advances , volume=

Inductive biases of neural network modularity in spatial navigation , author=. Science Advances , volume=. 2024 , publisher=

2024

[28] [28]

Curiosity driven exploration of learned disentangled goal spaces

Laversanne-Finot, Adrien and Péré, Alexandre and Oudeyer, Pierre-Yves. Curiosity driven exploration of learned disentangled goal spaces. arXiv [cs.LG]

[29] [29]

Representation learning in deep RL via discrete information bottleneck

Islam, Riashat and Zang, Hongyu and Tomar, Manan and Didolkar, Aniket and Islam, Md Mofijul and Arnob, Samin Yeasar and Iqbal, Tariq and Li, Xin and Goyal, Anirudh and Heess, Nicolas and Lamb, Alex. Representation learning in deep RL via discrete information bottleneck. arXiv [cs.LG]

[30] [30]

Annual review of biochemistry , volume=

Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints , author=. Annual review of biochemistry , volume=. 2004 , publisher=

2004

[31] [31]

BioSystems , volume=

Error-correcting codes and information in biology , author=. BioSystems , volume=. 2019 , publisher=

2019

[32] [32]

arXiv preprint arXiv:2001.08028 , year=

Natural selection finds natural gradient , author=. arXiv preprint arXiv:2001.08028 , year=

work page arXiv 2001

[33] [33]

Evolutionary Optimization of Model Merging Recipes

Akiba, Takuya and Shing, Makoto and Tang, Yujin and Sun, Qi and Ha, David. Evolutionary Optimization of Model Merging Recipes. arXiv [cs.NE]

[34] [34]

Nature Machine Intelligence , volume=

Designing neural networks through neuroevolution , author=. Nature Machine Intelligence , volume=. 2019 , publisher=

2019

[35] [35]

Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration

Kim, Jaekyeom and Kim, Minjung and Woo, Dongyeon and Kim, Gunhee. Drop-bottleneck: Learning discrete compressed representation for noise-robust exploration. arXiv [cs.LG]

[36] [36]

Measuring compositionality in representation learning

Andreas, Jacob. Measuring compositionality in representation learning. Int Conf Learn Represent

[37] [37]

Journal of Petroleum Science and Engineering , volume=

A fast and independent architecture of artificial neural network for permeability prediction , author=. Journal of Petroleum Science and Engineering , volume=. 2012 , publisher=

2012

[38] [38]

Network Neuroscience , volume=

Optimal modularity and memory capacity of neural reservoirs , author=. Network Neuroscience , volume=. 2019 , publisher=

2019

[39] [39]

Journal of Comparative Neurology , volume=

The modular organization of the cerebral cortex: Evolutionary significance and possible links to neurodevelopmental conditions , author=. Journal of Comparative Neurology , volume=. 2019 , publisher=

2019

[40] [40]

Proceedings of the National Academy of Sciences , volume=

The modular and integrative functional architecture of the human brain , author=. Proceedings of the National Academy of Sciences , volume=. 2015 , publisher=

2015

[41] [41]

Neural networks , volume=

Design and evolution of modular neural network architectures , author=. Neural networks , volume=. 1994 , publisher=

1994

[42] [42]

The Twelfth International Conference on Learning Representations , year=

Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks , author=. The Twelfth International Conference on Learning Representations , year=

[43] [43]

Frontiers in neuroscience , volume=

Modular and hierarchically modular organization of brain networks , author=. Frontiers in neuroscience , volume=. 2010 , publisher=

2010

[44] [44]

2023 , eprint=

Principled Weight Initialization for Hypernetworks , author=. 2023 , eprint=

2023

[45] [45]

Breaking neural network scaling laws with modularity

Boopathy, Akhilan and Jiang, Sunshine and Yue, William and Hwang, Jaedong and Iyer, Abhiram and Fiete, Ila. Breaking neural network scaling laws with modularity. arXiv [cs.LG]

[46] [46]

Don't cut corners: Exact conditions for modularity in biologically inspired representations

Dorrell, Will and Hsu, Kyle and Hollingsworth, Luke and Lee, Jin Hwa and Wu, Jiajun and Finn, Chelsea and Latham, Peter E and Behrens, Tim E J and Whittington, James C R. Don't cut corners: Exact conditions for modularity in biologically inspired representations. arXiv [q-bio.NC]

[47] [47]

Inductive biases of neural network modularity in spatial navigation

Zhang, Ruiyi and Pitkow, Xaq and Angelaki, Dora E. Inductive biases of neural network modularity in spatial navigation. Sci. Adv

[48] [48]

Nature communications , volume=

A critique of pure learning and what artificial neural networks can learn from animal brains , author=. Nature communications , volume=. 2019 , publisher=

2019

[49] [49]

Proceedings of the National Academy of Sciences , volume=

Encoding innate ability through a genomic bottleneck , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

2024

[50] [50]

HyperNetworks

Ha, David and Dai, Andrew and Le, Quoc V. HyperNetworks. arXiv [cs.LG]

[51] [51]

Programmed and self-organized flow of information during morphogenesis

Collinet, Claudio and Lecuit, Thomas. Programmed and self-organized flow of information during morphogenesis. Nat. Rev. Mol. Cell Biol

[52] [52]

The Genomic Code: the genome instantiates a generative model of the organism

Mitchell, Kevin J and Cheney, Nick. The Genomic Code: the genome instantiates a generative model of the organism. Trends Genet

[53] [53]

Genetic programming and evolvable machines , volume=

Compositional pattern producing networks: A novel abstraction of development , author=. Genetic programming and evolvable machines , volume=. 2007 , publisher=

2007

[54] [54]

ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

Evolving Self-Assembling Neural Networks: From Spontaneous Activity to Experience-Dependent Learning , author=. ALIFE 2024: Proceedings of the 2024 Artificial Life Conference , year=

2024

[55] [55]

arXiv preprint arXiv:2406.09020 , year=

Meta-Learning an Evolvable Developmental Encoding , author=. arXiv preprint arXiv:2406.09020 , year=

work page arXiv

[56] [56]

A tale of two algorithms: Structured slots explain prefrontal sequence memory and are unified with hippocampal cognitive maps

Whittington, James C R and Dorrell, William and Behrens, Timothy E J and Ganguli, Surya and El-Gaby, Mohamady. A tale of two algorithms: Structured slots explain prefrontal sequence memory and are unified with hippocampal cognitive maps. Neuron

[57] [57]

Minimum Description Length recurrent neural networks

Lan, Nur and Geyer, Michal and Chemla, Emmanuel and Katzir, Roni. Minimum Description Length recurrent neural networks. arXiv [cs.CL]

[58] [58]

Artificial Intelligence Review , volume=

A brief review of hypernetworks in deep learning , author=. Artificial Intelligence Review , volume=. 2024 , publisher=

2024

[59] [59]

Artificial life , volume=

An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons , author=. Artificial life , volume=. 2012 , publisher=

2012

[60] [60]

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Saxe, Andrew M and Sodhani, Shagun and Lewallen, Sam. The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. arXiv [cs.LG]

[61] [61]

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

Malach, Eran and Yehudai, Gilad and Shalev-Schwartz, Shai and Shamir, Ohad. Proving the Lottery Ticket Hypothesis: Pruning is All You Need. International Conference on Machine Learning

[62] [62]

Failures of gradient-based Deep Learning

Shalev-Shwartz, Shai and Shamir, Ohad and Shammah, Shaked. Failures of gradient-based Deep Learning. arXiv [cs.LG]

[63] [63]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Frankle, Jonathan and Carbin, Michael. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv [cs.LG]

[64] [64]

Multiplicative interactions and where to find them

Jayakumar, Siddhant M and Menick, Jacob and Czarnecki, Wojciech M and Schwarz, Jonathan and Rae, Jack W and Osindero, Simon and Teh, Y and Harley, Tim and Pascanu, Razvan. Multiplicative interactions and where to find them. Int Conf Learn Represent

[65] [65]

The genesis and evolution of homeobox gene clusters

Garcia-Fernàndez, Jordi. The genesis and evolution of homeobox gene clusters. Nat. Rev. Genet

[66] [66]

Designing neural networks through neuroevolution

Stanley, Kenneth O and Clune, Jeff and Lehman, Joel and Miikkulainen, Risto. Designing neural networks through neuroevolution. Nature Machine Intelligence

[67] [67]

Complex computation from developmental priors

Barabási, Dániel L and Beynon, Taliesin and Katona, \'A dam and Perez-Nieves, Nicolas. Complex computation from developmental priors. Nat. Commun

[68] [68]

2018 , eprint=

From Nodes to Networks: Evolving Recurrent Neural Networks , author=. 2018 , eprint=

2018

[69] [69]

Self-Tuning Networks: Bilevel optimization of hyperparameters using structured best-response functions

MacKay, Matthew and Vicol, Paul and Lorraine, Jon and Duvenaud, David and Grosse, Roger. Self-Tuning Networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv [cs.LG]

[70] [70]

Modular growth of hierarchical networks: Efficient, general, and robust curriculum learning

Hamidi, Mani and Khajehabdollahi, Sina and Giannakakis, Emmanouil and Schäfer, Tim J and Levina, Anna and Wu, Charley M. Modular growth of hierarchical networks: Efficient, general, and robust curriculum learning. The 2024 Conference on Artificial Life

2024

[71] [71]

Stochastic Hyperparameter Optimization through Hypernetworks

Lorraine, Jonathan and Duvenaud, David. Stochastic Hyperparameter Optimization through Hypernetworks. arXiv [cs.LG]

[72] [72]

Proceedings of the Royal Society b: Biological sciences , volume=

The evolutionary origins of modularity , author=. Proceedings of the Royal Society b: Biological sciences , volume=. 2013 , publisher=

2013

[73] [73]

Fourier features let networks learn high frequency functions in low dimensional domains

Tancik, Matthew and Srinivasan, Pratul P and Mildenhall, Ben and Fridovich-Keil, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan T and Ng, Ren. Fourier features let networks learn high frequency functions in low dimensional domains. arXiv [cs.CV]

[74] [74]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[75] [75]

DARTS: Differentiable Architecture Search

Darts: Differentiable architecture search , author=. arXiv preprint arXiv:1806.09055 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[76] [76]

Advances in Neural Information Processing Systems , volume=

Meta architecture search , author=. Advances in Neural Information Processing Systems , volume=

[77] [77]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Meta-learning of neural architectures for few-shot learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[78] [78]

arXiv:2110.09165 [cond-mat, physics:nlin, q-bio] , author =

A reservoir of timescales in random neural networks , url =. arXiv:2110.09165 [cond-mat, physics:nlin, q-bio] , author =. 2021 , note =

work page arXiv 2021

[79] [79]

Training Compute-Optimal Large Language Models

Training compute-optimal large language models , author=. arXiv preprint arXiv:2203.15556 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[80] [80]

, author=

Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=