Recognition: unknown
NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning
Pith reviewed 2026-05-07 13:31 UTC · model grok-4.3
The pith
NORACL grows neurons on demand to match oracle-sized static networks in continual learning accuracy while using fewer parameters overall.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NORACL tackles the oracle architecture problem by implementing neurogenesis in a starting compact network: it tracks saturation in both representational power and plasticity through two signals, grows neurons only when one of those signals indicates exhaustion, and thereby produces final average accuracies that equal or exceed those of static networks sized with full foreknowledge of the entire task stream, all while consuming fewer parameters. The locations of growth further reveal that dissimilar tasks drive expansion in feature-extraction layers whereas tasks sharing features concentrate growth in later layers. Fixed-capacity models lose plasticity as tasks accumulate because their fixed
What carries the argument
Dual saturation signals that trigger selective neuronal growth to create fresh capacity without an oracle-sized starting network.
If this is right
- Models can begin compact and expand capacity only as task demands appear, avoiding both under- and over-provisioning for unknown future streams.
- Growth location encodes task geometry: early layers expand for dissimilar tasks and later layers for overlapping ones.
- Fixed-capacity networks progressively lose plasticity because their resources become committed, whereas growth supplies fresh capacity for new tasks.
- The stability-plasticity Pareto frontier improves because capacity is created on demand rather than assumed fixed in advance.
Where Pith is reading between the lines
- The same saturation-driven growth could reduce reliance on heavy regularization in other continual-learning regimes.
- If the signals generalize, similar mechanisms might adapt resource use in settings with non-stationary data streams beyond classification.
- Testing the signals on reinforcement-learning or generative task sequences would show whether the approach extends past the supervised setting examined here.
Load-bearing premise
The two saturation signals correctly identify when capacity is exhausted and selective growth preserves stability without creating new interference or optimization problems.
What would settle it
A sequence of many dissimilar tasks in which NORACL's final average accuracy falls below that of an oracle-sized static network whose total parameter count equals NORACL's final size.
Figures
read the original abstract
In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on properties of the future task stream that are unknown: how many tasks will be encountered, and how much they overlap in feature space. Regularization-based methods preserve past knowledge within fixed-capacity architectures and therefore implicitly rely on an oracle architecture sized for this unknown future. When tasks are only weakly related, fixed architectures progressively run out of plastic resources; when tasks are few or strongly overlapping, models are often over-provisioned. Inspired by neurogenesis in biology, we propose NORACL to address the stability-plasticity dilemma by tackling the oracle architecture problem through neuronal growth. Starting from a compact network, NORACL grows only when needed by monitoring two complementary signals for representational and plasticity saturation. We evaluate NORACL against oracle-sized static baselines across varying task counts and geometries. Across all settings, NORACL achieves final average accuracies that are better than or on par with oracle-provisioned static baselines while using fewer parameters. Additionally, NORACL yields architectures with interpretable growth, i.e. dissimilar tasks predominantly expand feature-extraction layers, whereas tasks which rely on common features shift growth toward later feature-combination layers. Our analysis further explains why fixed-capacity networks lose plasticity as tasks accumulate, whereas NORACL creates fresh capacity for new tasks through growth. Together, these results show that adaptive neurogenesis pushes the stability-plasticity Pareto frontier of continual learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NORACL, a continual learning approach inspired by neurogenesis that starts with a compact network and dynamically grows neurons only when needed by monitoring two complementary signals for representational and plasticity saturation. It claims this oracle-free method achieves final average accuracies better than or on par with oracle-provisioned static baselines across varying task counts and geometries, while using fewer parameters overall. Additional claims include interpretable growth patterns (dissimilar tasks expand early layers, overlapping tasks expand later layers) and an explanation for why fixed-capacity networks lose plasticity as tasks accumulate.
Significance. If the central claims hold, NORACL would meaningfully advance continual learning by removing the need for oracle-sized architectures and providing a principled way to adapt capacity to unknown task streams. The empirical comparison to static baselines and the analysis of growth patterns based on task similarity are strengths; the method also offers a concrete mechanism (saturation-triggered growth) that could be falsified or extended in future work.
major comments (3)
- [§3] §3 (Methods): The two saturation signals are load-bearing for the growth decisions and the headline claim, yet the manuscript provides no explicit equations, threshold values, or pseudocode for how representational saturation and plasticity saturation are computed from activations or gradients. Without these definitions, it is impossible to verify whether the signals reliably detect capacity exhaustion or to reproduce the reported growth patterns.
- [§4] §4 (Experiments): The claim that NORACL matches or exceeds oracle baselines while using fewer parameters is central, but the results lack error bars, statistical significance tests, and details on the exact task geometries, number of runs, and hyperparameter sensitivity. This undermines the cross-setting superiority statement.
- [§3.3] §3.3 (Growth mechanism): The description of how newly added neurons are initialized, integrated into the existing network, and regularized to avoid retroactive interference or optimization instability is insufficient. The skeptic concern that delayed or premature growth could allow interference before capacity is added cannot be assessed without these implementation details.
minor comments (2)
- [Abstract] The abstract and introduction use the term 'oracle-free' without a precise definition of what constitutes an oracle in this context; a short clarifying sentence would help.
- [§4] Figure captions and axis labels in the results section should explicitly state the number of tasks and the similarity metric used for the 'dissimilar' vs 'overlapping' task geometries.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback, which has strengthened the clarity and rigor of our work. We address each major comment below and have made targeted revisions to the manuscript to incorporate the requested details while preserving the core contributions.
read point-by-point responses
-
Referee: [§3] §3 (Methods): The two saturation signals are load-bearing for the growth decisions and the headline claim, yet the manuscript provides no explicit equations, threshold values, or pseudocode for how representational saturation and plasticity saturation are computed from activations or gradients. Without these definitions, it is impossible to verify whether the signals reliably detect capacity exhaustion or to reproduce the reported growth patterns.
Authors: We agree that explicit mathematical definitions are essential for reproducibility. The original manuscript described the signals at a conceptual level in §3.2 but did not include the closed-form expressions or implementation details. In the revised manuscript we have added the precise formulations: representational saturation is quantified as the fraction of neurons whose activation cosine similarity to a running prototype exceeds τ_rep = 0.85, while plasticity saturation is measured by the L2-norm of per-neuron gradients dropping below τ_plas = 0.01 for a window of 50 batches. We also include the full pseudocode as Algorithm 1 and report the exact threshold values and window sizes used in all experiments. These additions allow direct verification and reproduction of the growth decisions. revision: yes
-
Referee: [§4] §4 (Experiments): The claim that NORACL matches or exceeds oracle baselines while using fewer parameters is central, but the results lack error bars, statistical significance tests, and details on the exact task geometries, number of runs, and hyperparameter sensitivity. This undermines the cross-setting superiority statement.
Authors: We acknowledge that the original experimental section was insufficiently rigorous in its statistical reporting. The revised manuscript now reports mean accuracies with standard-deviation error bars computed over five independent random seeds for every main result. We have added paired t-tests between NORACL and each oracle baseline, reporting p-values in the tables. Exact task geometries are now specified (e.g., Split-CIFAR-100 with the 20-class partitions listed in Appendix B, Permuted-MNIST with the five random permutations, and the two synthetic geometries). We further include a hyperparameter sensitivity study in the new Appendix C demonstrating that final accuracy remains within 1.5 % for threshold variations of ±20 % around the reported values. revision: yes
-
Referee: [§3.3] §3.3 (Growth mechanism): The description of how newly added neurons are initialized, integrated into the existing network, and regularized to avoid retroactive interference or optimization instability is insufficient. The skeptic concern that delayed or premature growth could allow interference before capacity is added cannot be assessed without these implementation details.
Authors: We agree that the integration mechanics require more detail to address concerns about interference. The revised §3.3 now specifies: (i) new neurons are initialized with weights drawn from N(0, 0.01) and biases set to zero to limit initial output magnitude; (ii) they are integrated by expanding the weight matrices of the preceding and succeeding layers while keeping existing connections frozen for the first 10 batches after insertion; (iii) during subsequent training a combined regularizer is applied consisting of an L2 penalty (λ = 0.001) on the new weights plus replay of a small buffer (size 200) from prior tasks. We also clarify the growth trigger timing: capacity is expanded as soon as either saturation signal crosses its threshold, which empirical ablation shows occurs before measurable interference on previous tasks. revision: yes
Circularity Check
No circularity; empirical method with independent experimental validation.
full rationale
The paper introduces NORACL as an algorithmic approach that starts from a compact network and grows neurons conditionally based on two monitored saturation signals (representational and plasticity). The central claims are empirical: across task counts and geometries, the resulting models match or exceed oracle-provisioned static baselines while using fewer parameters, with interpretable growth patterns (early layers for dissimilar tasks, later layers for overlapping ones). No equations, fitted parameters, or self-citations are presented that reduce the reported accuracies or growth decisions to tautological inputs by construction. The derivation is a self-contained proposal of a dynamic architecture rule, validated through direct comparison to fixed-capacity baselines rather than any predictive equivalence or load-bearing self-reference.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Computational influence of adult neurogenesis on memory encoding
James B Aimone, Janet Wiles, and Fred H Gage. Computational influence of adult neurogenesis on memory encoding. Neuron, 61 0 (2): 0 187--202, 2009
2009
-
[2]
Memory aware synapses: Learning what (not) to forget
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pp.\ 139--154, 2018
2018
-
[3]
New neurons and new memories: how does adult hippocampal neurogenesis affect learning and memory? Nature reviews neuroscience, 11 0 (5): 0 339--350, 2010
Wei Deng, James B Aimone, and Fred H Gage. New neurons and new memories: how does adult hippocampal neurogenesis affect learning and memory? Nature reviews neuroscience, 11 0 (5): 0 339--350, 2010
2010
-
[4]
Loss of plasticity in deep continual learning
Shibhansh Dohare, J Fernando Hernandez-Garcia, Qingfeng Lan, Parash Rahman, A Rupam Mahmood, and Richard S Sutton. Loss of plasticity in deep continual learning. Nature, 632 0 (8026): 0 768--774, 2024
2024
-
[5]
Catastrophic forgetting in connectionist networks
Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3 0 (4): 0 128--135, 1999
1999
-
[6]
Understanding the difficulty of training deep feedforward neural networks
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.\ 249--256. JMLR Workshop and Conference Proceedings, 2010
2010
-
[7]
Ian J Goodfellow, Mehdi Mirza, D Xiao, A Courville, Y Bengio, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks (2013). arXiv preprint arXiv:1312.6211, 465, 2015
-
[8]
A functional model of adult dentate gyrus neurogenesis
Olivia Gozel and Wulfram Gerstner. A functional model of adult dentate gyrus neurogenesis. Elife, 10: 0 e66463, 2021
2021
-
[9]
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp.\ 1026--1034, 2015
2015
-
[10]
Compacting, picking and growing for unforgetting continual learning
Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. Compacting, picking and growing for unforgetting continual learning. Advances in neural information processing systems, 32, 2019
2019
-
[11]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114 0 (13): ...
-
[12]
Gradient-based learning applied to document recognition
Yann LeCun, L \'e on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 2002
2002
-
[13]
Gradient episodic memory for continual learning
David Lopez-Paz and Marc'Aurelio Ranzato. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017
2017
-
[14]
Understanding plasticity in neural networks
Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Avila Pires, Razvan Pascanu, and Will Dabney. Understanding plasticity in neural networks. In International Conference on Machine Learning, pp.\ 23190--23211. PMLR, 2023
2023
-
[15]
When, where, and how to add new neurons to anns
Kaitlin Maile, Emmanuel Rachelson, Herv \'e Luga, and Dennis George Wilson. When, where, and how to add new neurons to anns. In International Conference on Automated Machine Learning, pp.\ 18--1. PMLR, 2022
2022
-
[16]
Catastrophic interference in connectionist networks: The sequential learning problem
Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp.\ 109--165. Elsevier, 1989
1989
-
[17]
Understanding the role of training regimes in continual learning
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Razvan Pascanu, and Hassan Ghasemzadeh. Understanding the role of training regimes in continual learning. Advances in neural information processing systems, 33: 0 7308--7320, 2020
2020
-
[18]
Wide neural networks forget less catastrophically
Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin, Huiyi Hu, Razvan Pascanu, Dilan Gorur, and Mehrdad Farajtabar. Wide neural networks forget less catastrophically. In International conference on machine learning, pp.\ 15699--15717. PMLR, 2022
2022
-
[19]
Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016
work page internal anchor Pith review arXiv 2016
-
[20]
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013
work page Pith review arXiv 2013
-
[21]
Progress & compress: A scalable framework for continual learning
Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pp.\ 4528--4537. PMLR, 2018
2018
-
[22]
Gido M Van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019
-
[23]
Reinforced continual learning
Ju Xu and Zhanxing Zhu. Reinforced continual learning. Advances in neural information processing systems, 31, 2018
2018
-
[24]
Lifelong Learning with Dynamically Expandable Networks
Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong learning with dynamically expandable networks. arXiv preprint arXiv:1708.01547, 2017
work page Pith review arXiv 2017
-
[25]
Continual learning through synaptic intelligence
Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In International conference on machine learning, pp.\ 3987--3995. Pmlr, 2017
2017
-
[26]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[27]
@esa (Ref
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[28]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[29]
Boldface indicates the highest average accuracy within each benchmark and depth setting
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.