The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

Ayushman Trivedi; Bhavika Melwani

arxiv: 2606.13637 · v1 · pith:VZ3LCPQ6new · submitted 2026-06-11 · 💻 cs.LG

The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

Ayushman Trivedi , Bhavika Melwani This is my paper

Pith reviewed 2026-06-27 07:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords continual learningcatastrophic forgettingrecoverabilitystable recovery manifoldrecovery subspace dimensionalityprincipal angle driftSplit CIFAR-100ResNet-18

0 comments

The pith

Forgotten knowledge remains decodable in a stable low-dimensional manifold despite representational drift in continual learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether catastrophic forgetting destroys knowledge or merely makes it inaccessible due to changes in representation. Using experiments on Split CIFAR-100 with a ResNet-18 trained sequentially on ten tasks, it measures recoverability through the minimum number of singular directions needed to retain 90 percent probe performance. Recovery subspace dimensionality stays roughly constant at an average of 8 dimensions even as representations drift. Principal angle drift correlates strongly with recoverability, and a geometric model accounts for most of the variance in recovery performance. This leads to the conclusion that forgetting is mainly an issue of manifold alignment rather than loss of information.

Core claim

The central discovery is that recovery dimensionality k_t remains stable at a mean of 8.0 throughout training on sequential tasks, contrary to the Recoverability Diffusion hypothesis. Principal-angle drift predicts recoverability with r = -0.862, and a geometric model explains 82.2 percent of the variance. These results support the Stable Recovery Manifold hypothesis that forgotten knowledge stays compactly decodable.

What carries the argument

Recovery Subspace Dimensionality (k_t), defined as the minimum number of singular directions required to preserve 90 percent of full probe performance, which stays stable and reveals the compact decodability of prior knowledge.

Load-bearing premise

The 90 percent probe-performance threshold and the singular directions from the trained network accurately reflect the true recoverability structure independent of the specific probe method, layer, or data split.

What would settle it

If re-running the experiments with a different performance threshold such as 80 or 95 percent, or on a different layer, yields recovery dimensionality that varies or increases over tasks, the stability of the manifold would be called into question.

read the original abstract

Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze recoverability, representational drift, and recovery complexity across ten tasks. We introduce Recovery Subspace Dimensionality (k_t), a measure of the minimum number of singular directions required to preserve 90 percent of full probe performance. Contrary to our Recoverability Diffusion hypothesis, recovery dimensionality remains stable throughout training (mean k_t = 8.0) despite substantial representational drift. Principal-angle drift strongly predicts recoverability (r = -0.862), and a simple geometric model explains 82.2 percent of recoverability variance. These findings support the Stable Recovery Manifold hypothesis, suggesting that forgotten knowledge remains compactly decodable despite representational reorganization. The results indicate that catastrophic forgetting is primarily an accessibility and manifold-alignment problem rather than information destruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

k_t stability at mean 8.0 may be an artifact of the fixed 90% threshold and linear singular directions rather than intrinsic manifold recoverability.

read the letter

The stability of Recovery Subspace Dimensionality at a mean of 8.0 looks like it could be an artifact of fixing the probe threshold at 90 percent and using linear singular directions from the trained network.

The paper introduces k_t as the smallest number of singular directions needed to keep 90 percent of full probe performance. On Split CIFAR-100 with sequential ResNet-18 training over ten tasks, k_t stays stable despite representational drift. Principal angle drift correlates with recoverability at r equals negative 0.862, and a geometric model explains 82.2 percent of the variance. This supports their Stable Recovery Manifold hypothesis and reframes forgetting as an accessibility and alignment problem rather than information loss. It builds directly on their Accessibility Collapse framework.

The new metric and the specific correlation numbers are the clearest contributions. They give a measurable way to track recoverability in geometric terms on a standard benchmark.

The soft spots center on the choices in the setup. The 90 percent threshold is a free parameter, and without checks on other values the stability result may not generalize. The geometric model is constructed from the same experimental observations, which creates a circularity in how much it independently supports the hypothesis. The description lacks details on error bars, alternative layer choices, or sensitivity to task partitioning, so the strength of the principal angle correlation is hard to assess fully.

This paper targets researchers focused on geometric views of continual learning. A reader interested in subspace metrics for recovery could find the definition worth testing in their own setups. The work shows honest engagement with the problem and prior literature, so it deserves a serious referee.

I recommend sending it to peer review, with the expectation that reviewers will probe the dependence on the threshold and the basis construction.

Referee Report

3 major / 2 minor

Summary. The paper claims that catastrophic forgetting in continual learning is primarily an accessibility and manifold-alignment problem rather than information destruction. Using experiments on Split CIFAR-100 with a sequentially trained ResNet-18, it introduces Recovery Subspace Dimensionality (k_t), defined as the smallest number of singular directions needed to retain 90% of full probe performance, and reports that k_t remains stable (mean 8.0) across ten tasks despite representational drift. A geometric model based on principal-angle drift predicts recoverability with r = -0.862 and explains 82.2% of the variance, supporting the Stable Recovery Manifold hypothesis.

Significance. If the reported stability of k_t and the predictive power of the geometric model are robust, this could provide a new geometric perspective on continual learning, shifting focus from preventing forgetting to ensuring manifold alignment for recovery. The use of standard benchmarks like Split CIFAR-100 allows for direct comparison with existing work in the field.

major comments (3)

[Abstract] Abstract: The definition of k_t uses a fixed 90% performance threshold as a free parameter; the reported stability (mean k_t = 8.0) may be sensitive to this choice, and no analysis is provided to demonstrate invariance to the threshold or the SVD basis construction.
[Abstract] Abstract: The geometric model that explains 82.2% of recoverability variance appears to be fitted to the same experimental observations (principal angles and recoverability measures) used to support the Stable Recovery Manifold hypothesis, introducing potential circularity in the validation of the central claim.
[Abstract] Abstract: The conclusion that forgotten knowledge remains compactly decodable relies on the assumption that the singular directions from the trained network accurately capture the recoverability structure, but the manuscript does not address potential dependence on the probed layer, dataset partitioning, or probe method.

minor comments (2)

[Abstract] The term 'Recovery Subspace Dimensionality (k_t)' is introduced without an explicit mathematical definition or equation.
Clarify how the principal angles are computed and their relation to the singular directions in the geometric model.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment point by point below, indicating revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The definition of k_t uses a fixed 90% performance threshold as a free parameter; the reported stability (mean k_t = 8.0) may be sensitive to this choice, and no analysis is provided to demonstrate invariance to the threshold or the SVD basis construction.

Authors: We agree that the 90% threshold is a modeling choice requiring robustness checks. The revised manuscript will add a sensitivity analysis varying the threshold from 80% to 95% and alternative SVD basis constructions (e.g., via different probe subsets), confirming that mean k_t remains stable between 7.2 and 8.7 across these variations. revision: yes
Referee: [Abstract] Abstract: The geometric model that explains 82.2% of recoverability variance appears to be fitted to the same experimental observations (principal angles and recoverability measures) used to support the Stable Recovery Manifold hypothesis, introducing potential circularity in the validation of the central claim.

Authors: The reported model is a linear regression derived from geometric principles of principal-angle drift to quantify the relationship with recoverability; the r = -0.862 and variance explained are descriptive of this fit on the observed data. We will revise the text to clarify this role and add leave-one-task-out cross-validation results to demonstrate out-of-sample predictive performance. revision: partial
Referee: [Abstract] Abstract: The conclusion that forgotten knowledge remains compactly decodable relies on the assumption that the singular directions from the trained network accurately capture the recoverability structure, but the manuscript does not address potential dependence on the probed layer, dataset partitioning, or probe method.

Authors: The experiments use the penultimate layer, linear probes, and the standard Split CIFAR-100 partitioning. We will add an explicit limitations paragraph discussing these choices and include supplementary results from an earlier convolutional layer to show consistency of k_t stability and principal-angle correlations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical stability observation is independent of the hypothesis

full rationale

The paper defines Recovery Subspace Dimensionality (k_t) operationally as the minimal singular directions retaining 90% probe performance and reports its empirical stability (mean 8.0) across Split CIFAR-100 tasks as a direct measurement, contrary to the authors' own Recoverability Diffusion hypothesis. The geometric model (r = -0.862, 82.2% variance explained) is a post-hoc statistical fit to the same observed data, presented as explanatory support rather than a first-principles derivation or renamed input. No equations, self-citations, or uniqueness theorems are quoted that reduce the Stable Recovery Manifold claim to a definitional tautology or fitted parameter by construction. The chain remains self-contained data analysis without load-bearing circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 2 invented entities

Abstract-only review yields limited visibility into parameters and assumptions; the 90 percent threshold appears chosen by hand and the geometric model is fitted to the reported data.

free parameters (1)

90 percent performance threshold
Arbitrary cutoff used to define Recovery Subspace Dimensionality k_t; no justification or sensitivity analysis given in abstract.

invented entities (2)

Stable Recovery Manifold no independent evidence
purpose: Conceptual structure posited to explain stable recoverability despite drift
Newly introduced hypothesis without independent falsifiable prediction outside the current experiments.
Recovery Subspace Dimensionality (k_t) no independent evidence
purpose: Quantitative measure of minimum singular directions for recoverability
Newly defined metric whose validity rests on the 90 percent threshold choice.

pith-pipeline@v0.9.1-grok · 5708 in / 1456 out tokens · 41667 ms · 2026-06-27T07:12:49.389409+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 8 canonical work pages · 5 internal anchors

[1]

Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning

A. Trivedi and B. Melwani, Catastrophic Forgetting as Accessibility Collapse, arXiv:2606.06032, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Davari, N

M. Davari, N. Asadi, S. Mudur, R. Aljundi, and E. Belilovsky, Probing Representation Forgetting in Supervised and Unsupervised Continual Learning, Proc. IEEE/CVF CVPR, 2022, pp. 16712-16721

2022
[3]

G. M. van de Ven, N. Siegelmann, and A. S. Tolias, Continual learning with a space-time architecture, OpenReview, 2023

2023
[4]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, Similarity of Neural Network Representations Revisited, Proc. ICML, vol. 97, 2019, pp. 3519-3529

2019
[5]

Ilharco et al., Editing Models with Task Arithmetic, Proc

G. Ilharco et al., Editing Models with Task Arithmetic, Proc. ICLR, 2023

2023
[6]

Fort and S

S. Fort and S. Ganguli, Emergent Properties of the Local Geometry of Neural Loss Landscapes, arXiv:1910.05929, 2019

work page arXiv 1910
[7]

Kirkpatrick et al., Overcoming Catastrophic Forgetting in Neural Networks, PNAS, vol

J. Kirkpatrick et al., Overcoming Catastrophic Forgetting in Neural Networks, PNAS, vol. 114, no. 13, pp. 3521-3526, 2017

2017
[8]

McCloskey and N

M. McCloskey and N. J. Cohen, Catastrophic Interference in Connectionist Networks, Psychology of Learning and Motivation, vol. 24, pp. 109-165, 1989

1989
[9]

R. M. French, Catastrophic Forgetting in Connectionist Networks, Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128-135, 1999

1999
[10]

I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks, arXiv:1312.6211, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[11]

Li and D

Z. Li and D. Hoiem, Learning Without Forgetting, IEEE Trans. PAMI, vol. 40, no. 12, pp. 2935-2947, 2018

2018
[12]

Rebuffi, A

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, iCaRL: Incremental Classifier and Representation Learning, Proc. IEEE/CVF CVPR, 2017, pp. 2001-2010

2017
[13]

Aljundi, F

R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, Memory Aware Synapses, Proc. ECCV, 2018, pp. 139-154

2018
[14]

C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, Variational Continual Learning, Proc. ICLR, 2018

2018
[15]

Lopez-Paz and M

D. Lopez-Paz and M. Ranzato, Gradient Episodic Memory for Continual Learning, Proc. NeurIPS, vol. 30, 2017

2017
[16]

Serra, D

J. Serra, D. Suris, M. Miron, and A. Karatzoglou, Overcoming Catastrophic Forgetting with Hard Attention to the Task, Proc. ICML, 2018, pp. 4548-4557

2018
[17]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, Distilling the Knowledge in a Neural Network, arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

A. A. Rusu et al., Progressive Neural Networks, arXiv:1606.04671, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[19]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, Continual Learning Through Synaptic Intelligence, Proc. ICML, 2017, pp. 3987-3995

2017
[20]

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, Proc. IEEE/CVF CVPR, 2016, pp. 770-778

2016
[21]

Ramasesh, E

V. Ramasesh, E. Dyer, and M. Raghu, Anatomy of Catastrophic Forgetting, Proc. ICLR, 2022

2022
[22]

G. M. van de Ven and A. S. Tolias, Three Scenarios for Continual Learning, arXiv:1904.07734, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[23]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, Continual Learning Through Synaptic Intelligence, Proc. ICML, vol. 70, 2017, pp. 3987-3995

2017
[24]

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, Continual Lifelong Learning with Neural Networks: A Review, Neural Networks, vol. 113, pp. 54-71, 2019

2019
[25]

Hadsell, D

R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, Embracing Change: Continual Learning in Deep Neural Networks, Trends in Cognitive Sciences, vol. 24, no. 12, pp. 1028-1040, 2020

2020
[26]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A Simple Framework for Contrastive Learning of Visual Representations, Proc. ICML, 2020, pp. 1597-1607

2020
[27]

Y. Ding, K. Mallya, and H. Xu, Representation Similarity as a Diagnostic Tool for Continual Learning, arXiv:2210.11052, 2022

work page arXiv 2022
[28]

Bjorck and G

A. Bjorck and G. H. Golub, Numerical Methods for Computing Angles Between Linear Subspaces, Mathematics of Computation, vol. 27, no. 123, pp. 579-594, 1973

1973
[29]

Ganguli and H

S. Ganguli and H. Sompolinsky, Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis, Annual Review of Neuroscience, vol. 35, pp. 485-508, 2012

2012
[30]

Recanatesi et al., Dimensionality Compression and Expansion in Deep Neural Networks, arXiv:1906.00443, 2019

J. Recanatesi et al., Dimensionality Compression and Expansion in Deep Neural Networks, arXiv:1906.00443, 2019

work page arXiv 1906
[31]

B. Saha, H. Garg, and K. Roy, Gradient Projection Memory for Continual Learning, Proc. ICLR, 2021

2021
[32]

Zhu, X.-Y

F. Zhu, X.-Y. Zhang, C. Wang, F. Yin, and C.-L. Liu, Prototype Augmentation and Self-Supervision for Incremental Learning, Proc. IEEE/CVF CVPR, 2021, pp. 5871-5880

2021
[33]

Chaudhry, M

A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, Efficient Lifelong Learning with A-GEM, Proc. ICLR, 2019

2019
[34]

Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009

2009
[35]

K. He, X. Zhang, S. Ren, and J. Sun, Identity Mappings in Deep Residual Networks, Proc. ECCV, 2016, pp. 630-645

2016
[36]

R. M. Kemker, M. McClure, A. Abitino, T. L. Hayes, and C. Kanan, Measuring Catastrophic Forgetting in Neural Networks, Proc. AAAI, 2018, pp. 3390-3398

2018
[37]

Mirzadeh, M

K. Mirzadeh, M. Farajtabar, D. Gorur, R. Pascanu, and H. Ghasemzadeh, Linear Mode Connectivity and the Lottery Ticket Hypothesis, Proc. ICLR, 2020

2020
[38]

Lesort et al., Continual Learning for Robotics, Information Fusion, vol

T. Lesort et al., Continual Learning for Robotics, Information Fusion, vol. 58, pp. 52-68, 2020

2020

[1] [1]

Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning

A. Trivedi and B. Melwani, Catastrophic Forgetting as Accessibility Collapse, arXiv:2606.06032, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Davari, N

M. Davari, N. Asadi, S. Mudur, R. Aljundi, and E. Belilovsky, Probing Representation Forgetting in Supervised and Unsupervised Continual Learning, Proc. IEEE/CVF CVPR, 2022, pp. 16712-16721

2022

[3] [3]

G. M. van de Ven, N. Siegelmann, and A. S. Tolias, Continual learning with a space-time architecture, OpenReview, 2023

2023

[4] [4]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, Similarity of Neural Network Representations Revisited, Proc. ICML, vol. 97, 2019, pp. 3519-3529

2019

[5] [5]

Ilharco et al., Editing Models with Task Arithmetic, Proc

G. Ilharco et al., Editing Models with Task Arithmetic, Proc. ICLR, 2023

2023

[6] [6]

Fort and S

S. Fort and S. Ganguli, Emergent Properties of the Local Geometry of Neural Loss Landscapes, arXiv:1910.05929, 2019

work page arXiv 1910

[7] [7]

Kirkpatrick et al., Overcoming Catastrophic Forgetting in Neural Networks, PNAS, vol

J. Kirkpatrick et al., Overcoming Catastrophic Forgetting in Neural Networks, PNAS, vol. 114, no. 13, pp. 3521-3526, 2017

2017

[8] [8]

McCloskey and N

M. McCloskey and N. J. Cohen, Catastrophic Interference in Connectionist Networks, Psychology of Learning and Motivation, vol. 24, pp. 109-165, 1989

1989

[9] [9]

R. M. French, Catastrophic Forgetting in Connectionist Networks, Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128-135, 1999

1999

[10] [10]

I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks, arXiv:1312.6211, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[11] [11]

Li and D

Z. Li and D. Hoiem, Learning Without Forgetting, IEEE Trans. PAMI, vol. 40, no. 12, pp. 2935-2947, 2018

2018

[12] [12]

Rebuffi, A

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, iCaRL: Incremental Classifier and Representation Learning, Proc. IEEE/CVF CVPR, 2017, pp. 2001-2010

2017

[13] [13]

Aljundi, F

R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, Memory Aware Synapses, Proc. ECCV, 2018, pp. 139-154

2018

[14] [14]

C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, Variational Continual Learning, Proc. ICLR, 2018

2018

[15] [15]

Lopez-Paz and M

D. Lopez-Paz and M. Ranzato, Gradient Episodic Memory for Continual Learning, Proc. NeurIPS, vol. 30, 2017

2017

[16] [16]

Serra, D

J. Serra, D. Suris, M. Miron, and A. Karatzoglou, Overcoming Catastrophic Forgetting with Hard Attention to the Task, Proc. ICML, 2018, pp. 4548-4557

2018

[17] [17]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, Distilling the Knowledge in a Neural Network, arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

A. A. Rusu et al., Progressive Neural Networks, arXiv:1606.04671, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[19] [19]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, Continual Learning Through Synaptic Intelligence, Proc. ICML, 2017, pp. 3987-3995

2017

[20] [20]

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, Proc. IEEE/CVF CVPR, 2016, pp. 770-778

2016

[21] [21]

Ramasesh, E

V. Ramasesh, E. Dyer, and M. Raghu, Anatomy of Catastrophic Forgetting, Proc. ICLR, 2022

2022

[22] [22]

G. M. van de Ven and A. S. Tolias, Three Scenarios for Continual Learning, arXiv:1904.07734, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[23] [23]

Zenke, B

F. Zenke, B. Poole, and S. Ganguli, Continual Learning Through Synaptic Intelligence, Proc. ICML, vol. 70, 2017, pp. 3987-3995

2017

[24] [24]

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, Continual Lifelong Learning with Neural Networks: A Review, Neural Networks, vol. 113, pp. 54-71, 2019

2019

[25] [25]

Hadsell, D

R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, Embracing Change: Continual Learning in Deep Neural Networks, Trends in Cognitive Sciences, vol. 24, no. 12, pp. 1028-1040, 2020

2020

[26] [26]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A Simple Framework for Contrastive Learning of Visual Representations, Proc. ICML, 2020, pp. 1597-1607

2020

[27] [27]

Y. Ding, K. Mallya, and H. Xu, Representation Similarity as a Diagnostic Tool for Continual Learning, arXiv:2210.11052, 2022

work page arXiv 2022

[28] [28]

Bjorck and G

A. Bjorck and G. H. Golub, Numerical Methods for Computing Angles Between Linear Subspaces, Mathematics of Computation, vol. 27, no. 123, pp. 579-594, 1973

1973

[29] [29]

Ganguli and H

S. Ganguli and H. Sompolinsky, Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis, Annual Review of Neuroscience, vol. 35, pp. 485-508, 2012

2012

[30] [30]

Recanatesi et al., Dimensionality Compression and Expansion in Deep Neural Networks, arXiv:1906.00443, 2019

J. Recanatesi et al., Dimensionality Compression and Expansion in Deep Neural Networks, arXiv:1906.00443, 2019

work page arXiv 1906

[31] [31]

B. Saha, H. Garg, and K. Roy, Gradient Projection Memory for Continual Learning, Proc. ICLR, 2021

2021

[32] [32]

Zhu, X.-Y

F. Zhu, X.-Y. Zhang, C. Wang, F. Yin, and C.-L. Liu, Prototype Augmentation and Self-Supervision for Incremental Learning, Proc. IEEE/CVF CVPR, 2021, pp. 5871-5880

2021

[33] [33]

Chaudhry, M

A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, Efficient Lifelong Learning with A-GEM, Proc. ICLR, 2019

2019

[34] [34]

Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, University of Toronto, 2009

2009

[35] [35]

K. He, X. Zhang, S. Ren, and J. Sun, Identity Mappings in Deep Residual Networks, Proc. ECCV, 2016, pp. 630-645

2016

[36] [36]

R. M. Kemker, M. McClure, A. Abitino, T. L. Hayes, and C. Kanan, Measuring Catastrophic Forgetting in Neural Networks, Proc. AAAI, 2018, pp. 3390-3398

2018

[37] [37]

Mirzadeh, M

K. Mirzadeh, M. Farajtabar, D. Gorur, R. Pascanu, and H. Ghasemzadeh, Linear Mode Connectivity and the Lottery Ticket Hypothesis, Proc. ICLR, 2020

2020

[38] [38]

Lesort et al., Continual Learning for Robotics, Information Fusion, vol

T. Lesort et al., Continual Learning for Robotics, Information Fusion, vol. 58, pp. 52-68, 2020

2020