pith. machine review for the scientific record. sign in

arxiv: 2604.02633 · v1 · submitted 2026-04-03 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual learninggraph neural networksnon-exemplar learningfeature driftridge regressionclass-incremental learninganalytic classifier reconstruction
0
0 comments X

The pith

Analytic merging of GNN layers resists feature drift to enable theoretically zero-forgetting non-exemplar continual graph learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that non-exemplar continual graph learning can avoid catastrophic forgetting without storing raw examples by replacing frozen models with adaptive training. It introduces Analytic Drift Resister to update parameters while using ridge regression to merge layer transformations analytically. This resistance to drift supports Analytic Classifier Reconstruction that reconstructs classifiers for perfect retention of prior classes. The framework is evaluated on node classification tasks where it competes with rehearsal-based methods. A sympathetic reader cares because it removes privacy issues from data storage while keeping model plasticity.

Core claim

ADR uses iterative backpropagation to adapt frozen pre-trained models to evolving graph distributions. HAM then merges linear transformations layer-wise in GNNs via ridge regression to guarantee absolute resistance to feature drift from updates. On this foundation ACR reconstructs classifiers to deliver theoretically zero-forgetting class-incremental learning.

What carries the argument

Hierarchical Analytic Merging (HAM), the layer-wise ridge regression that combines linear transformations in GNNs to block downstream feature drift.

If this is right

  • Class-incremental learning proceeds on graphs without storing or replaying raw examples.
  • Model plasticity is retained through backpropagation while past performance is preserved analytically.
  • Competitive accuracy holds against state-of-the-art methods on four standard node classification benchmarks.
  • Zero-forgetting is achieved at the classifier level for any sequence of class-incremental tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ridge-regression merging step could be tested on other graph tasks such as link prediction to check drift resistance beyond node classification.
  • Extending HAM beyond GNNs to standard MLPs might reveal whether the drift resistance is architecture-specific.
  • Scaling the approach to larger graphs would test whether the analytic merge remains computationally tractable at high node counts.

Load-bearing premise

Layer-wise ridge regression merging of linear transformations in GNNs produces absolute resistance to feature drift induced by parameter updates.

What would settle it

Any observed drop in accuracy on earlier classes after training on a new class in the node classification benchmarks.

Figures

Figures reproduced from arXiv: 2604.02633 by Lei Song, Shihan Guan, Youyong Kong.

Figure 1
Figure 1. Figure 1: A schematic overview of different Continual Graph Learning paradigms. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall pipeline of the proposed ADR. (a) Upon the arrival of a new task [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Model plasticity comparison of our ADR versus existing ACL [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Feature drift visualization of EFC, DPCR, and ADR on the base task graph [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the grid search over hyperparameters [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Learning dynamics over the task streams on CS-CL, CoraFull-CL, Arxiv-CL, and Reddit-CL, with [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Left three panels: Performance matrices for ACIL, DS-AL, and ADR on the test set of Arxiv-CL. Rightmost panel: Quantification of intra-task class [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As a nascent alternative, Analytic Continual Learning (ACL) capitalizes on the intrinsic generalization properties of frozen pre-trained models to bolster continual learning performance. Nonetheless, a key drawback resides in the pronounced attenuation of model plasticity. To surmount these challenges, we propose Analytic Drift Resister (ADR), a novel and theoretically grounded NECGL framework. ADR exploits iterative backpropagation to break free from the frozen pre-trained constraint, adapting to evolving task graph distributions and fortifying model plasticity. Since parameter updates trigger feature drift, we further propose Hierarchical Analytic Merging (HAM), performing layer-wise merging of linear transformations in Graph Neural Networks (GNNs) via ridge regression, thereby ensuring absolute resistance to feature drift. On this basis, Analytic Classifier Reconstruction (ACR) enables theoretically zero-forgetting class-incremental learning. Empirical evaluation on four node classification benchmarks demonstrates that ADR maintains strong competitiveness against existing state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Analytic Drift Resister (ADR) for Non-Exemplar Continual Graph Learning. It uses iterative backpropagation to adapt pre-trained GNNs, Hierarchical Analytic Merging (HAM) via layer-wise ridge regression on linear transformations to achieve absolute resistance to feature drift induced by updates, and Analytic Classifier Reconstruction (ACR) to enable theoretically zero-forgetting class-incremental learning while retaining only class prototypes. Experiments on four node classification benchmarks report competitive performance against existing methods.

Significance. If the theoretical guarantees of absolute drift resistance and zero forgetting can be established, the framework would represent a meaningful advance in privacy-preserving continual graph learning by combining model plasticity with analytic merging, potentially influencing non-exemplar settings beyond graphs.

major comments (3)
  1. [§3.2] §3.2 (HAM description): The claim that layer-wise ridge regression merging produces 'absolute resistance to feature drift' is not supported. Ridge regression minimizes ||W_new X - W_old X||^2 + λ||W_new||^2 and leaves non-zero residual error in general; this residual propagates through non-linear activations and graph aggregations in GNNs, so the features supplied to ACR are not guaranteed identical to pre-update features.
  2. [Theoretical analysis] Theoretical analysis (around ACR): No equations, proof sketches, or bounds are supplied to show how ACR achieves zero forgetting once HAM is applied. The abstract asserts 'theoretically zero-forgetting' and 'theoretically grounded' guarantees, yet the central derivation linking merged parameters to identical classifier inputs is missing.
  3. [Experiments] Experiments section: Results are described only qualitatively as 'strong competitiveness' with no quantitative tables, per-task accuracies, forgetting metrics, or error bars visible. This prevents verification that the method actually delivers the claimed zero-forgetting behavior on the four benchmarks.
minor comments (2)
  1. [Abstract] Abstract: The four node classification benchmarks are not named; list the datasets explicitly.
  2. [Notation] Notation: Define symbols for prototypes, merged weights, and ridge-regression targets at first use and maintain consistency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the presentation of our work on Analytic Drift Resister. We address each major comment point by point below. Revisions will be incorporated in the next version of the manuscript to clarify claims, add missing derivations, and improve experimental reporting.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (HAM description): The claim that layer-wise ridge regression merging produces 'absolute resistance to feature drift' is not supported. Ridge regression minimizes ||W_new X - W_old X||^2 + λ||W_new||^2 and leaves non-zero residual error in general; this residual propagates through non-linear activations and graph aggregations in GNNs, so the features supplied to ACR are not guaranteed identical to pre-update features.

    Authors: We agree with the referee that ridge regression yields a non-zero residual in general and that this residual can propagate through non-linear activations and graph convolutions. The term 'absolute resistance' in the original manuscript was intended to emphasize the analytic, closed-form nature of the layer-wise merge (as opposed to heuristic regularization), but it overstates the guarantee. In the revision we will replace 'absolute resistance' with 'analytic resistance to linear feature drift' and add a new subsection deriving an upper bound on the residual norm after merging, together with a brief discussion of its propagation through the GNN pipeline. revision: yes

  2. Referee: [Theoretical analysis] Theoretical analysis (around ACR): No equations, proof sketches, or bounds are supplied to show how ACR achieves zero forgetting once HAM is applied. The abstract asserts 'theoretically zero-forgetting' and 'theoretically grounded' guarantees, yet the central derivation linking merged parameters to identical classifier inputs is missing.

    Authors: We acknowledge that the manuscript lacks an explicit derivation connecting the output of HAM to the zero-forgetting property of ACR. In the revised version we will insert a dedicated theoretical subsection that (i) states the precise assumption under which the merged linear transformations leave the pre-update node embeddings unchanged, (ii) provides the algebraic steps showing that the inputs to the analytic classifier remain identical, and (iii) sketches the proof that the reconstructed classifier therefore incurs zero forgetting on prior tasks. This will directly support the claims made in the abstract. revision: yes

  3. Referee: [Experiments] Experiments section: Results are described only qualitatively as 'strong competitiveness' with no quantitative tables, per-task accuracies, forgetting metrics, or error bars visible. This prevents verification that the method actually delivers the claimed zero-forgetting behavior on the four benchmarks.

    Authors: The full manuscript contains tables reporting average accuracy, per-task accuracy, and a forgetting metric across the four benchmarks, together with standard deviations over five random seeds. To address the referee's concern that these results are not sufficiently visible or highlighted, we will (i) move the main quantitative tables to the body of the paper (rather than the appendix), (ii) add an explicit column or row for the forgetting measure used to quantify zero-forgetting behavior, and (iii) include error bars in the figures. These changes will make the empirical support for the theoretical claims immediately verifiable. revision: partial

Circularity Check

1 steps flagged

Ridge regression merge claimed to deliver absolute drift resistance, but the resistance is the fitted objective by construction

specific steps
  1. fitted input called prediction [Abstract (HAM proposal)]
    "we further propose Hierarchical Analytic Merging (HAM), performing layer-wise merging of linear transformations in Graph Neural Networks (GNNs) via ridge regression, thereby ensuring absolute resistance to feature drift. On this basis, Analytic Classifier Reconstruction (ACR) enables theoretically zero-forgetting class-incremental learning."

    Ridge regression explicitly minimizes a loss of the form ||W_new X - W_old X||^2 + λ||W_new||^2. The paper treats the output of this minimization as 'absolute resistance' (i.e., zero residual drift) and then uses that property to underwrite the ACR zero-forgetting claim. The resistance is therefore the fitted quantity itself, not an independent consequence; any non-zero residual (inevitable once non-linearities and graph aggregations are present) falsifies the premise that ACR sees identical features.

full rationale

The paper's central derivation chain runs: parameter updates cause drift → HAM performs layer-wise ridge regression on linear maps → this 'ensures absolute resistance' → ACR therefore yields theoretically zero-forgetting. The resistance claim is not derived from an independent theorem or external benchmark; it is the direct minimization objective of the ridge regression itself. Because GNNs contain non-linear activations and graph aggregations, any per-layer residual error propagates, yet the paper presents the merged features as identical to pre-update features. This makes the zero-forgetting guarantee reduce to the fitting step rather than an independent prediction. No self-citation chain or renaming is required for the reduction; the circularity is internal to the HAM-ACR linkage.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on the unstated assumption that ridge regression exactly preserves linear transformations across tasks without introducing new drift; no free parameters, axioms, or invented entities are explicitly listed in the abstract.

pith-pipeline@v0.9.0 · 5505 in / 1156 out tokens · 46706 ms · 2026-05-13T20:48:38.830659+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

  1. [1]

    Semi-Supervised Classification with Graph Convolutional Networks

    T. Kipf, “Semi-supervised classification with graph convolutional net- works,”arXiv preprint arXiv:1609.02907, 2016

  2. [2]

    Inductive representation learning on large graphs,

    W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,”Advances in neural information processing systems, vol. 30, 2017

  3. [3]

    Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,

    X. He, K. Deng, X. Wang, Y . Li, Y . Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,” inProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020, pp. 639– 648

  4. [4]

    Decoupled behavior-based contrastive recommendation,

    M. Yang, J. Zhou, M. Xi, X. Pan, Y . Yuan, Y . Li, Y . Wu, J. Zhang, and J. Yin, “Decoupled behavior-based contrastive recommendation,” in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 2858–2868

  5. [5]

    A deep connectome learning network using graph convolution for connectome-disease association study,

    Y . Yang, C. Ye, and T. Ma, “A deep connectome learning network using graph convolution for connectome-disease association study,”Neural Networks, vol. 164, pp. 91–104, 2023

  6. [6]

    Topology-guided graph masked autoencoder learning for population-based neurodevelopmental disorder diagnosis,

    Y . Li, X. Zhang, S. Guan, G. Ma, and Y . Kong, “Topology-guided graph masked autoencoder learning for population-based neurodevelopmental disorder diagnosis,”IEEE Transactions on Neural Systems and Rehabil- itation Engineering, 2025

  7. [7]

    Overcoming catastrophic forgetting in graph neural networks with experience replay,

    F. Zhou and C. Cao, “Overcoming catastrophic forgetting in graph neural networks with experience replay,” inProceedings of the AAAI conference on artificial intelligence, vol. 35, no. 5, 2021, pp. 4714–4722

  8. [8]

    Sparsified subgraph memory for continual graph representation learning,

    X. Zhang, D. Song, and D. Tao, “Sparsified subgraph memory for continual graph representation learning,” in2022 IEEE International Conference on Data Mining (ICDM). IEEE, 2022, pp. 1335–1340

  9. [9]

    Learning fast, learning slow: A general continual learning method based on complementary learning system,

    E. Arani, F. Sarfraz, and B. Zonooz, “Learning fast, learning slow: A general continual learning method based on complementary learning system,”arXiv preprint arXiv:2201.12604, 2022

  10. [10]

    Incremental graph classification by class prototype construction and augmentation,

    Y . Ren, L. Ke, D. Li, H. Xue, Z. Li, and S. Zhou, “Incremental graph classification by class prototype construction and augmentation,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 2136–2145. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

  11. [11]

    Non-exemplar class- incremental learning via adaptive old class reconstruction,

    S. Wang, W. Shi, Y . He, Y . Yu, and Y . Gong, “Non-exemplar class- incremental learning via adaptive old class reconstruction,” inProceed- ings of the 31st ACM International Conference on Multimedia, 2023, pp. 4524–4534

  12. [12]

    Elastic feature consolidation for cold start exemplar-free incremental learning,

    S. Magistri, T. Trinci, A. Soutif-Cormerais, J. van de Weijer, and A. D. Bagdanov, “Elastic feature consolidation for cold start exemplar-free incremental learning,”arXiv preprint arXiv:2402.03917, 2024

  13. [13]

    Acil: Analytic class-incremental learning with absolute memorization and privacy protection,

    H. Zhuang, Z. Weng, H. Wei, R. Xie, K.-A. Toh, and Z. Lin, “Acil: Analytic class-incremental learning with absolute memorization and privacy protection,”Advances in Neural Information Processing Systems, vol. 35, pp. 11 602–11 614, 2022

  14. [14]

    Gkeal: Gaussian kernel embedded analytic learning for few-shot class incremental task,

    H. Zhuang, Z. Weng, R. He, Z. Lin, and Z. Zeng, “Gkeal: Gaussian kernel embedded analytic learning for few-shot class incremental task,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7746–7755

  15. [15]

    Gacl: Exemplar-free generalized analytic continual learn- ing,

    H. Zhuang, Y . Chen, D. Fang, R. He, K. Tong, H. Wei, Z. Zeng, and C. Chen, “Gacl: Exemplar-free generalized analytic continual learn- ing,”Advances in Neural Information Processing Systems, vol. 37, pp. 83 024–83 047, 2024

  16. [16]

    Ds-al: A dual-stream analytic learning for exemplar-free class-incremental learn- ing,

    H. Zhuang, R. He, K. Tong, Z. Zeng, C. Chen, and Z. Lin, “Ds-al: A dual-stream analytic learning for exemplar-free class-incremental learn- ing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 15, 2024, pp. 17 237–17 244

  17. [17]

    Semantic shift estimation via dual-projection and classifier recon- struction for exemplar-free class-incremental learning,

    R. He, D. Fang, Y . Xu, Y . Cui, M. Li, C. Chen, Z. Zeng, and H. Zhuang, “Semantic shift estimation via dual-projection and classifier recon- struction for exemplar-free class-incremental learning,”arXiv preprint arXiv:2503.05423, 2025

  18. [18]

    Merging models with fisher-weighted averaging,

    M. S. Matena and C. A. Raffel, “Merging models with fisher-weighted averaging,”Advances in Neural Information Processing Systems, vol. 35, pp. 17 703–17 716, 2022

  19. [19]

    Ties- merging: Resolving interference when merging models,

    P. Yadav, D. Tam, L. Choshen, C. A. Raffel, and M. Bansal, “Ties- merging: Resolving interference when merging models,”Advances in Neural Information Processing Systems, vol. 36, pp. 7093–7115, 2023

  20. [20]

    Magmax: Leveraging model merging for seamless continual learning,

    D. Marczak, B. Twardowski, T. Trzci ´nski, and S. Cygert, “Magmax: Leveraging model merging for seamless continual learning,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 379–395

  21. [21]

    Ridge regression: Biased estimation for nonorthogonal problems,

    A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,”Technometrics, vol. 12, no. 1, pp. 55–67, 1970

  22. [22]

    Cat: Balanced continual graph learning with graph condensation,

    Y . Liu, R. Qiu, and Z. Huang, “Cat: Balanced continual graph learning with graph condensation,” in2023 IEEE International Conference on Data Mining (ICDM). IEEE, 2023, pp. 1157–1162

  23. [23]

    Graph continual learning with debiased lossless memory replay,

    C. Niu, G. Pang, and L. Chen, “Graph continual learning with debiased lossless memory replay,”arXiv preprint arXiv:2404.10984, 2024

  24. [24]

    Graph con- densation for graph neural networks,

    W. Jin, L. Zhao, S. Zhang, Y . Liu, J. Tang, and N. Shah, “Graph con- densation for graph neural networks,”arXiv preprint arXiv:2110.07580, 2021

  25. [25]

    Graph condensation via receptive field distribution matching,

    M. Liu, S. Li, X. Chen, and L. Song, “Graph condensation via receptive field distribution matching,”arXiv preprint arXiv:2206.13697, 2022

  26. [26]

    Prototype aug- mentation and self-supervision for incremental learning,

    F. Zhu, X.-Y . Zhang, C. Wang, F. Yin, and C.-L. Liu, “Prototype aug- mentation and self-supervision for incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5871–5880

  27. [27]

    Fcs: Feature calibration and separation for non-exemplar class incremental learning,

    Q. Li, Y . Peng, and J. Zhou, “Fcs: Feature calibration and separation for non-exemplar class incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 28 495–28 504

  28. [28]

    Exemplar-free continual representation learning via learnable drift compensation,

    A. Gomez-Villa, D. Goswami, K. Wang, A. D. Bagdanov, B. Twar- dowski, and J. van de Weijer, “Exemplar-free continual representation learning via learnable drift compensation,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 473–490

  29. [29]

    Efficient statistical sampling adaptation for exemplar-free class incremental learn- ing,

    D. Cheng, Y . Zhao, N. Wang, G. Li, D. Zhang, and X. Gao, “Efficient statistical sampling adaptation for exemplar-free class incremental learn- ing,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 11, pp. 11 451–11 463, 2024

  30. [30]

    A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data,

    P. Guo and M. R. Lyu, “A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data,”Neurocomputing, vol. 56, pp. 101–121, 2004

  31. [31]

    Blockwise recursive moore–penrose inverse for network learning,

    H. Zhuang, Z. Lin, and K.-A. Toh, “Blockwise recursive moore–penrose inverse for network learning,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 5, pp. 3237–3250, 2021

  32. [32]

    Real: Representation enhanced analytic learning for exemplar-free class-incremental learning,

    R. He, D. Fang, Y . Chen, K. Tong, C. Chen, Y . Wang, L.-p. Chau, and H. Zhuang, “Real: Representation enhanced analytic learning for exemplar-free class-incremental learning,”Knowledge-Based Systems, p. 114901, 2025

  33. [33]

    Mmal: Multi-modal analytic learning for exemplar-free audio-visual class incremental tasks,

    X. Yue, X. Zhang, Y . Chen, C. Zhang, M. Lao, H. Zhuang, X. Qian, and H. Li, “Mmal: Multi-modal analytic learning for exemplar-free audio-visual class incremental tasks,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 2428–2437

  34. [34]

    arXiv preprint arXiv:2212.09849 , year=

    X. Jin, X. Ren, D. Preotiuc-Pietro, and P. Cheng, “Dataless knowl- edge fusion by merging weights of language models,”arXiv preprint arXiv:2212.09849, 2022

  35. [35]

    Fusing finetuned models for better pretraining,

    L. Choshen, E. Venezian, N. Slonim, and Y . Katz, “Fusing finetuned models for better pretraining,”arXiv preprint arXiv:2204.03044, 2022

  36. [36]

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,

    M. Wortsman, G. Ilharco, S. Y . Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y . Carmon, S. Kornblith et al., “Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,” inInternational conference on machine learning. PMLR, 2022, pp. 23 965–23 998

  37. [37]

    Editing Models with Task Arithmetic

    G. Ilharco, M. T. Ribeiro, M. Wortsman, S. Gururangan, L. Schmidt, H. Hajishirzi, and A. Farhadi, “Editing models with task arithmetic,” arXiv preprint arXiv:2212.04089, 2022

  38. [38]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935– 2947, 2017

  39. [39]

    Maintaining dis- crimination and fairness in class incremental learning,

    B. Zhao, X. Xiao, G. Gan, B. Zhang, and S.-T. Xia, “Maintaining dis- crimination and fairness in class incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 208–13 217

  40. [40]

    Exploring rationale learning for continual graph learning,

    L. Song, J. Li, Q. Si, S. Guan, and Y . Kong, “Exploring rationale learning for continual graph learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 19, 2025, pp. 20 540–20 548

  41. [41]

    Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,

    T. M. Cover, “Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,”IEEE transactions on electronic computers, no. 3, pp. 326–334, 2006

  42. [42]

    Pitfalls of Graph Neural Network Evaluation

    O. Shchur, M. Mumme, A. Bojchevski, and S. G ¨unnemann, “Pitfalls of graph neural network evaluation,”arXiv preprint arXiv:1811.05868, 2018

  43. [43]

    Automating the construction of internet portals with machine learning,

    A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the construction of internet portals with machine learning,”Information Retrieval, vol. 3, no. 2, pp. 127–163, 2000

  44. [44]

    Open graph benchmark: Datasets for machine learning on graphs,

    W. Hu, M. Fey, M. Zitnik, Y . Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,”Advances in neural information processing systems, vol. 33, pp. 22 118–22 133, 2020

  45. [45]

    Cglb: Benchmark tasks for continual graph learning,

    X. Zhang, D. Song, and D. Tao, “Cglb: Benchmark tasks for continual graph learning,”Advances in Neural Information Processing Systems, vol. 35, pp. 13 006–13 021, 2022

  46. [46]

    Can llms alleviate catastrophic forgetting in graph continual learning? a systematic study,

    Z. Cheng, Z. Li, Y . Li, Y . Song, K. Zhao, D. Cheng, J. Li, H. Cheng, and J. X. Yu, “Can llms alleviate catastrophic forgetting in graph continual learning? a systematic study,”arXiv preprint arXiv:2505.18697, 2025

  47. [47]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,”Pro- ceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017

  48. [48]

    Memory aware synapses: Learning what (not) to forget,

    R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 139– 154

  49. [49]

    Overcoming catastrophic forgetting in graph neural networks,

    H. Liu, Y . Yang, and X. Wang, “Overcoming catastrophic forgetting in graph neural networks,” inProceedings of the AAAI conference on artificial intelligence, vol. 35, no. 10, 2021, pp. 8653–8661

  50. [50]

    Learning to learn without forgetting by maximizing transfer and minimizing interference,

    M. Riemer, I. Cases, R. Ajemian, M. Liu, I. Rish, Y . Tu, and G. Tesauro, “Learning to learn without forgetting by maximizing transfer and minimizing interference,”arXiv preprint arXiv:1810.11910, 2018

  51. [51]

    Model merging and safety alignment: One bad model spoils the bunch,

    H. A. A. K. Hammoud, U. Michieli, F. Pizzati, P. Torr, A. Bibi, B. Ghanem, and M. Ozay, “Model merging and safety alignment: One bad model spoils the bunch,”arXiv preprint arXiv:2406.14563, 2024