pith. machine review for the scientific record. sign in

arxiv: 2605.12998 · v1 · submitted 2026-05-13 · 💻 cs.LG

Recognition: unknown

DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

Dongjin Song, Guiquan Sun, Jingchao Ni, Xikun Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:38 UTC · model grok-4.3

classification 💻 cs.LG
keywords continual graph learningtask-free learningdistribution driftbenchmarkcatastrophic forgettinggraph streamsnon-stationary data
0
0 comments X

The pith

Many existing continual graph learning methods implicitly depend on task boundary information and degrade under continuous distribution shifts without them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that typical continual graph learning setups assume discrete tasks with known boundaries, but real data changes smoothly and without labels. It offers a task-free model where the stream is a changing mix of hidden task distributions whose shifts follow Gaussian rules. The resulting DRIFT benchmark lets researchers test methods across hard switches to gentle drifts. Tests show big performance losses, meaning current techniques need task cues to work. This points to the need for new methods built for boundary-free streams.

Core claim

By modeling the data stream as a time-varying mixture of latent task distributions with Gaussian-parameterized transitions, the DRIFT benchmark demonstrates that representative continual learning methods suffer substantial performance degradation in task-free settings compared to traditional task-based protocols, indicating implicit reliance on task boundary information.

What carries the argument

A unified formulation modeling the data stream as a time-varying mixture of latent task distributions with Gaussian-parameterized transition dynamics, realized as the DRIFT benchmark.

If this is right

  • Standard continual learning techniques require modification to handle unknown task boundaries and continuous shifts.
  • Performance metrics from task-based evaluations overestimate real-world applicability for graph streams.
  • New algorithms must detect or adapt to distribution changes without explicit task signals.
  • Benchmarks like DRIFT provide a way to compare methods under varying levels of drift smoothness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Methods successful on DRIFT may transfer better to other non-stationary data like sensor streams or social networks.
  • Extending the Gaussian mixture model to other transition types could test robustness of the findings.
  • Integration with online learning techniques that monitor distribution changes might address the identified gaps.

Load-bearing premise

Real graph data streams can be modeled accurately as time-varying mixtures of latent task distributions using Gaussian transition dynamics.

What would settle it

Observing that top continual learning methods achieve similar accuracy on DRIFT's task-free streams as on standard task-based benchmarks would falsify the claim of implicit boundary reliance.

Figures

Figures reproduced from arXiv: 2605.12998 by Dongjin Song, Guiquan Sun, Jingchao Ni, Xikun Zhang.

Figure 1
Figure 1. Figure 1: Overview of DRIFT. We propose a unified formulation of task-free continual graph learning [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The dynamics of test accuracy of all implemented baselines on four datasets. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of the transition scale σ on CoraFull-CL. However, improved adaptation is accompanied by increased forgetting. ER degrades from −37.6% forgetting at σ=3 to −48.0% at σ=20, while A-GEM drops from −38.4% to −53.9%. Although smoother transitions improve online adaptation, they simultaneously weaken the ef￾fective training signal associated with each la￾tent distribution, making old knowledge harder to … view at source ↗
Figure 4
Figure 4. Figure 4: Effect of with- vs. without-replacement sampling [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: t-SNE visualization of learned node embeddings on Reddit-CL. [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
read the original abstract

Continual graph learning (CGL) aims to learn from dynamically evolving graphs while mitigating catastrophic forgetting. Existing CGL approaches typically adopt a task-based formulation, where the data stream is partitioned into a sequence of discrete tasks with pre-defined boundaries. However, such assumptions rarely hold in real-world environments, where data distributions evolve continuously and task identity is often unavailable. To better reflect realistic non-stationary environments, we revisit continual graph learning from a task-free perspective. We propose a unified formulation that models the data stream as a time-varying mixture of latent task distributions, enabling continuous modeling of distribution drift. Based on this formulation, we construct DRIFT, a benchmark that spans a spectrum of transition dynamics ranging from hard task switches to smooth distributional drift through a Gaussian parameterization. We evaluate representative continual learning methods under this task-free setting and observe substantial performance degradation compared to traditional task-based protocols. Our findings indicate that many existing approaches implicitly rely on task boundary information and struggle under realistic task-free graph streams. This work highlights the importance of studying continual graph learning under realistic non-stationary conditions and provides a benchmark for future research in this direction. Our code is available at https://github.com/gqBond/DRIFT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces DRIFT, a benchmark for task-free continual graph learning. It formulates the data stream as a time-varying mixture of latent task distributions with Gaussian-parameterized transitions spanning hard switches to smooth drifts, constructs the benchmark accordingly, evaluates representative continual learning methods, and reports substantial performance degradation relative to task-based settings, concluding that many existing approaches implicitly rely on task boundary information.

Significance. If the benchmark's synthetic transitions faithfully capture realistic continuous shifts, the work is significant for the CGL community by supplying a reproducible testbed (code is publicly released) and by quantifying the gap between task-based protocols and task-free streams. This could steer future research toward boundary-free methods.

major comments (1)
  1. [Abstract and formulation] Abstract and unified formulation: the central claim that existing methods 'struggle under realistic task-free graph streams' rests on DRIFT's Gaussian parameterization of latent-task transitions being representative; the manuscript provides no validation (e.g., statistical comparison of generated node-feature or topology statistics) against real non-stationary graph streams such as temporal citation or social networks, so the observed degradation could be generator-specific rather than general evidence.
minor comments (2)
  1. [Experiments] Experiments section: supply full quantitative tables with means, standard deviations, and statistical tests for the reported degradation to allow precise assessment of effect sizes.
  2. [Benchmark construction] Notation: clarify whether the mixture weights and Gaussian parameters are fixed across runs or re-sampled, as this affects reproducibility of the transition dynamics.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the representativeness of DRIFT. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract and formulation] Abstract and unified formulation: the central claim that existing methods 'struggle under realistic task-free graph streams' rests on DRIFT's Gaussian parameterization of latent-task transitions being representative; the manuscript provides no validation (e.g., statistical comparison of generated node-feature or topology statistics) against real non-stationary graph streams such as temporal citation or social networks, so the observed degradation could be generator-specific rather than general evidence.

    Authors: We agree that the manuscript does not provide direct statistical comparisons (e.g., Kolmogorov-Smirnov tests or moment matching on node features and graph statistics) between DRIFT-generated streams and real non-stationary graphs such as temporal citation or social networks. This is a valid limitation: the observed degradation could partly reflect properties of the Gaussian mixture transitions rather than being fully general. Our formulation intentionally uses a controllable Gaussian parameterization to span the full spectrum from abrupt switches to smooth drifts while keeping latent task identities unavailable, which is the core contribution for studying task-free CGL. Real streams rarely come with ground-truth latent task labels, making direct validation difficult. In the revision we will add a dedicated Limitations subsection that (i) explicitly states the synthetic nature of the generator, (ii) provides qualitative discussion of how the modeled drifts align with observed gradual shifts in citation networks (e.g., evolving research topics), and (iii) cites prior work on real temporal graphs exhibiting mixture-like behavior. We will also release additional diagnostic plots comparing basic statistics. These changes clarify the benchmark's scope without altering the central empirical finding that standard methods degrade when boundary information is removed. revision: yes

Circularity Check

0 steps flagged

No circularity; benchmark construction and empirical evaluation are self-contained

full rationale

The paper defines a modeling formulation for task-free CGL as a time-varying mixture of latent tasks with Gaussian transitions, builds the DRIFT benchmark from that formulation, and reports direct empirical evaluations of existing methods on the resulting streams. No equations, parameters, or predictions are shown that reduce by construction to fitted inputs or prior self-citations. The central observation (performance degradation under the new protocol) follows from running the methods on the generated data rather than from any self-referential derivation. This is a standard benchmark paper whose claims rest on the fidelity of the synthetic generator, not on circular logic.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about latent task mixtures and Gaussian parameterization of drifts; no free parameters are fitted to target results and no new entities are postulated.

free parameters (1)
  • Gaussian parameters controlling transition dynamics
    Control the spectrum from hard task switches to smooth distributional drift in benchmark construction.
axioms (2)
  • domain assumption Real-world graph data streams evolve continuously without predefined task boundaries
    Motivates the shift from task-based to task-free formulation.
  • domain assumption Data can be represented as a time-varying mixture of latent task distributions
    Core modeling choice enabling continuous drift simulation.

pith-pipeline@v0.9.0 · 5525 in / 1371 out tokens · 63014 ms · 2026-05-14T19:38:51.391559+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Inductive representation learning on large graphs

    Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. InAdvances in Neural Information Processing Systems, volume 30, 2017

  2. [2]

    Open graph benchmark: Datasets for machine learning on graphs

    Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020

  3. [3]

    Knowledge graph embedding: A survey of approaches and applications.IEEE transactions on knowledge and data engineering, 29:2724– 2743, 2017

    Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. Knowledge graph embedding: A survey of approaches and applications.IEEE transactions on knowledge and data engineering, 29:2724– 2743, 2017

  4. [4]

    Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526, 2017

  5. [5]

    On Tiny Episodic Memories in Continual Learning

    Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. On tiny episodic memories in continual learning.arXiv preprint arXiv:1902.10486, 2019

  6. [6]

    Cglb: Benchmark tasks for continual graph learning.Advances in Neural Information Processing Systems, 35:13006–13021, 2022

    Xikun Zhang, Dongjin Song, and Dacheng Tao. Cglb: Benchmark tasks for continual graph learning.Advances in Neural Information Processing Systems, 35:13006–13021, 2022

  7. [7]

    Temporal graph networks for deep learning on dynamic graphs,

    Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. Temporal graph networks for deep learning on dynamic graphs.arXiv preprint arXiv:2006.10637, 2020

  8. [8]

    Online continual learning on class incremental blurry task configuration with anytime inference

    Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. InInternational Conference on Learning Representations, 2022

  9. [9]

    Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning

    Jun-Yeong Moon, Keon-Hee Park, Jung Uk Kim, and Gyeong-Moon Park. Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11731–11741, 2023

  10. [10]

    Efficient Lifelong Learning with A-GEM

    Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-gem.arXiv preprint arXiv:1812.00420, 2018

  11. [11]

    Gradient based sample selection for online continual learning.Advances in neural information processing systems, 32, 2019

    Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual learning.Advances in neural information processing systems, 32, 2019

  12. [12]

    Dark experience for general continual learning: a strong, simple baseline.Advances in neural information processing systems, 33:15920–15930, 2020

    Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline.Advances in neural information processing systems, 33:15920–15930, 2020

  13. [13]

    Task-free continual learning

    Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. Task-free continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11254–11263, 2019. 10

  14. [14]

    Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

    Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

  15. [15]

    Continual learning on dynamic graphs via parameter isolation

    Peiyan Zhang, Yuchen Yan, Chaozhuo Li, Senzhang Wang, Xing Xie, Guojie Song, and Sunghun Kim. Continual learning on dynamic graphs via parameter isolation. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, pages 601–611, 2023

  16. [16]

    Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

    Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

  17. [17]

    A topology-aware graph coarsening framework for continual graph learning.Advances in Neural Information Processing Systems, 37:132491– 132523, 2024

    Xiaoxue Han, Zhuo Feng, and Yue Ning. A topology-aware graph coarsening framework for continual graph learning.Advances in Neural Information Processing Systems, 37:132491– 132523, 2024

  18. [18]

    Topology-aware embedding memory for continual learning on expanding networks

    Xikun Zhang, Dongjin Song, Yixin Chen, and Dacheng Tao. Topology-aware embedding memory for continual learning on expanding networks. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4326–4337, 2024

  19. [19]

    Hierarchical prototype networks for continual graph representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4622–4636, 2022

    Xikun Zhang, Dongjin Song, and Dacheng Tao. Hierarchical prototype networks for continual graph representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4622–4636, 2022

  20. [20]

    Hero: Heterogeneous continual graph learning via meta-knowledge distillation.arXiv preprint arXiv:2505.17458, 2025

    Guiquan Sun, Xikun Zhang, Jingchao Ni, and Dongjin Song. Hero: Heterogeneous continual graph learning via meta-knowledge distillation.arXiv preprint arXiv:2505.17458, 2025

  21. [21]

    Cat: Balanced continual graph learning with graph condensation

    Yilun Liu, Ruihong Qiu, and Zi Huang. Cat: Balanced continual graph learning with graph condensation. In2023 IEEE International Conference on Data Mining (ICDM), pages 1157–

  22. [22]

    Lifelong graph learning

    Chen Wang, Yuheng Qiu, Dasong Gao, and Sebastian Scherer. Lifelong graph learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13719–13728, 2022

  23. [23]

    Replay-and-forget-free graph class- incremental learning: A task profiling and prompting approach.Advances in Neural Information Processing Systems, 37:87978–88002, 2024

    Chaoxi Niu, Guansong Pang, Ling Chen, and Bing Liu. Replay-and-forget-free graph class- incremental learning: A task profiling and prompting approach.Advances in Neural Information Processing Systems, 37:87978–88002, 2024

  24. [24]

    Class-domain incremental learning on graphs via disentangled knowledge distillation

    Qin Tian, Chen Zhao, Xintao Wu, Dong Li, Minglai Shao, Xujiang Zhao, and Wenjun Wang. Class-domain incremental learning on graphs via disentangled knowledge distillation. In Proceedings of the ACM Web Conference 2026, pages 452–462, 2026

  25. [25]

    What matters in graph class incremental learning? an information preservation perspective.Advances in Neural Information Processing Systems, 37:26195–26223, 2024

    Jialu Li, Yu Wang, Pengfei Zhu, Wanyu Lin, and Qinghua Hu. What matters in graph class incremental learning? an information preservation perspective.Advances in Neural Information Processing Systems, 37:26195–26223, 2024

  26. [26]

    Overcoming catastrophic forgetting in graph neural networks with experience replay

    Fan Zhou and Chengtai Cao. Overcoming catastrophic forgetting in graph neural networks with experience replay. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 4714–4722, 2021

  27. [27]

    Streaming graph neural networks via continual learning

    Junshan Wang, Guojie Song, Yi Wu, and Liang Wang. Streaming graph neural networks via continual learning. InProceedings of the 29th ACM international conference on information & knowledge management, pages 1515–1524, 2020

  28. [28]

    Overcoming catastrophic forgetting in graph neural networks

    Huihui Liu, Yiding Yang, and Xinchao Wang. Overcoming catastrophic forgetting in graph neural networks. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 8653–8661, 2021

  29. [29]

    Sparsified subgraph memory for continual graph representation learning

    Xikun Zhang, Dongjin Song, and Dacheng Tao. Sparsified subgraph memory for continual graph representation learning. In2022 IEEE International Conference on Data Mining (ICDM), pages 1335–1340. IEEE, 2022. 11

  30. [30]

    Ricci curvature-based graph sparsification for continual graph representation learning.IEEE Transactions on Neural Networks and Learning Systems, 35(12):17398–17410, 2023

    Xikun Zhang, Dongjin Song, and Dacheng Tao. Ricci curvature-based graph sparsification for continual graph representation learning.IEEE Transactions on Neural Networks and Learning Systems, 35(12):17398–17410, 2023

  31. [31]

    Towards continuous reuse of graph models via holistic memory diversification

    Ziyue Qiao, Junren Xiao, Qingqiang Sun, Meng Xiao, Xiao Luo, and Hui Xiong. Towards continuous reuse of graph models via holistic memory diversification. InThe Thirteenth International Conference on Learning Representations, 2025

  32. [32]

    Continual learning on graphs: Challenges, solutions, and opportunities.arXiv preprint arXiv:2402.11565, 2024

    Xikun Zhang, Dongjin Song, and Dacheng Tao. Continual learning on graphs: Challenges, solutions, and opportunities.arXiv preprint arXiv:2402.11565, 2024

  33. [33]

    Online continual graph learning.arXiv preprint arXiv:2508.03283, 2025

    Giovanni Donghi, Luca Pasa, Daniele Zambon, Cesare Alippi, and Nicolò Navarin. Online continual graph learning.arXiv preprint arXiv:2508.03283, 2025

  34. [34]

    Inductive representation learning on temporal graphs.arXiv preprint arXiv:2002.07962, 2020

    Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Inductive representation learning on temporal graphs.arXiv preprint arXiv:2002.07962, 2020

  35. [35]

    Dysat: Deep neural representation learning on dynamic graphs via self-attention networks

    Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. Dysat: Deep neural representation learning on dynamic graphs via self-attention networks. InProceedings of the 13th international conference on web search and data mining, pages 519–527, 2020

  36. [36]

    Simulating task-free continual learning streams from existing datasets

    Aristotelis Chrysakis and Marie-Francine Moens. Simulating task-free continual learning streams from existing datasets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2516–2524, 2023

  37. [37]

    Andrew McCallum, Kamal Nigam, Jason D. M. Rennie, and Kristie Seymore. Automating the construction of internet portals with machine learning.Information Retrieval, 3:127–163, 2000

  38. [38]

    A critical look at the evaluation of GNNs under heterophily: Are we re- ally making progress? InThe Eleventh International Conference on Learning Representations, 2023

    Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, and Liudmila Prokhorenkova. A critical look at the evaluation of GNNs under heterophily: Are we re- ally making progress? InThe Eleventh International Conference on Learning Representations, 2023

  39. [39]

    Be- yond homophily in graph neural networks: Current limitations and effective designs.Advances in neural information processing systems, 33:7793–7804, 2020

    Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Be- yond homophily in graph neural networks: Current limitations and effective designs.Advances in neural information processing systems, 33:7793–7804, 2020

  40. [40]

    Semi-Supervised Classification with Graph Convolutional Networks

    Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

  41. [41]

    Gradient episodic memory for continual learning

    David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017

  42. [42]

    Memory aware synapses: Learning what (not) to forget

    Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European conference on computer vision (ECCV), pages 139–154, 2018. 12 A Details of DRIFT Benchmark A.1 Details of Benchmark Baselines A brief introduction of the implemented Continual Le...

  43. [43]

    Therefore, this can be viewed as the lower bound on the continual learning performance

    Bare modeldenotes the backbone GNN without the continual learning technique. Therefore, this can be viewed as the lower bound on the continual learning performance

  44. [44]

    We use Reservoir Sampling to select nodes

    A-GEM [10]is an efficient version of GEM [ 41], which ensures that the average loss for historical tasks does not increase by projecting the gradient of incoming data onto the orthogonal space of the gradient of historical data. We use Reservoir Sampling to select nodes

  45. [45]

    New incoming batches for training are then augmented with nodes sampled uniformly from the buffer

    Experience Replay (ER) [ 5]selects nodes from the incoming batch to be stored in the memory buffer by Reservoir Sampling, which is a simple yet effective method for CL. New incoming batches for training are then augmented with nodes sampled uniformly from the buffer

  46. [46]

    Specifically, it maintains samples whose gradients are less aligned with those already stored in the memory buffer, thereby promoting gradient diversity and reducing redundancy

    Gradient-based Sample Selection (GSS) [ 11]selects representative samples from the incoming data stream by measuring the diversity of their gradients. Specifically, it maintains samples whose gradients are less aligned with those already stored in the memory buffer, thereby promoting gradient diversity and reducing redundancy. New batches for training are...

  47. [47]

    Memory Aware Synapses (MAS)* [13]is a task-free version of MAS [ 42], which adds a detector guiding the model when to update the important weights in a streaming fashion

  48. [48]

    It constructs sparsified subgraphs by selecting important nodes based on their contribution to the graph topology to reduce redundancy

    Sparsified Subgraph Memory (SSM) [ 29]stores representative subgraphs instead of individual nodes to preserve both structural and feature information. It constructs sparsified subgraphs by selecting important nodes based on their contribution to the graph topology to reduce redundancy. Reservoir sampling is used as the sampling strategy

  49. [49]

    Subgraph Episodic Memory (SEM) [30]extends subgraph-based memory by introducing a curvature-guided sparsification mechanism. It constructs Subgraph Episodic Memory (SEM) to store computation subgraphs, and further prunes edges based on Ricci curvature to preserve the most informative topological relationships for message passing. This approach reduces red...

  50. [50]

    Roman Empire

    Diversified Memory Selection and Generation (DMSG) [31]maintains a diversified mem- ory buffer by jointly considering intra- and inter-class diversity when selecting samples. To adequately reuse the knowledge preserved in the buffer, it utilizes a variational layer to gen- erate the distribution of buffer node embeddings and sample synthesized ones for re...

  51. [51]

    Optimization details.All experiments use Adam with learning rate 5×10 −3

    We do not use dropout or batch normalization. Optimization details.All experiments use Adam with learning rate 5×10 −3. The mini-batch size is fixed at B=10. Each incoming batch is processed for one epoch before the next batch arrives. No learning-rate scheduling, gradient clipping, or warm-up is applied. Method-specific settings.Whenever possible, we fol...