pith. machine review for the scientific record. sign in

arxiv: 2605.09664 · v2 · submitted 2026-05-10 · 💻 cs.CR · cs.LG

Recognition: 2 theorem links

· Lean Theorem

FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:30 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords continual learningmalware detectionparameter interpolationmemory-free learningcatastrophic forgettingclass-incremental learningdomain-incremental learning
0
0 comments X

The pith

Adaptive layer-wise interpolation between task optima enables memory-free continual learning for malware detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a model can learn new malware samples or domains by adaptively interpolating parameters layer by layer between the current and previous task optima. This approach preserves accuracy on earlier threats without storing any old data or replaying past samples. A sympathetic reader would care because antivirus systems must handle over 200 million new malware instances yearly, yet full retraining on historical data is prohibitively expensive while simple updates create blind spots against older threats. The method exploits the geometric fact that successive warm-started optima remain linked by low-loss paths in parameter space, allowing simple blending to retain prior knowledge. On the EMBER Windows and AZ Android benchmarks, FreeMOCA outperforms eleven baselines in class-incremental settings and achieves the strongest retention, with accuracy gains reaching 42 percent and 37 percent respectively.

Core claim

FreeMOCA achieves continual learning for malicious code analysis by adaptive layer-wise interpolation between consecutive task optima in parameter space, leveraging low-loss connectivity of warm-started solutions to avoid catastrophic forgetting without memory or replay, and outperforming baselines on Class-IL and Domain-IL settings for EMBER and AZ benchmarks.

What carries the argument

adaptive layer-wise interpolation between consecutive task updates, which exploits low-loss paths connecting warm-started task optima

If this is right

  • Continual updates to malware classifiers become feasible without storing previous samples or incurring high compute costs for full retraining.
  • Detectors maintain high accuracy on both new and old threat categories, closing exploitable blind spots.
  • The method scales to large benchmarks like EMBER and AZ while reducing forgetting compared to replay-based approaches.
  • Both class-incremental and domain-incremental scenarios benefit, supporting evolving threat landscapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This geometric interpolation strategy might apply to other continual learning problems where task optima are similarly connected, such as in image classification or natural language processing.
  • If the low-loss path property holds more generally, it could reduce dependence on data replay across security and other dynamic domains.
  • Further tests on additional malware datasets or real-time streaming scenarios would help validate the scalability.

Load-bearing premise

Warm-started task optima in the model parameter space are connected by low-loss paths that permit effective interpolation.

What would settle it

An experiment where applying the layer-wise interpolation between two consecutive task optima causes a large drop in accuracy on the first task's test set, comparable to or worse than naive fine-tuning.

Figures

Figures reproduced from arXiv: 2605.09664 by Haeseung Jeon, Md Mahmuduzzaman Kamol, Mohammad Saidur Rahman, Se Eun Oh, Sohyun Han, Zahra Asadi.

Figure 2
Figure 2. Figure 2: LMC in warm-started CL. (A) Inde￾pendent solutions may face high-loss barriers. (B) Warm-starting keeps consecutive task solu￾tions close, enabling low-loss interpolation. (C) FreeMOCA uses this property for replay-free consolidation. malware and potentially unwanted applications (PUA) are observed each day [5], and large-scale analysis such as VirusTotal processes millions of samples daily [50]. Continual… view at source ↗
Figure 3
Figure 3. Figure 3: Selective block interpolation analysis with fixed λ = 0.5. Interpolating only Block 1 yields the highest accuracy on both datasets, sug￾gesting early layers, which capture more transfer￾able low-level representations, are better suited for interpolation than deeper, more task-specific layers. (a) EMBER-Class (b) AZ-Class [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Selective block interpolation analysis with fixed λ = 0.5. Interpolating only Block 1 yields the highest accuracy on both datasets, sug￾gesting early layers, which capture more transfer￾able low-level representations, are better suited for interpolation than deeper, more task-specific layers. (a) EMBER-Class (b) AZ-Class [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task-wise classification accuracy across [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

As over 200 million new malware samples are identified each year, antivirus systems must continuously adapt to the evolving threat landscape. However, retraining solely on new samples leads to catastrophic forgetting and exploitable blind spots, while retraining on the entire dataset incurs substantial computational cost. We propose FreeMOCA, a memory- and compute-efficient continual learning framework for malicious code analysis that preserves prior knowledge via adaptive layer-wise interpolation between consecutive task updates, leveraging the fact that warm-started task optima are connected by low-loss paths in parameter space. We evaluate FreeMOCA in both class-incremental (Class-IL) and domain-incremental (Domain-IL) settings on large-scale Windows (EMBER) and Android (AZ) malware benchmarks. FreeMOCA achieves substantial gains in Class-IL, outperforming 11 baselines on both EMBER and AZ benchmarks. It also significantly reduces forgetting, achieving the best retention across baselines, and improving accuracy by up to 42% and 37% on EMBER and AZ, respectively. These results demonstrate that warm-started interpolation in parameter space provides a scalable and effective alternative to replay for continual malware detection. Code is available at: https://github.com/IQSeC-Lab/FreeMOCA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes FreeMOCA, a memory- and compute-efficient continual learning framework for malicious code analysis. It preserves prior knowledge by performing adaptive layer-wise interpolation between consecutive task optima in parameter space, based on the assumption that warm-started optima are connected by low-loss paths. The method is evaluated in Class-IL and Domain-IL settings on the EMBER (Windows) and AZ (Android) malware benchmarks, claiming to outperform 11 baselines, achieve the best retention, and deliver accuracy gains of up to 42% on EMBER and 37% on AZ.

Significance. If the low-loss connectivity premise holds for malware models, FreeMOCA offers a practical, replay-free alternative for handling the high volume of new malware samples, which is valuable for resource-constrained antivirus systems. The open-sourced code is a positive factor for reproducibility. The result would be significant for continual learning in security domains if the central assumption is empirically supported.

major comments (2)
  1. [Method section (around the interpolation description)] The central claim in the method description rests on the premise that warm-started task optima are connected by low-loss paths in parameter space, enabling interpolation to avoid forgetting. No empirical verification is provided, such as loss curves or barrier analysis along the interpolation path on held-out prior-task data for the EMBER and AZ feature distributions and model architectures. This check is load-bearing because domain shifts in malware data can induce barriers that would invalidate the interpolation step.
  2. [Evaluation and results section] The quantitative results section reports outperformance over 11 baselines and specific accuracy/retention gains but provides no error bars, statistical significance tests, or full details on baseline implementations and the exact adaptive interpolation procedure. Without these, the robustness of the claimed 42% and 37% improvements cannot be assessed.
minor comments (2)
  1. [Abstract] The abstract could briefly clarify how layer-wise adaptation is computed (e.g., the selection criterion for interpolation weights).
  2. [Results tables and figures] Ensure all tables include standard deviations or confidence intervals and that figure captions fully describe the plotted quantities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our paper. We address the major comments point by point below, agreeing to incorporate additional analyses and details to enhance the manuscript's rigor.

read point-by-point responses
  1. Referee: [Method section (around the interpolation description)] The central claim in the method description rests on the premise that warm-started task optima are connected by low-loss paths in parameter space, enabling interpolation to avoid forgetting. No empirical verification is provided, such as loss curves or barrier analysis along the interpolation path on held-out prior-task data for the EMBER and AZ feature distributions and model architectures. This check is load-bearing because domain shifts in malware data can induce barriers that would invalidate the interpolation step.

    Authors: We concur that providing empirical support for the low-loss path assumption is important, particularly given potential domain shifts in malware distributions. Although the original manuscript relies on this premise from prior continual learning literature, we will add a new subsection with barrier analysis and loss curves along interpolation paths using held-out data from previous tasks on both EMBER and AZ datasets. This will confirm the absence of significant barriers for the model architectures used. revision: yes

  2. Referee: [Evaluation and results section] The quantitative results section reports outperformance over 11 baselines and specific accuracy/retention gains but provides no error bars, statistical significance tests, or full details on baseline implementations and the exact adaptive interpolation procedure. Without these, the robustness of the claimed 42% and 37% improvements cannot be assessed.

    Authors: We agree that including error bars, statistical tests, and more implementation details will improve the reliability assessment of our results. In the revised version, we will report mean accuracies with standard deviations over 5 random seeds, include paired t-tests or similar for significance against baselines, provide expanded descriptions of how each baseline was implemented (with references to their original papers and our adaptations), and detail the exact adaptive layer-wise interpolation procedure, including how the interpolation coefficients are computed per layer. revision: yes

Circularity Check

0 steps flagged

No circularity: method applies external connectivity property without self-referential reduction

full rationale

The paper frames its core mechanism as leveraging the known property that warm-started task optima lie on low-loss paths, then applies adaptive layer-wise interpolation for continual learning on malware benchmarks. No equations, fitted parameters, or self-citations are shown reducing the claimed predictions or gains to the inputs by construction. The derivation remains an application of an external fact to the EMBER/AZ tasks, with reported empirical outperformance serving as independent validation rather than tautological output.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption of low-loss connectivity between task optima; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Warm-started task optima are connected by low-loss paths in parameter space
    This property is invoked to justify the effectiveness of interpolation without memory replay.

pith-pipeline@v0.9.0 · 5540 in / 1200 out tokens · 40733 ms · 2026-05-15T05:30:21.924811+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

  1. [1]

    Git Re-Basin: Merging models modulo permutation symmetries

    Samuel Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git Re-Basin: Merging models modulo permutation symmetries. InInternational Conference on Learning Representations (ICLR), 2023

  2. [2]

    Bissyandé, Jacques Klein, and Yves Le Traon

    Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. Androzoo: Collecting millions of android apps for the research community. InInternational Conference on Mining Software Repositories (MSR), 2016

  3. [3]

    EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

    Hyrum S Anderson and Phil Roth. EMBER: an open dataset for training static pe malware machine learning models.arXiv:1804.04637, 2018

  4. [4]

    Drebin: Effective and explainable detection of android malware in your pocket

    Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. Drebin: Effective and explainable detection of android malware in your pocket. InNetwork and Distributed System Security Symposium (NDSS), 2014. 10

  5. [5]

    Malware statistics and trends report.https://www.av-test.org/en/statist ics/malware/, 2025

    A V-TEST. Malware statistics and trends report.https://www.av-test.org/en/statist ics/malware/, 2025

  6. [6]

    Nested learning: The illusion of deep learning architectures

    Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. Nested learning: The illusion of deep learning architectures. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  7. [7]

    Task-aware information routing from common representation space in lifelong learning

    Prashant Bhat, Bahram Zonooz, and Elahe Arani. Task-aware information routing from common representation space in lifelong learning. InInternational Conference on Learning Representations (ICLR), 2023

  8. [8]

    Optimization methods for large-scale machine learning.SIAM review, 2018

    Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning.SIAM review, 2018

  9. [10]

    Forget forgetting: Continual learning in a world of abundant memory, 2026

    Dongkyu Cho, Taesup Moon, Rumi Chunara, Kyunghyun Cho, and Sungmin Cha. Forget forgetting: Continual learning in a world of abundant memory, 2026. URL https://arxiv. org/abs/2502.07274

  10. [11]

    Beyond the TESSERACT: Trustworthy dataset curation for sound evalua- tions of android malware classifiers

    Theo Chow, Mario D’Onghia, Lorenz Linhardt, Zeliang Kan, Daniel Arp, Lorenzo Cavallaro, and Fabio Pierazzi. Beyond the TESSERACT: Trustworthy dataset curation for sound evalua- tions of android malware classifiers. InIEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2026

  11. [12]

    Don't forget, there is more than forgetting: new metrics for Continual Learning

    Natalia Díaz-Rodríguez, Vincenzo Lomonaco, David Filliat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning.arXiv preprint arXiv:1810.13166, 2018

  12. [13]

    Continual learning beyond a single model

    Thang Doan, Seyed Iman Mirzadeh, and Mehrdad Farajtabar. Continual learning beyond a single model. InConference on Lifelong Learning Agents (CoLLAs), 2023

  13. [14]

    Essentially no barriers in neural network energy landscape

    Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. Essentially no barriers in neural network energy landscape. InInternational Conference on Machine Learning (ICML), 2018

  14. [15]

    Linear mode connectivity and the lottery ticket hypothesis

    Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Linear mode connectivity and the lottery ticket hypothesis. InInternational Conference on Machine Learning (ICML), 2020

  15. [16]

    Catastrophic forgetting in connectionist networks.Trends in Cognitive Sciences, 1999

    Robert M French. Catastrophic forgetting in connectionist networks.Trends in Cognitive Sciences, 1999

  16. [17]

    Loss surfaces, mode connectivity, and fast ensembling of dnns.Advances in Neural Information Processing Systems (NeurIPS), 2018

    Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P Vetrov, and Andrew G Wilson. Loss surfaces, mode connectivity, and fast ensembling of dnns.Advances in Neural Information Processing Systems (NeurIPS), 2018

  17. [18]

    CITADEL: A semi-supervised active learn- ing framework for malware detection under continuous distribution drift.arXiv preprint arXiv:2511.11979, 2025

    Md Ahsanul Haque, Md Mahmuduzzaman Kamol, Suresh Kumar Amalapuram, Vladik Kreinovich, and Mohammad Saidur Rahman. CITADEL: A semi-supervised active learn- ing framework for malware detection under continuous distribution drift.arXiv preprint arXiv:2511.11979, 2025

  18. [19]

    LAMDA: A longitudinal android malware benchmark for concept drift analysis

    Md Ahsanul Haque, Ismail Hossain, Md Mahmuduzzaman Kamol, Md Jahangir Alam, Suresh Kumar Amalapuram, Sajedul Talukder, and Mohammad Saidur Rahman. LAMDA: A longitudinal android malware benchmark for concept drift analysis. InInternational Conference on Learning Representations (ICLR), 2026

  19. [20]

    Batch Normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating deep network training by reducing internal covariate shift. InInternational Conference on Machine Learning (ICML), 2015. 11

  20. [21]

    Memory-free continual learning with null space adaptation for zero-shot vision-language models

    Yujin Jo and Taesup Kim. Memory-free continual learning with null space adaptation for zero-shot vision-language models. InInternational Conference on Learning Representations (ICLR), 2026

  21. [23]

    Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences (PNAS), 2017

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences (PNAS), 2017

  22. [24]

    FireEye MalwareGuard uses machine learning to detect malware

    Eduard Kovacs. FireEye MalwareGuard uses machine learning to detect malware. https: //www.securityweek.com/fireeye-malwareguard-uses-machine-learning-detec t-malware/, 2018

  23. [25]

    Continual learning with weight interpolation

    J˛ edrzej Kozal, Jan Wasilewski, Bartosz Krawczyk, and Michał Wo´ zniak. Continual learning with weight interpolation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  24. [26]

    Continual learning in deep networks: an analysis of the last layer.arXiv preprint arXiv:2106.01834, 2021

    Timothée Lesort, Thomas George, and Irina Rish. Continual learning in deep networks: an analysis of the last layer.arXiv preprint arXiv:2106.01834, 2021

  25. [27]

    Learning without forgetting.IEEE transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

    Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

  26. [28]

    Activation function design sustains plasticity in continual learning

    Lute Lillo and Nick Cheney. Activation function design sustains plasticity in continual learning. InInternational Conference on Learning Representations (ICLR), 2026

  27. [29]

    KAN: Kolmogorov-Arnold Networks

    Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

  28. [30]

    Deep learning via hessian-free optimization

    James Martens. Deep learning via hessian-free optimization. InInternational Conference on International Conference on Machine Learning (ICML), 2010

  29. [31]

    Communication-efficient learning of deep networks from decentralized data

    Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. InArtificial Intelligence and Statistics (AISTATS), 2017

  30. [32]

    Linear mode connectivity in multitask and continual learning.arXiv preprint arXiv:2010.04495, 2020

    Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, and Hassan Ghasemzadeh. Linear mode connectivity in multitask and continual learning.arXiv preprint arXiv:2010.04495, 2020

  31. [33]

    Linear mode connectivity in multitask and continual learning

    Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu, and Hassan Ghasemzadeh. Linear mode connectivity in multitask and continual learning. InInterna- tional Conference on Learning Representations (ICLR), 2021

  32. [34]

    What is being transferred in transfer learning?Advances in Neural Information Processing Systems (NeurIPS), 2020

    Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. What is being transferred in transfer learning?Advances in Neural Information Processing Systems (NeurIPS), 2020

  33. [35]

    MalCL: Lever- aging gan-based generative replay to combat catastrophic forgetting in malware classification

    Jimin Park, AHyun Ji, Minji Park, Mohammad Saidur Rahman, and Se Eun Oh. MalCL: Lever- aging gan-based generative replay to combat catastrophic forgetting in malware classification. InAAAI Conference on Artificial Intelligence, 2025

  34. [36]

    Learn more, but bother less: parameter efficient continual learning.Advances in Neural Information Processing Systems (NeurIPS), 2024

    Fuli Qiao and Mehrdad Mahdavi. Learn more, but bother less: parameter efficient continual learning.Advances in Neural Information Processing Systems (NeurIPS), 2024

  35. [37]

    Transfusion: Understanding transfer learning for medical imaging.Advances in Neural Information Processing Systems (NeurIPS), 2019

    Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, and Samy Bengio. Transfusion: Understanding transfer learning for medical imaging.Advances in Neural Information Processing Systems (NeurIPS), 2019. 12

  36. [38]

    On the limitations of continual learning for malware classification

    Mohammad Saidur Rahman, Scott Coull, and Matthew Wright. On the limitations of continual learning for malware classification. InConference on Lifelong Learning Agents (CoLLAs), 2022

  37. [39]

    MADAR: Efficient continual learning for malware analysis with distribution-aware replay

    Mohammad Saidur Rahman, Scott Coull, Qi Yu, and Matthew Wright. MADAR: Efficient continual learning for malware analysis with distribution-aware replay. InConference on Applied Machine Learning in Information Security (CAMLIS), 2025

  38. [40]

    iCaRL: Incremental classifier and representation learning

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. iCaRL: Incremental classifier and representation learning. InConference on Computer Vision and Pattern Recognition (CVPR), 2017

  39. [41]

    Analyzing and reducing catastrophic forgetting in parameter efficient tuning.arXiv preprint arXiv:2402.18865, 2024

    Weijieying Ren, Xinlong Li, Lei Wang, Tianxiang Zhao, and Wei Qin. Analyzing and reducing catastrophic forgetting in parameter efficient tuning.arXiv preprint arXiv:2402.18865, 2024

  40. [42]

    Expe- rience replay for continual learning

    David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Expe- rience replay for continual learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

  41. [43]

    Understanding concept drift with deprecated permissions in android malware detection.IEEE Transactions on Dependable and Secure Computing (TDSC), 2026

    Ahmed Sabbah, Radi Jarrar, Samer Zein, and David Mohaisen. Understanding concept drift with deprecated permissions in android malware detection.IEEE Transactions on Dependable and Secure Computing (TDSC), 2026

  42. [44]

    Budgeted online continual learning by adaptive layer freezing and frequency-based sampling

    Minhyuk Seo, Hyunseo Koh, and Jonghyun Choi. Budgeted online continual learning by adaptive layer freezing and frequency-based sampling. InInternational Conference on Learning Representations (ICLR), 2025

  43. [45]

    Sleep-like unsuper- vised replay reduces catastrophic forgetting in artificial neural networks.Nature Communica- tions, 2022

    Timothy Tadros, Giri P Krishnan, Ramyaa Ramyaa, and Maxim Bazhenov. Sleep-like unsuper- vised replay reduces catastrophic forgetting in artificial neural networks.Nature Communica- tions, 2022

  44. [46]

    Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged con- sistency targets improve semi-supervised deep learning results.Advances in Neural Information Processing Systems (NeurIPS), 2017

  45. [47]

    Op- timizing mode connectivity via neuron alignment.Advances in Neural Information Processing Systems (NeurIPS), 2020

    Norman Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, and Rongjie Lai. Op- timizing mode connectivity via neuron alignment.Advances in Neural Information Processing Systems (NeurIPS), 2020

  46. [48]

    Brain-inspired replay for continual learning with artificial neural networks.Nature Communications, 2020

    Gido M van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks.Nature Communications, 2020

  47. [49]

    Brain-inspired replay for continual learning with artificial neural networks.Nature Communications, 2020

    Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks.Nature Communications, 2020

  48. [50]

    VirusTotal – Stats.https://www.virustotal.com/gui/stats, 2025

    VirusTotal. VirusTotal – Stats.https://www.virustotal.com/gui/stats, 2025

  49. [51]

    Continual test-time domain adaptation

    Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  50. [52]

    Rethinking continual learning with progressive neural collapse

    Zheng Wang, Wanhao Yu, Li Yang, and Sen Lin. Rethinking continual learning with progressive neural collapse. InInternational Conference on Learning Representations (ICLR), 2026

  51. [53]

    Optimiz- ing mode connectivity for class incremental learning

    Haitao Wen, Haoyang Cheng, Heqian Qiu, Lanxiao Wang, Lili Pan, and Hongliang Li. Optimiz- ing mode connectivity for class incremental learning. InInternational Conference on Machine Learning (ICML), 2023

  52. [54]

    How transferable are features in deep neural networks?Advances in Neural Information Processing Systems (NeurIPS), 2014

    Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Advances in Neural Information Processing Systems (NeurIPS), 2014

  53. [55]

    Continual learning through synaptic intelligence.Journal of Machine Learning Research (JMLR), 2017

    Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence.Journal of Machine Learning Research (JMLR), 2017. 13

  54. [56]

    Does continual learning equally forget all parameters? InInternational Conference on Machine Learning (ICML), 2023

    Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, and Chengqi Zhang. Does continual learning equally forget all parameters? InInternational Conference on Machine Learning (ICML), 2023

  55. [57]

    Exploring tradeoffs through mode connectivity for multi-task learning

    Zhipeng Zhou, Ziqiao Meng, Pengcheng Wu, Peilin Zhao, and Chunyan Miao. Exploring tradeoffs through mode connectivity for multi-task learning. InNeural Information Processing Systems (NeurIPS), 2026. A Discussion and Limitations FreeMOCA is designed for a specific CL regime: sequential tasks with observable boundaries and sufficient representational conti...