pith. machine review for the scientific record. sign in

arxiv: 2605.11804 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Stop Marginalizing My Dreams: Model Inversion via Laplace Kernel for Continual Learning

Jacek Tabor, {\L}ukasz Struski, Marek \'Smieja, Patryk Krukowski, Przemys{\l}aw Spurek

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:14 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords continual learningmodel inversiondata-freeLaplace kernelcovariance modelingcatastrophic forgettingsynthetic data
0
0 comments X

The pith

Modeling feature correlations via Laplace kernel improves data-free continual learning by generating higher-fidelity synthetic samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that existing data-free continual learning methods fail because they model feature distributions with diagonal covariance, which ignores correlations that shape the geometry of representations and leads to low-quality pseudo-samples. It introduces REMIX as a way to parameterize full covariance using a Laplace kernel, so that memory scales linearly with feature dimension and computation grows only by a logarithmic factor. This change produces more coherent synthetic data and raises performance on standard benchmarks. A sympathetic reader would see the result as evidence that respecting feature dependencies is necessary for scalable continual learning without access to old data.

Core claim

We show that modeling feature dependencies is a key ingredient for effective DFCIL. We introduce REMIX, a structured covariance modeling framework that enables scalable full-covariance modeling without the prohibitive cost of dense matrix inversion and log-determinant computation. By leveraging a Laplace kernel parameterization, REMIX captures structured feature dependencies using memory that scales linearly with the feature dimensionality, while requiring only an additional logarithmic factor in computation. Modeling these correlations produces more coherent synthetic samples and consistently improves performance across standard DFCIL benchmarks.

What carries the argument

Laplace kernel parameterization of the covariance matrix, which encodes feature correlations with linear memory cost instead of dense matrix operations.

If this is right

  • Synthetic samples retain more task knowledge because correlations between features are preserved.
  • Full-covariance modeling becomes practical for high-dimensional representations without quadratic memory.
  • Performance gains appear consistently across standard data-free continual learning benchmarks.
  • Diagonal assumptions are shown to be a limiting factor that must be removed for further progress.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same kernel approach could be tested on other generative tasks that currently rely on diagonal noise assumptions.
  • Alternative kernels might reveal different correlation structures that further improve retention.
  • Combining this inversion method with replay buffers or regularization techniques might compound the gains.

Load-bearing premise

The Laplace kernel parameterization captures the relevant feature correlations without introducing artifacts or requiring task-specific tuning that would break scalability.

What would settle it

Reverting to diagonal covariance while keeping all other components of REMIX produces no drop in synthetic sample quality or benchmark accuracy.

Figures

Figures reproduced from arXiv: 2605.11804 by Jacek Tabor, {\L}ukasz Struski, Marek \'Smieja, Patryk Krukowski, Przemys{\l}aw Spurek.

Figure 1
Figure 1. Figure 1: Overview of the shared feature extraction pipeline in DFCIL used by the proposed [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Construction of the proposed REMIX covariance matrix [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of synthetic samples generated using diagonal covariance modeling [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity to λF on CUB-200 (ViT + MoE-Adapter). Sensitivity to λF. We analyze the sensitiv￾ity of REMIX to the Frobenius regularization weight λF by sweeping values in the range [10−4 , 5 · 10−2 ]. The study is conducted on the CUB-200 dataset using a ViT backbone with the MoE-Adapter framework. The average incremental accuracy ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Log-likelihood comparison between diagonal and full-feature covariance models across [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Task-wise performance comparison on CIFAR-100 (ResNet-32). Solid lines denote mean [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Task-wise performance on Tiny-ImageNet with a ResNet-32 backbone. Left: accuracy on [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of generated samples using the ViT backbone. Left: prior ap [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of generated samples using the ResNet-34 backbone. Left: prior [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Log-likelihood comparison between diagonal and full-feature covariance models across [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
read the original abstract

Data-free continual learning (DFCIL) relies on model inversion to synthesize pseudo-samples and mitigate catastrophic forgetting. However, existing inversion methods are fundamentally limited by a simplifying assumption: they model feature distributions using diagonal covariance, effectively ignoring correlations that define the geometry of learned representations. As a result, synthesized samples often lack fidelity, limiting knowledge retention. In this work, we show that modeling feature dependencies is a key ingredient for effective DFCIL. We introduce REMIX, a structured covariance modeling framework that enables scalable full-covariance modeling without the prohibitive cost of dense matrix inversion and log-determinant computation. By leveraging a Laplace kernel parameterization, REMIX captures structured feature dependencies using memory that scales linearly with the feature dimensionality, while requiring only an additional logarithmic factor in computation. Modeling these correlations produces more coherent synthetic samples and consistently improves performance across standard DFCIL benchmarks. Our results demonstrate that moving beyond diagonal assumptions is essential for effective and scalable data-free continual learning. Our code is available at https://github. com/pkrukowski1/REMIX-Model-Inversion-via-Laplace-Kernel.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that data-free continual learning (DFCIL) is limited by diagonal-covariance assumptions in model inversion, which ignore feature correlations. It introduces REMIX, a Laplace-kernel parameterization that enables scalable structured full-covariance modeling with linear memory cost and only logarithmic extra compute, producing higher-fidelity synthetic samples and consistent benchmark gains. The central thesis is that moving beyond diagonal assumptions is essential for effective and scalable DFCIL.

Significance. If the empirical claims hold, the work would establish that structured covariance modeling is a key missing ingredient in DFCIL and supply a practical, memory-efficient mechanism to incorporate it. The linear-memory kernel approach could become a standard building block for future inversion-based continual-learning methods.

major comments (2)
  1. [Abstract] Abstract: the assertion that REMIX performs 'scalable full-covariance modeling' and that 'moving beyond diagonal assumptions is essential' is undercut by the fact that the Laplace kernel imposes a specific positive-definite structure (typically of the form exp(−γ‖·‖)) rather than an arbitrary covariance matrix. Without an eigenvalue-spectrum or approximation-error analysis showing that this structure can recover general feature correlations, the broader claim that any departure from diagonal covariance is necessary does not follow.
  2. [Abstract] Abstract: the statement that REMIX 'consistently improves performance across standard DFCIL benchmarks' is presented without any quantitative numbers, tables, ablation results, or error analysis. Because these results are the sole empirical support for the central claim, their absence prevents verification of effect size, statistical significance, or whether gains arise from the kernel structure itself rather than from better-conditioned sampling.
minor comments (1)
  1. [Abstract] The GitHub link in the abstract contains an extraneous space ('https://github. com/pkrukowski1/REMIX-Model-Inversion-via-Laplace-Kernel'); this should be corrected for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below, clarifying the scope of our claims about the Laplace kernel and committing to revisions that strengthen the presentation of our empirical results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that REMIX performs 'scalable full-covariance modeling' and that 'moving beyond diagonal assumptions is essential' is undercut by the fact that the Laplace kernel imposes a specific positive-definite structure (typically of the form exp(−γ‖·‖)) rather than an arbitrary covariance matrix. Without an eigenvalue-spectrum or approximation-error analysis showing that this structure can recover general feature correlations, the broader claim that any departure from diagonal covariance is necessary does not follow.

    Authors: We appreciate the referee's observation that the Laplace kernel induces a specific structured covariance rather than an arbitrary full matrix. REMIX is explicitly designed to parameterize structured (non-diagonal) covariances via the kernel, enabling dense feature correlations at linear memory cost; this is what we mean by 'scalable full-covariance modeling' in contrast to the diagonal assumption used in prior DFCIL work. The Laplace kernel is a standard positive-definite choice in Gaussian processes that can capture a range of correlation geometries through its length-scale parameters. While the current manuscript relies on empirical evidence that this structure produces higher-fidelity inversions and better retention, we agree that additional discussion of its approximation properties would be valuable. We will add a paragraph in the revised manuscript discussing the spectral properties of the Laplace kernel and its ability to model feature dependencies beyond the diagonal case. revision: partial

  2. Referee: [Abstract] Abstract: the statement that REMIX 'consistently improves performance across standard DFCIL benchmarks' is presented without any quantitative numbers, tables, ablation results, or error analysis. Because these results are the sole empirical support for the central claim, their absence prevents verification of effect size, statistical significance, or whether gains arise from the kernel structure itself rather than from better-conditioned sampling.

    Authors: We agree that the abstract would be strengthened by including quantitative highlights. The full manuscript contains tables and ablations (including comparisons to diagonal baselines, kernel ablations, and error bars across multiple runs) demonstrating consistent gains on standard DFCIL benchmarks. To address the referee's concern, we will revise the abstract to include specific quantitative statements referencing the magnitude of improvements and the experimental controls that isolate the contribution of the structured covariance. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit new parameterization independent of target fits

full rationale

The paper introduces REMIX as a novel structured covariance framework that adopts a Laplace kernel parameterization to model feature dependencies scalably. This is an explicit design choice presented in the abstract and method, not a quantity derived from or fitted to the target data by construction. No self-citations are invoked as load-bearing for the core premise, no uniqueness theorems are imported, and no 'predictions' reduce to renamed fitted inputs. Empirical gains on DFCIL benchmarks are claimed from the new modeling approach rather than tautological redefinitions. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides limited technical detail; the central claim rests on the domain assumption that structured covariance via Laplace kernel improves inversion fidelity, with no free parameters or invented entities explicitly named.

axioms (1)
  • domain assumption Laplace kernel parameterization captures the necessary feature dependencies for high-fidelity model inversion
    Invoked as the core modeling choice that replaces diagonal covariance.

pith-pipeline@v0.9.0 · 5518 in / 1124 out tokens · 81008 ms · 2026-05-13T07:14:55.209290+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Catastrophic interference in connectionist networks: The sequential learning problem

    Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. InPsychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989

  2. [2]

    Re-evaluating con- tinual learning scenarios: A categorization and case for strong baselines.arXiv preprint arXiv:1810.12488, 2018

    Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating con- tinual learning scenarios: A categorization and case for strong baselines.arXiv preprint arXiv:1810.12488, 2018

  3. [3]

    Secure, privacy-preserving and federated machine learning in medical imaging.Nature Machine Intelligence, 2(6):305–311, 2020

    Georgios A Kaissis, Marcus R Makowski, Daniel Rückert, and Rickmer F Braren. Secure, privacy-preserving and federated machine learning in medical imaging.Nature Machine Intelligence, 2(6):305–311, 2020

  4. [4]

    A continual learning survey: Defying forgetting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366– 3385, 2021

    Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366– 3385, 2021

  5. [5]

    Zero-shot knowledge distillation in deep networks

    Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, and Anirban Chakraborty. Zero-shot knowledge distillation in deep networks. InInternational conference on machine learning, pages 4743–4751. PMLR, 2019

  6. [6]

    Dreaming to distill: Data-free knowledge transfer via deepinversion

    Hongxu Yin, Pavlo Molchanov, Jose M Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K Jha, and Jan Kautz. Dreaming to distill: Data-free knowledge transfer via deepinversion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8715–8724, 2020

  7. [7]

    Always be dreaming: A new approach for data-free class-incremental learning

    James Smith, Yen-Chang Hsu, Jonathan Balloch, Yilin Shen, Hongxia Jin, and Zsolt Kira. Always be dreaming: A new approach for data-free class-incremental learning. InProceedings of the IEEE/CVF international conference on computer vision, pages 9374–9384, 2021

  8. [8]

    R-dfcil: Relation-guided represen- tation learning for data-free class incremental learning

    Qiankun Gao, Chen Zhao, Bernard Ghanem, and Jian Zhang. R-dfcil: Relation-guided represen- tation learning for data-free class incremental learning. InEuropean Conference on Computer Vision, pages 423–439. Springer, 2022

  9. [9]

    Model inversion with layer-specific modeling and alignment for data-free continual learning

    Ruilin Tong, Haodong Lu, Yuhang Liu, and Dong Gong. Model inversion with layer-specific modeling and alignment for data-free continual learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  10. [10]

    icarl: Incremental classifier and representation learning

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

  11. [11]

    Podnet: Pooled outputs distillation for small-tasks incremental learning

    Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Podnet: Pooled outputs distillation for small-tasks incremental learning. InEuropean conference on computer vision, pages 86–102. Springer, 2020

  12. [12]

    End-to-end incremental learning

    Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. InProceedings of the European conference on computer vision (ECCV), pages 233–248, 2018

  13. [13]

    Learning a unified classifier incrementally via rebalancing

    Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 831–839, 2019

  14. [14]

    Semantic drift compensation for class-incremental learning

    Lu Yu, Bartlomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, and Joost van de Weijer. Semantic drift compensation for class-incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6982–6991, 2020. 10

  15. [15]

    Rainbow memory: Continual learning with a memory of diverse samples

    Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learning with a memory of diverse samples. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8218–8227, 2021

  16. [16]

    Gdumb: A simple approach that questions our progress in continual learning

    Ameya Prabhu, Philip HS Torr, and Puneet K Dokania. Gdumb: A simple approach that questions our progress in continual learning. InEuropean conference on computer vision, pages 524–540. Springer, 2020

  17. [17]

    Adaptive aggregation networks for class- incremental learning

    Yaoyao Liu, Bernt Schiele, and Qianru Sun. Adaptive aggregation networks for class- incremental learning. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 2544–2553, 2021

  18. [18]

    Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning.Advances in Neural Information Processing Systems, 36:6582–6595, 2023

    Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, and Joost Van De Weijer. Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning.Advances in Neural Information Processing Systems, 36:6582–6595, 2023

  19. [19]

    Fetril: Feature translation for exemplar-free class-incremental learning

    Grégoire Petit, Adrian Popescu, Hugo Schindler, David Picard, and Bertrand Delezoide. Fetril: Feature translation for exemplar-free class-incremental learning. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3911–3920, 2023

  20. [20]

    Task- recency bias strikes back: Adapting covariances in exemplar-free class incremental learning

    Grzegorz Rype´s´c, Sebastian Cygert, Tomasz Trzci ´nski, and Bartłomiej Twardowski. Task- recency bias strikes back: Adapting covariances in exemplar-free class incremental learning. Advances in Neural Information Processing Systems, 37:63268–63289, 2024

  21. [21]

    Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

    Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

  22. [22]

    Continual learning with deep generative replay.Advances in neural information processing systems, 30, 2017

    Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay.Advances in neural information processing systems, 30, 2017

  23. [23]

    Gan memory with no forgetting.Advances in neural information processing systems, 33:16481–16494, 2020

    Yulai Cong, Miaoyun Zhao, Jianqiao Li, Sijia Wang, and Lawrence Carin. Gan memory with no forgetting.Advances in neural information processing systems, 33:16481–16494, 2020

  24. [24]

    Fearnet: Brain-inspired model for incremental learning

    Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. InInternational Conference on Learning Representations, 2018

  25. [25]

    Memory replay gans: Learning to generate new categories without forgetting.Advances in neural information processing systems, 31, 2018

    Chenshen Wu, Luis Herranz, Xialei Liu, Joost Van De Weijer, Bogdan Raducanu, et al. Memory replay gans: Learning to generate new categories without forgetting.Advances in neural information processing systems, 31, 2018

  26. [26]

    Learning latent representations across multiple data domains using lifelong vaegan

    Fei Ye and Adrian G Bors. Learning latent representations across multiple data domains using lifelong vaegan. InEuropean Conference on Computer Vision, pages 777–795. Springer, 2020

  27. [27]

    Brain-inspired replay for continual learning with artificial neural networks.Nature communications, 11(1):4069, 2020

    Gido M Van de Ven, Hava T Siegelmann, and Andreas S Tolias. Brain-inspired replay for continual learning with artificial neural networks.Nature communications, 11(1):4069, 2020

  28. [28]

    Theoretical insights into mem- orization in gans

    Vaishnavh Nagarajan, Colin Raffel, and Ian J Goodfellow. Theoretical insights into mem- orization in gans. InNeural Information Processing Systems Workshop, volume 1, page 3, 2018

  29. [29]

    An image is worth 16x16 words: Transformers for image recognition at scale, 2021

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021

  30. [30]

    What do we learn from inverting clip models?, 2024

    Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, and Tom Goldstein. What do we learn from inverting clip models?, 2024

  31. [31]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009

  32. [32]

    Tiny imagenet challenge

    Jiayu Wu, Qixiang Zhang, and Guoxia Xu. Tiny imagenet challenge. 2017

  33. [33]

    Caltech-ucsd birds 200

    Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. Caltech-ucsd birds 200. 09 2010. 11

  34. [34]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machin...

  35. [35]

    ,1−ρ 2 C−1)

    0−ρ C−1 1   , andϵhas diagonal covariance D= diag(1,1−ρ 2 1, . . . ,1−ρ 2 C−1). It follows that the covariance and precision matrices admit the factorization K=L −1DL−⊤,Q=K −1 =L ⊤D−1L. Explicit Form of the Precision Matrix.The inverse covariance Q is given by Q=L ⊤D−1L, where D−1 1,1 = 1,D −1 k,k = 1 1−ρ 2 k−1 ,1< k≤C. For diagonal entries: •i= 1: ...

  36. [36]

    Obtain target featuresS feat at layerLusing CFS

  37. [37]

    Initialize ˆol as Gaussian noise scaled by stored input statistics

  38. [38]

    Optimize ˆol via gradient descent so that its forward pass through the frozen block matches the target

  39. [39]

    This sequential procedure decomposes a highly non-convex global objective into a series of well- conditioned local problems

    Use the optimized ˆol as the target for the preceding layerl−1. This sequential procedure decomposes a highly non-convex global objective into a series of well- conditioned local problems. Layer-wise Optimization Objective.At each blockl >0, we optimize ˆol using L(l) layer =α matchL(l) match +α statL(l) stat +α inL(l) in

  40. [40]

    15 At the topmost layer ( l=L ), the optimization aims to align the generated features directly with the target class label y

    Feature Matching Loss (L(l) match).The formulation of the deterministic matching loss depends strictly on the depth of the layer being optimized. 15 At the topmost layer ( l=L ), the optimization aims to align the generated features directly with the target class label y. Therefore, the matching objective utilizes a standard CE loss: L(L) match = Lce(ˆoL,...

  41. [41]

    trust ratio

    Distribution Statistic Loss (L(l) stat): We regularize intermediate activations to match the feature statistics observed on real data by modeling their distribution with the proposed LCM and minimizing the exact Gaussian Negative Log-Likelihood (NLL). Given a batch of generated features {ˆol,i}N i=1 ∈R C, the objective under the multivariate Gaussian dist...

  42. [42]

    Although such features may satisfy the feature-matching objective, they often lack meaningful structure and can destabilize subsequent inversion steps

    Input Statistic Prior (L(l) in ):Due to the strong non-linearity of deep networks, directly optimizing the input tensor ˆol to match downstream targets can lead to adversarial or out-of-distribution activa- tions. Although such features may satisfy the feature-matching objective, they often lack meaningful structure and can destabilize subsequent inversio...

  43. [43]

    Local Classification Loss (Llce).The local classification loss is a standard CE objective applied exclusively to real samples from the current task. By restricting supervision toXnew, the model learns new classes without being biased by imperfections in synthetic data: Llce = 1 |Xnew| X (x,y)∈(Xnew,Ynew) Lce softmax(fhead(ffeat(x;θ);ϕ new)), y

  44. [44]

    Hard Knowledge Distillation (Lhkd).To explicitly preserve knowledge from previous tasks, we apply hard knowledge distillation on synthetic samples. This term enforces consistency between the outputs of the current model and the frozen teacher by penalizing deviations in logits: Lhkd = 1 |Xold| |Y1:t| X x∈Xold ∥fhead(ffeat(x;θ 1:t);ϕ 1:t)−f head(ffeat(x;θ)...

  45. [45]

    Relational Knowledge Distillation ( Lrkd).While Lhkd constrains absolute predictions, it can overly restrict the feature space. To counterbalance this effect, we introduce relational knowledge distillation, which preserves the geometric structure of the feature space by matching angular rela- tionships between features. Let u and v be learnable projection...