arxiv: 2604.12686 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Recognition: unknown

BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning

Jagadeesh Rachapudi , Ritali Vatsi , Praful Hambarde , Amit Shukla

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual learningmachine unlearningparameter-efficient fine-tuningLoRA adaptersknowledge leakageCLUadapter pathwaysface recognition

0 comments

The pith

BID-LoRA uses three separate low-rank adapter paths plus escape unlearning to add new knowledge and delete old knowledge while limiting leakage across cycles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that naively pairing continual learning methods with machine unlearning methods produces gradual knowledge leakage that degrades retained performance over repeated adaptation cycles. It formalizes a unified Continual Learning Unlearning paradigm whose three goals are precise deletion of unwanted data, efficient addition of new data without harming prior knowledge, and reduction of leakage. BID-LoRA realizes this by attaching three dedicated adapter pathways (retain, new, and unlearn) to attention layers and applying escape unlearning that pushes forget-class embeddings maximally far from retained knowledge. Only 5 percent of parameters are updated. Experiments on CIFAR-100 and a face-recognition subset show the approach maintains accuracy better than existing CLU baselines across multiple enrollment and withdrawal cycles.

Core claim

BID-LoRA is a parameter-efficient framework that applies three dedicated low-rank adapter pathways (retain, new, and unlearn) to attention layers together with escape unlearning that repositions forget-class embeddings at maximum distance from retained knowledge, thereby satisfying the three CLU goals of precise deletion, efficient integration, and minimal leakage while updating only 5 percent of parameters.

What carries the argument

Three dedicated adapter pathways (retain, new, unlearn) combined with escape unlearning that maximizes the distance of forget-class embeddings from retained knowledge.

If this is right

Multiple cycles of adding and removing knowledge can be performed without progressive degradation of retained performance.
Real-world identity systems can enroll new users and remove withdrawn users while updating only 5 percent of parameters.
Unified CLU training reduces reliance on separate continual-learning and machine-unlearning pipelines.
The same architecture applies to both image classification and face-recognition tasks without architecture changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The escape-unlearning distance mechanism could be tested on language models to forget specific training examples while acquiring new capabilities.
Repeated-cycle stability might allow deployment in privacy-regulated environments where user data must be added and deleted on demand.
The three-pathway design could be combined with other low-rank methods to handle larger base models.

Load-bearing premise

The three separate adapter pathways and escape unlearning together will block knowledge leakage and preserve retained-task accuracy without new instabilities or dataset-specific retuning.

What would settle it

If accuracy on retained classes declines or forgotten data can be reconstructed after several adaptation cycles on CIFAR-100 or CASIA-Face100, the leakage-prevention claim is falsified.

Figures

Figures reproduced from arXiv: 2604.12686 by Amit Shukla, Jagadeesh Rachapudi, Praful Hambarde, Ritali Vatsi.

**Figure 2.** Figure 2: LoRA placement in BID-LoRA at Attention Modules [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Pathway separation for BID-LoRA. This separation provides two benefits: (1) drift in one adapter does not corrupt others, and (2) each adapter trains on its specific objective without gradient interference. Adapters merge at inference, preserving efficiency. We empirically validate pathway specialization in Section V-D4. C. Loss Function In this section, we present the loss functions which are designed to… view at source ↗

**Figure 4.** Figure 4: Illustration of continual adapting evaluation protocol. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Radar plot comparison at Task-6. BID-LoRA consistently outperforms [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Geometric verification of unlearning. Top row: t-SNE visualization [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Knowledge leakage in CL+MU combinations. Retain accuracy [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Recent advances in deep learning underscore the need for systems that can not only acquire new knowledge through Continual Learning (CL) but also remove outdated, sensitive, or private information through Machine Unlearning (MU). However, while CL methods are well-developed, MU techniques remain in early stages, creating a critical gap for unified frameworks that depend on both capabilities. We find that naively combining existing CL and MU approaches results in knowledge leakage a gradual degradation of foundational knowledge across repeated adaptation cycles. To address this, we formalize Continual Learning Unlearning (CLU) as a unified paradigm with three key goals: (i) precise deletion of unwanted knowledge, (ii) efficient integration of new knowledge while preserving prior information, and (iii) minimizing knowledge leakage across cycles. We propose Bi-Directional Low-Rank Adaptation (BID-LoRA), a novel framework featuring three dedicated adapter pathways-retain, new, and unlearn applied to attention layers, combined with escape unlearning that pushes forget-class embeddings to positions maximally distant from retained knowledge, updating only 5% of parameters. Experiments on CIFAR-100 show that BID-LoRA outperforms CLU baselines across multiple adaptation cycles. We further evaluate on CASIA-Face100, a curated face recognition subset, demonstrating practical applicability to real-world identity management systems where new users must be enrolled and withdrawn users removed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BID-LoRA puts forward a three-pathway LoRA setup for combined continual learning and unlearning with an escape mechanism, but the absence of concrete metrics leaves the leakage-reduction claim hard to assess.

read the letter

The main takeaway is that this paper introduces a parameter-efficient framework called BID-LoRA that splits LoRA adapters into retain, new, and unlearn pathways applied to attention layers, paired with an escape unlearning step that pushes forget-class embeddings maximally distant. It formalizes CLU as a unified task with the goals of precise deletion, efficient addition, and leakage control across cycles, and it notes that naive CL-plus-MU combinations degrade over repeated adaptations. The 5% parameter update budget and the face-recognition use case on CASIA-Face100 are practical angles worth noting. What the work does reasonably well is identify the leakage problem in repeated cycles and propose a modular adapter design that tries to isolate the three behaviors without full retraining. The motivation for identity-management systems that must enroll and remove users is straightforward and relevant. The soft spots are more substantial. The abstract asserts outperformance on CIFAR-100 across multiple cycles yet supplies no numbers, baselines, error bars, or leakage-measurement protocol, so the central empirical claim cannot be checked from the given text. The stress-test worry about retain-pathway interference also lands: pushing forget embeddings far away while limiting updates to 5% could shift decision boundaries on retained classes over time, and nothing in the description rules out gradient conflicts or reports distance metrics on old classes. If the full paper contains those checks and the missing numbers, they need to be shown clearly; otherwise the architecture remains an untested proposal. This is the kind of paper that would interest researchers working on efficient, privacy-sensitive continual adaptation in vision. It deserves peer review because the CLU framing and the three-pathway idea are coherent enough to merit referee input on the experiments and stability analysis, even though the current evidence is thin.

Referee Report

2 major / 1 minor

Summary. The paper proposes BID-LoRA as a parameter-efficient framework for the unified Continual Learning Unlearning (CLU) paradigm. It introduces three dedicated low-rank adapter pathways (retain, new, and unlearn) applied to attention layers, combined with an escape unlearning strategy that pushes forget-class embeddings maximally distant from retained knowledge while updating only 5% of parameters. The central claims are that this prevents knowledge leakage across repeated adaptation cycles and outperforms existing CLU baselines on CIFAR-100, with additional evaluation on CASIA-Face100 demonstrating applicability to real-world identity management.

Significance. If the empirical results and stability claims hold, the work could be significant for addressing the gap between continual learning and machine unlearning in a single efficient framework. The 5% parameter update budget and explicit handling of leakage via dedicated pathways and embedding distancing offer a practical advance for privacy-sensitive continual adaptation tasks, such as enrolling and removing users in face recognition systems. The architectural separation of pathways is a clear strength if it can be shown to avoid the instabilities noted in naive combinations.

major comments (2)

[Abstract] Abstract: The claim that 'Experiments on CIFAR-100 show that BID-LoRA outperforms CLU baselines across multiple adaptation cycles' supplies no quantitative metrics, baseline names, error bars, leakage measurement protocol, or cycle counts. This absence makes the central empirical claim unverifiable and load-bearing for the paper's contribution.
[Method (escape unlearning)] Escape unlearning description: The strategy of maximizing embedding distance for forget classes assumes the retain and new adapters maintain stable decision boundaries without interference or gradient conflicts in a continual regime. No analysis of cross-pathway interactions, retained-class embedding distances, or boundary shifts under the 5% update constraint (applied only to attention layers) is provided, which directly undermines the no-leakage and stability guarantees.

minor comments (1)

[Introduction] The three CLU goals are listed but could be more explicitly tied to the architectural choices in the introduction for improved clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'Experiments on CIFAR-100 show that BID-LoRA outperforms CLU baselines across multiple adaptation cycles' supplies no quantitative metrics, baseline names, error bars, leakage measurement protocol, or cycle counts. This absence makes the central empirical claim unverifiable and load-bearing for the paper's contribution.

Authors: We agree that the abstract would be strengthened by including more specific details. In the revised manuscript, we will update the abstract to reference key quantitative results from the CIFAR-100 experiments, including accuracy metrics with error bars, the specific CLU baselines used for comparison, the leakage measurement protocol, and the number of adaptation cycles evaluated. Given typical abstract length constraints, we will prioritize the most salient metrics while ensuring the full experimental protocols, tables, and figures remain detailed in the Experiments section. revision: yes
Referee: [Method (escape unlearning)] Escape unlearning description: The strategy of maximizing embedding distance for forget classes assumes the retain and new adapters maintain stable decision boundaries without interference or gradient conflicts in a continual regime. No analysis of cross-pathway interactions, retained-class embedding distances, or boundary shifts under the 5% update constraint (applied only to attention layers) is provided, which directly undermines the no-leakage and stability guarantees.

Authors: We acknowledge this as a valid observation regarding the depth of analysis in the current draft. Although the empirical results demonstrate reduced leakage across cycles, the manuscript does not explicitly analyze cross-pathway interactions or boundary dynamics. In the revision, we will add a new analysis subsection (likely in Experiments or an extended Methods discussion) that quantifies: (i) embedding distances for both forget and retain classes before and after escape unlearning, (ii) potential interference or gradient conflicts among the retain/new/unlearn pathways, and (iii) decision boundary stability under the 5% parameter budget restricted to attention layers. This will include additional metrics, visualizations, and discussion to support the stability claims. revision: yes

Circularity Check

0 steps flagged

No circularity: BID-LoRA is an empirical architecture proposal with no self-referential derivations

full rationale

The paper defines BID-LoRA as a new three-pathway adapter architecture plus escape unlearning, then reports empirical outperformance on CIFAR-100 and CASIA-Face100. No equations, uniqueness theorems, or first-principles results are presented that reduce by construction to quantities defined in terms of the method's own fitted parameters or prior self-citations. The central claims rest on experimental comparisons rather than tautological redefinitions or load-bearing self-references.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claim rests on the unproven effectiveness of the newly introduced retain/new/unlearn pathways and escape unlearning mechanism; these are postulated to achieve the three CLU goals but lack independent justification beyond the summarized experiments.

free parameters (1)

adapter rank and scaling
Low-rank dimension and scaling factors are design choices that control the 5% parameter budget and must be selected for each base model.

axioms (2)

domain assumption Independent low-rank adapters applied to attention layers can be trained without mutual interference or base-model degradation.
Invoked when stating that the three pathways are applied to attention layers while updating only 5% of parameters.
ad hoc to paper Maximizing embedding distance for forget classes achieves precise deletion without collateral damage to retained knowledge.
This is the core of the escape unlearning step and is introduced without prior derivation or external validation in the abstract.

invented entities (2)

retain, new, and unlearn adapter pathways no independent evidence
purpose: Separate handling of retention, acquisition, and deletion within a single parameter-efficient update.
New components introduced by the framework; no independent evidence supplied.
escape unlearning no independent evidence
purpose: Push forget-class embeddings maximally distant from retained knowledge to enforce deletion.
Novel unlearning technique proposed in the paper; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5557 in / 1671 out tokens · 51471 ms · 2026-05-10T15:03:34.329415+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models
cs.CR 2026-04 unverdicted novelty 6.0

BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.

Reference graph

Works this paper leans on

44 extracted references · 14 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[2]

A unified framework for continual learning and unlearning.arXiv preprint arXiv:2408.11374,

R. Chatterjee, V . Chundawat, A. Tarun, A. Mali, and M. Mandal, “A unified framework for continual learning and unlearning,”arXiv preprint arXiv:2408.11374, 2024

work page arXiv 2024
[3]

Continual learning and private unlearning,

B. Liu, Q. Liu, and P. Stone, “Continual learning and private unlearning,” inConference on Lifelong Learning Agents. PMLR, 2022, pp. 243–254

2022
[4]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,”Pro- ceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017
[5]

Dark experience for general continual learning: a strong, simple baseline,

P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara, “Dark experience for general continual learning: a strong, simple baseline,” Advances in neural information processing systems, vol. 33, pp. 15 920– 15 930, 2020

2020
[6]

New insights on reducing abrupt representation change in online continual learning,

L. Caccia, R. Aljundi, N. Asadi, T. Tuytelaars, J. Pineau, and E. Belilovsky, “New insights on reducing abrupt representation change in online continual learning,”arXiv preprint arXiv:2104.05025, 2021

work page arXiv 2021
[7]

Dytox: Trans- formers for continual learning with dynamic token expansion,

A. Douillard, A. Ram ´e, G. Couairon, and M. Cord, “Dytox: Trans- formers for continual learning with dynamic token expansion,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9285–9295

2022
[8]

Exemplar-free continual learning of vision transformers via gated class- attention and cascaded feature drift compensation,

M. Cotogni, F. Yang, C. Cusano, A. D. Bagdanov, and J. van de Weijer, “Exemplar-free continual learning of vision transformers via gated class- attention and cascaded feature drift compensation,”International Journal of Computer Vision, pp. 1–19, 2025

2025
[9]

D3former: Debiased dual distilled transformer for incremental learning,

A. Mohamed, R. Grandhe, K. Joseph, S. Khan, and F. Khan, “D3former: Debiased dual distilled transformer for incremental learning,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2421–2430

2023
[10]

Towards unbounded machine unlearning,

M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou, “Towards unbounded machine unlearning,”Advances in neural information pro- cessing systems, vol. 36, pp. 1957–1987, 2023

1957
[11]

Continual forgetting for pre-trained vision models,

H. Zhao, B. Ni, J. Fan, Y . Wang, Y . Chen, G. Meng, and Z. Zhang, “Continual forgetting for pre-trained vision models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28 631–28 642. 11

2024
[12]

Learning to unlearn: Instance-wise unlearning for pre-trained classifiers,

S. Cha, S. Cho, D. Hwang, H. Lee, T. Moon, and M. Lee, “Learning to unlearn: Instance-wise unlearning for pre-trained classifiers,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 10, 2024, pp. 11 186–11 194

2024
[13]

arXiv preprint arXiv:2310.12508 (2023)

C. Fan, J. Liu, Y . Zhang, E. Wong, D. Wei, and S. Liu, “Salun: Em- powering machine unlearning via gradient-based weight saliency in both image classification and generation,”arXiv preprint arXiv:2310.12508, 2023

work page arXiv 2023
[14]

Fast yet effective machine unlearning,

A. K. Tarun, V . S. Chundawat, M. Mandal, and M. Kankanhalli, “Fast yet effective machine unlearning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 9, pp. 13 046–13 055, 2023

2023
[15]

An introduction to the california consumer privacy act (ccpa),

E. Goldman, “An introduction to the california consumer privacy act (ccpa),”Santa Clara Univ. Legal Studies Research Paper, 2020

2020
[16]

General data protection regulation (gdpr),

G. Data, “General data protection regulation (gdpr),”Intersoft Consult- ing, Accessed in October, vol. 24, no. 1, 2018

2018
[17]

Model inversion attacks that exploit confidence information and basic countermeasures,

M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 1322–1333

2015
[18]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,”Advances in neural information processing systems, vol. 32, 2019

2019
[19]

The secret revealer: Generative model-inversion attacks against deep neural net- works,

Y . Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Generative model-inversion attacks against deep neural net- works,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 253–261

2020
[20]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning. PMLR, 2019, pp. 2790–2799

2019
[21]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

2022
[22]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,”arXiv preprint arXiv:2101.00190, 2021

work page internal anchor Pith review arXiv 2021
[23]

Transformer Feed-Forward Layers Are Key-Value Memories

M. Geva, R. Schuster, J. Berant, and O. Levy, “Transformer feed-forward layers are key-value memories,”arXiv preprint arXiv:2012.14913, 2020

work page internal anchor Pith review arXiv 2012
[24]

Packnet: Adding multiple tasks to a single network by iterative pruning,

A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773

2018
[25]

Piggyback: Adapting a single network to multiple tasks by learning to mask weights,

A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 67– 82

2018
[26]

Learning to prompt for continual learning,

Z. Wang, Z. Zhang, C.-Y . Lee, H. Zhang, R. Sun, X. Ren, G. Su, V . Perot, J. Dy, and T. Pfister, “Learning to prompt for continual learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 139–149

2022
[27]

Continual stereo matching of continuous driving scenes with growing architecture,

C. Zhang, K. Tian, B. Fan, G. Meng, Z. Zhang, and C. Pan, “Continual stereo matching of continuous driving scenes with growing architecture,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 901–18 910

2022
[28]

Fast: Feature aware similarity thresholding for weak unlearning in black-box generative models,

S. Panda and A. Prathosh, “Fast: Feature aware similarity thresholding for weak unlearning in black-box generative models,”IEEE Transactions on Artificial Intelligence, 2024

2024
[29]

Malicious clients and contribution co- aware federated unlearning,

Y . Wang, X. Li, and S. Chen, “Malicious clients and contribution co- aware federated unlearning,”IEEE Transactions on Artificial Intelli- gence, 2025

2025
[30]

Gpt understands, too,

X. Liu, Y . Zheng, Z. Du, M. Ding, Y . Qian, Z. Yang, and J. Tang, “Gpt understands, too,”AI Open, vol. 5, pp. 208–215, 2024

2024
[31]

What would elsa do? freezing layers during transformer fine-tuning.arXiv preprint arXiv:1911.03090,

J. Lee, R. Tang, and J. Lin, “What would elsa do? freezing layers during transformer fine-tuning,”arXiv preprint arXiv:1911.03090, 2019

work page arXiv 1911
[32]

One-for-all: Generalized lora for parameter-efficient fine-tuning.arXiv preprint arXiv:2306.07967,

A. Chavan, Z. Liu, D. Gupta, E. Xing, and Z. Shen, “One-for-all: Generalized lora for parameter-efficient fine-tuning,”arXiv preprint arXiv:2306.07967, 2023

work page arXiv 2023
[33]

Dylora: Parameter efficient tuning of pre-trained models using dynamic search- free low-rank adaptation,

M. Valipour, M. Rezagholizadeh, I. Kobyzev, and A. Ghodsi, “Dylora: Parameter efficient tuning of pre-trained models using dynamic search- free low-rank adaptation,”arXiv preprint arXiv:2210.07558, 2022

work page arXiv 2022
[34]

Learning with selective forgetting

T. Shibata, G. Irie, D. Ikami, and Y . Mitsuzumi, “Learning with selective forgetting.” inIJCAI, vol. 3, 2021, p. 4

2021
[35]

A unified gradient-based framework for task-agnostic contin- ual learning-unlearning.arXiv preprint arXiv:2505.15178,

Z. Huang, X. Cheng, J. Zhang, J. Zheng, H. Wang, Z. He, T. Li, and X. Huang, “A unified gradient-based framework for task-agnostic continual learning-unlearning,”arXiv preprint arXiv:2505.15178, 2025

work page arXiv 2025
[36]

An unlearning framework for continual learning.arXiv preprint arXiv:2509.17530, 2025

S. Adhikari, V . Kumaravelu, and P. Srijith, “An unlearning framework for continual learning,”arXiv preprint arXiv:2509.17530, 2025

work page arXiv 2025
[37]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

2009
[38]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inInternational conference on machine learning. PMLR, 2021, pp. 10 347–10 357

2021
[39]

Learning Face Representation from Scratch

D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,”arXiv preprint arXiv:1411.7923, 2014

work page Pith review arXiv 2014
[40]

Face transformer for recognition,

Y . Zhong and W. Deng, “Face transformer for recognition,”arXiv preprint arXiv:2103.14803, 2021

work page arXiv 2021
[41]

Label- only membership inference attacks,

C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label- only membership inference attacks,” inInternational conference on machine learning. PMLR, 2021, pp. 1964–1974

2021
[42]

Towards source-free machine unlearning,

S. M. Ahmed, U. Y . Basaran, D. S. Raychaudhuri, A. Dutta, R. Kundu, F. F. Niloy, B. Guler, and A. K. Roy-Chowdhury, “Towards source-free machine unlearning,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4948–4957

2025
[43]

Llm unlearning via loss adjustment with only forget data,

Y . Wang, J. Wei, C. Y . Liu, J. Pang, Q. Liu, A. P. Shah, Y . Bao, Y . Liu, and W. Wei, “Llm unlearning via loss adjustment with only forget data,” arXiv preprint arXiv:2410.11143, 2024

work page arXiv 2024
[44]

Erasing concepts from diffusion models,

R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau, “Erasing concepts from diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 2426–2436

2023