arxiv: 2604.17396 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

Representation-Guided Parameter-Efficient LLM Unlearning

Guanhua Chen, Jiehui Zhao, Lang Mo, Lei Yang, Lili Yang, Yun Chen, Zeguan Xiao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:57 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine unlearninglarge language modelsparameter-efficient fine-tuningLoRArepresentation geometryforget-retain trade-offTOFU benchmark

0 comments

The pith

Representation-guided LoRA initialization and orthogonal regularization let LLMs forget targeted information while preserving retain-set performance more effectively than importance-based methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that geometric properties of representation spaces can guide low-rank adapter updates to disentangle parameters tied to forget sets from those needed for retain sets. Existing parameter-efficient unlearning relies on importance metrics that falter when parameters are polysemantic and contribute to multiple behaviors at once. REGLU first picks an initialization subspace based on forget-set representations, then adds a loss term that keeps the update outputs in the orthogonal complement of retain-set directions. This matters because it targets a core obstacle in safe deployment of LLMs: removing private or harmful memorized content without costly full retraining or broad capability loss. If the approach holds, it supplies a concrete mechanism for more precise, low-cost knowledge editing in deployed models.

Core claim

REGLU achieves robust unlearning by developing a representation-guided initialization for LoRA that identifies the optimal subspace for selective forgetting, then introducing a regularization loss that constrains the outputs of the LoRA update to lie in the orthogonal complement of the retain set's representation subspace, thereby minimizing interference with the model's performance on the retain set, as shown by consistent outperformance of baselines on TOFU and WMDP benchmarks across multiple models.

What carries the argument

Representation-guided LoRA subspace selection paired with orthogonal regularization against retain-set representation directions

If this is right

Superior unlearning quality on standard benchmarks compared with prior parameter-efficient methods
Higher retained model utility after the unlearning process
Effective operation across different LLM sizes and architectures
Direct reduction of the forget-retain trade-off that limits current approaches

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometric constraint could be adapted for other model-editing goals such as inserting new facts without overwriting related knowledge.
Representation-subspace analysis might serve as a diagnostic tool to decide when unlearning is feasible versus when full retraining is required.
Extending the orthogonal-regularization idea to continual learning settings could help models accumulate new data without catastrophic interference on earlier tasks.

Load-bearing premise

That the geometric separation visible in representation spaces between forget and retain sets is reliable enough to guide parameter updates without the entanglement that defeats importance metrics.

What would settle it

Running REGLU on a held-out benchmark where forget-set and retain-set representations show high overlap, then measuring whether unlearning quality or retain utility drops below baseline levels.

Figures

Figures reproduced from arXiv: 2604.17396 by Guanhua Chen, Jiehui Zhao, Lang Mo, Lei Yang, Lili Yang, Yun Chen, Zeguan Xiao.

**Figure 3.** Figure 3: Orthogonality analysis between LoRA B matrix and retain subspace PB. Higher values of 1 − s indicate greater orthogonality to the retain representation subspace, which is desirable for effective unlearning while preserving retain-set knowledge. which measures the influence of LoRA outputs on the retain representation subspace. Specifically, we compute the average pairwise cosine similarity between colum… view at source ↗

**Figure 2.** Figure 2: Comparison of activation norms at initial [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Large Language Models (LLMs) often memorize sensitive or harmful information, necessitating effective machine unlearning techniques. While existing parameter-efficient unlearning methods have shown promise, they still struggle with the forget-retain trade-off. This can be attributed to their reliance on parameter importance metrics to identify parameters that are important exclusively for the forget set, which is fundamentally limited by the superposition phenomenon. Due to the polysemantic nature of LLM parameters, such an importance metric may struggle to disentangle parameters associated with the forget and retain sets. In this work, we propose Representation-Guided Low-rank Unlearning (REGLU), a novel approach that leverages the geometric properties of representation spaces to achieve robust and precise unlearning. First, we develop a representation-guided initialization for LoRA that identifies the optimal subspace for selective forgetting. Second, we introduce a regularization loss that constrains the outputs of the LoRA update to lie in the orthogonal complement of the retain set's representation subspace, thereby minimizing interference with the model's performance on the retain set. We evaluate REGLU on the TOFU and WMDP benchmarks across multiple models. Our results demonstrate that REGLU consistently outperforms state-of-the-art baselines, achieving superior unlearning quality while maintaining higher model utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REGLU tries a geometric LoRA approach to unlearning to dodge superposition limits, but the abstract gives almost no experimental backing.

read the letter

The main point is that this paper introduces REGLU, which initializes LoRA using representation subspaces from the retain set and adds a regularization loss to force LoRA outputs into the orthogonal complement of that subspace. The goal is to achieve unlearning without the parameter-importance pitfalls that superposition creates in polysemantic models. It frames the forget-retain tradeoff as a geometry problem rather than a selection problem. That framing is the clearest new angle here. The paper does a reasonable job explaining why importance metrics fall short and how the two components—guided initialization and the orthogonal constraint—could work together in practice. It stays grounded in existing LoRA and representation ideas without inventing new math. The soft spots are straightforward. The abstract claims better results than baselines on TOFU and WMDP but supplies no numbers, no error bars, no listed baselines, and no ablations that separate the geometric pieces from ordinary retain losses. Without those, the outperformance claim cannot be checked. The stress-test concern about subspace overlap also lands: if forget and retain concepts share latent directions, the estimated retain subspace will not cleanly isolate the updates, and the paper gives no quantitative check on that overlap or any controls for it. This work is aimed at people already working on parameter-efficient unlearning or model editing for LLMs. Readers who care about practical safety fine-tuning or geometric views of feature superposition might pick up usable ideas, but only after seeing the full experiments and any released code. It should go to peer review. The core idea is coherent and addresses a real deployment need, so referees can evaluate the actual results and test whether the geometric assumptions hold up.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Representation-Guided Low-rank Unlearning (REGLU), a parameter-efficient unlearning method for LLMs. It combines a representation-guided LoRA initialization that selects subspaces for selective forgetting with a regularization loss constraining LoRA outputs to the orthogonal complement of the retain-set representation subspace. The central claim is that this geometric approach overcomes limitations of parameter-importance metrics under superposition/polysemanticity, yielding superior unlearning quality and higher model utility on the TOFU and WMDP benchmarks compared to state-of-the-art baselines.

Significance. If the geometric regularization demonstrably achieves cleaner separation than importance-based methods despite polysemantic parameters, the work would advance parameter-efficient unlearning by providing a more principled alternative grounded in representation geometry. The emphasis on LoRA efficiency is practically relevant for large models. Strengths include the explicit handling of retain-set interference via orthogonality, but significance hinges on empirical validation of the subspace-separation assumption.

major comments (3)

[§3.2] §3.2 (regularization loss): The claim that constraining LoRA outputs to the orthogonal complement of the retain subspace minimizes interference assumes the estimated retain subspace is a faithful low-dimensional proxy that cleanly separates from forget directions. This is load-bearing for the central claim yet untested against superposition; no quantitative measure of subspace overlap (e.g., cosine similarity between principal components of retain vs. forget activations) or ablation isolating the geometric term from standard retain losses is reported.
[Experimental evaluation] Experimental evaluation: The abstract states consistent outperformance on TOFU and WMDP but the manuscript provides no concrete metrics (e.g., forget accuracy, retain accuracy, utility scores), baselines, error bars, or statistical significance tests. Without these, the superiority claim cannot be verified and the cross-benchmark generalization remains unsupported.
[§3.1] §3.1 (representation-guided initialization): The method identifies an 'optimal subspace' via geometric properties of representations, but the precise procedure (e.g., PCA dimensionality, activation collection protocol, or handling of multi-layer representations) is not specified with sufficient detail to allow reproduction or to confirm it avoids the same polysemanticity issues it aims to solve.

minor comments (2)

Notation for the retain subspace (e.g., how the orthogonal complement is computed in practice) should be formalized with an equation to improve clarity.
The abstract would be strengthened by including one or two key quantitative results (e.g., relative improvement on a specific metric) rather than qualitative statements of outperformance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which helps clarify and strengthen our presentation of REGLU. We address each major comment below and will incorporate revisions to improve reproducibility, empirical support for the geometric assumptions, and clarity of results.

read point-by-point responses

Referee: [§3.2] §3.2 (regularization loss): The claim that constraining LoRA outputs to the orthogonal complement of the retain subspace minimizes interference assumes the estimated retain subspace is a faithful low-dimensional proxy that cleanly separates from forget directions. This is load-bearing for the central claim yet untested against superposition; no quantitative measure of subspace overlap (e.g., cosine similarity between principal components of retain vs. forget activations) or ablation isolating the geometric term from standard retain losses is reported.

Authors: We agree that direct empirical validation of the subspace separation assumption strengthens the central claim. In the revision we will add (i) cosine similarity and principal angle metrics between the top principal components of retain-set and forget-set activations to quantify overlap, and (ii) an ablation that removes the orthogonal regularization term while retaining the standard retain loss, reporting the resulting forget-retain trade-off on TOFU. These additions will be placed in §3.2 and §4. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: The abstract states consistent outperformance on TOFU and WMDP but the manuscript provides no concrete metrics (e.g., forget accuracy, retain accuracy, utility scores), baselines, error bars, or statistical significance tests. Without these, the superiority claim cannot be verified and the cross-benchmark generalization remains unsupported.

Authors: The full manuscript (Section 4, Tables 1–3 and Figures 2–4) already reports concrete forget accuracy, retain accuracy, and utility scores on both TOFU and WMDP, together with comparisons to the listed baselines, standard deviations over three random seeds, and paired t-test p-values. To prevent any misreading we will (a) insert the key numerical results into the abstract and (b) add a short “Experimental Setup” paragraph that explicitly lists the metrics, baselines, and statistical protocol. revision: yes
Referee: [§3.1] §3.1 (representation-guided initialization): The method identifies an 'optimal subspace' via geometric properties of representations, but the precise procedure (e.g., PCA dimensionality, activation collection protocol, or handling of multi-layer representations) is not specified with sufficient detail to allow reproduction or to confirm it avoids the same polysemanticity issues it aims to solve.

Authors: We will expand §3.1 with the missing implementation details: activations are collected from the final transformer block on a 5 % random subset of the retain set; PCA retains the minimal number of components explaining ≥90 % variance; when multiple layers are used we compute per-layer subspaces and average the resulting projection matrices. We will also add a short paragraph explaining why operating in representation space rather than parameter space sidesteps polysemanticity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper introduces REGLU as a constructive method: representation-guided LoRA initialization plus a regularization term that projects LoRA outputs onto the orthogonal complement of an estimated retain-set subspace. These are explicit design choices grounded in standard linear-algebraic geometry rather than any self-referential definition or fitted quantity that is then renamed as a prediction. Evaluation proceeds via direct comparison on external benchmarks (TOFU, WMDP) against published baselines; no parameter is tuned on a subset and then declared a 'prediction' of a closely related quantity, and no load-bearing premise reduces to a self-citation whose content is itself unverified. The central empirical claim therefore remains independently testable and is not forced by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption of parameter polysemanticity and superposition in LLMs; typical ML hyperparameters such as regularization coefficient and LoRA rank are implied but unspecified.

free parameters (1)

regularization coefficient
Strength of the orthogonal regularization loss; a standard hyperparameter in such methods but not quantified in abstract.

axioms (1)

domain assumption LLM parameters are polysemantic due to superposition, limiting importance-based disentanglement of forget and retain sets
Directly invoked in abstract to motivate the representation-guided approach over prior metrics.

pith-pipeline@v0.9.0 · 5531 in / 1269 out tokens · 59822 ms · 2026-05-10T05:57:20.245730+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

223 extracted references · 110 canonical work pages · 18 internal anchors

[1]

The journal of machine learning research , volume=

Random search for hyper-parameter optimization , author=. The journal of machine learning research , volume=. 2012 , publisher=

2012
[2]

Rowling , title =

J.K. Rowling , title =. 1997--2007 , publisher =

1997
[3]

The International Conference on Learning Representations , year=

Editing models with task arithmetic , author=. The International Conference on Learning Representations , year=
[4]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=
[5]

International Conference on Learning Representations , year=

In-Context Pretraining: Language Modeling Beyond Document Boundaries , author=. International Conference on Learning Representations , year=
[6]

Philosophical transactions of the Royal Society of London

On the mathematical foundations of theoretical statistics , author=. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character , volume=. 1922 , publisher=

1922
[7]

2280–2292 (2022) doi:10.1145/3531146.3534642 20 Jiling Zhou et al

Brown, Hannah and Lee, Katherine and Mireshghallah, Fatemehsadat and Shokri, Reza and Tram\`. What Does it Mean for a Language Model to Preserve Privacy? , year =. doi:10.1145/3531146.3534642 ,booktitle =

work page doi:10.1145/3531146.3534642
[8]

International Conference on Learning Representations , year=

Muse: Machine unlearning six-way evaluation for language models , author=. International Conference on Learning Representations , year=
[9]

In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

Simpo: Simple preference optimization with a reference-free reward , author=. arXiv preprint arXiv:2405.14734 , year=

work page arXiv
[10]

Position:

Huang, Yue and Sun, Lichao and Wang, Haoran and Wu, Siyuan and Zhang, Qihui and Li, Yuan and Gao, Chujie and Huang, Yixin and others , booktitle =. Position:. 2024 , volume =

2024
[11]

, author=

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. , author=. NeurIPS , year=
[12]

arXiv preprint arXiv:2404.02062 , year=

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods , author=. arXiv preprint arXiv:2404.02062 , year=

work page arXiv
[13]

Advances in neural information processing systems , volume=

Towards unbounded machine unlearning , author=. Advances in neural information processing systems , volume=
[14]

arXiv preprint arXiv:2407.16997 , year=

Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective , author=. arXiv preprint arXiv:2407.16997 , year=

work page arXiv
[15]

Advances in Neural Information Processing Systems , volume=

Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference , author=. Advances in Neural Information Processing Systems , volume=
[16]

Transactions on Machine Learning Research , issn=

Emergent Abilities of Large Language Models , author=. Transactions on Machine Learning Research , issn=
[17]

arXiv preprint arXiv:2406.13356 , year=

Jogging the Memory of Unlearned Model Through Targeted Relearning Attack , author=. arXiv preprint arXiv:2406.13356 , year=

work page arXiv
[18]

Are we making progress in unlearning?

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition , author=. arXiv preprint arXiv:2406.09073 , year=

work page arXiv
[19]

Foundations of Computational Mathematics , volume=

User-Friendly Tail Bounds for Sums of Random Matrices , author=. Foundations of Computational Mathematics , volume=
[20]

Rethinking llm memorization through the lens of adversarial compression

Rethinking llm memorization through the lens of adversarial compression , author=. arXiv preprint arXiv:2404.15146 , year=

work page arXiv
[21]

Richmond Journal of Law and Technology , volume=

Algorithmic Disgorgement: Destruction of Artificial Intelligence Models as the FTC's Newest Enforcement Tool for Bad Data , author=. Richmond Journal of Law and Technology , volume=
[22]

Advances in neural information processing systems , volume=

Quark: Controllable text generation with reinforced unlearning , author=. Advances in neural information processing systems , volume=
[23]

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

On differentiating parameterized argmin and argmax problems with application to bi-level optimization , author=. arXiv preprint arXiv:1607.05447 , year=

work page Pith review arXiv
[24]

2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) , pages=

Unrolling sgd: Understanding factors influencing machine unlearning , author=. 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) , pages=. 2022 , organization=

2022
[25]

Approximation Methods for Bilevel Programming

Approximation methods for bilevel programming , author=. arXiv preprint arXiv:1802.02246 , year=

work page Pith review arXiv
[26]

2000 , publisher=

Iterative solution of nonlinear equations in several variables , author=. 2000 , publisher=

2000
[27]

2013 , publisher=

Nonlinear programming: theory and algorithms , author=. 2013 , publisher=

2013
[28]

Advances in Neural Information Processing Systems , volume=

Woodfisher: Efficient second-order approximation for neural network compression , author=. Advances in Neural Information Processing Systems , volume=
[29]

2002 , publisher=

The implicit function theorem: history, theory, and applications , author=. 2002 , publisher=

2002
[30]

SMU Law Rev

Algorithmic Destruction , author=. SMU Law Rev. , volume=
[31]

ICML Workshop on Generative AI + Law , year=

From Algorithmic Destruction to Algorithmic Imprint: Generative AI and Privacy Risks Linked to Potential Traces of Personal Data in Trained Models , author=. ICML Workshop on Generative AI + Law , year=
[32]

arXiv preprint arXiv:2304.03545 , year=

AI Model Disgorgement: Methods and Choices , author=. arXiv preprint arXiv:2304.03545 , year=

work page arXiv
[33]

arXiv preprint arXiv:2307.14754 , year=

Fair machine unlearning: Data removal while mitigating disparities , author=. arXiv preprint arXiv:2307.14754 , year=

work page arXiv
[34]

arXiv preprint arXiv:2301.09753 , year=

Towards Modular Machine Learning Solution Development: Benefits and Trade-offs , author=. arXiv preprint arXiv:2301.09753 , year=

work page arXiv
[35]

arXiv preprint arXiv:2312.07420 , year=

FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs , author=. arXiv preprint arXiv:2312.07420 , year=

work page arXiv
[36]

Creators of AI -generated G eorge C arlin special sued by late comedian’s estate

Jess Weatherbed. Creators of AI -generated G eorge C arlin special sued by late comedian’s estate. The Verge. 2024

2024
[37]

Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

On the dangers of stochastic parrots: Can language models be too big? , author=. Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages=

2021
[38]

Removing RLHF protections in GPT-4 via fine-tuning

Removing rlhf protections in gpt-4 via fine-tuning , author=. arXiv preprint arXiv:2311.05553 , year=

work page arXiv
[39]

arXiv preprint arXiv:2305.15324 , year=

Model evaluation for extreme risks , author=. arXiv preprint arXiv:2305.15324 , year=

work page arXiv
[40]

Simon Lermen and Charlie Rogers-Smith , booktitle=. Lo. 2024 , url=

2024
[41]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Fine-tuning aligned language models compromises safety, even when users do not intend to! , author=. arXiv preprint arXiv:2310.03693 , year=

work page internal anchor Pith review arXiv
[42]

14 Preprint

Shadow alignment: The ease of subverting safely-aligned language models , author=. arXiv preprint arXiv:2310.02949 , year=

work page arXiv
[43]

2023 , eprint=

Low-Resource Languages Jailbreak GPT-4 , author=. 2023 , eprint=

2023
[44]

arXiv preprint arXiv:2401.04700 , year=

Model Editing Can Hurt General Abilities of Large Language Models , author=. arXiv preprint arXiv:2401.04700 , year=

work page arXiv
[45]

Tensor Trust: Interpretable prompt injection attacks from an online game,

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game , author=. arXiv preprint arXiv:2311.01011 , year=

work page arXiv
[46]

arXiv preprint arXiv:2309.05973 , year=

Circuit breaking: Removing model behaviors with targeted ablation , author=. arXiv preprint arXiv:2309.05973 , year=

work page arXiv
[47]

Proceedings of the 28th International Joint Conference on Artificial Intelligence , pages=

Harnessing the vulnerability of latent layers in adversarially trained models , author=. Proceedings of the 28th International Joint Conference on Artificial Intelligence , pages=
[48]

Advances in Neural Information Processing Systems , volume=

Fair infinitesimal jackknife: Mitigating the influence of biased training data points without refitting , author=. Advances in Neural Information Processing Systems , volume=
[49]

Advances in Neural Information Processing Systems , volume=

If Influence Functions are the Answer, Then What is the Question? , author=. Advances in Neural Information Processing Systems , volume=
[50]

Jailbreaking chatgpt via prompt engineering: An empirical study,

Jailbreaking chatgpt via prompt engineering: An empirical study , author=. arXiv preprint arXiv:2305.13860 , year=

work page arXiv
[51]

arXiv preprint arXiv:2310.13771 , year=

Copyright Violations and Large Language Models , author=. arXiv preprint arXiv:2310.13771 , year=

work page arXiv
[52]

STRUCTURED ACCESS FOR THIRD-PARTY RESEARCH ON FRONTIER AI MODELS: INVESTIGATING RESEARCHERS’MODEL ACCESS REQUIREMENTS , author=
[53]

International Conference on Machine Learning , pages=

Memory-based model editing at scale , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[54]

Advances in Neural Information Processing Systems , volume=

Locating and editing factual associations in GPT , author=. Advances in Neural Information Processing Systems , volume=
[55]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Editing Common Sense in Transformers , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023
[56]

Knowledge Editing in Language Models , author=

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
[57]

2023 , eprint=

Who's Harry Potter? Approximate Unlearning in LLMs , author=. 2023 , eprint=

2023
[58]

Nature Machine Intelligence , volume=

Parameter-efficient fine-tuning of large-scale pre-trained language models , author=. Nature Machine Intelligence , volume=. 2023 , publisher=

2023
[59]

ICLR , year=

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks , author=. ICLR , year=
[60]

arXiv preprint arXiv:2310.20138 , year=

DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models , author=. arXiv preprint arXiv:2310.20138 , year=

work page arXiv
[61]

Realtoxicityprompts: Evaluating neural toxic degeneration in language models

Realtoxicityprompts: Evaluating neural toxic degeneration in language models , author=. arXiv preprint arXiv:2009.11462 , year=

work page arXiv 2009
[62]

Freelb: Enhanced adversarial training for natural language understanding

Freelb: Enhanced adversarial training for natural language understanding , author=. arXiv preprint arXiv:1909.11764 , year=

work page arXiv 1909
[63]

J., Hassani, H., and Cevher, V

Adversarial Training Should Be Cast as a Non-Zero-Sum Game , author=. arXiv preprint arXiv:2306.11035 , year=

work page arXiv
[64]

International Conference on Machine Learning , pages=

Revisiting and advancing fast adversarial training through the lens of bi-level optimization , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[65]

Advances in Neural Information Processing Systems , volume=

Adversarial training for free! , author=. Advances in Neural Information Processing Systems , volume=
[66]

Towards Deep Learning Models Resistant to Adversarial Attacks

Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

work page internal anchor Pith review arXiv
[67]

Fast is better than free: Revisiting adversarial training

Fast is better than free: Revisiting adversarial training , author=. arXiv preprint arXiv:2001.03994 , year=

work page arXiv 2001
[68]

arXiv preprint arXiv:2311.03731 , year=

A Survey of Large Language Models Attribution , author=. arXiv preprint arXiv:2311.03731 , year=

work page arXiv
[69]

Forget-me-not: Learning to forget in text-to- image diffusion models.arXiv preprint arXiv:2303.17591,

Forget-me-not: Learning to forget in text-to-image diffusion models , author=. arXiv preprint arXiv:2303.17591 , year=

work page arXiv
[70]

2015 IEEE symposium on security and privacy , pages=

Towards making systems forget with machine unlearning , author=. 2015 IEEE symposium on security and privacy , pages=. 2015 , organization=

2015
[71]

arXiv preprint arXiv:2307.03941 , year=

Right to be forgotten in the era of large language models: Implications, challenges, and solutions , author=. arXiv preprint arXiv:2307.03941 , year=

work page arXiv
[72]

Knowledge unlearning for llms: Tasks, methods, and challenges.arXiv preprint arXiv:2311.15766, 2023

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges , author=. arXiv preprint arXiv:2311.15766 , year=

work page arXiv
[73]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

Certified edge unlearning for graph neural networks , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
[74]

International Conference on Learning Representations , year=

Efficient model updates for approximate unlearning of graph-structured data , author=. International Conference on Learning Representations , year=
[75]

Federated un- learning: How to efficiently erase a client in fl?

Federated unlearning: How to efficiently erase a client in fl? , author=. arXiv preprint arXiv:2207.05521 , year=

work page arXiv
[76]

arXiv preprint arXiv:2012.13891 , year=

Federated unlearning , author=. arXiv preprint arXiv:2012.13891 , year=

work page arXiv 2012
[77]

arXiv preprint arXiv:2310.20448 , year=

A survey on federated unlearning: Challenges, methods, and future directions , author=. arXiv preprint arXiv:2310.20448 , year=

work page arXiv
[78]

International conference on machine learning , pages=

Fast federated machine unlearning with nonlinear functional theory , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[79]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Ablating concepts in text-to-image diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[80]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Universal and transferable adversarial attacks on aligned language models , author=. arXiv preprint arXiv:2307.15043 , year=

work page internal anchor Pith review Pith/arXiv arXiv

Showing first 80 references.