arxiv: 2509.22483 · v1 · submitted 2025-09-26 · 💻 cs.LG · cs.AI

OFMU: Optimization-Driven Framework for Machine Unlearning

Sadia Asif , Mohammad Mohammadi Amiri This is my paper

Pith reviewed 2026-05-18 13:01 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords machine unlearningbi-level optimizationgradient decorrelationforgetting efficacymodel utilityconvergence guaranteeslarge language models

0 comments p. Extension

The pith

A bi-level optimization framework called OFMU improves the balance between forgetting specific data and retaining model performance in machine unlearning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new way to handle machine unlearning in large models by using a penalty-based bi-level optimization instead of simple weighted loss combinations. It prioritizes forgetting through an inner maximization step that uses a similarity-aware penalty to reduce conflicting gradients with the retention objective. An outer minimization step then restores utility on the kept data. The method includes a two-loop algorithm with theoretical convergence proofs for both convex and non-convex cases. This matters because it could enable more reliable removal of unwanted knowledge like private or copyrighted information while keeping the model useful overall.

Core claim

OFMU is a penalty-based bi-level optimization framework that enforces forgetting via an inner maximization step incorporating a similarity-aware penalty to decorrelate gradients of the forget and retention objectives, and restores utility through an outer minimization step, supported by a two-loop algorithm with provable convergence guarantees under convex and non-convex regimes, achieving better trade-offs between forgetting efficacy and model utility.

What carries the argument

The bi-level optimization structure where the inner loop maximizes forgetting with a similarity-aware penalty for gradient decorrelation and the outer loop minimizes to preserve retention utility.

If this is right

Convergence is guaranteed for the two-loop algorithm in both convex and non-convex settings.
Better trade-offs between forgetting efficacy and model utility are achieved compared to prior scalarization methods.
Scalability is ensured for large-scale models through the developed algorithm.
Consistent outperformance is shown on vision and language benchmarks in forgetting and retained utility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The hierarchical structure might help in scenarios with multiple conflicting objectives beyond unlearning.
Future work could test the method on even larger models or in online unlearning settings where data arrives continuously.
Connecting to other optimization techniques could further reduce computational overhead.

Load-bearing premise

The similarity-aware penalty term can decorrelate the gradients of forget and retention objectives in practice without adding too much computation or causing training instabilities.

What would settle it

Training a model with the OFMU method on a standard unlearning benchmark and observing no improvement in the forgetting-utility trade-off compared to weighted sum baselines, or seeing the algorithm fail to converge.

Figures

Figures reproduced from arXiv: 2509.22483 by Mohammad Mohammadi Amiri, Sadia Asif.

**Figure 1.** Figure 1: Coupling of unlearning difficulty with collateral utility loss. Harder samples induce disproportionately large utility degradation for existing methods (GA (Thudi et al., 2022), GradDiff (Maini et al., 2024a)), whereas OFMU mitigates this coupling through its similarity-aware hierarchical updates. Full detail is provided in Appendix 7.4.6. These challenges call for a more principled and structured appr… view at source ↗

**Figure 2.** Figure 2: Overall normalized performance of unlearning methods on LLaMA-2 and LLaMA-3 under [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Overall normalized performance of unlearning methods on CIFAR-10. The score is obtained by normalizing four key metrics—UA, RA, TA, and MIA Efficacy— within each scenario and averaging them into a unified value. Details of the calculation are given in Appendix 7.4.3. This unified view highlights the balance between forgetting and retention across different forget scenarios and shows how OFMU strikes the b… view at source ↗

**Figure 4.** Figure 4: Embedding similarity with the retrain model for easy vs. hard samples. Easy samples correspond to instances where the base model had low initial confidence, while hard samples are high-confidence, entangled instances. Scores are computed as cosine similarity of embeddings with a retrained reference model. OFMU maintains competitive similarity on easy samples and significantly stronger robustness on hard s… view at source ↗

**Figure 5.** Figure 5: Effect of inner-loop steps and penalty parameter [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

Large language models deployed in sensitive applications increasingly require the ability to unlearn specific knowledge, such as user requests, copyrighted materials, or outdated information, without retraining from scratch to ensure regulatory compliance, user privacy, and safety. This task, known as machine unlearning, aims to remove the influence of targeted data (forgetting) while maintaining performance on the remaining data (retention). A common approach is to formulate this as a multi-objective problem and reduce it to a single-objective problem via scalarization, where forgetting and retention losses are combined using a weighted sum. However, this often results in unstable training dynamics and degraded model utility due to conflicting gradient directions. To address these challenges, we propose OFMU, a penalty-based bi-level optimization framework that explicitly prioritizes forgetting while preserving retention through a hierarchical structure. Our method enforces forgetting via an inner maximization step that incorporates a similarity-aware penalty to decorrelate the gradients of the forget and retention objectives, and restores utility through an outer minimization step. To ensure scalability, we develop a two-loop algorithm with provable convergence guarantees under both convex and non-convex regimes. We further provide a rigorous theoretical analysis of convergence rates and show that our approach achieves better trade-offs between forgetting efficacy and model utility compared to prior methods. Extensive experiments across vision and language benchmarks demonstrate that OFMU consistently outperforms existing unlearning methods in both forgetting efficacy and retained utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes OFMU, a penalty-based bi-level optimization framework for machine unlearning. It prioritizes forgetting via an inner maximization step incorporating a similarity-aware penalty to decorrelate gradients of forget and retention objectives, followed by an outer minimization to restore utility. A two-loop algorithm is introduced with claimed provable convergence guarantees under convex and non-convex regimes, along with theoretical analysis of convergence rates. Experiments on vision and language benchmarks are reported to show improved trade-offs between forgetting efficacy and model utility relative to prior scalarization-based methods.

Significance. If the convergence analysis and empirical trade-offs hold under the stated assumptions, the work would offer a practically relevant advance in scalable machine unlearning for LLMs by addressing gradient conflicts through an explicitly derived penalty term and providing a hierarchical optimization structure with theoretical backing. The two-loop algorithm and its guarantees constitute a clear methodological contribution over standard weighted-sum approaches.

major comments (2)

[Theoretical analysis] Theoretical analysis section: the convergence rate statements for the non-convex regime are stated to hold under Lipschitz smoothness and bounded variance assumptions, yet the manuscript does not provide the explicit dependence of the rate on the penalty coefficient or verify that the similarity-aware term preserves these assumptions in practice for large-scale models; this is load-bearing for the central claim of provable guarantees.
[Experiments] Experimental section: the reported superior trade-offs (e.g., higher forgetting efficacy with retained utility) are shown via benchmarks, but hyperparameter selection for the penalty coefficient, learning rates in the two loops, and exact controls for baseline methods are not detailed; without these, the empirical support for the weakest assumption (stable decorrelation without overhead) cannot be fully assessed.

minor comments (2)

[Method] Notation for the inner and outer objectives could be clarified with explicit equation numbers when first introduced to aid readability of the bi-level formulation.
[Experiments] Figure captions for the convergence plots should include the specific values of the penalty coefficient used in each run.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the methodological contribution of the bi-level optimization approach. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: Theoretical analysis section: the convergence rate statements for the non-convex regime are stated to hold under Lipschitz smoothness and bounded variance assumptions, yet the manuscript does not provide the explicit dependence of the rate on the penalty coefficient or verify that the similarity-aware term preserves these assumptions in practice for large-scale models; this is load-bearing for the central claim of provable guarantees.

Authors: We acknowledge that the explicit dependence of the non-convex convergence rate on the penalty coefficient is not derived in the current version. The similarity-aware penalty is a smooth quadratic term that preserves the Lipschitz smoothness and bounded-variance assumptions under the stated conditions. To strengthen the central claim, we will add the explicit rate dependence (showing the standard 1/sqrt(T) scaling modulated by the penalty) to the theorem in Section 4 and include a brief empirical verification that the assumptions continue to hold for the large-scale vision and language models used in the experiments. revision: yes
Referee: Experimental section: the reported superior trade-offs (e.g., higher forgetting efficacy with retained utility) are shown via benchmarks, but hyperparameter selection for the penalty coefficient, learning rates in the two loops, and exact controls for baseline methods are not detailed; without these, the empirical support for the weakest assumption (stable decorrelation without overhead) cannot be fully assessed.

Authors: We agree that additional experimental details are necessary for reproducibility and to fully support the claim of stable decorrelation. In the revised manuscript we will expand the experimental section with the grid-search ranges and final values chosen for the penalty coefficient, the inner- and outer-loop learning rates, and precise descriptions of how each baseline was implemented and tuned (including any early-stopping or regularization controls). These additions will allow readers to assess the overhead and stability of the decorrelation effect. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper formulates machine unlearning as a bi-level optimization problem and introduces a similarity-aware penalty term explicitly constructed to decorrelate gradients between forget and retention objectives. The two-loop algorithm and associated convergence analysis are developed from standard optimization principles with stated assumptions for convex and non-convex regimes. No load-bearing step reduces by the paper's own equations to a self-definition, a fitted parameter renamed as prediction, or a self-citation chain. The central claims rest on independent theoretical derivations and empirical benchmarks rather than circular reductions, making the framework self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework depends on the effectiveness of the introduced penalty term and the convergence properties of the two-loop algorithm, both of which require empirical and theoretical validation beyond the abstract.

free parameters (1)

penalty coefficient
Controls the strength of the similarity-aware penalty applied during the inner maximization to decorrelate gradients.

axioms (1)

domain assumption The two-loop algorithm converges with provable rates under both convex and non-convex regimes.
Invoked to justify scalability and theoretical support for the method.

invented entities (1)

similarity-aware penalty no independent evidence
purpose: Decorrelates gradients of forget and retention objectives in the inner maximization step.
New term introduced to mitigate conflicting gradient directions in the multi-objective unlearning problem.

pith-pipeline@v0.9.0 · 5778 in / 1289 out tokens · 71973 ms · 2026-05-18T13:01:50.080694+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

penalty-based bi-level optimization framework … inner maximization step that incorporates a similarity-aware penalty … two-loop algorithm with provable convergence guarantees under both convex and non-convex regimes
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

F(θ) = Lr(θ) + ρ‖∇θΦ(θ)‖² … stationarity condition ∇θΦ(θ)=0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 4 internal anchors

[1]

Unlearning as multi-task optimization: A normalized gradient dif- ference approach with an adaptive learning rate.arXiv preprint arXiv:2410.22086,

Zhiqi Bu, Xiaomeng Jin, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, V olkan Cevher, and Mingyi Hong. Unlearning as multi-task optimization: A normalized gradient dif- ference approach with an adaptive learning rate.arXiv preprint arXiv:2410.22086,

work page arXiv
[2]

Yinzhi Cao and Junfeng Yang

Accessed: 2025-09-16. Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE,

work page 2025
[3]

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tram`er, and Chiyuan Zhang

doi: 10.1109/SP.2015.35. Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tram`er, and Chiyuan Zhang. Quantifying memorization across neural language models. InThe Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, May

work page doi:10.1109/sp.2015.35 2015
[4]

Unlearn what you want to forget: Efficient unlearning for llms

OpenReview.net. Jiaao Chen and Diyi Yang. Unlearn what you want to forget: Efficient unlearning for llms.arXiv preprint arXiv:2310.20150,

work page arXiv
[5]

Snap: Unlearning selective knowledge in large language models with negative instructions.arXiv preprint arXiv:2406.12329,

Minseok Choi, Daniel Rim, Dohyun Lee, and Jaegul Choo. Snap: Unlearning selective knowledge in large language models with negative instructions.arXiv preprint arXiv:2406.12329,

work page arXiv
[6]

Quang-Vinh Dang

doi: 10.1007/s10479-007-0176-2. Quang-Vinh Dang. Right to be forgotten in the age of machine learning. InAdvances in Digital Science: ICADS 2021, pp. 403–411. Springer,

work page doi:10.1007/s10479-007-0176-2 2021
[7]

Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, and Ivan Vulic

doi: 10.1007/978-3-030-52119-6. Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, and Ivan Vulic. Undial: Self- distillation with adjusted logits for robust unlearning in large language models.arXiv preprint arXiv:2402.10052,

work page doi:10.1007/978-3-030-52119-6
[8]

Who’s Harry Potter? Approximate Unlearning in LLMs, October 2023

Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms.arXiv preprint arXiv:2310.02238,

work page arXiv
[9]

Simplicity prevails: Rethinking negative preference optimization for llm unlearning

URLhttps://eur-lex. europa.eu/eli/reg/2016/679/oj. Regulation (EU) 2016/679. Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Sim- plicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163,

work page arXiv 2016
[10]

doi: 10.1109/TPAMI.2021.3079209. James Y . Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, and Muhao Chen. Offset unlearning for large language models.Transactions on Machine Learning Research, May

work page doi:10.1109/tpami.2021.3079209 2021
[11]

Are large pre-trained language models leaking your personal information? InFindings of the Association for Computational Linguistics: EMNLP 2022, pp

Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang. Are large pre-trained language models leaking your personal information? InFindings of the Association for Computational Linguistics: EMNLP 2022, pp. 2038–2047, Abu Dhabi, United Arab Emirates, December

work page 2022
[12]

Noune, P

URLhttps://doi.org/10.48550/arXiv. 2304.04934. Spotlight. Anastasia Koloskova, Youssef Allouah, Animesh Jha, Rachid Guerraoui, and Sanmi Koyejo. Certi- fied unlearning for neural networks. InProceedings of the 42nd International Conference on Ma- chine Learning, volume 267 ofProceedings of Machine Learning Research, Vancouver, Canada,

work page internal anchor Pith review doi:10.48550/arxiv
[13]

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Technical Report. Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Z...

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Large language model unlearning via embedding-corrupted prompts.arXiv preprint arXiv:2406.07933, 2024a

Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearning via embedding-corrupted prompts.arXiv preprint arXiv:2406.07933, 2024a. Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearning via embedding-corrupted prompts.arXiv preprint arXiv:2406.07933, 2024b. Aleksander Madry, Aleksan...

work page arXiv
[15]

TOFU: A Task of Fictitious Unlearning for LLMs

URLhttps://openreview.net/forum?id= rJzIBfZAb. Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms.arXiv preprint arXiv:2401.06121, 2024a. Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, and J. Zico Kolter. Tofu: A task of fictitious unlearning for llms. InInt...

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Alternate preference optimization for unlearning factual knowledge in large language models.arXiv preprint arXiv:2409.13474,

Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, and Elita Lobo. Alternate preference optimization for unlearning factual knowledge in large language models.arXiv preprint arXiv:2409.13474,

work page arXiv
[17]

Unlearnable algorithms for in-context learning.arXiv preprint arXiv:2402.00751,

Andrei Muresanu, Anvith Thudi, Michael R Zhang, and Nicolas Papernot. Unlearnable algorithms for in-context learning.arXiv preprint arXiv:2402.00751,

work page arXiv
[19]

On First-Order Meta-Learning Algorithms

arXiv:1803.02999. Office of the Privacy Commissioner of Canada. Announcement: Privacy commissioner seeks federal court determination on key issue for canadians’ online reputation.https://www.priv.gc. ca/en/opc-news/news-and-announcements/2018/an_181010/, Oct

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju

Ac- cessed: 2025-09-16. Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579,

work page arXiv 2025
[21]

Large language model unlearning

Yuanshun Yao, Xiaojun Xu, and Yang Liu. Large language model unlearning.arXiv preprint arXiv:2310.10683,

work page arXiv
[22]

Towards certified unlearning for deep neural networks

Binchi Zhang, Yushun Dong, Tianhao Wang, and Jundong Li. Towards certified unlearning for deep neural networks. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, Vienna, Austria, 2024a. PMLR. Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun...

work page arXiv
[23]

mlr.press/v97/zhang19p.html

URLhttp://proceedings. mlr.press/v97/zhang19p.html. R. Zhang, L. Lin, Y . Bai, and S. Mei. Negative preference optimization: From catastrophic collapse to effective unlearning. InConference on Learning on Large Language Models (COLM), 2024b. Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tram `er, and Javier Rando. An adversarial perspect...

work page arXiv
[24]

Therefore, it must be that∇ θΦ(θ∗) = 0for any accumulation pointθ ∗ of the minimizers asρ→ ∞

Then, for sufficiently largeρ, the penalty termρ∥∇ θΦ(θ∗ ρ)∥2 would dominateF(θ ∗ ρ), causing it to diverge to infinity, which contradicts the assumption thatF(θ ∗ ρ)is minimized and bounded below. Therefore, it must be that∇ θΦ(θ∗) = 0for any accumulation pointθ ∗ of the minimizers asρ→ ∞. 7.2.2 PROOF OFLEMMA2 Proof.Letd (t) :=θ ∗ in −θ ′(t). By convexit...

work page 2013
[25]

Higher is better

Table 9: OFMU component ablation (TOFUforget05). Higher is better. Variant FQ↑MU↑Hard-sample Emb. Sim.↑ Penalty only (no similarity-aware) 0.36 0.53 0.71 Two-loop only (no penalty) 0.33 0.54 0.69 Full OFMU0.38 0.54 0.73 The results reveal that both the penalty reformulation and the similarity-aware gradient decorrelation are critical. Removing similarity-...

work page 2023
[26]

7.4.12 MODELS ANDEXPERIMENTALSETUP For TOFU, we evaluate two model architectures:LLaMA-2-7B-hf-chat 3 and LLaMA-3.2-1B-Instruct4

and Influence Unlearning (IU) (Mehta et al., 2022). 7.4.12 MODELS ANDEXPERIMENTALSETUP For TOFU, we evaluate two model architectures:LLaMA-2-7B-hf-chat 3 and LLaMA-3.2-1B-Instruct4. While WMDP experiments are carried out on Zephyr-7B-beta5. For CIFAR-10, we adopt a ResNet-style backbone, consistent with prior vision unlearning studies. All experiments are...

work page 2022
[27]

Early approaches of machine unlearning focused on exact unlearning, which requires retraining the model from scratch after excluding the forget set (Bourtoule et al., 2021)

as a framework for removing the influence of specific training instances from a trained model. Early approaches of machine unlearning focused on exact unlearning, which requires retraining the model from scratch after excluding the forget set (Bourtoule et al., 2021). While these methods provide strong correct- ness guarantees, retraining is computational...

work page 2021
[28]

These methods require careful data construction for each unlearning task and risk semantic drift or factual incoherence

proposeAlternate Preference Optimization, which combines negative feedback on forget examples with positive in-domain alternatives, yielding more coherent behavior than refusal-only tuning. These methods require careful data construction for each unlearning task and risk semantic drift or factual incoherence. Additionally, performance on unrelated domains...

work page 2022
[29]

A naive approach would explicitly construct the Hessian and then perform a matrix-vector multi- plication, which incursO(d 2)time and memory complexity

7.6 HESSIAN-VECTORPRODUCT VIAAUTOMATICDIFFERENTIATION The penalty term in our formulation requires computing the Hessian-vector product ∇2 θΦ(θ(k) in )∇ θΦ(θ(k) in ),(42) 23 Under review where∇ θΦ(θ(k) in )∈R d is the gradient of the inner objective and∇ 2 θΦ(θ(k) in )∈R d×d is its Hessian matrix. A naive approach would explicitly construct the Hessian an...

work page 1994