arxiv: 2604.23046 · v1 · submitted 2026-04-24 · 💻 cs.LG · cs.IT· cs.SI· math.IT· stat.ML

Recognition: unknown

Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers

Kennon Stewart

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:09 UTC · model grok-4.3

classification 💻 cs.LG cs.ITcs.SImath.ITstat.ML

keywords machine unlearningsecond-order optimizersgeometric memoryoptimizer state volatilityeigendecaydata deletioncounterfactual model

0 comments

The pith

Current machine unlearning definitions are underspecified for second-order optimizers because they miss residual information in the optimizer state.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares first-order and second-order learners on a data deletion task. Both approaches match an ideal counterfactual model in final performance and gradient behavior. Second-order methods, however, display ongoing volatility in their internal state, which the authors interpret as retained geometric information about the deleted data. Controlled eigendecay perturbations that erase this geometric structure restore stability and confirm information loss.

Core claim

While first- and second-order optimizers both realign with the ideal counterfactual after data deletion in terms of performance and gradients, second-order optimizers exhibit significant volatility in the optimizer state. This volatility signals residual information about the supposedly deleted data that first-order analysis cannot detect. Only eigendecay treatments that apply controlled state perturbation to erase geometric memory restore stability and achieve full information loss.

What carries the argument

Volatility in the second-order optimizer state after data deletion, detected through eigendecay treatments that mimic loss model memory and erase geometric structure via controlled perturbation.

If this is right

Unlearning verification for second-order methods must examine optimizer state stability in addition to performance and gradient metrics.
Eigendecay combined with controlled perturbation erases the geometric component of retained memory.
First-order analysis alone is insufficient to certify successful unlearning when second-order optimizers are used.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Privacy audits for models trained with Hessian-based methods may need to track higher-order state variables to close gaps left by current unlearning standards.
Similar state-volatility signatures could appear in other memory-sensitive settings such as continual learning or federated training.
The geometric framing suggests that unlearning success criteria should be stated in terms of the full optimization trajectory rather than endpoint statistics alone.

Load-bearing premise

That observed volatility in the optimizer state after data deletion directly indicates retained information about the deleted data rather than unrelated numerical or optimization dynamics.

What would settle it

A replication in which the second-order optimizer state shows no elevated volatility after deletion, or in which eigendecay without explicit geometric erasure still eliminates the volatility.

read the original abstract

We argue that current definitions of machine unlearning are underspecified for second-order optimizers. We compare first-order and second-order learners for their ability to handle the data deletion task with varying degrees of eigendecomposition to mimic the loss model memory. While both first and second-order methods realign with the ideal counterfactul in terms of performance and gradient, the second-order optimizer shows significant volatility in the optimizer state. This indicates residual information, supposedly deleted, that isn't detectable by first-order analysis. Various eigendecay treatments show that stability and information loss is regained only under controlled state pertubation where geometric information (or memory) is erased.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags that second-order optimizer states may hold residual geometric memory after unlearning that performance metrics miss, but supplies zero experiments or controls to support it.

read the letter

The main claim here is that unlearning definitions need updating for second-order methods: while both first- and second-order optimizers realign on performance and gradients after data deletion, the second-order case shows volatility in the optimizer state that signals leftover information. Eigendecay perturbations are used to restore stability by erasing that geometric memory. This is the punchline worth noting for anyone working on unlearning verification as second-order tools spread. What the paper does is extend the conversation beyond first-order assumptions by framing the optimizer state geometrically and testing decay treatments. That framing is a reasonable next step if the goal is to understand what counts as true unlearning. The soft spots are substantial and central. The abstract states comparative findings and conclusions with no datasets, no quantitative metrics, no error analysis, and no baselines. There is no control comparing post-unlearning state distance against a model trained only on retained data, nor any quantification of volatility under random perturbations unrelated to deletion. This leaves the interpretation of volatility as retained memory as an assumption rather than a result. The eigendecay mechanism itself is described at a high level without validation that it specifically targets unlearning-related geometry. This work is for researchers focused on machine unlearning, privacy compliance, or second-order optimization in ML. A reader could extract the conceptual gap from the abstract, but the lack of evidence makes the conclusions hard to evaluate or build on. I would not send this to peer review yet. The authors should add the experimental details, controls, and quantitative support first.

Referee Report

2 major / 2 minor

Summary. The paper argues that current definitions of machine unlearning are underspecified for second-order optimizers. It compares first-order and second-order learners on data deletion tasks, using varying degrees of eigendecomposition to model loss memory. Both methods are reported to realign with the ideal counterfactual in performance and gradients, but the second-order optimizer exhibits significant volatility in its state, interpreted as residual geometric memory undetectable by first-order analysis. Eigendecay treatments are shown to restore stability and information loss only under controlled perturbations that erase this geometric memory.

Significance. If the central empirical claims hold with proper controls, the work would highlight an important gap in unlearning evaluation for second-order methods by focusing on optimizer-state geometry rather than performance or gradients alone. This could motivate more complete unlearning criteria that account for Hessian or preconditioner memory, with the eigendecay approach offering a potential tool for controlled forgetting. The geometric framing is a strength if the volatility is causally linked to retained data rather than generic dynamics.

major comments (2)

[Abstract and experimental evaluation] The interpretation that volatility in the second-order optimizer state after deletion and eigendecay indicates retained geometric memory about the deleted data (as stated in the abstract) is load-bearing for the central claim but lacks necessary baseline controls. No comparison is reported to the optimizer state of a model trained ab initio on the retained data only, nor to expected volatility under random eigendecay perturbations independent of the deletion. This leaves open whether the observed volatility reflects specific residual information or generic sensitivity of second-order methods to curvature changes.
[Abstract] The abstract states comparative findings on performance, gradient realignment, and state volatility but supplies no experimental details on datasets, quantitative metrics (e.g., state distance norms, volatility measures), error bars, or statistical analysis. Without these, the magnitude and reliability of the reported differences cannot be evaluated, undermining assessment of whether second-order volatility is 'significant' in a reproducible sense.

minor comments (2)

[Abstract] The abstract contains a typo: 'counterfactul' should be 'counterfactual'.
[Abstract] The description of 'various eigendecay treatments' is vague; the manuscript should specify the exact decay schedules, how they mimic memory, and the precise perturbation controls used to erase geometric information.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important ways to strengthen the empirical support for our claims about residual geometric memory in second-order optimizers. We address the major comments point by point below and will revise the manuscript to incorporate the suggested controls and details.

read point-by-point responses

Referee: [Abstract and experimental evaluation] The interpretation that volatility in the second-order optimizer state after deletion and eigendecay indicates retained geometric memory about the deleted data (as stated in the abstract) is load-bearing for the central claim but lacks necessary baseline controls. No comparison is reported to the optimizer state of a model trained ab initio on the retained data only, nor to expected volatility under random eigendecay perturbations independent of the deletion. This leaves open whether the observed volatility reflects specific residual information or generic sensitivity of second-order methods to curvature changes.

Authors: We agree that these baseline controls are essential to establish causality between the observed volatility and retained geometric memory from the deleted data, rather than generic optimizer sensitivity to curvature perturbations. In the revised manuscript, we will add direct comparisons of the post-unlearning optimizer state to (i) a model trained ab initio solely on the retained data and (ii) models subjected to random eigendecay perturbations unrelated to the deletion task. These additions will quantify whether the volatility is specific to the unlearning scenario and thereby reinforce the geometric memory interpretation. revision: yes
Referee: [Abstract] The abstract states comparative findings on performance, gradient realignment, and state volatility but supplies no experimental details on datasets, quantitative metrics (e.g., state distance norms, volatility measures), error bars, or statistical analysis. Without these, the magnitude and reliability of the reported differences cannot be evaluated, undermining assessment of whether second-order volatility is 'significant' in a reproducible sense.

Authors: We acknowledge that the abstract would benefit from greater specificity to support evaluation of the results. While abstracts are constrained in length, we will revise it to concisely include the datasets employed, the quantitative metrics used for state volatility (such as distance norms and variance measures), and explicit references to the error bars and statistical analyses detailed in the main text. This will improve reproducibility without exceeding typical abstract limits. revision: partial

Circularity Check

0 steps flagged

Empirical comparison of unlearning in first- vs second-order optimizers with no load-bearing circular derivations

full rationale

The paper conducts an empirical study comparing first- and second-order optimizers on data deletion tasks, reporting performance/gradient realignment and state volatility. No mathematical derivations, fitted parameters renamed as predictions, or self-referential definitions are present in the provided text. Any self-citations are peripheral and do not support the central observations, which rest on experimental results rather than tautological reductions. The link from volatility to 'residual geometric memory' is interpretive but not circular by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to assumptions extractable from it. The paper relies on standard domain assumptions about how unlearning success is measured.

axioms (2)

domain assumption Performance and gradient realignment to an ideal counterfactual constitutes evidence of successful unlearning in the first-order sense
Invoked when stating that both methods realign in performance and gradient.
ad hoc to paper Volatility in optimizer state after deletion indicates residual information about deleted data
Central interpretive step linking the observed volatility to memory retention.

pith-pipeline@v0.9.0 · 5408 in / 1319 out tokens · 25369 ms · 2026-05-08T12:09:05.094658+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 7 canonical work pages

[1]

Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. (arXiv:1912.03817), December 2020. arXiv:1912.03817

work page arXiv 1912
[2]

Boyd and Lieven Vandenberghe.Convex optimization

Stephen P. Boyd and Lieven Vandenberghe.Convex optimization. Cambridge University Press, Cambridge New York Melbourne New Delhi Singapore, version 29 edition, 2023. 10

2023
[3]

Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu

Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization.SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995

1995
[4]

Cambridge; New York: Cambridge University Press, 1 edition, 2006

Nicolo Cesa-Bianchi and Gabor Lugosi.Prediction, Learning, and Games. Cambridge; New York: Cambridge University Press, 1 edition, 2006

2006
[5]

When machine unlearning jeopardizes privacy

Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. When machine unlearning jeopardizes privacy. (arXiv:2005.02205), 2021. arXiv:2005.02205

work page arXiv 2005
[6]

Thealgorithmicfoundationsofdifferentialprivacy.Foundations and Trends®in Theoretical Computer Science, 9(3–4):211–487, August 2014

CynthiaDworkandAaronRoth. Thealgorithmicfoundationsofdifferentialprivacy.Foundations and Trends®in Theoretical Computer Science, 9(3–4):211–487, August 2014

2014
[7]

Making ai forget you: Data deletion in machine learning,

Antonio Ginart, Melody Y. Guan, Gregory Valiant, and James Zou. Making ai forget you: Data deletion in machine learning. (arXiv:1907.05012), November 2019. arXiv:1907.05012

work page arXiv 1907
[8]

arXiv preprint arXiv:1911.03030 (2019)

Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens van der Maaten. Certified data removal from machine learning models. (arXiv:1911.03030), November 2023. arXiv:1911.03030

work page arXiv 1911
[9]

Global convergence of online limited memory bfgs

Aryan Mokhtari and Alejandro Ribeiro. Global convergence of online limited memory bfgs. (arXiv:1409.2045), 2014. arXiv:1409.2045

work page arXiv 2045
[10]

Remember

Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. Remember what you want to forget: Algorithms for machine unlearning. (arXiv:2103.03279), 2021. arXiv:2103.03279

work page arXiv 2021
[11]

Machine unlearning via algorithmic stability

Enayat Ullah, Tung Mai, Anup Rao, Ryan Rossi, and Raman Arora. Machine unlearning via algorithmic stability. (arXiv:2102.13179), February 2021. arXiv:2102.13179. 11

work page arXiv 2021