Forget by Uncertainty: Orthogonal Entropy Unlearning for Quantized Neural Networks

Jingling Yuan; Junhao Dong; Ke Xu; Tian Zhang; Yujia Tong; Yuze Wang

arxiv: 2602.00567 · v2 · pith:52WNQTTCnew · submitted 2026-01-31 · 💻 cs.LG

Forget by Uncertainty: Orthogonal Entropy Unlearning for Quantized Neural Networks

Tian Zhang , Yujia Tong , Junhao Dong , Ke Xu , Yuze Wang , Jingling Yuan This is my paper

Pith reviewed 2026-05-25 07:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine unlearningquantized neural networksentropy maximizationorthogonal projectiongradient conflictprivacy complianceedge deployment

0 comments

The pith

Entropy maximization on forgotten data combined with orthogonal gradient projection enables unlearning in quantized networks without degrading retained performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets machine unlearning for quantized neural networks on edge devices to satisfy privacy rules requiring data deletion. Prior techniques force models to predict wrong labels on forgotten samples and rely on scalar gradient scaling that leaves directional conflicts intact. The new framework instead raises prediction entropy on data to be forgotten to produce an unbiased forgetting direction and projects those gradients onto the orthogonal complement of gradients from retained data. A sympathetic reader would care because the combination offers a route to remove specific information while keeping model accuracy on everything else, without retraining from scratch.

Core claim

The authors claim that entropy-guided unlearning supplies an unbiased forgetting direction by maximizing prediction uncertainty on forgotten data, thereby avoiding any confident misprediction toward a particular wrong class, while gradient orthogonal projection removes interference by mapping forgetting gradients into the orthogonal complement of retain gradients and supplies a first-order theoretical guarantee that utility on retained data is preserved.

What carries the argument

The Orthogonal Entropy Unlearning (OEU) framework consisting of entropy maximization for the forgetting direction and orthogonal projection to eliminate gradient conflicts.

If this is right

Forgetting effectiveness improves without inducing confident mispredictions on forgotten samples.
Retain accuracy remains higher than with scalar reweighting methods.
Theoretical guarantee of utility preservation holds under first-order approximation.
The method applies directly to quantized models running on edge hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The orthogonal projection step might be combined with other compression methods such as pruning to handle unlearning in additional constrained settings.
Repeated unlearning operations on the same model could accumulate approximation errors not captured by the single-step analysis.
The entropy objective could be tested on non-quantized full-precision models to check whether the unbiased-forgetting property generalizes.

Load-bearing premise

That raising prediction entropy on forgotten data produces unbiased forgetting without introducing new biases and that the first-order approximation for utility preservation holds in practice for quantized networks.

What would settle it

An experiment in which OEU is applied to a quantized model and the resulting predictions on forgotten data show low entropy or the accuracy on retained data falls substantially beyond the first-order prediction.

read the original abstract

The deployment of quantized neural networks on edge devices, combined with privacy regulations like GDPR, creates an urgent need for machine unlearning in quantized models. However, existing methods face critical challenges: they induce forgetting by training models to memorize incorrect labels, conflating forgetting with misremembering, and employ scalar gradient reweighting that cannot resolve directional conflicts between gradients. We propose OEU, a novel Orthogonal Entropy Unlearning framework with two key innovations: 1) Entropy-guided unlearning provides an unbiased forgetting direction by maximizing prediction uncertainty on forgotten data, avoiding confident misprediction toward any specific class, and 2) Gradient orthogonal projection eliminates interference by projecting forgetting gradients onto the orthogonal complement of retain gradients, providing theoretical guarantees for utility preservation under first-order approximation. Extensive experiments demonstrate that OEU outperforms existing methods in both forgetting effectiveness and retain accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OEU pairs entropy maximization for an unbiased forget direction with orthogonal gradient projection, but the first-order utility guarantee looks vulnerable once quantization's discrete steps are considered.

read the letter

The core contribution is a concrete way to steer unlearning in quantized nets: maximize predictive entropy on the forget set so the direction does not push toward any particular wrong class, then project the resulting gradient onto the orthogonal complement of the retain gradients. The authors position this pair as distinct from prior scalar reweighting or label-flipping approaches, and the abstract claims both stronger forgetting and better retention than those baselines. That combination is the actual new piece, and it targets a practical setting—edge deployment under GDPR-style rules—where existing unlearning work has paid less attention to quantization constraints. The experiments are described as showing clear gains on both metrics, which is the part that would interest practitioners if the numbers hold up under standard benchmarks. The load-bearing claim, however, is the theoretical guarantee that the projection preserves utility under a first-order approximation. Quantized parameters live on a discrete grid and updates are typically done in higher precision before requantization; that non-smooth mapping undercuts the continuous differentiability the Taylor argument needs. If the paper does not supply either a tighter bound that accounts for the discretization or direct empirical checks that the retain-set degradation stays small after requantization, the guarantee is weaker than stated. The entropy direction itself is secondary and less affected by this issue. Readers working on efficient-model privacy or unlearning for constrained hardware would find the ideas worth examining, even if they end up adapting the projection step. The work is coherent enough on its own terms to merit referee time so the derivations and experimental controls can be examined in detail.

Referee Report

2 major / 2 minor

Summary. The paper proposes Orthogonal Entropy Unlearning (OEU) for machine unlearning in quantized neural networks. It introduces entropy-guided unlearning to maximize prediction uncertainty on forgotten data for an unbiased forgetting direction, and gradient orthogonal projection to eliminate interference with retain gradients, claiming theoretical guarantees for utility preservation under a first-order approximation. Experiments are said to show superiority over existing methods in forgetting effectiveness and retain accuracy for quantized models on edge devices.

Significance. If the theoretical guarantees hold under quantization, the work would be significant for enabling privacy-compliant unlearning in deployed quantized models, a practical setting under regulations like GDPR. The attempt to derive utility preservation via orthogonal projection and the use of entropy to avoid class-specific mispredictions are strengths worth noting, as is the focus on quantization-specific challenges rather than generic unlearning.

major comments (2)

[Abstract] Abstract (theoretical guarantees paragraph): The first-order Taylor approximation invoked for utility preservation after orthogonal projection assumes continuous differentiability of the loss landscape. In quantized networks, parameters reside on a discrete grid with requantization after updates, introducing a non-smooth mapping that can violate the conditions required for the expansion to bound retain-set degradation. This directly undermines the load-bearing claim of 'theoretical guarantees' for the projection step.
[Method (gradient orthogonal projection)] Method section on gradient orthogonal projection: The projection is defined in continuous space, but no derivation or bound is provided showing how the orthogonality is preserved (or approximately preserved) after the requantization operator is applied. Without this, the first-order guarantee does not transfer to the actual quantized training loop.

minor comments (2)

[Abstract] Abstract: The phrase 'unbiased forgetting direction' is used without a precise definition (e.g., statistical unbiasedness versus directional neutrality); a short clarification would improve readability.
Notation: The orthogonal complement operation and the entropy objective should be given explicit symbols or equation numbers on first use to aid cross-referencing in the theoretical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting important subtleties in the theoretical analysis for quantized networks. We respond point-by-point to the major comments and outline planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract (theoretical guarantees paragraph): The first-order Taylor approximation invoked for utility preservation after orthogonal projection assumes continuous differentiability of the loss landscape. In quantized networks, parameters reside on a discrete grid with requantization after updates, introducing a non-smooth mapping that can violate the conditions required for the expansion to bound retain-set degradation. This directly undermines the load-bearing claim of 'theoretical guarantees' for the projection step.

Authors: We agree that the first-order Taylor expansion formally requires continuous differentiability and that requantization introduces a non-smooth mapping. The derivation in the manuscript is performed in continuous parameter space prior to the quantization step. We will revise the abstract to qualify the guarantee as holding under the first-order approximation in the continuous relaxation, and we will add a short discussion in the method section acknowledging the limitation introduced by discretization. revision: yes
Referee: [Method (gradient orthogonal projection)] Method section on gradient orthogonal projection: The projection is defined in continuous space, but no derivation or bound is provided showing how the orthogonality is preserved (or approximately preserved) after the requantization operator is applied. Without this, the first-order guarantee does not transfer to the actual quantized training loop.

Authors: The projection operates on gradients in continuous space to select the update direction; requantization occurs afterward. No explicit bound on post-requantization orthogonality appears in the current manuscript. We will revise the method section to include either an empirical verification of the retained orthogonality for standard quantization step sizes or an explicit statement that the guarantee applies before requantization. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context present the OEU method as introducing entropy maximization for forgetting direction and orthogonal gradient projection for utility preservation under a standard first-order Taylor approximation. No equations, self-citations, or derivations are exhibited that reduce the central claims to self-definitional inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The theoretical guarantee is framed as an application of existing approximation techniques rather than a closed loop. The paper appears self-contained with independent content relative to external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no information on free parameters, axioms, or invented entities is provided, so the ledger remains empty.

pith-pipeline@v0.9.0 · 5683 in / 1035 out tokens · 32278 ms · 2026-05-25T07:03:17.413580+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Entropy-guided unlearning maximizes prediction uncertainty... Gradient orthogonal projection... under first-order approximation (Theorems 4.1-4.4, Eq. 7-8)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

OEU framework for quantized neural networks (QAT, STE, random/class-wise forgetting)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.