pith. sign in

arxiv: 2605.01735 · v2 · pith:LPH6T5NMnew · submitted 2026-05-03 · 💻 cs.CL

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

Pith reviewed 2026-05-10 15:58 UTC · model grok-4.3

classification 💻 cs.CL
keywords geometric unlearningLLM privacysynthetic dataToFU benchmarkUnlearnPIIlow-rank geometryhidden state alignmenttargeted forgetting
0
0 comments X

The pith

Geometric Unlearning lets LLMs forget specific private facts using only a handful of synthetic prompts while retaining general performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models deployed in practice must sometimes remove particular pieces of information to satisfy privacy or legal requirements after initial training. Existing unlearning techniques typically demand the full original training data and apply broad changes that can degrade overall capabilities. This work shows that a compact low-rank representation of safe behavior can be extracted from a small collection of reference prompts and then used to realign the model's internal hidden states at inference time. Alignment occurs through projection onto this safe geometry guided by lightweight synthetic anchor prompts, with an added regularizer to limit unintended shifts. The result on standard privacy benchmarks is targeted forgetting accompanied by little loss in non-target tasks, suggesting that unlearning need not require massive data or heavy retraining.

Core claim

The paper establishes that Geometric Unlearning operates directly on prompt-time planning states by first distilling a low-rank geometry of desired safe behavior from a small set of safe reference prompts, then applying projection-based alignment of hidden representations using synthetic in-context anchors, together with a teacher-distillation regularizer on non-target anchors, to suppress target information without access to the original training corpus.

What carries the argument

Geometric Unlearning (GU): extraction of a compact low-rank safe-behavior geometry from reference prompts followed by projection alignment of hidden planning states via synthetic anchors.

If this is right

  • Strong suppression of target entities is achieved on ToFU and UnlearnPII benchmarks without original training data.
  • Non-target performance remains largely intact when alignment uses only minimal synthetic prompts.
  • Localized projection on hidden states avoids the broad gradient updates common in prior methods.
  • A teacher-distillation regularizer on synthetic non-target anchors limits collateral drift during unlearning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same low-rank alignment idea could be tested on unlearning tasks in non-language models such as vision or multimodal systems.
  • Organizations handling regulated data might adopt this approach to meet deletion requests without maintaining full training archives.
  • If the safe geometry remains stable across model scales, the method could support repeated unlearning cycles on the same base model.

Load-bearing premise

A low-rank geometry distilled from a few safe prompts can be projected onto hidden states to suppress chosen target information without broad utility loss or access to the original data.

What would settle it

Run the method on a model in which target facts are deliberately entangled across many dimensions in the hidden states; if target suppression fails or non-target accuracy drops sharply, the geometric alignment approach does not hold.

Figures

Figures reproduced from arXiv: 2605.01735 by Chenchen Tan, Cunjian Chen, Longxiang Gao, Shujie Cui, Xinghao Li, Youyang Qu.

Figure 1
Figure 1. Figure 1: Conventional data-driven unlearning vs. our original￾corpus-free unlearning (GU). Top: Standard unlearning pipelines fine-tune an LLM using target unlearning data Df and retention data Dr, which can re-expose original data and pose privacy risks. Bottom: Our approach uses only user-provided anchor points A to generate synthetic unlearning data Dvirt, and applies Geometric Unlearning on the LLM using Dvirt … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed unlearning framework. The framework is structured into two parallel pathways to balance unlearning and preservation. The top Unlearning Pathway focuses on geometric unlearning by processing target anchors and synthetic unlearning data (Dvirt) through dynamic window masking. The aggregated hidden states (for topic z) are then projected within the Geometric Unlearning Engine to minim… view at source ↗
Figure 3
Figure 3. Figure 3: Privacy risk of MIAs across unlearning methods for two base models (unlearning 10% benchmark data, i.e., Forget-10) measured by the deviation from chance performance |AUC − 0.5| (lower is better). Each row contains three points computed from different MIAs scoring metrics: Min-K, Reference, and Zlib. For each metric, AUC is the ROC area obtained when using the corresponding attack score to distinguish trai… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of synthetic sample budget on unlearning, re￾taining, and runtime. We construct 10 to 40 anchor-conditioned synthetic samples for unlearning, paired with an equal number of synthetic retain samples (1:1 forget and retain) for each setting. We report extraction strength (lower is better), model utility (higher is better), and training time for LLaMA-2-7B and LLaMA-3.2-1B [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 6
Figure 6. Figure 6: Training dynamics under large-scale unlearning (20% forget split) for LLaMA3.2-1B and LLaMA3.2-8B. We track ac￾curacy on the unlearning and retaining sets over training epochs. Shaded bands indicate variability across runs. 6. Conclusion We introduced Geometric Unlearning (GU), a selective un￾learning framework that operates on prompt-time planning representations without access to the original training co… view at source ↗
Figure 5
Figure 5. Figure 5: Unlearning effectiveness and model utility trade-off across model scales and forget splits on UnlearnPII benchmark. The y-axis reports target knowledge suppression (higher indicates better unlearning), and the x-axis reports retained model utility (higher indicates better utility preservation). Large-scale unlearning [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-conditioned hidden states without access to the original training corpus. Specifically, GU distills a compact, low-rank safe-behavior subspace from a small set of safe reference prompts and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden representations to this safe subspace. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Geometric Unlearning (GU) for selective unlearning in LLMs. GU distills a compact low-rank geometry of safe behavior from a small set of safe reference prompts and performs projection-based alignment of hidden planning states using lightweight anchor-in-context synthetic prompts, plus a teacher-distillation regularizer on non-target anchors. The method requires no access to the original training corpus. On the ToFU and UnlearnPII privacy benchmarks, GU is reported to achieve strong target suppression while preserving non-target performance, using only minimal synthetic data.

Significance. If the central geometric alignment mechanism is shown to reliably remove target encodings from hidden states, the work would be significant for practical LLM governance. It reduces reliance on original data and broad updates, offering a data-efficient alternative to existing unlearning techniques. The emphasis on low-rank distillation and synthetic anchors could influence future privacy-preserving methods, provided the approach generalizes beyond output-level metrics.

major comments (2)
  1. [§3] §3 (Geometric Unlearning procedure): The core claim that projection onto the distilled low-rank safe geometry suppresses target information encoded during original training is load-bearing, yet the manuscript provides no hidden-state probing, membership-inference, or subspace analysis to confirm that target signals are removed rather than merely masked at the output level. Without such verification, residual encodings in orthogonal subspaces cannot be ruled out.
  2. [§4] §4 (Benchmark evaluation): Results on ToFU and UnlearnPII report strong target suppression with minimal non-target degradation, but the evaluation relies on output accuracy and refusal metrics. No ablation isolating the contribution of the low-rank projection versus the synthetic anchors or regularizer is presented, making it difficult to attribute success specifically to the geometric component.
minor comments (2)
  1. [§3.1] The notation for the projection operator and the rank parameter in the low-rank geometry distillation should be defined more explicitly, ideally with a small illustrative equation.
  2. [Figure 1] Figure captions for the method overview diagram could more clearly label the flow from safe prompts to anchor alignment and the role of the teacher regularizer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and will incorporate revisions to strengthen the empirical support for the geometric mechanism.

read point-by-point responses
  1. Referee: [§3] §3 (Geometric Unlearning procedure): The core claim that projection onto the distilled low-rank safe geometry suppresses target information encoded during original training is load-bearing, yet the manuscript provides no hidden-state probing, membership-inference, or subspace analysis to confirm that target signals are removed rather than merely masked at the output level. Without such verification, residual encodings in orthogonal subspaces cannot be ruled out.

    Authors: We agree that direct verification of hidden-state suppression is important for substantiating the central mechanism. While the method explicitly aligns planning states via projection and the output-level results on ToFU and UnlearnPII demonstrate effective target suppression with preserved utility, we acknowledge the absence of internal analysis. In the revision we will add hidden-state probing, membership-inference attacks on the target subspace, and before/after subspace overlap metrics to show that target encodings are reduced rather than merely masked at the output. revision: yes

  2. Referee: [§4] §4 (Benchmark evaluation): Results on ToFU and UnlearnPII report strong target suppression with minimal non-target degradation, but the evaluation relies on output accuracy and refusal metrics. No ablation isolating the contribution of the low-rank projection versus the synthetic anchors or regularizer is presented, making it difficult to attribute success specifically to the geometric component.

    Authors: We thank the referee for highlighting the need for component-wise attribution. The current results show the full pipeline works with minimal data, but we agree that isolating the low-rank projection is necessary. In the revised manuscript we will include ablations that remove or replace the projection step (while retaining anchors and regularizer) and report the resulting changes in target suppression and non-target performance on both benchmarks. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper proposes Geometric Unlearning as a new method that distills a low-rank safe geometry from a small set of reference prompts and performs projection-based alignment on hidden states using synthetic anchors, with a teacher-distillation regularizer. All load-bearing steps (geometry distillation, projection alignment, and regularizer) are defined from first principles and external synthetic data rather than fitted to target outcomes or reduced to self-citations. Empirical results on ToFU and UnlearnPII are independent external benchmarks, not constructed by definition from the method inputs. No self-definitional, fitted-prediction, or uniqueness-imported circularity is present in the described chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces novel constructs for unlearning that rest on assumptions about model internals and the efficacy of geometric operations, with no independent evidence provided in the abstract for these entities.

axioms (1)
  • domain assumption The internal hidden states of LLMs during prompt processing contain planning representations that can be aligned geometrically to achieve unlearning.
    This underpins the operation on prompt-time planning states without training data.
invented entities (2)
  • low-rank geometry of desired safe behavior no independent evidence
    purpose: Compact representation of safe behavior distilled from reference prompts for alignment.
    Introduced as the core of the GU method.
  • anchor-in-context synthetic prompts no independent evidence
    purpose: Lightweight prompts to trigger localized projection-based alignment.
    Used to minimize data disclosure.

pith-pipeline@v0.9.0 · 5514 in / 1295 out tokens · 79405 ms · 2026-05-10T15:58:11.548449+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.