ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules

Ruihao Pan; Suhang Wang

arxiv: 2605.27138 · v1 · pith:YUP4RQSRnew · submitted 2026-05-26 · 💻 cs.AI

ICCU: In-Context Continual Unlearning via Pattern-Induced Refusal Rules

Ruihao Pan , Suhang Wang This is my paper

Pith reviewed 2026-06-29 17:15 UTC · model grok-4.3

classification 💻 cs.AI

keywords machine unlearningcontinual unlearningin-context learningrefusal ruleslanguage modelsdata removalsequential requests

0 comments

The pith

In-context refusal rules let language models unlearn specific data sequentially without parameter changes or interference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ICCU, a framework for in-context continual unlearning in language models. It works by inducing readable refusal rules from datasets marked for unlearning and then using those rules at inference time through prompts or filters. This avoids the need for repeated fine-tuning, which is expensive and can degrade performance over multiple requests. The method is compositional because rules are combined independently of order, and it lets the original data be discarded once rules are created. Readers would care if this makes it practical to honor data deletion requests in production models over time.

Core claim

ICCU induces readable refusal rules from unlearning datasets and applies them at inference time either as a filter or via the system prompt, without modifying model parameters. Because rules are accumulated as an order-independent union, ICCU is compositional and free of cross-request interference, and the original forget-set data can be discarded after rule induction. Experiments show it suppresses target knowledge while preserving utility and handles paraphrased and cross-lingual queries.

What carries the argument

Pattern-induced refusal rules applied in-context at inference time to block recall of forgotten knowledge.

If this is right

Target knowledge is suppressed while model utility on other tasks is preserved.
The approach scales to multiple sequential unlearning requests without cross-interference or cumulative utility loss.
Robustness holds for paraphrased queries and queries in different languages.
Original unlearning data can be discarded after rule induction, reducing storage needs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models could comply with privacy regulations more efficiently in ongoing deployments.
The method might apply to other types of knowledge suppression beyond data unlearning.
It raises the possibility of hybrid systems combining in-context rules with occasional fine-tuning for harder cases.

Load-bearing premise

Readable refusal rules induced from the unlearning dataset can cover all possible ways the model might express or recall the forgotten knowledge, including paraphrases and cross-lingual queries.

What would settle it

A query using a novel paraphrase or untranslated phrasing of the target knowledge that the induced rules do not block, yet the model still produces the correct answer.

Figures

Figures reproduced from arXiv: 2605.27138 by Ruihao Pan, Suhang Wang.

**Figure 1.** Figure 1: Overview of the ICCU framework. Stage 1 (Continual Refusal Rule Generation) processes each forget dataset Dt by (1) clustering its samples, (2) inducing one natural-language refusal rule per cluster, and (3) accumulating the rules across rounds into a shared repository R. Stage 2 (Inference-Time Adaptive Rule Retrieval) handles each query q by (4) cluster gating, which answers q directly when it is out of … view at source ↗

**Figure 2.** Figure 2: Refusal Rate under varying gating thresholds [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Machine unlearning aims to remove the influence of specific data from trained language models. In real-world deployments, unlearning requests often arrive sequentially, which challenges existing fine-tuning-based methods: fine-tuning each request is costly, accumulates utility loss, and may cause cross-request interference. To address these issues, we propose ICCU (In-Context Continual Unlearning), an in-context continual unlearning framework that induces readable refusal rules from unlearning datasets and applies them at inference time either as a filter or via the system prompt, without modifying model parameters. Because rules are accumulated as an order-independent union, ICCU is compositional and free of cross-request interference, and the original forget-set data can be discarded after rule induction. Extensive experiments show that ICCU effectively suppresses target knowledge while preserving utility, scales across sequential requests, and remains robust to paraphrased and cross-lingual queries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ICCU shifts continual unlearning to inference-time refusal rules instead of repeated fine-tuning, which solves the interference and cost problems if the rules actually cover everything.

read the letter

The main takeaway is that this paper gives a way to handle sequential unlearning requests without touching model weights. It extracts readable refusal rules from each forget set, keeps them as an accumulating set, and applies the whole collection at inference either as a filter or in the prompt.

What is actually new is the move to a purely in-context, order-independent mechanism. Most prior work retrains on each new request, which costs compute, risks utility drop, and can let earlier unlearning interfere with later ones. ICCU discards the original data after rule induction and treats the rules as a union, so compositionality comes for free.

The paper does a solid job laying out the deployment problem and showing why parameter updates are a bad fit for ongoing requests. The abstract reports that the method suppresses the target knowledge, keeps utility, and holds up on paraphrases plus cross-lingual queries, which is the kind of evidence that matters for the claim.

The soft spot is exactly the one the stress-test flags. The method never changes weights, so success rests on the induced rules blocking every route the model might still use to surface the forgotten facts. Rule induction is presented as an empirical pattern extraction with no completeness argument, and the abstract does not describe why the extracted rules are guaranteed to catch every possible phrasing. If the experiments only test a limited set of queries, leakage on uncovered formulations remains possible. That assumption is load-bearing and needs direct evidence from the method and results sections.

This is aimed at people working on practical unlearning for regulated LLM deployments. A reader who cares about inference-time controls would find the setup worth examining. I would send it to peer review because the problem is real and the approach is distinct, even though the coverage question will need careful referee attention.

Referee Report

2 major / 2 minor

Summary. The paper proposes ICCU, an in-context continual unlearning framework for LLMs that extracts readable refusal rules from sequential unlearning (forget) datasets via pattern induction and applies the accumulated rule set at inference time either as a filter or system prompt, without any parameter updates. The approach is presented as compositional and order-independent, allowing original forget-set data to be discarded post-induction while claiming to suppress target knowledge, preserve utility on unrelated tasks, scale to multiple sequential requests without interference, and remain robust to paraphrased and cross-lingual queries, supported by extensive experiments.

Significance. If the empirical results hold under the coverage assumption, ICCU would provide a lightweight, non-destructive alternative to fine-tuning-based unlearning methods that avoids cumulative utility loss and cross-request interference in sequential settings. The parameter-free, inference-time nature and explicit data-discard property are practical strengths for deployed models; the readability of induced rules could also aid interpretability if coverage proves reliable.

major comments (2)

[Method (rule induction procedure)] The robustness claim to paraphrased and cross-lingual queries (abstract and experimental sections) depends on the induced refusal rules covering all possible expressions of target knowledge. The induction step is described as empirical pattern extraction with no stated completeness argument, exhaustive enumeration, or formal coverage guarantee; this is load-bearing because any uncovered formulation would allow leakage under the in-context-only mechanism.
[Experiments] Experiments demonstrate suppression and utility preservation, but without the full experimental details (baselines, exact metrics, post-hoc rule selection criteria, and failure-case analysis), it is unclear whether the reported robustness generalizes beyond the tested paraphrases or if rule sets were tuned to the evaluation distribution.

minor comments (2)

[Method] Notation for the accumulated rule set (union operation) and its application as filter vs. prompt should be formalized with pseudocode or equations for reproducibility.
[Method] Clarify how rule induction handles conflicting or overlapping patterns across sequential requests to support the order-independence claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment point by point below, with clarifications on the empirical nature of our method and commitments to expand experimental details where appropriate.

read point-by-point responses

Referee: [Method (rule induction procedure)] The robustness claim to paraphrased and cross-lingual queries (abstract and experimental sections) depends on the induced refusal rules covering all possible expressions of target knowledge. The induction step is described as empirical pattern extraction with no stated completeness argument, exhaustive enumeration, or formal coverage guarantee; this is load-bearing because any uncovered formulation would allow leakage under the in-context-only mechanism.

Authors: We acknowledge that the rule induction is an empirical pattern extraction process without a formal completeness argument, exhaustive enumeration, or coverage guarantee. The manuscript presents ICCU as a practical, data-driven approach that induces readable rules from forget datasets and validates generalization through experiments on paraphrased and cross-lingual queries. We do not claim theoretical universality, as that would be infeasible for natural language. We will revise the method section to explicitly note the empirical basis and reliance on experimental evidence rather than formal guarantees. revision: partial
Referee: [Experiments] Experiments demonstrate suppression and utility preservation, but without the full experimental details (baselines, exact metrics, post-hoc rule selection criteria, and failure-case analysis), it is unclear whether the reported robustness generalizes beyond the tested paraphrases or if rule sets were tuned to the evaluation distribution.

Authors: The full manuscript describes the experimental setup, baselines, and metrics. To improve clarity, we will expand the experiments section in revision to provide precise details on post-hoc rule selection criteria, additional failure-case analysis, and confirmation that rules are induced solely from forget data without reference to evaluation queries. This will better demonstrate that reported robustness is not due to distribution-specific tuning. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper proposes an empirical framework (induce refusal rules from forget-set data, apply as filter or system prompt at inference) with no mathematical derivation chain, equations, or fitted parameters that reduce to inputs by construction. Claims rest on experimental results rather than self-referential definitions or self-citation load-bearing premises. The coverage of rules for paraphrases is presented as an empirical assumption, not a derived property that collapses to the input data. This is a standard non-circular empirical method paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that pattern-induced rules can substitute for parameter updates in suppressing knowledge.

axioms (1)

domain assumption It is possible to induce a set of readable refusal rules from the forget-set data that effectively cover the target knowledge even under paraphrasing and cross-lingual queries.
This underpins the claim of robustness and effectiveness without parameter modification.

pith-pipeline@v0.9.1-grok · 5676 in / 1291 out tokens · 52246 ms · 2026-06-29T17:15:48.654894+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 3 canonical work pages · 3 internal anchors

[1]

No Language Left Behind: Scaling Human-Centered Machine Translation

IEEE. Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, and 1 others. 2021. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650. Jiaao Chen and Diyi Yang. 2023. Unlearn what you ...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Pointer Sentinel Mixture Models

Measuring massive multitask language under- standing. InInternational Conference on Learning Representations. Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations. Shengyuan Hu, Yiwei Fu...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Yuanshun Yao, Xiaojun Xu, and Yang Liu. 2024. Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475. Binchi Zhang, Zhengzhang Chen, Zaiyi Zheng, Jundong Li, and Haifeng Chen. 2025a. Resolving editing- unlearning conflicts: A knowledge codebook frame- work...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Determine whether the question matches a rule in the RULE SET
[5]

Output E if the question clearly matches a rule
[6]

I don’t know

Otherwise, answer the question normally with A, B, C, or D. Output ONLY one letter: A, B, C, D, or E. Do not output anything else. We score logits at all five letters. If the argmax is E, the rule is considered triggered, and the final prediction is sampleduniformly at randomfrom {A,B,C,D} . This drives the expected accuracy on triggered questions toward ...

2023

[1] [1]

No Language Left Behind: Scaling Human-Centered Machine Translation

IEEE. Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, and 1 others. 2021. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650. Jiaao Chen and Diyi Yang. 2023. Unlearn what you ...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

Pointer Sentinel Mixture Models

Measuring massive multitask language under- standing. InInternational Conference on Learning Representations. Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations. Shengyuan Hu, Yiwei Fu...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Yuanshun Yao, Xiaojun Xu, and Yang Liu. 2024. Large language model unlearning.Advances in Neural Information Processing Systems, 37:105425–105475. Binchi Zhang, Zhengzhang Chen, Zaiyi Zheng, Jundong Li, and Haifeng Chen. 2025a. Resolving editing- unlearning conflicts: A knowledge codebook frame- work...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Determine whether the question matches a rule in the RULE SET

[5] [5]

Output E if the question clearly matches a rule

[6] [6]

I don’t know

Otherwise, answer the question normally with A, B, C, or D. Output ONLY one letter: A, B, C, D, or E. Do not output anything else. We score logits at all five letters. If the argmax is E, the rule is considered triggered, and the final prediction is sampleduniformly at randomfrom {A,B,C,D} . This drives the expected accuracy on triggered questions toward ...

2023