hub

Pratiksha Thaker, Yash Maurya, Shengyuan Hu, Zhiwei Steven Wu, and Virginia Smith

Guardrail baselines for unlearning in llms · 2025 · arXiv 2403.03329

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

cs.CV · 2026-04-03 · conditional · novelty 8.0

VLM-UnBench demonstrates that prompt-based training-free unlearning in VLMs leaves forget accuracy near the no-instruction baseline except under oracle conditions that reveal the target concept.

Improving LLM Unlearning Robustness via Random Perturbations

cs.CL · 2025-01-31 · unverdicted · novelty 7.0

LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.

Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Distinguishable Deletion unifies knowledge erasure and refusal for LLM unlearning via an energy index that enforces boundaries during training and enables refusal at inference.

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Targeting minor components in LLM representations during unlearning yields substantially better resistance to relearning attacks than prior methods.

CAP: Controllable Alignment Prompting for Unlearning in LLMs

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

CAP is a reinforcement-learning-driven prompt optimization framework that suppresses target knowledge in LLMs while preserving general capabilities, enabling reversible unlearning without any parameter updates.

Representation-Guided Parameter-Efficient LLM Unlearning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

CURaTE performs continual unlearning in LLMs in real time by using sentence embeddings to detect and refuse forget requests without changing model parameters, achieving effective forgetting and perfect knowledge preservation.

Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning

cs.LG · 2025-10-01 · conditional · novelty 6.0

Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.

Short paper: Models in the dark -- Rectification and erasure under GDPR in ML supply chains

cs.LG · 2026-06-04 · unverdicted · novelty 5.0

Survey identifying technical and supply-chain barriers to GDPR data subject rights in ML, with new framing of 'models in the dark' for downstream opacity.

Runtime-Structured Task Decomposition for Agentic Coding Systems

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Runtime-structured task decomposition reduces retry costs in agentic coding systems by up to 51.7% versus monolithic prompts by rerunning only failed subtasks on two software engineering workloads.

AI as a Tool for Simulation-Based Experiments in Literary Studies

cs.CL · 2026-06-01 · unverdicted · novelty 4.0

Proposes AI-driven simulations for literary-historical experiments and reports preliminary text-generation results claiming the first limited in-distribution outputs matching human novels.

On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning

cs.CL · 2026-05-26 · unverdicted · novelty 4.0

Counterfactual tuning for LLM unlearning induces knowledge conflict and hallucination spillover, diagnosed via the RWKU+ benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Runtime-Structured Task Decomposition for Agentic Coding Systems cs.SE · 2026-05-14 · unverdicted · none · ref 52
Runtime-structured task decomposition reduces retry costs in agentic coding systems by up to 51.7% versus monolithic prompts by rerunning only failed subtasks on two software engineering workloads.

Pratiksha Thaker, Yash Maurya, Shengyuan Hu, Zhiwei Steven Wu, and Virginia Smith

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer