Recognition: 2 theorem links
· Lean TheoremSalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Pith reviewed 2026-05-16 17:52 UTC · model grok-4.3
The pith
Gradient-based weight saliency enables effective unlearning of data, classes, or concepts in both image classifiers and generators while approaching exact retraining performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SalUn computes weight saliency by examining gradients of the forgetting data and then applies an optimization step that updates primarily those salient weights. The result erases targeted information in image classification models and prevents conditional diffusion models from generating specified concepts. Experiments show stability advantages on random data removal and near-100 percent unlearning accuracy on harmful-image prevention tasks, all while preserving accuracy on retained data.
What carries the argument
Gradient-based weight saliency, which ranks model parameters by their gradient magnitude or influence on the forgetting objective and thereby restricts unlearning updates to those parameters.
Load-bearing premise
Gradient information reliably isolates the exact parameters responsible for the forgetting data without causing large unintended changes to retained knowledge.
What would settle it
If a model processed by SalUn still classifies forgotten classes or generates forbidden concepts at rates close to the original trained model on held-out test examples, the claim of effective unlearning would be refuted.
read the original abstract
With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SalUn, a machine unlearning method that computes gradient-based weight saliency on forgetting data to identify and selectively update a subset of model parameters, thereby erasing the influence of specific data points, classes, or concepts. It reports empirical results on image classification (0.2% gap to exact unlearning on CIFAR-10) and conditional diffusion models (near-100% unlearning accuracy, outperforming Erased Stable Diffusion and Forget-Me-Not), claiming to be the first principled approach effective for both domains.
Significance. If the isolation of salient weights holds, the work offers a unified, efficient framework for machine unlearning that narrows the gap to exact retraining while extending to generative models; the open-sourced code at the provided GitHub link is a clear strength that supports reproducibility and further testing.
major comments (2)
- [§3.2] §3.2, Eq. (3): the top-k gradient saliency computed solely on forgetting samples assumes these weights encode the target influence in isolation; however, in shared-backbone networks (ResNet/VGG for classification, U-Net for diffusion), gradients may highlight parameters also used by retained classes/concepts, and the manuscript provides no direct measurement of salient-set overlap or degradation when thresholds vary.
- [Experiments] Experimental section: reported gaps (0.2% on CIFAR-10, near-100% on diffusion) are presented without error bars, seed-wise stability checks on the saliency computation, or ablations on the saliency threshold k; these omissions make it difficult to confirm that the small advantage over baselines is robust rather than sensitive to initialization or hyperparameter choice.
minor comments (2)
- [Abstract] Abstract: the 0.2% stability advantage is stated without naming the exact metric (accuracy? loss?) or reporting variability; add this detail for clarity.
- [§3] Notation: ensure consistent use of symbols for saliency scores and update rules across equations and text; a short table summarizing symbols would aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have revised the paper to incorporate additional analyses for greater clarity and robustness.
read point-by-point responses
-
Referee: [§3.2] §3.2, Eq. (3): the top-k gradient saliency computed solely on forgetting samples assumes these weights encode the target influence in isolation; however, in shared-backbone networks (ResNet/VGG for classification, U-Net for diffusion), gradients may highlight parameters also used by retained classes/concepts, and the manuscript provides no direct measurement of salient-set overlap or degradation when thresholds vary.
Authors: We appreciate this observation regarding the potential for parameter overlap in shared-backbone architectures. While the saliency computation focuses on forgetting samples to identify the most affected weights, we acknowledge that some shared parameters may exist. To directly address this, the revised manuscript now includes a quantitative analysis of the overlap between the top-k salient weight sets derived from forgetting data versus retained data (or concepts). This overlap is measured across the classification and diffusion experiments and shown to be limited, supporting the targeted nature of the updates. We have also added an ablation on varying the threshold k, reporting both unlearning effectiveness and any degradation in retained performance to demonstrate stability within the chosen operating range. revision: yes
-
Referee: [Experiments] Experimental section: reported gaps (0.2% on CIFAR-10, near-100% on diffusion) are presented without error bars, seed-wise stability checks on the saliency computation, or ablations on the saliency threshold k; these omissions make it difficult to confirm that the small advantage over baselines is robust rather than sensitive to initialization or hyperparameter choice.
Authors: We agree that the absence of error bars, multi-seed checks, and k ablations limits the ability to assess robustness. The revised experimental section now reports results averaged over multiple random seeds (with standard deviations shown as error bars) for the primary metrics on CIFAR-10 and the conditional diffusion models. We have also added a dedicated ablation study on the saliency threshold k, illustrating how performance varies with different k values and confirming that the reported gaps to exact unlearning remain stable and small within the selected range. These additions substantiate that the advantages are not artifacts of a single initialization or hyperparameter setting. revision: yes
Circularity Check
No significant circularity: SalUn is a new algorithmic construction validated against external baselines.
full rationale
The paper introduces gradient-based weight saliency (Eq. 3 in §3.2) as a novel MU procedure that computes top-k salient weights from forgetting-data gradients and applies targeted updates. Performance is measured directly against exact retraining from scratch and prior MU baselines (e.g., Erased Stable Diffusion) on CIFAR-10, ImageNet, and diffusion models, with reported gaps (0.2% stability) and unlearning accuracy (~100%). No equation reduces the claimed improvement to a fitted hyperparameter by definition, no self-citation chain justifies the core premise, and the uniqueness claim is presented as an empirical observation rather than a theorem derived from prior author work. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gradient-based saliency computed on forgetting data isolates the relevant model weights for unlearning
invented entities (1)
-
weight saliency
no independent evidence
Forward citations
Cited by 17 Pith papers
-
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
-
Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
-
Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models
CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
-
Efficient Unlearning through Maximizing Relearning Convergence Delay
The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.
-
Is your algorithm unlearning or untraining?
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
-
CURE:Circuit-Aware Unlearning for LLM-based Recommendation
CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.
-
Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning
A contrastive visual forgetting technique constrained to the null space of retained knowledge enables targeted unlearning of visual concepts in MLLMs while preserving non-target visual and all textual knowledge.
-
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
-
IPRU: Input-Perturbation-based Radio Frequency Fingerprinting Unlearning for LAWNs
IPRU erases target AAV radio fingerprints via an optimized input perturbation vector, delivering 1.41% unlearning accuracy, 99.41% remaining accuracy, full membership-inference resistance, and 5.79X speedup over retraining.
-
Beyond Text Prompts: Precise Concept Erasure through Text-Image Collaboration
TICoE achieves more precise and faithful concept erasure in text-to-image models by collaborating text and image data through a convex manifold and hierarchical learning, outperforming prior methods.
-
Class Unlearning via Depth-Aware Removal of Forget-Specific Directions
DAMP performs one-shot class unlearning by extracting and projecting out forget-specific residual directions at each network depth using class prototypes and a separability-derived scaling rule.
-
BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning
BID-LoRA uses bi-directional low-rank adapters with retain/new/unlearn pathways and escape unlearning to enable continual learning and unlearning while minimizing knowledge leakage and parameter updates.
-
EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
-
Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?
Unlearning a demographic group in CLIP models redistributes bias primarily along gender boundaries rather than eliminating it.
-
Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models
Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.
-
Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
Jellyfish enables zero-shot federated unlearning through synthetic proxy data generation, channel-restricted knowledge disentanglement, and a composite loss with repair to forget target data while retaining model utility.
-
Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures
A modified SISA architecture with replay and gating achieves effective class removal from trained CNNs on image datasets while preserving accuracy and cutting retraining costs.
Reference graph
Works this paper leans on
-
[1]
Sanity checks for saliency maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018
work page 2018
-
[2]
Gradient surgery for one-shot unlearning on generative model, 2023
Seohui Bae, Seoyoon Kim, Hyemin Jung, and Woohyung Lim. Gradient surgery for one-shot unlearning on generative model, 2023
work page 2023
-
[4]
Nudenet: Neural nets for nudity classification, detection and selective censoring, 2019
P Bedapudi. Nudenet: Neural nets for nudity classification, detection and selective censoring, 2019
work page 2019
-
[6]
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp.\ 1897--1914. IEEE, 2022
work page 2022
-
[7]
Grad- CAM ++: Generalized gradient-based visual explanations for deep convolutional networks
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad- CAM ++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.\ 839--847. IEEE, 2018
work page 2018
-
[8]
Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. Graph unlearning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp.\ 499--513, 2022 a
work page 2022
-
[9]
Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary
Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7766--7775, 2023
work page 2023
-
[10]
Quarantine: Sparsity can uncover the trojan attack trigger for free
Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu, and Zhangyang Wang. Quarantine: Sparsity can uncover the trojan attack trigger for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 598--609, 2022 b
work page 2022
-
[15]
Our data, ourselves: Privacy via distributed noise generation
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual international conference on the theory and applications of cryptographic techniques, pp.\ 486--503. Springer, 2006
work page 2006
-
[18]
Making ai forget you: Data deletion in machine learning
Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019
work page 2019
-
[19]
Eternal sunshine of the spotless net: Selective forgetting in deep networks
Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 9304--9312, 2020
work page 2020
-
[20]
Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.\ 11516--11524, 2021
work page 2021
-
[24]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 770--778, 2016
work page 2016
-
[25]
Selective amnesia: A continual learning approach to forgetting in deep generative models, 2023
Alvin Heng and Harold Soh. Selective amnesia: A continual learning approach to forgetting in deep generative models, 2023
work page 2023
-
[27]
The european union general data protection regulation: what it is and what it means
Chris Jay Hoofnagle, Bart van der Sloot, and Frederik Zuiderveen Borgesius. The european union general data protection regulation: what it is and what it means. Information & Communications Technology Law, 28 0 (1): 0 65--98, 2019
work page 2019
-
[28]
Fastai: A layered api for deep learning
Jeremy Howard and Sylvain Gugger. Fastai: A layered api for deep learning. Information, 11 0 (2): 0 108, 2020
work page 2020
-
[30]
Approximate data deletion from machine learning models
Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, and James Zou. Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics, pp.\ 2008--2016. PMLR, 2021
work page 2008
-
[31]
A data-based perspective on transfer learning
Saachi Jain, Hadi Salman, Alaa Khaddaj, Eric Wong, Sung Min Park, and Aleksander M a dry. A data-based perspective on transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3613--3622, 2023
work page 2023
-
[32]
How can i explain this to you? an empirical study of deep neural network explanation methods
Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng, Luis Garcia, and Mani Srivastava. How can i explain this to you? an empirical study of deep neural network explanation methods. Advances in Neural Information Processing Systems, 33: 0 4211--4222, 2020
work page 2020
-
[34]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In International conference on machine learning, pp.\ 1885--1894. PMLR, 2017
work page 2017
-
[35]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[36]
Tiny imagenet visual recognition challenge
Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7 0 (7): 0 3, 2015
work page 2015
-
[39]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.\ 10012--10022, 2021
work page 2021
-
[40]
Locating and editing factual associations in gpt
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35: 0 17359--17372, 2022
work page 2022
-
[42]
Descent-to-delete: Gradient-based methods for machine unlearning
Seth Neel, Aaron Roth, and Saeed Sharifi-Malvajerdi. Descent-to-delete: Gradient-based methods for machine unlearning. In Algorithmic Learning Theory, pp.\ 931--962. PMLR, 2021
work page 2021
-
[43]
Reading digits in natural images with unsupervised feature learning
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011
work page 2011
-
[45]
Neal Parikh, Stephen Boyd, et al. Proximal algorithms. Foundations and trends in Optimization , 1 0 (3): 0 127--239, 2014
work page 2014
-
[49]
Scaling vision with sparse mixture of experts
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, Andr \'e Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 34: 0 8583--8595, 2021
work page 2021
-
[50]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022
work page 2022
-
[51]
Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models, 2023
Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models, 2023
work page 2023
-
[53]
Laion-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35: 0 25278--25294, 2022
work page 2022
-
[54]
Remember what you want to forget: Algorithms for machine unlearning
Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34: 0 18075--18086, 2021
work page 2021
-
[55]
Grad- CAM : Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad- CAM : Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 618--626, 2017
work page 2017
-
[60]
Diffusion art or digital forgery? investigating data replication in diffusion models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 6048--6058, 2023
work page 2023
-
[62]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.\ 3319--3328. JMLR. org, 2017
work page 2017
-
[63]
Unrolling sgd: Understanding factors influencing machine unlearning
Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling sgd: Understanding factors influencing machine unlearning. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), pp.\ 303--319. IEEE, 2022 a
work page 2022
-
[64]
On the necessity of auditable algorithmic definitions for machine unlearning
Anvith Thudi, Hengrui Jia, Ilia Shumailov, and Nicolas Papernot. On the necessity of auditable algorithmic definitions for machine unlearning. In 31st USENIX Security Symposium (USENIX Security 22), pp.\ 4007--4022, 2022 b
work page 2022
-
[65]
Machine unlearning via algorithmic stability
Enayat Ullah, Tung Mai, Anup Rao, Ryan A Rossi, and Raman Arora. Machine unlearning via algorithmic stability. In Conference on Learning Theory, pp.\ 4126--4142. PMLR, 2021
work page 2021
-
[66]
Federated unlearning via class-discriminative pruning
Junxiao Wang, Song Guo, Xin Xie, and Heng Qi. Federated unlearning via class-discriminative pruning. In Proceedings of the ACM Web Conference 2022, pp.\ 622--632, 2022
work page 2022
-
[68]
Leveraging sparse linear layers for debuggable deep networks
Eric Wong, Shibani Santurkar, and Aleksander Madry. Leveraging sparse linear layers for debuggable deep networks. In International Conference on Machine Learning, pp.\ 11205--11216. PMLR, 2021
work page 2021
-
[69]
Federated unlearning: Guarantee the right of clients to forget
Leijie Wu, Song Guo, Junxiao Wang, Zicong Hong, Jie Zhang, and Yaohong Ding. Federated unlearning: Guarantee the right of clients to forget. IEEE Network, 36 0 (5): 0 129--135, 2022
work page 2022
-
[71]
Visualizing and understanding convolutional networks
Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pp.\ 818--833. Springer, 2014
work page 2014
-
[74]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 2921--2929, 2016
work page 2016
-
[75]
2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) , pages=
Unrolling sgd: Understanding factors influencing machine unlearning , author=. 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) , pages=. 2022 , organization=
work page 2022
-
[76]
Can sensitive information be deleted from llms? objectives for defending against extraction attacks
Can sensitive information be deleted from llms? objectives for defending against extraction attacks , author=. arXiv preprint arXiv:2309.17410 , year=
-
[77]
International Conference on Machine Learning , pages=
Understanding instance-level impact of fairness constraints , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[78]
Canadian privacy law: The personal information protection and electronic documents act (PIPEDA) , author=. Int'l. In-House Counsel J. , volume=. 2008 , publisher=
work page 2008
-
[79]
arXiv preprint arXiv:2301.09753 , year=
Towards Modular Machine Learning Solution Development: Benefits and Trade-offs , author=. arXiv preprint arXiv:2301.09753 , year=
-
[80]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Quarantine: Sparsity can uncover the trojan attack trigger for free , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[81]
Advances in neural information processing systems , volume=
Sanity checks for saliency maps , author=. Advances in neural information processing systems , volume=
-
[82]
SmoothGrad: removing noise by adding noise
Smoothgrad: removing noise by adding noise , author=. arXiv preprint arXiv:1706.03825 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[83]
International Conference on Machine Learning , pages=
Leveraging sparse linear layers for debuggable deep networks , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[84]
Advances in Neural Information Processing Systems , volume=
Prompt certified machine unlearning with randomized gradient smoothing and quantization , author=. Advances in Neural Information Processing Systems , volume=
-
[85]
European conference on computer vision , pages=
Visualizing and understanding convolutional networks , author=. European conference on computer vision , pages=. 2014 , organization=
work page 2014
-
[86]
Striving for Simplicity: The All Convolutional Net
Striving for simplicity: The all convolutional net , author=. arXiv preprint arXiv:1412.6806 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[87]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Deep inside convolutional networks: Visualising image classification models and saliency maps , author=. arXiv preprint arXiv:1312.6034 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[88]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Diffusion art or digital forgery? investigating data replication in diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[89]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Learning deep features for discriminative localization , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[90]
RISE: Randomized Input Sampling for Explanation of Black-box Models
RISE: Randomized Input Sampling for Explanation of Black-box Models , author=. arXiv preprint arXiv:1806.07421 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[91]
arXiv preprint arXiv:2308.03296 , year=
Studying Large Language Model Generalization with Influence Functions , author=. arXiv preprint arXiv:2308.03296 , year=
-
[92]
arXiv preprint arXiv:2302.03169 , year=
Data selection for language models via importance resampling , author=. arXiv preprint arXiv:2302.03169 , year=
-
[93]
Advances in Neural Information Processing Systems , volume=
How can i explain this to you? an empirical study of deep neural network explanation methods , author=. Advances in Neural Information Processing Systems , volume=
-
[94]
arXiv preprint arXiv:2202.00622 , year=
Datamodels: Predicting predictions from training data , author=. arXiv preprint arXiv:2202.00622 , year=
-
[95]
Chattopadhay, Aditya and Sarkar, Anirban and Howlader, Prantik and Balasubramanian, Vineeth N , booktitle=. Grad-. 2018 , organization=
work page 2018
-
[96]
arXiv preprint arXiv:2303.14186 , year=
Trak: Attributing model behavior at scale , author=. arXiv preprint arXiv:2303.14186 , year=
-
[97]
Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=. Grad-
-
[98]
Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=
Axiomatic attribution for deep networks , author=. Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=. 2017 , organization=
work page 2017
-
[99]
2022 IEEE Symposium on Security and Privacy (SP) , pages=
Membership inference attacks from first principles , author=. 2022 IEEE Symposium on Security and Privacy (SP) , pages=. 2022 , organization=
work page 2022
-
[100]
2022 IEEE International Conference on Knowledge Graph (ICKG) , pages=
Certified Data Removal in Sum-Product Networks , author=. 2022 IEEE International Conference on Knowledge Graph (ICKG) , pages=. 2022 , organization=
work page 2022
-
[101]
Federated unlearning: Guarantee the right of clients to forget , author=. IEEE Network , volume=. 2022 , publisher=
work page 2022
-
[102]
arXiv preprint arXiv:2307.14754 , year=
Fair Machine Unlearning: Data Removal while Mitigating Disparities , author=. arXiv preprint arXiv:2307.14754 , year=
-
[103]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
A Data-Based Perspective on Transfer Learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[104]
Advances in Neural Information Processing Systems , volume=
Scaling vision with sparse mixture of experts , author=. Advances in Neural Information Processing Systems , volume=
-
[105]
International conference on machine learning , pages=
Understanding black-box predictions via influence functions , author=. International conference on machine learning , pages=. 2017 , organization=
work page 2017
-
[106]
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
-
[107]
International Conference on Artificial Intelligence and Statistics , pages=
Approximate data deletion from machine learning models , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=
work page 2021
-
[108]
Continual lifelong learning with neural networks: A review , author=. Neural Networks , volume=. 2019 , publisher=
work page 2019
- [109]
-
[110]
Optimization with sparsity-inducing penalties , author=. Foundations and Trends. 2012 , publisher=
work page 2012
-
[111]
Advances in Neural Information Processing Systems , year=
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting , author=. Advances in Neural Information Processing Systems , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.