PIU: Proximity-guided Identity Unlearning in ID-Conditioned Diffusion Models

Darian Toma\v{s}evi\'c; Jose Edgar Hernandez Cancino Estrada; Mauro D\'iaz Lupone; Peter Peer; Vitomir \v{S}truc; \v{Z}iga Emer\v{s}i\v{c}

arxiv: 2605.22311 · v1 · pith:AKNYYEAYnew · submitted 2026-05-21 · 💻 cs.CV

PIU: Proximity-guided Identity Unlearning in ID-Conditioned Diffusion Models

Jose Edgar Hernandez Cancino Estrada , Mauro D\'iaz Lupone , \v{Z}iga Emer\v{s}i\v{c} , Vitomir \v{S}truc , Peter Peer , Darian Toma\v{s}evi\'c This is my paper

Pith reviewed 2026-05-22 06:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords identity unlearningdiffusion modelsface generationmachine unlearningprivacycross-attentionArcFace

0 comments

The pith

Reassigning a target identity to a nearby anchor in the embedding space and fine-tuning only cross-attention layers removes that identity from an ID-conditioned face diffusion model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Identity-conditioned diffusion models produce high-quality, consistent faces but create privacy problems because they can still generate images of people who want to be forgotten. The paper introduces Proximity-guided Identity Unlearning as a way to treat removal as replacing the unwanted identity with a selected anchor identity close to it in the learned embedding space. Anchor choice follows proximity in ArcFace geometry, and changes are limited to localized fine-tuning of identity-sensitive cross-attention layers. Experiments across many identities indicate the target is suppressed while realism and consistency hold for retained identities.

Core claim

Identity removal is formulated as an identity replacement objective that reassigns the source identity to a proximity-selected anchor identity in the learned identity space, with effective unlearning achieved through localized fine-tuning of a small subset of identity-sensitive cross-attention layers in models such as Arc2Face.

What carries the argument

Proximity-based anchor selection in the ArcFace identity space that guides an identity replacement objective, implemented via targeted updates to cross-attention layers.

If this is right

The target identity generation is suppressed across tested cases
Realism and identity consistency remain intact for retained identities
Unlearning and image-quality metrics show improvement
Qualitative results confirm suppression without visible new artifacts

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proximity replacement idea could be tested in other embedding-conditioned generators outside face synthesis
Geometric structure in identity embeddings may enable selective forgetting in additional privacy-sensitive generation tasks

Load-bearing premise

Reassigning the source identity to a proximity-selected anchor identity in the learned identity space, combined with localized fine-tuning of cross-attention layers, removes the original identity's influence without degrading overall model performance or introducing new artifacts.

What would settle it

A test in which the fine-tuned model still generates recognizable images of the target identity at rates comparable to the original model when conditioned on its embedding, or in which image quality and consistency metrics drop for retained identities.

Figures

Figures reproduced from arXiv: 2605.22311 by Darian Toma\v{s}evi\'c, Jose Edgar Hernandez Cancino Estrada, Mauro D\'iaz Lupone, Peter Peer, Vitomir \v{S}truc, \v{Z}iga Emer\v{s}i\v{c}.

**Figure 1.** Figure 1: PIU unlearns a target identity from an identityconditioned diffusion model [28] by mapping it to a selected anchor. Afterwards, conditioning on the target identity produces images aligned with the anchor rather than the original subject. ing synthetic face images suitable for privacy-aware training and evaluation of deep models [4]. While text-based personalization methods can adapt pretrained generative… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed Proximity-guided Identity Unlearning (PIU) framework. During training, target (forget) and retain identity embeddings are processed by frozen and trainable versions of Arc2Face [28]. The forget loss Lforget steers the trainable prediction toward a proximity-based anchor trajectory, while the preservation loss Lpreserve enforces alignment with the frozen model for non-target identit… view at source ↗

**Figure 3.** Figure 3: Layerwise identity separation in Arc2Face crossattention activations. Identity separation captures the ability of the model to distinguish between identities across Keys (K), Values (V), and Queries (Q), as defined in Section 3.4. Higher scores indicate layers with stronger identity conditioning. Dashed lines denote the surgical blocks selected for unlearning: Downsampling (D2), Middle (M), and Upsamplin… view at source ↗

**Figure 4.** Figure 4: Images generated by Arc2Face before (first row) and after unlearning with different methods (third row onward). All samples use fixed original ID embeddings and initial noise. Unlearning shifts the output from the original identity toward the facial characteristics of the synthetic anchor identity (second row), apart from the anchorless SISS method. For each unlearned image, the cosine similarity to the or… view at source ↗

**Figure 5.** Figure 5: Arc2Face samples of identities held-out from the unlearning process. For each case, the same conditioning embeddings and initial noise are used for both the original and postunlearning models, enabling a direct comparison of visual quality and identity preservation on non-target identities. Reported is the cosine similarity to the original non-target identity [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the negative guidance scale η. Increasing η strengthens the shift away from original identities, but higher values degrade visual quality. With η = 0, samples preserve the highest quality, but retain recognizable traits of original identities [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of the cosine similarity of each sample’s [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of PIU under different anchor selection strategies. True unconditional guidance degrades realism and leads to unstable generations. In contrast, anchors with increasing proximity progressively improve visual quality and identity consistency. A good preservation of the Arc2Face identity prior should keep AccR high. These two quantities are combined into the SRK score: SRK = AccR AccU … view at source ↗

**Figure 9.** Figure 9: Visualization of PIU results across 12 Original–Anchor–Unlearned triplets. Unlearned samples are generated using the same [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative results of the SISS reproduction on [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Forget ISM obtained across the grid search for different [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Retain unseen ISM obtained across the grid search for [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 15.** Figure 15: Retain unseen KD obtained across the grid search for [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: Retain unseen eDIFFIQA obtained across the grid [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗

**Figure 17.** Figure 17: Some visible changes can be seen on the forget [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative results of the WID adaptation on Arc2Face [PITH_FULL_IMAGE:figures/full_fig_p024_18.png] view at source ↗

read the original abstract

Identity-conditioned diffusion models enable high-quality and identity-consistent face generation, but they also raise severe privacy concerns, as models may continue to synthesize individuals despite their right to be forgotten. While machine unlearning has been extensively studied for concept and data removal, identity unlearning remains largely unexplored, particularly in models conditioned directly on identity embeddings rather than text prompts. In this work, we study identity unlearning in Arc2Face, a state-of-the-art identity-conditioned latent diffusion model for face generation, and introduce Proximity-guided Identity Unlearning (PIU), an anchor-guided framework for identity unlearning. Specifically, we formulate identity removal as an identity replacement objective that reassigns the source identity to a selected anchor identity in the learned identity space, and we complement it with a proximity-based anchor selection strategy motivated by the geometry of ArcFace representations. We further show that effective unlearning can be achieved through localized fine-tuning of a small subset of identity-sensitive cross-attention layers. Experiments across many target identities show that our framework effectively suppresses generation of the target identity while preserving realism and identity consistency for retained identities, as validated by improved performance on unlearning and image-quality metrics, together with qualitative evaluation. The source code for the PIU framework is publicly available at https://github.com/edgarcancinoe/piu_unlearning .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Proximity-guided Identity Unlearning (PIU) for removing specific identities from identity-conditioned diffusion models such as Arc2Face. The method reassigns the source identity embedding to a proximity-selected anchor identity in ArcFace space and performs localized fine-tuning on a small subset of identity-sensitive cross-attention layers. The central claim is that this suppresses generation of the target identity while preserving realism and identity consistency for retained identities, as shown by improved unlearning and image-quality metrics plus qualitative results across many target identities. Code is released publicly.

Significance. If the effectiveness claims hold under rigorous controls, the work would be significant for privacy in generative face models by providing a practical, geometry-motivated approach to identity unlearning that avoids full retraining. The use of ArcFace proximity for anchor selection and targeted layer updates is a targeted contribution; public code release aids reproducibility.

major comments (2)

[§4 Experiments] §4 Experiments: the claim that the framework 'effectively suppresses generation of the target identity' is supported only by assertions of 'improved performance on unlearning and image-quality metrics' without reported numerical values, standard deviations, exact number of identities, or explicit baselines/controls, rendering the quantitative evidence for the central claim unverifiable from the given description.
[§3.2] §3.2 (localized fine-tuning description): the premise that reassigning to a proximity anchor plus updates confined to cross-attention layers removes the original identity's generative influence lacks a direct test for residual leakage (e.g., whether the unmodified source embedding can still elicit the target face under post-unlearning sampling or adversarial optimization); this is load-bearing for the removal claim.

minor comments (1)

[Abstract] Abstract: the phrase 'across many target identities' would be strengthened by stating the exact count and selection criteria used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity and strengthen the supporting evidence for our claims.

read point-by-point responses

Referee: [§4 Experiments] §4 Experiments: the claim that the framework 'effectively suppresses generation of the target identity' is supported only by assertions of 'improved performance on unlearning and image-quality metrics' without reported numerical values, standard deviations, exact number of identities, or explicit baselines/controls, rendering the quantitative evidence for the central claim unverifiable from the given description.

Authors: We thank the referee for this observation on the presentation of results. The submitted manuscript reports quantitative improvements via figures and tables in §4 (and supplementary material) that include specific metric values for unlearning effectiveness (e.g., identity similarity) and image quality (e.g., FID), along with comparisons to baselines. However, to enhance verifiability in the main text, we will revise §4 to explicitly tabulate the numerical values with standard deviations, state the exact number of target identities evaluated (50), and include additional explicit baseline comparisons such as full-model fine-tuning and random-anchor selection. These revisions will be made in the next version. revision: yes
Referee: [§3.2] §3.2 (localized fine-tuning description): the premise that reassigning to a proximity anchor plus updates confined to cross-attention layers removes the original identity's generative influence lacks a direct test for residual leakage (e.g., whether the unmodified source embedding can still elicit the target face under post-unlearning sampling or adversarial optimization); this is load-bearing for the removal claim.

Authors: We agree that a direct test for residual leakage would strengthen the central removal claim. Our current evaluation in §3.2 and §4 demonstrates suppression via post-unlearning identity similarity scores (near-zero cosine similarity to the target) and qualitative results showing no recognizable target faces. To directly address residual leakage, we will add experiments in the revised manuscript: (1) sampling with the unmodified source embedding after PIU to measure retained identity similarity, and (2) an adversarial optimization attempt to recover the target identity from the post-unlearning model, reporting failure to reconstruct the original face. This will provide explicit evidence against residual generative influence. revision: yes

Circularity Check

0 steps flagged

No circularity: new unlearning objective and localized fine-tuning are independent of fitted inputs

full rationale

The paper proposes PIU as a novel anchor-guided replacement objective in ArcFace embedding space plus targeted cross-attention fine-tuning. These steps are explicitly motivated by external ArcFace geometry and presented as new components rather than redefinitions of prior fitted parameters. No equations reduce the unlearning metric or identity reassignment back to the source embedding by construction, and no self-citation chain is invoked to justify uniqueness or force the method. Experimental claims rest on independent unlearning and quality metrics across multiple identities, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on domain assumptions about identity embedding geometry and modeling choices for which layers to update; no new physical entities are postulated and free parameters are mainly implementation hyperparameters.

free parameters (2)

proximity threshold or number of candidate anchors
Anchor selection strategy requires choosing how close candidate anchors must be in ArcFace space; this choice affects which replacement identity is used.
identity-sensitive cross-attention layers subset
Decision on which specific layers count as identity-sensitive is a modeling choice that determines the scope of fine-tuning.

axioms (1)

domain assumption The geometry of ArcFace representations provides a reliable basis for selecting anchor identities that enable effective unlearning.
Explicitly motivated by the geometry of ArcFace representations in the abstract.

pith-pipeline@v0.9.0 · 5806 in / 1349 out tokens · 43708 ms · 2026-05-22T06:56:17.730908+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 4 internal anchors

[1]

Alberti, K

S. Alberti, K. Hasanaliyev, M. Shah, and S. Ermon. Data unlearning in diffusion models. InInternational Conference on Learning Representations (ICLR), pages 1–17, 2025

work page 2025
[2]

Babnik, P

ˇZ. Babnik, P. Peer, and V . ˇStruc. eDifFIQA: Towards effi- cient face image quality assessment based on denoising dif- fusion probabilistic models.IEEE Transactions on Biomet- rics, Behavior, and Identity Science, 6(4):458–474, 2024

work page 2024
[3]

Bourtoule, V

L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot. Ma- chine unlearning. InIEEE Symposium on Security and Pri- vacy (SP), pages 141–159, 2021

work page 2021
[4]

Boutros, V

F. Boutros, V . Struc, J. Fierrez, and N. Damer. Synthetic data for face recognition: Current state and future prospects. Image and Vision Computing, 135:104688, 2023

work page 2023
[5]

Carlini, J

N. Carlini, J. Hayes, M. Nasr, M. Jagielski, V . Sehwag, F. Tramer, B. Balle, D. Ippolito, and E. Wallace. Extract- ing training data from diffusion models. InUSENIX Security Symposium, pages 5253–5270, 2023

work page 2023
[6]

Chen, C.-W

Q. Chen, C.-W. Cheng, X. Su, H. Xu, X. Lin, S. You, A. I. Aviles-Rivero, and Y . Chen. Legato: Good identity unlearn- ing is continuous.arXiv preprint arXiv:2601.04282, 2026

work page arXiv 2026
[7]

J. Deng, J. Guo, N. Xue, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4690–4699, 2019

work page 2019
[8]

Ester, H.-P

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density- based algorithm for discovering clusters in large spatial databases with noise. InInternational Conference on Knowl- edge Discovery and Data Mining (KDD), volume 96, pages 226–231, 1996

work page 1996
[9]

C. Fan, J. Liu, Y . Zhang, E. Wong, D. Wei, and S. Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation.arXiv preprint arXiv:2310.12508, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

R. Gal, Y . Alaluf, Y . Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inver- sion.arXiv preprint arXiv:2208.01618, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

Gandikota, J

R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023

work page 2023
[12]

Gandikota, H

R. Gandikota, H. Orgad, Y . Belinkov, J. Materzy ´nska, and D. Bau. Unified concept editing in diffusion models. In IEEE/CVF Winter conference on Applications of Computer Vision (WACV), pages 5111–5120, 2024

work page 2024
[13]

E. Goldman. An introduction to the california consumer pri- vacy act (CCPA).Santa Clara Univ. Legal Studies Research Paper, 2020

work page 2020
[14]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Gen- erative adversarial nets.Advances in Neural Information Processing Systems (NeurIPS), 27, 2014

work page 2014
[15]

Gretton, K

A. Gretton, K. Borgwardt, M. Rasch, B. Sch ¨olkopf, and A. Smola. A kernel method for the two-sample-problem.Ad- vances in Neural Information Processing Systems (NeurIPS), 19, 2006

work page 2006
[16]

Heng and H

A. Heng and H. Soh. Selective amnesia: A continual learn- ing approach to forgetting in deep generative models.Ad- vances in Neural Information Processing Systems (NeurIPS), 36:17170–17194, 2023

work page 2023
[17]

Heusel, H

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

work page 2017
[18]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilis- tic models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6840–6851, 2020

work page 2020
[19]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[20]

Karras, T

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and varia- tion.International Conference on Learning Representations (ICLR), 2018

work page 2018
[21]

Karras, T

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of GANs for improved quality, stability, and varia- tion. InInternational Conference on Learning Representa- tions (ICLR), 2018

work page 2018
[22]

Karras, S

T. Karras, S. Laine, and T. Aila. A style-based generator ar- chitecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410, 2019

work page 2019
[23]

Kumari, B

N. Kumari, B. Zhang, S.-Y . Wang, E. Shechtman, R. Zhang, and J.-Y . Zhu. Ablating concepts in text-to-image diffusion models. InIEEE/CVF International Conference on Com- puter Vision (ICCV), pages 22691–22702, 2023

work page 2023
[24]

Z. Li, M. Cao, X. Wang, Z. Qi, M.-M. Cheng, and Y . Shan. Photomaker: Customizing realistic human pho- tos via stacked id embedding. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8640–8650, 2024

work page 2024
[25]

Z. Liu, K. Chen, Y . Zhang, J. Han, L. Hong, H. Xu, Z. Li, D.- Y . Yeung, and J. T. Kwok. Implicit concept removal of diffu- sion models. InSpringer European Conference on Computer Vision (ECCV), pages 457–473, 2024

work page 2024
[26]

C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu. DPM- Solver++: Fast solver for guided sampling of diffusion prob- abilistic models.Machine Intelligence Research (MIR), 22(4):730–751, 2025

work page 2025
[27]

Oquab, T

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El- Nouby, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Re- search (TMLR), 2024

work page 2024
[28]

F. P. Papantoniou, A. Lattas, S. Moschoglou, J. Deng, B. Kainz, and S. Zafeiriou. Arc2face: A foundation model for id-consistent human faces. InSpringer European Con- ference on Computer Vision (ECCV), pages 241–261, 2024

work page 2024
[29]

Protection

D. Protection. General data protection regulation.Intersoft Consulting, Accessed in October, 24(1), 2018

work page 2018
[30]

F. Qi, A. Liu, Z. Zhang, and C. Xu. Forget me: Feder- ated unlearning for face generation models. InACM In- ternational Conference on Multimedia (ICK), pages 11288– 11297, 2025

work page 2025
[31]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InPMLR International Conference on Machine Learning (ICML), pages 8748–8763, 2021

work page 2021
[32]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Om- mer. High-resolution image synthesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022
[33]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: Convo- lutional networks for biomedical image segmentation. In Springer International Conference on Medical Image Com- puting and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015

work page 2015
[34]

N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman. Dreambooth: Fine tuning text-to-image dif- fusion models for subject-driven generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023

work page 2023
[35]

Schroff, D

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 815–823, 2015

work page 2015
[36]

Seo, S.-H

J. Seo, S.-H. Lee, T.-Y . Lee, S. Moon, and G.-M. Park. Gen- erative unlearning for any identity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9151–9161, 2024

work page 2024
[37]

Shaheryar, J

M. Shaheryar, J. T. Lee, and S. K. Jung. Black hole-driven identity absorbing in diffusion models. InIEEE/CVF Com- puter Vision and Pattern Recognition Conference (CVPR), pages 28544–28554, 2025

work page 2025
[38]

Shaheryar, J

M. Shaheryar, J. T. Lee, and S. K. Jung. Unlearn and pro- tect: Selective identity removal in diffusion models for pri- vacy preservation. InACM/SIGAPP Symposium on Applied Computing, pages 1172–1179, 2025

work page 2025
[39]

Somepalli, V

G. Somepalli, V . Singla, M. Goldblum, J. Geiping, and T. Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 6048–6058, 2023

work page 2023
[40]

Somepalli, V

G. Somepalli, V . Singla, M. Goldblum, J. Geiping, and T. Goldstein. Understanding and mitigating copying in dif- fusion models.Advances in Neural Information Processing Systems (NeurIPS), 36:47783–47803, 2023

work page 2023
[41]

Stein, J

G. Stein, J. Cresswell, R. Hosseinzadeh, Y . Sui, B. Ross, V . Villecroze, Z. Liu, A. L. Caterini, E. Taylor, and G. Loaiza-Ganem. Exposing flaws of generative model eval- uation metrics and their unfair treatment of diffusion mod- els.Advances in Neural Information Processing Systems (NeurIPS), 36:3732–3784, 2023

work page 2023
[42]

Toma ˇsevi´c, F

D. Toma ˇsevi´c, F. Boutros, C. Lin, N. Damer, V . ˇStruc, and P. Peer. Id-booth: Identity-consistent face generation with diffusion models. InIEEE International Conference on Au- tomatic Face and Gesture Recognition (FG), pages 1–10, 2025

work page 2025
[43]

Van Le, H

T. Van Le, H. Phung, T. H. Nguyen, Q. Dao, N. N. Tran, and A. Tran. Anti-dreambooth: Protecting users from person- alized text-to-image synthesis. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 2116–2127, 2023

work page 2023
[44]

J. Wu, T. Le, M. Hayat, and M. Harandi. Erasing undesir- able influence in diffusion models. InIEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), pages 28263–28273, 2025

work page 2025
[45]

G. Xiao, T. Yin, W. T. Freeman, F. Durand, and S. Han. Fast- composer: Tuning-free multi-subject image generation with localized attention.Springer International Journal of Com- puter Vision (IJCV), 133(3):1175–1194, 2025

work page 2025
[46]

Z. Yan, Y . Zhang, Y . Fan, and B. Wu. Ucf: Uncover- ing common features for generalizable deepfake detection. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 22412–22423, 2023

work page 2023
[47]

H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang. IP-adapter: Text compatible image prompt adapter for text-to-image dif- fusion models.arXiv preprint arXiv:2308.06721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[48]

xt −γ tx σt −ϵ θ(xt, t) 2 2 # ,(14) and La(θ) :=E a∼pA at∼q(·|a)

Y . Zhang, E. Jin, Y . Dong, Y . Wu, P. Torr, A. Khakzar, J. Stegmaier, and K. Kawaguchi. Minimalist concept era- sure in generative models.arXiv preprint arXiv:2507.13386, 2025. PIU: Proximity-Guided Identity Unlearning in ID-Conditioned Diffusion Models Supplementary Material A. Additional Experimentation Details Dataset Details.As explained in Section ...

work page arXiv 2025

[1] [1]

Alberti, K

S. Alberti, K. Hasanaliyev, M. Shah, and S. Ermon. Data unlearning in diffusion models. InInternational Conference on Learning Representations (ICLR), pages 1–17, 2025

work page 2025

[2] [2]

Babnik, P

ˇZ. Babnik, P. Peer, and V . ˇStruc. eDifFIQA: Towards effi- cient face image quality assessment based on denoising dif- fusion probabilistic models.IEEE Transactions on Biomet- rics, Behavior, and Identity Science, 6(4):458–474, 2024

work page 2024

[3] [3]

Bourtoule, V

L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot. Ma- chine unlearning. InIEEE Symposium on Security and Pri- vacy (SP), pages 141–159, 2021

work page 2021

[4] [4]

Boutros, V

F. Boutros, V . Struc, J. Fierrez, and N. Damer. Synthetic data for face recognition: Current state and future prospects. Image and Vision Computing, 135:104688, 2023

work page 2023

[5] [5]

Carlini, J

N. Carlini, J. Hayes, M. Nasr, M. Jagielski, V . Sehwag, F. Tramer, B. Balle, D. Ippolito, and E. Wallace. Extract- ing training data from diffusion models. InUSENIX Security Symposium, pages 5253–5270, 2023

work page 2023

[6] [6]

Chen, C.-W

Q. Chen, C.-W. Cheng, X. Su, H. Xu, X. Lin, S. You, A. I. Aviles-Rivero, and Y . Chen. Legato: Good identity unlearn- ing is continuous.arXiv preprint arXiv:2601.04282, 2026

work page arXiv 2026

[7] [7]

J. Deng, J. Guo, N. Xue, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4690–4699, 2019

work page 2019

[8] [8]

Ester, H.-P

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density- based algorithm for discovering clusters in large spatial databases with noise. InInternational Conference on Knowl- edge Discovery and Data Mining (KDD), volume 96, pages 226–231, 1996

work page 1996

[9] [9]

C. Fan, J. Liu, Y . Zhang, E. Wong, D. Wei, and S. Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation.arXiv preprint arXiv:2310.12508, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

R. Gal, Y . Alaluf, Y . Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inver- sion.arXiv preprint arXiv:2208.01618, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[11] [11]

Gandikota, J

R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023

work page 2023

[12] [12]

Gandikota, H

R. Gandikota, H. Orgad, Y . Belinkov, J. Materzy ´nska, and D. Bau. Unified concept editing in diffusion models. In IEEE/CVF Winter conference on Applications of Computer Vision (WACV), pages 5111–5120, 2024

work page 2024

[13] [13]

E. Goldman. An introduction to the california consumer pri- vacy act (CCPA).Santa Clara Univ. Legal Studies Research Paper, 2020

work page 2020

[14] [14]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Gen- erative adversarial nets.Advances in Neural Information Processing Systems (NeurIPS), 27, 2014

work page 2014

[15] [15]

Gretton, K

A. Gretton, K. Borgwardt, M. Rasch, B. Sch ¨olkopf, and A. Smola. A kernel method for the two-sample-problem.Ad- vances in Neural Information Processing Systems (NeurIPS), 19, 2006

work page 2006

[16] [16]

Heng and H

A. Heng and H. Soh. Selective amnesia: A continual learn- ing approach to forgetting in deep generative models.Ad- vances in Neural Information Processing Systems (NeurIPS), 36:17170–17194, 2023

work page 2023

[17] [17]

Heusel, H

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

work page 2017

[18] [18]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilis- tic models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6840–6851, 2020

work page 2020

[19] [19]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[20] [20]

Karras, T

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and varia- tion.International Conference on Learning Representations (ICLR), 2018

work page 2018

[21] [21]

Karras, T

T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of GANs for improved quality, stability, and varia- tion. InInternational Conference on Learning Representa- tions (ICLR), 2018

work page 2018

[22] [22]

Karras, S

T. Karras, S. Laine, and T. Aila. A style-based generator ar- chitecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410, 2019

work page 2019

[23] [23]

Kumari, B

N. Kumari, B. Zhang, S.-Y . Wang, E. Shechtman, R. Zhang, and J.-Y . Zhu. Ablating concepts in text-to-image diffusion models. InIEEE/CVF International Conference on Com- puter Vision (ICCV), pages 22691–22702, 2023

work page 2023

[24] [24]

Z. Li, M. Cao, X. Wang, Z. Qi, M.-M. Cheng, and Y . Shan. Photomaker: Customizing realistic human pho- tos via stacked id embedding. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8640–8650, 2024

work page 2024

[25] [25]

Z. Liu, K. Chen, Y . Zhang, J. Han, L. Hong, H. Xu, Z. Li, D.- Y . Yeung, and J. T. Kwok. Implicit concept removal of diffu- sion models. InSpringer European Conference on Computer Vision (ECCV), pages 457–473, 2024

work page 2024

[26] [26]

C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu. DPM- Solver++: Fast solver for guided sampling of diffusion prob- abilistic models.Machine Intelligence Research (MIR), 22(4):730–751, 2025

work page 2025

[27] [27]

Oquab, T

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El- Nouby, et al. DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Re- search (TMLR), 2024

work page 2024

[28] [28]

F. P. Papantoniou, A. Lattas, S. Moschoglou, J. Deng, B. Kainz, and S. Zafeiriou. Arc2face: A foundation model for id-consistent human faces. InSpringer European Con- ference on Computer Vision (ECCV), pages 241–261, 2024

work page 2024

[29] [29]

Protection

D. Protection. General data protection regulation.Intersoft Consulting, Accessed in October, 24(1), 2018

work page 2018

[30] [30]

F. Qi, A. Liu, Z. Zhang, and C. Xu. Forget me: Feder- ated unlearning for face generation models. InACM In- ternational Conference on Multimedia (ICK), pages 11288– 11297, 2025

work page 2025

[31] [31]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InPMLR International Conference on Machine Learning (ICML), pages 8748–8763, 2021

work page 2021

[32] [32]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Om- mer. High-resolution image synthesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

work page 2022

[33] [33]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-net: Convo- lutional networks for biomedical image segmentation. In Springer International Conference on Medical Image Com- puting and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015

work page 2015

[34] [34]

N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman. Dreambooth: Fine tuning text-to-image dif- fusion models for subject-driven generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023

work page 2023

[35] [35]

Schroff, D

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 815–823, 2015

work page 2015

[36] [36]

Seo, S.-H

J. Seo, S.-H. Lee, T.-Y . Lee, S. Moon, and G.-M. Park. Gen- erative unlearning for any identity. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9151–9161, 2024

work page 2024

[37] [37]

Shaheryar, J

M. Shaheryar, J. T. Lee, and S. K. Jung. Black hole-driven identity absorbing in diffusion models. InIEEE/CVF Com- puter Vision and Pattern Recognition Conference (CVPR), pages 28544–28554, 2025

work page 2025

[38] [38]

Shaheryar, J

M. Shaheryar, J. T. Lee, and S. K. Jung. Unlearn and pro- tect: Selective identity removal in diffusion models for pri- vacy preservation. InACM/SIGAPP Symposium on Applied Computing, pages 1172–1179, 2025

work page 2025

[39] [39]

Somepalli, V

G. Somepalli, V . Singla, M. Goldblum, J. Geiping, and T. Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 6048–6058, 2023

work page 2023

[40] [40]

Somepalli, V

G. Somepalli, V . Singla, M. Goldblum, J. Geiping, and T. Goldstein. Understanding and mitigating copying in dif- fusion models.Advances in Neural Information Processing Systems (NeurIPS), 36:47783–47803, 2023

work page 2023

[41] [41]

Stein, J

G. Stein, J. Cresswell, R. Hosseinzadeh, Y . Sui, B. Ross, V . Villecroze, Z. Liu, A. L. Caterini, E. Taylor, and G. Loaiza-Ganem. Exposing flaws of generative model eval- uation metrics and their unfair treatment of diffusion mod- els.Advances in Neural Information Processing Systems (NeurIPS), 36:3732–3784, 2023

work page 2023

[42] [42]

Toma ˇsevi´c, F

D. Toma ˇsevi´c, F. Boutros, C. Lin, N. Damer, V . ˇStruc, and P. Peer. Id-booth: Identity-consistent face generation with diffusion models. InIEEE International Conference on Au- tomatic Face and Gesture Recognition (FG), pages 1–10, 2025

work page 2025

[43] [43]

Van Le, H

T. Van Le, H. Phung, T. H. Nguyen, Q. Dao, N. N. Tran, and A. Tran. Anti-dreambooth: Protecting users from person- alized text-to-image synthesis. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 2116–2127, 2023

work page 2023

[44] [44]

J. Wu, T. Le, M. Hayat, and M. Harandi. Erasing undesir- able influence in diffusion models. InIEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), pages 28263–28273, 2025

work page 2025

[45] [45]

G. Xiao, T. Yin, W. T. Freeman, F. Durand, and S. Han. Fast- composer: Tuning-free multi-subject image generation with localized attention.Springer International Journal of Com- puter Vision (IJCV), 133(3):1175–1194, 2025

work page 2025

[46] [46]

Z. Yan, Y . Zhang, Y . Fan, and B. Wu. Ucf: Uncover- ing common features for generalizable deepfake detection. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 22412–22423, 2023

work page 2023

[47] [47]

H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang. IP-adapter: Text compatible image prompt adapter for text-to-image dif- fusion models.arXiv preprint arXiv:2308.06721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[48] [48]

xt −γ tx σt −ϵ θ(xt, t) 2 2 # ,(14) and La(θ) :=E a∼pA at∼q(·|a)

Y . Zhang, E. Jin, Y . Dong, Y . Wu, P. Torr, A. Khakzar, J. Stegmaier, and K. Kawaguchi. Minimalist concept era- sure in generative models.arXiv preprint arXiv:2507.13386, 2025. PIU: Proximity-Guided Identity Unlearning in ID-Conditioned Diffusion Models Supplementary Material A. Additional Experimentation Details Dataset Details.As explained in Section ...

work page arXiv 2025