pith. sign in

arxiv: 2605.25574 · v1 · pith:MJRVLOBVnew · submitted 2026-05-25 · 💻 cs.CV · cs.AI

Mosaic: Compositional Multi-Concept Erasure via Vector Field Blending

Pith reviewed 2026-06-29 22:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords concept erasuretext-to-image modelsmulti-concept removalvector field blendingcompositional scenesflow-based modelssafe image synthesis
0
0 comments X

The pith

Mosaic erases multiple target concepts from one image by blending concept-specific masks in the vector field.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines compositional multi-concept erasure as the task of removing several specified concepts at once from a single generated scene. It introduces CoME-Bench to measure performance on both intra-category and cross-category cases. Mosaic constructs masks for each target concept from the flow-based model's vector field and blends them selectively. The method requires no extra optimization or retraining. If the approach holds, it enables safer synthesis when multiple unwanted elements appear together in complex outputs.

Core claim

Mosaic is a framework for compositional multi-concept erasure in flow-based T2I models that exploits spatial locality of target concepts in the vector field. It dynamically builds concept-specific masks and performs selective blending to remove multiple targets simultaneously while leaving non-target regions intact, all without additional optimization.

What carries the argument

Dynamic construction of concept-specific masks followed by selective vector field blending.

If this is right

  • Multiple concepts can be removed from a single complex scene in one forward pass.
  • No per-concept fine-tuning or optimization is needed after the initial model training.
  • The same blending procedure applies to both same-category and different-category concept pairs.
  • Non-target context and overall scene structure remain unchanged after erasure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If vector-field locality generalizes, the same mask-blending idea could apply to other generative architectures that produce spatial fields.
  • Real-time multi-concept editing tools could be built on top of this approach without retraining.
  • Platforms generating images on demand might adopt the method to reduce post-filtering steps for safety.

Load-bearing premise

Target concepts occupy sufficiently separate regions in the vector field that their masks can be blended without affecting each other or unrelated image content.

What would settle it

A test image containing two target concepts where the blended output still shows visible traces of either concept or alters non-target elements would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.25574 by Jong-Seok Lee, Jungwoo Kim, Junseok Ko.

Figure 1
Figure 1. Figure 1: Composable Multi-Concept Generation Results. While diffusion models (SD1.4 [3] (left) and SDXL [6] (mid)) struggle to compose multiple concepts simultaneously, the flow-based model (Flux.1-dev [24] (right)) renders all three distinct characters in a single scene. All images are generated with an identical prompt: “SpongeBob SquarePants leans forward as if checking something on the ground, Mario stands a fe… view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual Comparison of Multi-Concept Erasure Benchmarks. (a) Existing bench￾marks evaluate each target concept on separate images. (b) CoME-Bench proposes a compositional protocol where multiple target concepts co-exist within a single scene. rectified flow [30] reformulated generation as learning a deterministic velocity field along straight trajectories, enabling more stable and efficient sampling than… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Prompt Generation–Verification Pipeline. For each candidate prompt generated from the given target concepts, we synthesize N images using different random seeds and verify whether all target concepts are correctly rendered. The prompt is verified only when all N images pass verification; otherwise, another prompt is generated. (LLM), Qwen3-4B-Instruct [47], to generate diverse prompts that … view at source ↗
Figure 4
Figure 4. Figure 4: Erased Concepts are Spatially Localized in Vector Fields. (a) An image containing two target concepts (Mario and Polar bear) generated by FLUX [24]. (b) Pseudo-ground truth segmentation masks used as spatial reference for each target concept. (c) The ℓ2 norm of the vector field differences. (d) IoU [52] and soft IoU [53] between the vector field mask and pseudo-ground truth mask across different timesteps … view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the Mosaic Pipeline. Mosaic generates concept-specific masks and blends vector fields to erase multiple target concepts while preserving the remaining image contents. concept using SAM3 [54], and quantify their alignment with the vector field difference maps using IoU [55], soft IoU [53], and their harmonic mean. As shown in Fig. 4d, the vector field differences are strongly localized around ea… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative Comparison across Methods. The first and third rows illustrate intra-category settings involving two character concepts and two object concepts, respectively. The second row presents a cross-category setting with a character and an object, while the last row shows a three￾concept setting consisting of two characters and one object. Across all settings, Mosaic consistently removes the target con… view at source ↗
Figure 7
Figure 7. Figure 7: User Study Results. Participants rate each method on a 1–5 scale for target concept removal and image quality. Mosaic achieves the highest target removal score while maintaining strong image quality, showing that it better balances effective erasure and visual fidelity [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Evaluation Prompt Template for Erasing Success Rate (ESR). A VLM-based evaluator determines whether the target concept remains visually present after concept erasure. Concept-wise CLIP similarity (C-CLIP). We use CLIP similarity as an additional metric to evaluate target concept erasure. Specifically, we compute CLIP similarity [56] between the generated image and a text prompt of the form “a photo of {tar… view at source ↗
Figure 9
Figure 9. Figure 9: Evaluation prompt template for Selective Alignment (SA). A VLM-based evaluator determines whether each non-target entity remains visually present after concept erasure [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of our User Study UI. C.1 Details for Prompt Curation Pipeline We provide detailed descriptions of the prompt curation pipeline used to construct CoME-Bench, including the exact prompt templates and filtering criteria used at each stage. Prompt generation. To construct high-quality and diverse prompts, we leverage LLMs to generate candidate descriptions for each set of target concepts. Given… view at source ↗
Figure 11
Figure 11. Figure 11: Prompt Generation Instruction. The LLM generates diverse prompts in which all target concepts co-occur within a coherent scene. compositional diversity. The templates incorporate structured constraints to ensure that all concepts are jointly present, visually identifiable, and situated within a meaningful and neutral background. To further promote diversity, we introduce heterogeneous prompt styles, inclu… view at source ↗
Figure 12
Figure 12. Figure 12: Prompt Verification Instruction. The VLM verifies whether all target concepts are faithfully and consistently rendered across multiple random seeds. or implicitly described in the scene under this template. Prompts for which no valid non-target concepts can be identified are discarded, as they do not support evaluation of concept preservation. The extracted non-target concepts are stored together with the… view at source ↗
Figure 13
Figure 13. Figure 13: Instruction for Non-Target Visual Element Extraction. The VLM extracts explicitly specified non-target visual elements from the original prompt while ignoring removed target concepts. multiple concepts. A similar trend is observed in cross-category compositions, where more complex combinations lead to lower success rates. These results highlight the inherent challenge of compositional prompt generation an… view at source ↗
Figure 14
Figure 14. Figure 14: Additional Analyses of Vector Field Locality. In most cases, the vector field–based masks exhibit high overlap with the pseudo-ground truth masks during the early-to-mid timesteps. However, failure cases are also observed where the IoU varies significantly even for the same target concept. E Additional Results In this section, we provide additional qualitative and quantitative results to further analyze t… view at source ↗
Figure 15
Figure 15. Figure 15: Additional Qualitative Results. significantly distort the image, Mosaic selectively modifies only the target-related regions, resulting in more coherent and visually consistent outputs. Celebrities and Artistic Style Erasure. In [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Celebrity Exam￾ples. E.2 Additional Results on CoME-Bench In [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗
read the original abstract

Concept erasure has emerged as a key research direction for ensuring safe and ethical image synthesis in Text-to-Image (T2I) models. While existing studies have explored concept erasure across multiple concepts, they typically assume only a single target concept per image, a limitation increasingly exposed by modern flow-based T2I models, which can generate complex scenes with multiple concepts simultaneously. To address this gap, we introduce compositional multi-concept erasure, a new task that aims to simultaneously remove multiple target concepts within a single scene. We propose CoME-Bench, a benchmark for evaluating compositional multi-concept erasure, which covers both intra- and cross-category scenarios. We further propose Mosaic, a novel framework for multi-concept erasure in flow-based T2I models, which exploits the spatial locality of target concepts in the vector field by dynamically constructing concept-specific masks and selectively blending them without additional optimization. Extensive experiments demonstrate that Mosaic effectively removes multiple target concepts in complex compositional scenes while preserving non-target contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces the task of compositional multi-concept erasure in flow-based T2I models (removing multiple target concepts from a single complex scene), proposes the CoME-Bench benchmark covering intra- and cross-category cases, and presents the Mosaic framework. Mosaic exploits assumed spatial locality of concepts in the vector field to dynamically build per-concept masks and blend them for optimization-free erasure while preserving non-target regions. The abstract asserts that extensive experiments confirm Mosaic's effectiveness on this task.

Significance. If the spatial-locality assumption holds and the method delivers the claimed erasure/preservation tradeoff on a properly controlled benchmark, the work would address a genuine gap between single-concept erasure methods and modern flow-based generators that produce multi-concept scenes. The introduction of a dedicated benchmark is a positive step toward standardized evaluation.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'Mosaic effectively removes multiple target concepts in complex compositional scenes while preserving non-target contexts' is asserted on the basis of 'extensive experiments,' yet the abstract supplies no metrics, baselines, quantitative tables, or controls. This absence makes the headline result unverifiable and is load-bearing for the paper's contribution.
  2. [Abstract] Abstract (method description): the framework is presented as relying on 'spatial locality of target concepts in the vector field' to enable dynamic mask construction and selective blending. No validation, overlap statistics, activation visualizations, or failure-case analysis of this locality assumption is referenced, leaving the core technical premise untested against the skeptic's concern that diffuse or overlapping fields would cause leakage or incomplete erasure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We respond point-by-point to the major comments below and outline planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'Mosaic effectively removes multiple target concepts in complex compositional scenes while preserving non-target contexts' is asserted on the basis of 'extensive experiments,' yet the abstract supplies no metrics, baselines, quantitative tables, or controls. This absence makes the headline result unverifiable and is load-bearing for the paper's contribution.

    Authors: We agree that the abstract would benefit from greater specificity to support verifiability of the central claim. In the revised version we will incorporate concise quantitative indicators drawn from the CoME-Bench results (e.g., target-concept erasure success and non-target preservation scores) together with a brief statement of the primary baselines. revision: yes

  2. Referee: [Abstract] Abstract (method description): the framework is presented as relying on 'spatial locality of target concepts in the vector field' to enable dynamic mask construction and selective blending. No validation, overlap statistics, activation visualizations, or failure-case analysis of this locality assumption is referenced, leaving the core technical premise untested against the skeptic's concern that diffuse or overlapping fields would cause leakage or incomplete erasure.

    Authors: The locality assumption is foundational to the dynamic masking procedure. Although the reported experiments demonstrate effective erasure without leakage in practice, we accept that explicit supporting analysis would address potential skepticism. We will add a dedicated paragraph and accompanying figure in the method or experiments section that reports activation overlap statistics and selected vector-field visualizations, including any observed edge cases. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a direct construction from stated spatial-locality assumption

full rationale

The paper presents Mosaic as a framework that exploits an assumed spatial locality property of concepts in the vector field to build and blend masks. No equations, fitted parameters, or self-citation chains are shown that reduce the claimed erasure performance to the inputs by construction. The central claim rests on an empirical assumption about locality rather than a self-referential derivation or renamed fit. This is self-contained against external benchmarks and receives the default non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the high-level description of the method and benchmark.

pith-pipeline@v0.9.1-grok · 5700 in / 1076 out tokens · 27290 ms · 2026-06-29T22:51:03.041931+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 11 canonical work pages · 6 internal anchors

  1. [1]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  2. [2]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InThe Ninth International Conference on Learning Representations, 2021

  3. [3]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 10684–10695, 2022

  4. [4]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. InAdvances in Neural Information Processing Systems, volume 35, pag...

  5. [5]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 2022

  6. [6]

    SDXL: Improving latent diffusion models for high-resolution image synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InThe Twelfth International Conference on Learning Representations, 2024

  7. [7]

    Artists land a win in class action lawsuit against a.i

    Richard Whiddington. Artists land a win in class action lawsuit against a.i. com- panies. Artnet News, August 2024. URL https://news.artnet.com/art-world/ artists-vs-stability-ai-lawsuit-moves-ahead-2524849

  8. [8]

    Ai brad pitt dupes french woman out of C830,000

    Laura Gozzi. Ai brad pitt dupes french woman out of C830,000. BBC News, January 2025. URLhttps://www.bbc.com/news/articles/c2l0y7q3k7ro

  9. [9]

    Diffu- sion art or digital forgery? investigating data replication in diffusion models

    Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffu- sion art or digital forgery? investigating data replication in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6048–6058, 2023

  10. [10]

    Dominating vs

    Hayeon Jeong and Jong-Seok Lee. Dominating vs. dominated: Generative collapse in diffusion models.arXiv preprint arXiv:2512.20666, 2025

  11. [11]

    Steering away from memorization: Reachability-constrained reinforcement learning for text-to-image diffusion.arXiv preprint arXiv:2603.00140, 2026

    Sathwik Karnik, Juyeop Kim, Sanmi Koyejo, Jong-Seok Lee, and Somil Bansal. Steering away from memorization: Reachability-constrained reinforcement learning for text-to-image diffusion.arXiv preprint arXiv:2603.00140, 2026

  12. [12]

    Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

    Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 22522–22531, 2023

  13. [13]

    Investigating toxicity and bias in stable diffusion text-to-image models.Scientific Reports, 15(1):31401, 2025

    Matthias Schneider and Thilo Hagendorff. Investigating toxicity and bias in stable diffusion text-to-image models.Scientific Reports, 15(1):31401, 2025

  14. [14]

    Training- free framework for defending unsafe image synthesis attack

    Junha Park, Jaehui Hwang, Ian Ryu, Hyungkeun Park, Jiyoon Kim, and Jong-Seok Lee. Training- free framework for defending unsafe image synthesis attack. In2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1906–1910, 2026

  15. [15]

    Stable diffusion 2.0 release

    Stability AI. Stable diffusion 2.0 release. Stability AI Blog, November 2022. URL https: //stability.ai/news/stable-diffusion-v2-release

  16. [16]

    Ablating concepts in text-to-image diffusion models

    Nupur Kumari, Bingliang Zhang, Sheng-Yu Wang, Eli Shechtman, Richard Zhang, and Jun-Yan Zhu. Ablating concepts in text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22634–22645, 2023

  17. [17]

    Erasing concepts from diffusion models

    Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, and David Bau. Erasing concepts from diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 2426–2436, 2023. 11

  18. [18]

    Unified concept editing in diffusion models

    Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´nska, and David Bau. Unified concept editing in diffusion models. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5099–5108, 2024

  19. [19]

    Mace: Mass concept erasure in diffusion models

    Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440, 2024

  20. [20]

    Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers

    Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, and Yu- Chiang Frank Wang. Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers. InEuropean Conference on Computer Vision, volume 17, pages 360–376, 2024

  21. [21]

    SAeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders

    Bartosz Cywi´nski and Kamil Deja. SAeuron: Interpretable concept unlearning in diffusion models with sparse autoencoders. InForty-second International Conference on Machine Learning, 2025

  22. [22]

    Cure: Concept unlearning via orthogonal representation editing in diffusion models

    Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Concept unlearning via orthogonal representation editing in diffusion models. InAdvances in Neural Information Processing Systems, 2025

  23. [23]

    Erasing thousands of concepts: Towards scalable and practical concept erasure for text-to-image diffusion models

    Hoigi Seo, Byung Hyun Lee, Jaehyun Cho, Sungjin Lim, and Se Young Chun. Erasing thousands of concepts: Towards scalable and practical concept erasure for text-to-image diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, 2026

  24. [24]

    Flux.https://github.com/black-forest-labs/flux, 2024

    Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

  25. [25]

    Scaling rectified flow trans- formers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

  26. [26]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, 2023

  27. [27]

    Eraseanything: Enabling concept erasure in rectified flow transformers

    Daiheng Gao, Shilin Lu, Wenbo Zhou, Jiaming Chu, Jie Zhang, Mengxi Jia, Bang Zhang, Zhaoxin Fan, and Weiming Zhang. Eraseanything: Enabling concept erasure in rectified flow transformers. InForty-second International Conference on Machine Learning, 2025

  28. [28]

    Minimalist concept erasure in generative models

    Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, and Kenji Kawaguchi. Minimalist concept erasure in generative models. InForty-second International Conference on Machine Learning, 2025

  29. [29]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  30. [30]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

  31. [31]

    Editing implicit assumptions in text- to-image diffusion models

    Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Editing implicit assumptions in text- to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7030–7038, 2023

  32. [32]

    Direct unlearning optimization for robust and safe text-to- image models

    Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, and Gayoung Lee. Direct unlearning optimization for robust and safe text-to- image models. InAdvances in Neural Information Processing Systems, volume 37, pages 80244–80267, 2024

  33. [33]

    One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications

    Mengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan Jin, Yuan He, Hui Xue, Jungong Han, and Guiguang Ding. One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7559–7568, 2024. 12

  34. [34]

    SDErasure: Concept-specific trajectory shifting for concept erasure via adaptive diffusion classifier

    Fengyuan Miao, Shancheng Fang, Lingyun Yu, Yadong Qu, Yuhao Sun, Xiaorui Wang, and Hongtao Xie. SDErasure: Concept-specific trajectory shifting for concept erasure via adaptive diffusion classifier. InThe Fourteenth International Conference on Learning Representations, 2026

  35. [35]

    Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning

    Hyeonjin Kim, Hangyeol Jung, Heechan Yun, Sungjun Yun, and Dong-Jun Han. Disen- tangled sparse representations for concept-separated diffusion unlearning.arXiv preprint arXiv:2605.12122, 2026

  36. [36]

    Suma: A subspace mapping approach for robust and effective concept erasure in text-to-image diffusion models

    Kien Nguyen, Anh Tran, and Cuong Pham. Suma: A subspace mapping approach for robust and effective concept erasure in text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19587–19596, 2025

  37. [37]

    Defensive unlearning with adversarial training for robust concept erasure in diffusion models

    Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models. InAdvances in neural information processing systems, volume 37, pages 36748–36776, 2024

  38. [38]

    Memories of forgotten concepts

    Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, and Shai Avidan. Memories of forgotten concepts. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2966–2975, 2025

  39. [39]

    Continualflow: Learning and unlearning with neural flow matching

    Lorenzo Simone, Davide Bacciu, and Shuangge Ma. Continualflow: Learning and unlearning with neural flow matching. InICML 2025 Workshop on Machine Unlearning for Generative AI, 2025

  40. [40]

    Flowedit: Inversion-free text-based editing using pre-trained flow models

    Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, and Tomer Michaeli. Flowedit: Inversion-free text-based editing using pre-trained flow models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19721–19730, 2025

  41. [41]

    Ring-a-bell! how reliable are concept removal methods for diffusion models? InThe Twelfth International Conference on Learning Representations, 2024

    Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, and Chun-Ying Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models? InThe Twelfth International Conference on Learning Representations, 2024

  42. [42]

    Unlearncanvas: Stylized image dataset for enhanced machine unlearning evaluation in diffusion models

    Yihua Zhang, Chongyu Fan, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Gaoyuan Zhang, Gaowen Liu, Ramana Kompella, Xiaoming Liu, and Sijia Liu. Unlearncanvas: Stylized image dataset for enhanced machine unlearning evaluation in diffusion models. InAdvances in Neural Information Processing Systems, volume 37, pages 96387–96423, 2024

  43. [43]

    A dataset and benchmark for copyright infringement unlearning from text-to-image diffusion models.arXiv preprint arXiv:2403.12052, 2024

    Rui Ma, Qiang Zhou, Yizhu Jin, Daquan Zhou, Bangjun Xiao, Xiuyu Li, Yi Qu, Aishani Singh, Kurt Keutzer, Jingtong Hu, et al. A dataset and benchmark for copyright infringement unlearning from text-to-image diffusion models.arXiv preprint arXiv:2403.12052, 2024

  44. [44]

    Holistic unlearning benchmark: A multi-faceted evaluation for text-to-image diffusion model unlearning

    Saemi Moon, Minjong Lee, Sangdon Park, and Dongwoo Kim. Holistic unlearning benchmark: A multi-faceted evaluation for text-to-image diffusion model unlearning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16356–16366, 2025

  45. [45]

    Comprehensive assessment and analysis for NSFW content erasure in text-to-image diffusion models

    Die Chen, Zhiwen Li, Cen Chen, Yuexiang Xie, Xiaodan Li, Jinyan Ye, Yingda Chen, and Yaliang Li. Comprehensive assessment and analysis for NSFW content erasure in text-to-image diffusion models. InAdvances in Neural Information Processing Systems, 2025

  46. [46]

    Emma: Concept erasure benchmark with compre- hensive semantic metrics and diverse categories.arXiv preprint arXiv:2512.17320, 2025

    Lu Wei, Yuta Nakashima, and Noa Garcia. Emma: Concept erasure benchmark with compre- hensive semantic metrics and diverse categories.arXiv preprint arXiv:2512.17320, 2025

  47. [47]

    Qwen3 Technical Report

    Qwen Team. Qwen3 technical report, 2025. URLhttps://arxiv.org/abs/2505.09388

  48. [48]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  49. [49]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. 13

  50. [50]

    Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers

    Nanxiang Jiang, Zhaoxin Fan, Baisen Wang, Daiheng Gao, Junhang Cheng, Jifeng Guo, Yalan Qin, Yeying Jin, Hongwei Zheng, Faguo Wu, et al. Z-erase: Enabling concept erasure in single-stream diffusion transformers.arXiv preprint arXiv:2603.25074, 2026

  51. [51]

    O’Reilly Media, Inc

    Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009

  52. [52]

    Fully convolutional networks for se- mantic segmentation

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for se- mantic segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015

  53. [53]

    Jaccard metric losses: Optimizing the jaccard index with soft labels.Advances in Neural Information Processing Systems, 36:75259–75285, 2023

    Zifu Wang, Xuefei Ning, and Matthew Blaschko. Jaccard metric losses: Optimizing the jaccard index with soft labels.Advances in Neural Information Processing Systems, 36:75259–75285, 2023

  54. [54]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

  55. [55]

    Generalized intersection over union: A metric and a loss for bounding box regression

    Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019

  56. [56]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  57. [57]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

  58. [58]

    Wang, E.P

    Z. Wang, E.P. Simoncelli, and A.C. Bovik. Multiscale structural similarity for image quality assessment. InThe Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402 V ol.2, 2003

  59. [59]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

  60. [60]

    a photo ofc i,

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014. 14 MOSAIC: Compositional Multi-Concept Erasure via Vector Field Blending Supplementary Materials A. Baseline Implemen...