Towards Safe Self-distillation of Internet-scale Text-to-image Diffusion Models

Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee · 2023 · arXiv 2307.05977

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

cs.CV · 2025-01-07 · unverdicted · novelty 7.0

PromptGuard optimizes a universal safety soft prompt (and category-specific variants) in T2I embedding space to moderate NSFW inputs, achieving average unsafe ratios of 5.84-6.18% while being 3.8x faster than prior defenses.

Closed-Form Concept Erasure via Double Projections

cs.LG · 2026-04-11 · unverdicted · novelty 6.0

A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.

CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models

cs.CR · 2026-06-01 · unverdicted · novelty 4.0

CoreUnlearn uses a Component Extraction Module and Swap Disentangling Strategy to remove only erasure-critical components from concept embeddings in diffusion models.

citing papers explorer

Showing 3 of 3 citing papers.

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models cs.CV · 2025-01-07 · unverdicted · none · ref 12
PromptGuard optimizes a universal safety soft prompt (and category-specific variants) in T2I embedding space to moderate NSFW inputs, achieving average unsafe ratios of 5.84-6.18% while being 3.8x faster than prior defenses.
Closed-Form Concept Erasure via Double Projections cs.LG · 2026-04-11 · unverdicted · none · ref 37
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models cs.CR · 2026-06-01 · unverdicted · none · ref 24
CoreUnlearn uses a Component Extraction Module and Swap Disentangling Strategy to remove only erasure-critical components from concept embeddings in diffusion models.

Towards Safe Self-distillation of Internet-scale Text-to-image Diffusion Models

fields

years

verdicts

representative citing papers

citing papers explorer