PromptGuard optimizes a universal safety soft prompt (and category-specific variants) in T2I embedding space to moderate NSFW inputs, achieving average unsafe ratios of 5.84-6.18% while being 3.8x faster than prior defenses.
Towards Safe Self-distillation of Internet-scale Text-to-image Diffusion Models
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
CoreUnlearn uses a Component Extraction Module and Swap Disentangling Strategy to remove only erasure-critical components from concept embeddings in diffusion models.
citing papers explorer
-
PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models
PromptGuard optimizes a universal safety soft prompt (and category-specific variants) in T2I embedding space to moderate NSFW inputs, achieving average unsafe ratios of 5.84-6.18% while being 3.8x faster than prior defenses.
-
Closed-Form Concept Erasure via Double Projections
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
-
CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models
CoreUnlearn uses a Component Extraction Module and Swap Disentangling Strategy to remove only erasure-critical components from concept embeddings in diffusion models.