Towards resilient safety-driven unlearning for diffusion models against down- stream fine-tuning

· 2025 · arXiv 2507.16302

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra

Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.

citing papers explorer

Showing 2 of 2 citing papers.

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training cs.CV · 2026-05-18 · unverdicted · none · ref 95
SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
Erasure or Erosion? Evaluating Compositional Degradation in Unlearned Text-To-Image Diffusion Models cs.CV · 2026-04-06 · unverdicted · none · ref 16
Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.

Towards resilient safety-driven unlearning for diffusion models against down- stream fine-tuning

fields

years

verdicts

representative citing papers

citing papers explorer