Obliviate erases targeted concepts from autoregressive image generators via KL supervision on visual tokens over full trajectories, cutting nudity rates sharply on benchmarks while keeping general performance.
Mma-diffusion: Multimodal attack on diffusion models
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10verdicts
UNVERDICTED 10roles
method 1polarities
use method 1representative citing papers
VESFlow edits the learned velocity field of flow matching models via a safe-conditional posterior to produce safe images in 4 sampling steps, with an optional risk filter and VESFlow+ variant that also repels from unsafe directions.
Introduces a layered intervention framework for knowledge infusion in multimodal generative models and empirically demonstrates complementarity of layers in a safety-alignment task with diffusion models.
LA-LQR applies latent-space linear-quadratic regulator control to steer text-to-video model activations toward desired features while penalizing excessive changes.
A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
FlowGuard detects unsafe content during diffusion image generation via linear latent decoding and curriculum learning, outperforming prior methods by over 30% F1 while reducing GPU memory by 97% and projection time to 0.2 seconds.
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
Unlearning methods that strongly erase concepts from text-to-image diffusion models consistently degrade performance on attribute binding, spatial reasoning, and counting tasks.
SPOT projects prompts to a tau-safe set via total variation to cut inappropriate content 14-44% relative to baselines while preserving benign prompt behavior in frozen T2I models.
Iterative self-improving codebooks enhance safety in autoregressive multimodal models by self-identifying unsafe generations and updating the codebook to eliminate harmful visual token mappings without external feedback.
citing papers explorer
-
Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control
LA-LQR applies latent-space linear-quadratic regulator control to steer text-to-video model activations toward desired features while penalizing excessive changes.