A method using attention head vectors detects and suppresses risky content generation in Diffusion Transformers at inference time.
Con- ceptattention: Diffusion transformers learn highly inter- pretable features
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
UVR is a training-free framework that uses attention modulation based on identified information flow stages in multimodal DiT attention to erase unsafe semantics in image synthesis and editing at 91% and 77% rates while preserving quality.
TS-Attn dynamically separates and rearranges attention in existing text-to-video models to improve temporal consistency and prompt adherence for videos with multiple sequential actions.
Text-to-image diffusion models exhibit varying degrees of emergent content-style separation in art generation, with content tokens primarily influencing object regions and style tokens affecting backgrounds and textures.
A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.
citing papers explorer
No citing papers match the current filters.