MasqLoRA shows that an independent LoRA adapter can be trained on a few trigger-target pairs to backdoor diffusion models with 99.8% success rate while remaining stealthy when the trigger is absent.
hub
Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
fields
cs.CV 11representative citing papers
NeuroQuant is a modality-aware 3D VQ-VAE that uses dual-stream encoding, a shared anatomical codebook, and FiLM to achieve superior multi-modal brain MRI reconstruction.
Face2Scene uses facial restoration as an oracle to derive degradation codes that condition a diffusion model for restoring the entire degraded scene.
AdaScope adaptively selects optimal RL intervention points during diffusion denoising by monitoring structural and semantic changes, delivering 66% higher performance at 59% lower cost than full-trajectory RL baselines.
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
SlimDiffSR uses uncertainty-guided timestep assignment and structured pruning with frequency- and direction-separable convolutions plus MMD distillation to create a 200x faster, 20x smaller diffusion SR model for remote sensing while retaining competitive quality.
Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
DeCo decouples high- and low-frequency generation in pixel diffusion via a DiT plus lightweight decoder and a frequency-aware flow-matching loss, reaching FID 1.62 at 256x256 and 2.22 at 512x512 on ImageNet while closing the gap to latent diffusion methods.
ViPO enhances GRPO for visual generation by creating spatially and temporally aware advantage maps from pretrained vision models to focus optimization on perceptually important regions.
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.
citing papers explorer
-
When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
MasqLoRA shows that an independent LoRA adapter can be trained on a few trigger-target pairs to backdoor diffusion models with 99.8% success rate while remaining stealthy when the trigger is absent.
-
Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI
NeuroQuant is a modality-aware 3D VQ-VAE that uses dual-stream encoding, a shared anatomical codebook, and FiLM to achieve superior multi-modal brain MRI reconstruction.
-
Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration
Face2Scene uses facial restoration as an oracle to derive degradation codes that condition a diffusion model for restoring the entire degraded scene.
-
Do Less, Achieve More: Do We Need Every-Step Optimization for RL Fine-tuning of Diffusion Models?
AdaScope adaptively selects optimal RL intervention points during diffusion denoising by monitoring structural and semantic changes, delivering 66% higher performance at 59% lower cost than full-trajectory RL baselines.
-
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
-
SlimDiffSR: Toward Lightweight and Efficient Remote Sensing Image Super-Resolution via Diffusion Model Distillation
SlimDiffSR uses uncertainty-guided timestep assignment and structured pruning with frequency- and direction-separable convolutions plus MMD distillation to create a 200x faster, 20x smaller diffusion SR model for remote sensing while retaining competitive quality.
-
Bias at the End of the Score
Reward models used as quality scorers in text-to-image generation encode demographic biases that cause reward-guided training to sexualize female subjects, reinforce stereotypes, and reduce diversity.
-
EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure
EGLOCE erases target concepts in diffusion models at inference time by optimizing latents with dual energy guidance that repels unwanted concepts while retaining prompt alignment.
-
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
DeCo decouples high- and low-frequency generation in pixel diffusion via a DiT plus lightweight decoder and a frequency-aware flow-matching loss, reaching FID 1.62 at 256x256 and 2.22 at 512x512 on ImageNet while closing the gap to latent diffusion methods.
-
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
ViPO enhances GRPO for visual generation by creating spatially and temporally aware advantage maps from pretrained vision models to focus optimization on perceptually important regions.
-
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.