Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.
hub
p+: Ex- tended textual conditioning in text-to-image generation
23 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.
IREU improves identity unlearning in CPG by offline location of identity features followed by targeted perturbations, outperforming global updates while preserving fidelity for retained identities and generalizing across generators.
Prompt-aware weighting strategies W-Switch and W-Composite improve multi-concept LoRA composition in diffusion models without training.
Equilibrated Diffusion decomposes concepts in frequency space to independently optimize subject and style embeddings, plus mask-guided diffusion and residual reference attention, for improved subject fidelity and text alignment over baselines.
SlimDiffSR uses uncertainty-guided timestep assignment and structured pruning with frequency- and direction-separable convolutions plus MMD distillation to create a 200x faster, 20x smaller diffusion SR model for remote sensing while retaining competitive quality.
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
NP-LoRA fuses subject and style LoRAs via null-space projection of the content update onto the orthogonal complement of the style subspace, with a soft variant controlled by one parameter.
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
DreamAudio generates audio clips that incorporate user-specified personalized audio events from reference samples while remaining aligned with text prompts.
OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
A scalable pipeline generates an intra-consistent, inter-diverse 1.4M style image dataset from text-to-image models and uses it to train a style encoder and generalizable style transfer model.
PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.
TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
FA-Seg delivers state-of-the-art training-free open-vocabulary segmentation performance (43.8% mIoU average) on standard benchmarks by extracting and refining attention from a single forward pass of a pretrained diffusion model.
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
Early DC component convergence in text-to-image Transformer features causes output homogeneity; selective early attenuation via DAVE improves diversity without retraining or extra cost.
TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.
citing papers explorer
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.