hub

p+: Ex- tended textual conditioning in text-to-image generation

· 2023 · arXiv 2303.09522

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Inline Critic Steers Image Editing

cs.CV · 2026-05-12 · conditional · novelty 7.0

Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.

PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

cs.CV · 2026-03-09 · unverdicted · novelty 7.0

DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.

SlimDiffSR: Toward Lightweight and Efficient Remote Sensing Image Super-Resolution via Diffusion Model Distillation

cs.CV · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

SlimDiffSR uses uncertainty-guided timestep assignment and structured pruning with frequency- and direction-separable convolutions plus MMD distillation to create a 200x faster, 20x smaller diffusion SR model for remote sensing while retaining competitive quality.

PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.

NP-LoRA: Null Space Projection for Subject-Style LoRA Fusion

cs.CV · 2025-11-14 · unverdicted · novelty 6.0

NP-LoRA fuses subject and style LoRAs via null-space projection of the content update onto the orthogonal complement of the style subspace, with a soft variant controlled by one parameter.

Adversarial Concept Distillation for One-Step Diffusion Personalization

cs.CV · 2025-10-23 · unverdicted · novelty 6.0

OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.

DreamAudio: Customized Text-to-Audio Generation with Diffusion Models

cs.SD · 2025-09-07 · unverdicted · novelty 6.0

DreamAudio generates audio clips that incorporate user-specified personalized audio events from reference samples while remaining aligned with text prompts.

OmniPrism: Learning Disentangled Visual Concept for Image Generation

cs.CV · 2024-12-16 · unverdicted · novelty 6.0

OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.

DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.

FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer

cs.CV · 2026-04-11 · unverdicted · novelty 5.0

FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

A scalable pipeline generates an intra-consistent, inter-diverse 1.4M style image dataset from text-to-image models and uses it to train a style encoder and generalizable style transfer model.

PureCC: Pure Learning for Text-to-Image Concept Customization

cs.CV · 2026-03-08 · unverdicted · novelty 5.0

PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.

TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration

cs.CV · 2026-01-28 · unverdicted · novelty 5.0

TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

cs.CV · 2025-06-30 · unverdicted · novelty 5.0

SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.

FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation

cs.CV · 2025-06-29 · unverdicted · novelty 5.0

FA-Seg delivers state-of-the-art training-free open-vocabulary segmentation performance (43.8% mIoU average) on standard benchmarks by extracting and refining attention from a single forward pass of a pretrained diffusion model.

Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

cs.CV · 2025-05-26 · unverdicted · novelty 5.0

Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.

TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation

cs.CV · 2024-09-12 · unverdicted · novelty 4.0

TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.

ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation

cs.CV · 2025-06-23

citing papers explorer

Showing 19 of 19 citing papers.

Inline Critic Steers Image Editing cs.CV · 2026-05-12 · conditional · none · ref 45
Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.
PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space cs.LG · 2026-04-03 · unverdicted · none · ref 44
PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation cs.CV · 2026-03-09 · unverdicted · none · ref 61
DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.
SlimDiffSR: Toward Lightweight and Efficient Remote Sensing Image Super-Resolution via Diffusion Model Distillation cs.CV · 2026-05-04 · unverdicted · none · ref 42 · 2 links
SlimDiffSR uses uncertainty-guided timestep assignment and structured pruning with frequency- and direction-separable convolutions plus MMD distillation to create a 200x faster, 20x smaller diffusion SR model for remote sensing while retaining competitive quality.
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios cs.CV · 2026-04-15 · unverdicted · none · ref 40
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional loss plus geometric priors to preserve correct component relationships.
NP-LoRA: Null Space Projection for Subject-Style LoRA Fusion cs.CV · 2025-11-14 · unverdicted · none · ref 12
NP-LoRA fuses subject and style LoRAs via null-space projection of the content update onto the orthogonal complement of the style subspace, with a soft variant controlled by one parameter.
Adversarial Concept Distillation for One-Step Diffusion Personalization cs.CV · 2025-10-23 · unverdicted · none · ref 87
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
DreamAudio: Customized Text-to-Audio Generation with Diffusion Models cs.SD · 2025-09-07 · unverdicted · none · ref 62
DreamAudio generates audio clips that incorporate user-specified personalized audio events from reference samples while remaining aligned with text prompts.
OmniPrism: Learning Disentangled Visual Concept for Image Generation cs.CV · 2024-12-16 · unverdicted · none · ref 40
OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing cs.CV · 2026-05-16 · unverdicted · none · ref 41
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer cs.CV · 2026-04-11 · unverdicted · none · ref 30
FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping cs.CV · 2026-04-09 · unverdicted · none · ref 48
A scalable pipeline generates an intra-consistent, inter-diverse 1.4M style image dataset from text-to-image models and uses it to train a style encoder and generalizable style transfer model.
PureCC: Pure Learning for Text-to-Image Concept Customization cs.CV · 2026-03-08 · unverdicted · none · ref 44
PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.
TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration cs.CV · 2026-01-28 · unverdicted · none · ref 84
TPGDiff introduces hierarchical triple-prior guidance in a diffusion network, placing degradation priors throughout, structural priors in shallow layers, and semantic priors in deep layers for improved all-in-one image restoration.
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation cs.CV · 2025-06-30 · unverdicted · none · ref 81
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation cs.CV · 2025-06-29 · unverdicted · none · ref 45
FA-Seg delivers state-of-the-art training-free open-vocabulary segmentation performance (43.8% mIoU average) on standard benchmarks by extracting and refining attention from a single forward pass of a pretrained diffusion model.
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift cs.CV · 2025-05-26 · unverdicted · none · ref 23
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation cs.CV · 2024-09-12 · unverdicted · none · ref 43
TextBoost is a one-shot personalization technique that selectively fine-tunes the text encoder of diffusion models using causality-preserving adaptation and lightweight adapters to reduce parameters and storage.
ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation cs.CV · 2025-06-23 · unreviewed · ref 9

p+: Ex- tended textual conditioning in text-to-image generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer