Learn Once, Edit Anywhere: Visual Direction Transfer for Diffusion Models
read the original abstract
The rapid advancement of diffusion models has enabled the generation of high-fidelity images from textual prompts, yet achieving precise, disentangled control over specific attributes remains a significant challenge. A fundamental limitation arises because visual differences between images are often far more descriptive and nuanced than what can be captured through human-crafted text descriptions, which frequently fail to convey fine-grained semantic details. To address this, we introduce ViDiT (Visual Direction Transfer for Diffusion), a framework that expands the editing vocabulary by capturing latent semantics directly from image-edit pairs. ViDiT learns the underlying transformation by optimizing a single, global, and continuous editing direction from a small set of ``before-and-after'' examples. This optimization process transfers visual changes into the diffusion model's conditioning space, allowing for detailed edits that text alone cannot easily describe. ViDiT operates on a ``Learn Once'' principle, which completely eliminates the need for model fine-tuning or expensive per-image optimization during inference. Once learned, these continuous directions enable ``Edit Anywhere'' capabilities, allowing users to apply highly disentangled manipulations, such as changes in facial features, animal attributes, or artistic styles, to any image in a zero-shot manner with granular control over the edit intensity. Quantitative and qualitative evaluations demonstrate that ViDiT outperforms existing text-based editing methods in maintaining input faithfulness while achieving precise, scalable attribute control.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution
Presents Decoupled Time Guidance (DTG) for training-free generative video super-resolution by temporally decoupling conditional and unconditional diffusion signals.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.