Learn Once, Edit Anywhere: Visual Direction Transfer for Diffusion Models

Hidir Yesiltepe; Pinar Yanardag; Yusuf Dalva

arxiv: 2403.19645 · v2 · pith:346TLTCCnew · submitted 2024-03-28 · 💻 cs.CV

Learn Once, Edit Anywhere: Visual Direction Transfer for Diffusion Models

Yusuf Dalva , Hidir Yesiltepe , Pinar Yanardag This is my paper

classification 💻 cs.CV

keywords diffusionviditvisualcontroldirectionediteditingonce

0 comments

read the original abstract

The rapid advancement of diffusion models has enabled the generation of high-fidelity images from textual prompts, yet achieving precise, disentangled control over specific attributes remains a significant challenge. A fundamental limitation arises because visual differences between images are often far more descriptive and nuanced than what can be captured through human-crafted text descriptions, which frequently fail to convey fine-grained semantic details. To address this, we introduce ViDiT (Visual Direction Transfer for Diffusion), a framework that expands the editing vocabulary by capturing latent semantics directly from image-edit pairs. ViDiT learns the underlying transformation by optimizing a single, global, and continuous editing direction from a small set of ``before-and-after'' examples. This optimization process transfers visual changes into the diffusion model's conditioning space, allowing for detailed edits that text alone cannot easily describe. ViDiT operates on a ``Learn Once'' principle, which completely eliminates the need for model fine-tuning or expensive per-image optimization during inference. Once learned, these continuous directions enable ``Edit Anywhere'' capabilities, allowing users to apply highly disentangled manipulations, such as changes in facial features, animal attributes, or artistic styles, to any image in a zero-shot manner with granular control over the edit intensity. Quantitative and qualitative evaluations demonstrate that ViDiT outperforms existing text-based editing methods in maintaining input faithfulness while achieving precise, scalable attribute control.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution
cs.CV 2026-05 unverdicted novelty 7.0

Presents Decoupled Time Guidance (DTG) for training-free generative video super-resolution by temporally decoupling conditional and unconditional diffusion signals.