hub

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu · 2021 · cs.CV · arXiv 2108.01073

29 Pith papers cite this work. Polarity classification is still indexing.

29 Pith papers citing it

open full Pith review browse 29 citing papers arXiv PDF

abstract

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.

RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

RevealLayer decomposes natural images into multiple RGBA layers using diffusion models with region-aware attention, occlusion-guided adaptation, and a composite loss, outperforming prior methods on a new benchmark dataset.

A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.

ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.

GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

GeoEdit constructs local tangent frames from small perturbations to initial noise, enabling Jacobian-free on-manifold edits in diffusion models via alternating tangent steps and diffusion projections.

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

cs.GR · 2026-04-23 · unverdicted · novelty 7.0

StyleID supplies human-perception-aligned benchmarks and fine-tuned encoders that improve facial identity recognition robustness across stylization types and strengths.

Latent Fourier Transform

cs.SD · 2026-04-20 · unverdicted · novelty 7.0

LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

Your Pre-trained Diffusion Model Secretly Knows Restoration

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Pre-trained diffusion models inherently support image restoration that can be unlocked by optimizing prompt embeddings at the text encoder output using a diffusion bridge formulation, achieving competitive results on models like WAN and FLUX without fine-tuning.

CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

CAMEO uses coordinated agents for planning, prompting, generation, and quality feedback to achieve higher structural reliability in conditional image editing than single-step models.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

cs.CV · 2021-12-20 · accept · novelty 7.0

A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.

Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

V2V-Zero adapts frozen VLMs for visual conditioning via hidden states from specification pages, scoring 0.85 on GenEval and 32.7 on a new seven-task benchmark while revealing capability hierarchies in attribute binding and structural control.

Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations

cs.CE · 2026-05-09 · unverdicted · novelty 6.0

Anisotropic SPDEs preserve geometric data structure over longer timescales in score-based generative modeling, yielding better image quality than standard SDE baselines and flow matching in unconditional and conditional tasks.

Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

OT-Bridge Editor reframes localized image editing as a constrained entropic optimal transport problem to generate synthetic coronary angiograms that boost downstream stenosis detection by 27.8% on ARCADE and 23.0% on a multi-center dataset.

Conservative Flows: A New Paradigm of Generative Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Conservative flows generate by running probability-preserving stochastic dynamics initialized at data points rather than noise, using corrected Langevin or predictor-corrector mechanisms on top of any pretrained flow model and showing gains on Swiss-roll, ImageNet-256 and Oxford Flowers-102.

Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.

MooD: Perception-Enhanced Efficient Affective Image Editing via Continuous Valence-Arousal Modeling

cs.CV · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MooD introduces continuous valence-arousal modeling with VA-aware retrieval and perception-enhanced guidance for efficient, controllable affective image editing, plus a new AffectSet dataset.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

FluSplat: Sparse-View 3D Editing without Test-Time Optimization

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

cs.CV · 2026-03-27 · unverdicted · novelty 6.0

MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.

citing papers explorer

Showing 19 of 19 citing papers after filters.

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models cs.CV · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition cs.CV · 2026-05-12 · unverdicted · none · ref 53 · internal anchor
RevealLayer decomposes natural images into multiple RGBA layers using diffusion models with region-aware attention, occlusion-guided adaptation, and a composite loss, outperforming prior methods on a new benchmark dataset.
ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent cs.CV · 2026-04-28 · unverdicted · none · ref 12 · internal anchor
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unverdicted · none · ref 53 · 2 links · internal anchor
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 36 · internal anchor
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
Your Pre-trained Diffusion Model Secretly Knows Restoration cs.CV · 2026-04-06 · unverdicted · none · ref 36 · internal anchor
Pre-trained diffusion models inherently support image restoration that can be unlocked by optimizing prompt embeddings at the text encoder output using a diffusion bridge formulation, achieving competitive results on models like WAN and FLUX without fine-tuning.
CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator cs.CV · 2026-04-03 · unverdicted · none · ref 25 · internal anchor
CAMEO uses coordinated agents for planning, prompting, generation, and quality feedback to achieve higher structural reliability in conditional image editing than single-step models.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 53 · internal anchor
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 15 · internal anchor
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm cs.CV · 2026-05-12 · unverdicted · none · ref 20 · internal anchor
V2V-Zero adapts frozen VLMs for visual conditioning via hidden states from specification pages, scoring 0.85 on GenEval and 32.7 on a new seven-task benchmark while revealing capability hierarchies in attribute binding and structural control.
Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport cs.CV · 2026-05-09 · unverdicted · none · ref 8 · internal anchor
OT-Bridge Editor reframes localized image editing as a constrained entropic optimal transport problem to generate synthetic coronary angiograms that boost downstream stenosis detection by 27.8% on ARCADE and 23.0% on a multi-center dataset.
MooD: Perception-Enhanced Efficient Affective Image Editing via Continuous Valence-Arousal Modeling cs.CV · 2026-05-04 · unverdicted · none · ref 21 · 2 links · internal anchor
MooD introduces continuous valence-arousal modeling with VA-aware retrieval and perception-enhanced guidance for efficient, controllable affective image editing, plus a new AffectSet dataset.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement cs.CV · 2026-04-30 · unverdicted · none · ref 30 · internal anchor
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
FluSplat: Sparse-View 3D Editing without Test-Time Optimization cs.CV · 2026-04-21 · unverdicted · none · ref 30 · internal anchor
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model cs.CV · 2026-03-27 · unverdicted · none · ref 41 · internal anchor
MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 50 · internal anchor
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 28 · internal anchor
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
Towards Robust Sequential Decomposition for Complex Image Editing cs.CV · 2026-05-10 · unverdicted · none · ref 20 · internal anchor
Sequential decomposition trained on synthetic editing tasks improves robustness for complex image instructions and transfers to real images via co-training.
Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unverdicted · none · ref 33 · internal anchor
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods on three benchmarks.

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer