SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Accept info: ICLR 2022
Authors: Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
Affiliation: Stanford University, Carnegie Mellon University
Links: arXiv, OpenReview, project page, GitHub
Task: guided image synthesis & editing
TLDR: Generate realistic images from strokes by adding small noise and then denoising with score-based models.

1. Intuition & Motivation

Key intuition of Stochastic Differential Editing (SDEdit): hijack the generative process of SDE-based generative models.

Add a suitable amount of noise to smooth out undesirable artifacts and distortions, while still preserving the overall structure of the input user guide.

2. SDEdit

2.1. Approach overview

SDEdit
Given guide \(\mathbf{x}^{(g)}\), choose intermediate time \(t_0 \in (0, 1)\).
Sample \(\mathbf{x}^{(g)}(t_0) \sim \mathcal{N}(\mathbf{x}^{(g)}; \sigma^2(t_0) \mathbf{I})\).
Then produce \(\mathbf{x}(0)\) by iterating the reverse SDE.
SDEdit with mask (keep certain parts unchange)
editable region (\(\mathbf{\Omega}\)): simulate reverse SDE
uneditable region (\((1 - \mathbf{\Omega})\)): gradually reduce the noise magnitude to make sure image editable region and uneditable region have comparable amount of noise
(\((1 - \mathbf{\Omega}) \odot (\mathbf{x} + \sigma(t) \mathbf{z})\))

2.2. Algorithm

Algorithm 2 - Guided image synthesis and editing (VE-SDE)

Algorithm 3 - Guided image synthesis and editing with mask (VE-SDE)

Algorithm 4 - Guided image synthesis and editing (VP-SDE)

Algorithm 5 - Guided image synthesis and editing with mask (VP-SDE)

3. Experiments

3.1. Main results

Tasks
Stroke-based image synthesis
Stroke-based image editing
Image compositing
Metrics Realism: Kernel Inception Score (KID), user study
Faithfulness: L2 distance, masked LPIPS, user study
Overall human satisfaction score (realism + faithfulness): user study
Implementation details
\(t_0 \in \left [ 0.3, 0.6 \right ]\)
Ablations
Method (timestep \(t_0\))
Quality of user guide

3.2. Figures

Stroke-based image synthesis

Stroke-based image editing

Image compositing

Ablation - timestep

3.3. Limitations

Need score-based models trained on the target domain.