SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Accept info: ICLR 2022
Authors: Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
Affiliation: Stanford University, Carnegie Mellon University
Links: arXiv, OpenReview, project page, GitHub
Task: guided image synthesis & editing
TLDR: Generate realistic images from strokes by adding small noise and then denoising with score-based models.
1. Intuition & Motivation
Key intuition of Stochastic Differential Editing (SDEdit): hijack the generative process of SDE-based generative models.
Add a suitable amount of noise to smooth out undesirable artifacts and distortions, while still preserving the overall structure of the input user guide.
2. SDEdit
2.1. Approach overview
- SDEdit
Given guide \(\mathbf{x}^{(g)}\), choose intermediate time \(t_0 \in (0, 1)\).
Sample \(\mathbf{x}^{(g)}(t_0) \sim \mathcal{N}(\mathbf{x}^{(g)}; \sigma^2(t_0) \mathbf{I})\).
Then produce \(\mathbf{x}(0)\) by iterating the reverse SDE. - SDEdit with mask (keep certain parts unchange)
editable region (\(\mathbf{\Omega}\)): simulate reverse SDE
uneditable region (\((1 - \mathbf{\Omega})\)): gradually reduce the noise magnitude to make sure image editable region and uneditable region have comparable amount of noise
(\((1 - \mathbf{\Omega}) \odot (\mathbf{x} + \sigma(t) \mathbf{z})\))
2.2. Algorithm
Algorithm 2 - Guided image synthesis and editing (VE-SDE)
Algorithm 3 - Guided image synthesis and editing with mask (VE-SDE)
Algorithm 4 - Guided image synthesis and editing (VP-SDE)
Algorithm 5 - Guided image synthesis and editing with mask (VP-SDE)
3. Experiments
3.1. Main results
- Tasks
Stroke-based image synthesis
Stroke-based image editing
Image compositing - Metrics Realism: Kernel Inception Score (KID), user study
Faithfulness: L2 distance, masked LPIPS, user study
Overall human satisfaction score (realism + faithfulness): user study - Implementation details
\(t_0 \in \left [ 0.3, 0.6 \right ]\) - Ablations
Method (timestep \(t_0\))
Quality of user guide
3.2. Figures
Stroke-based image synthesis
Stroke-based image editing
Image compositing
Ablation - timestep
3.3. Limitations
- Need score-based models trained on the target domain.