SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Accept info: ICLR 2022
Authors: Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon
Affiliation: Stanford University, Carnegie Mellon University
Links: arXiv, OpenReview, project page, GitHub
Task: guided image synthesis & editing
TLDR: Generate realistic images from strokes by adding small noise and then denoising with score-based models.

1. Intuition & Motivation

Figure2

Key intuition of Stochastic Differential Editing (SDEdit): hijack the generative process of SDE-based generative models.

Add a suitable amount of noise to smooth out undesirable artifacts and distortions, while still preserving the overall structure of the input user guide.

2. SDEdit

2.1. Approach overview

  • SDEdit
    Given guide \(\mathbf{x}^{(g)}\), choose intermediate time \(t_0 \in (0, 1)\).
    Sample \(\mathbf{x}^{(g)}(t_0) \sim \mathcal{N}(\mathbf{x}^{(g)}; \sigma^2(t_0) \mathbf{I})\).
    Then produce \(\mathbf{x}(0)\) by iterating the reverse SDE.
  • SDEdit with mask (keep certain parts unchange)
    editable region (\(\mathbf{\Omega}\)): simulate reverse SDE
    uneditable region (\((1 - \mathbf{\Omega})\)): gradually reduce the noise magnitude to make sure image editable region and uneditable region have comparable amount of noise
    (\((1 - \mathbf{\Omega}) \odot (\mathbf{x} + \sigma(t) \mathbf{z})\))

2.2. Algorithm

Algorithm 2 - Guided image synthesis and editing (VE-SDE) Algorithm2
Algorithm 3 - Guided image synthesis and editing with mask (VE-SDE) Algorithm3
Algorithm 4 - Guided image synthesis and editing (VP-SDE) Algorithm4
Algorithm 5 - Guided image synthesis and editing with mask (VP-SDE) Algorithm5

3. Experiments

3.1. Main results

  • Tasks
    Stroke-based image synthesis
    Stroke-based image editing
    Image compositing
  • Metrics Realism: Kernel Inception Score (KID), user study
    Faithfulness: L2 distance, masked LPIPS, user study
    Overall human satisfaction score (realism + faithfulness): user study
  • Implementation details
    \(t_0 \in \left [ 0.3, 0.6 \right ]\)
  • Ablations
    Method (timestep \(t_0\))
    Quality of user guide

3.2. Figures

Stroke-based image synthesis Figure4 Figure5
Stroke-based image editing Figure6
Image compositing Figure7
Ablation - timestep Figure3

3.3. Limitations

  1. Need score-based models trained on the target domain.