Click here to go to part A

Part B: Diffusion Models from Scratch

Visualizing Noising Processes

We are using MNIST digits as our training/testing data for this denoiser. A noisy image is generated from a clean image by adding noise at different strengths. This is how the noise looks at different strengths.

img

Single-Step Denoising UNet

I first built a one-step denoiser, which optimizes over L2 loss. Below is the architecture.

img

I trained this denoiser to denoise a noisy image with σ=0.5, with a batch size of 256 and hidden dimension D=128. I used an Adam optimizer with a learning rate of 1e-4. The model was trained over 5 epochs.

img
img
Epoch 1
img
Epoch 5

Below is how the denoiser performs on different σ values.

img

DDPM

Time-Conditioned UNet

The time conditioned UNet injects a scalar t into the UNet architecture.

img
img
img

Class-Conditioned UNet

The class conditioning also adds a class-conditioning vector c.

img
img
Epoch 5
img
Epoch 20