Part A: The Power of Diffusion Models

Precomputed Text Embeddings

num_inference_steps = 20

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

The pictures are all accurate to the prompts, but you can see there are some strange details, like the man being cross-eyed.

num_inference_steps = 10

num_inference_steps = 20

num_inference_steps = 100

I think the middle rocket looks best, but the rightmost one could be considered the most "detailed", with more colors and shading.

I used 180 as my seed.

Forward Process

Campanile

Noisy Campanile at t=250

Noisy Campanile at t=500

Noisy Campanile at t=750

Classical Denoising

Noisy Campanile at t=250

Noisy Campanile at t=500

Noisy Campanile at t=750

Gaussian Blur Denoising at t=250

Gaussian Blur Denoising at t=500

Gaussian Blur Denoising at t=750

One-Step Denoising

Noisy Campanile at t=250

Noisy Campanile at t=500

Noisy Campanile at t=750

One-Step Denoised Campanile at t=250

One-Step Denoised Campanile at t=500

One-Step Denoised Campanile at t=750

Iterative Denoising

Noisy Campanile at t=90

Noisy Campanile at t=240

Noisy Campanile at t=390

Noisy Campanile at t=540

Noisy Campanile at t=690

Original

Iteratively Denoised

One-Step Denoised

Gaussian Blurred

Diffusion Model Sampling

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Classifier-Free Guidance

These were a lot better than in the previous part.

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Image-to-Image Translation

With prompt "a high quality photo"

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Campanile

Trying this with my own images!

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Bins

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Guitar

I then tried this with images from the web and hand drawn images.

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Teddy Bear

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Spongebob

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Flower

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Duck

For the hand-drawn images, the i-start=20 images came extremely close to the original image! It even looks a little better because it adds proper shading, which can be seen in the duck.

Inpainting

By applying masks to images and only denoising within those masks (keeping the rest of the image as the original), we can create inpainted images.

Original

Mask

Hole to Fill

Inpainted Image

Original

Mask

Hole to Fill

Inpainted Image

Original

Mask

Hole to Fill

Inpainted Image

Text-Conditional Image to Image

This was essentially the same as the other image-to-imgae translation, but with new prompts instead of "a high quality photo"

"a rocket ship"

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Campanile

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Spongebob

"a photo of a dog"

i_start=1

i_start=3

i_start=5

i_start=7

i_start=10

i_start=20

Spongebob

Visual Anagrams

We can create visual anagrams by iteratively denoising both an image and its flipped version, each with different prompts. We average the two noise estimates at each step.

Prompts: "an oil painting of people around a campfire", "an oil painting of an old man"

Prompts: "an oil painting of a snowy mountain village", "a photo of the amalfi cost"

Prompts: "a guitar", "a wine bottle"

Hybrid Images

This used a similar technique as the previous part, but with noise estimates from high and low frequencies instead of an image and its flipped version. We can see the low frequency image when the image is far away (or smaller), and the high frequency image when the image is up close, or bigger.

Prompts: "a lithograph of a skull", "a lithograph of waterfalls"

Prompts: "a photo of a dog", "an oil painting of an old man"

I thought this one was particularly interesting because it interpreted "oil painting" as a picture of the painting itself, not making the whole picture an oil painting. The legs of the easel enable the dog to have legs!

Prompts: "a lithograph of a skull", "an oil painting of a snowy mountain village"

Click here to go to part B