0% found this document useful (0 votes)

113 views58 pages

Stable Diffusion

Uploaded by

Seuneedhi Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views58 pages

Stable Diffusion

Uploaded by

Seuneedhi Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

Stable

Diffusion
What’s the deal with all these
pictures?
These pictures were generated by Stable
Diffusion, a recent diffusion generative model.
It can turn text prompts (e.g. “an astronaut riding
a horse”) into images.
It can also do a variety of other things!

DALL·E 2, which works in a similar way.

Why should we
care?
Could be a model of imagination.

Similar techniques could be used to

generate any number of things (e.g. neural
data).
"a lovely cat running
in the desert in Van
Gogh style, trending
It’s
art." cool!
How does it
work?
It’s complicated…
but here’s the high-level
idea.

“Batman eating pizza

in a diner"
What do we “bad stick figure
drawing"
need?
Example pictures of
people

1. Method of learning to generate new stuff given many

examples
What do we need?
2. Way to link text and
images

“cool professor person”

3. Way to compress images

(for speed in training and
generation)
What do we need?
4. Way to add in good image-related inductive biases…

…since when you’re generating something new, you need

a way to safely go beyond the images you’ve seen before.

What is Inductive Bias?

• Inductive bias refers to the set of assumptions or preferences that a machine learning algorithm brings to the
table when learning from data. These assumptions guide the algorithm in making predictions about unseen
data.
• Why it's Important: Without inductive bias, a machine learning model would be unable to generalize from
its training data. It would have no way to choose between the infinite number of possible hypotheses that
could explain the data.
What do we need?
1. Method of learning to generate new Forward/reverse
stuff dffusion
2. Way to link text and Text-image representation
images model

3. Way to compress Autoencode

images r

4. Way to add in good inductive U-net + ‘attention’

architecture
biases
Making a ‘good’ generative model is about making all these parts work together
well!
Stable Diffusion in Action
Cartoon with StableDiffusion + Cartoon

https://www.reddit.com/r/Sta
bleDiffusion/comments/xcjj7u
/sd_img2img_after_effects_i
_generated_2_images_and/
Outlin
e
• Build Stable Diffusion “from Scratch”
• Principle of Diffusion models (sampling, learning)
• Diffusion for Images – UNet architecture
• Understanding prompts – Word as vectors, CLIP
• Let words modulate diffusion – Conditional Diffusion, Cross
Attention
• Diffusion in latent space – AutoEncoderKL
• Training on Massive Dataset. – LAION 5Billion
Principle of Diffusion
Models
Learning to generate by iterative
denoising.
“Creating noise from data is
easy; Creating data from noise is generative
modeling.”
-- Song,
Yang
Diffusion models
• Forward diffusion (noising)
• 𝑥0 → 𝑥1 → ⋯ 𝑥𝑇
• Take a data distribution 𝑥0 ~𝑝(𝑥), turn it into noise
by diffusion 𝑥 𝑇 ~𝒩 0, 𝜎 2 𝐼

𝒙 𝒙𝟏 𝒙
𝒙𝑻−
𝟎 𝟏 𝑻
• Reverse diffusion
(denoising)
• 𝑥𝑇 → 𝑥𝑇−1 → ⋯
𝑥0
Math Formalism
• For a forward diffusion process
𝑑𝒙 = 𝑓 𝒙, 𝑡 𝑑𝑡 + 𝑔 𝑡 𝑑𝒘

• There is a backward diffusion process that reverse the

𝑑𝒙 = 𝑓 𝑥, 𝑡 − 𝑔 𝑡 2∇𝑥 log 𝑝(𝒙, 𝑡) 𝑑𝑡

time

+𝑔 𝑡 𝑑𝒘
• If we know the time-dependent score function ∇𝑥 log 𝑝(𝒙, 𝑡)
• Then we can reverse the diffusion process.
Modelling Score
function over Image
Domain
Introducing
UNet
Convolutional Neural Network
Features of larger scale (larger • CNN parametrizes
RF) function over images
Higher abstraction level.
• Motivation
• Features are translational
invariant
• Extract feature at different
scale / abstraction level

• Key modules
• Convolution
• Downsamping (Max-pool)

VGG
CNN + inverted CNN ⇒
UNet
• Inverted CNN
(generator)
can generate
images.
• CNN + inverted CNN

→
Down Up could model Image
Sampling Sampling
Convolution TransposedConvolution
Image
function.
UNet: a natural architecture for image-
to- image function
Skip connection
Transporting information
at the same resolution.

Down Up
(sampling) (sampling)
side side
Encode Decoder
r
Key Ingredients of
UNet
• Convolution operation
• Save parameter,
spatial invariant

• Down/Up sampling
• Multiscale / Hierarchy
• Learn modulation at
multi scale
and multi-abstraction
levels.

• Skip
connection
• No bottleneck
• Route feature
of the same
scaledirectly.
Note: Add Time Dependency
• The score function is time-
𝑡
dependent.
• Target: 𝑠 𝑥, 𝑡 = ∇𝑥 log 𝑝(𝑥,
embedding

𝑡) ⊕

Linear/
MLP
�
• Add time Conv
[𝐬𝐢𝐧
�
tensor

𝝎𝒊 𝒕 ,
dependency
𝐜𝐨𝐬
• Assume time dependency is

𝝎𝒊 𝒕 ,
spatially
…]
homogeneous.
• Add one scalar value per channel 𝑓(𝑡)

• Parametrize 𝑓(𝑡) by MLP / linear of Fourier

basis.
How to understand
prompts?
Language
CLIP!
/ Multimodal Transformer,
Word as Vectors: Language Model 101
• Unlike pixel, meaning of word
are not explicitly in the Words in
a I cats and dogs .
characters. sentence love
Token 328 793 3989, 537 3255,
• Word can be represented as Index , , , 269

index in dictionary
• But index is also meaning less. Word
Vectors

• Represent words in a vector

space
• Vector geometry => semantic
relation.
Word Vector in N layers
……
Context: RNN /
Transformers
• Meaning of word depends on
context, not always the same.
Transformer
• “I a
book
ticket to buy that book.” Block

• Word vectors should depend on

context.

• Transformers let each word “absorb” Transformer

Block
influence from other words to be
“contextualized”

More on attention
later…
Learning Word
Vectors: GPT & BERT
& CLIP
• Self-supervised learning of
Downstream Classifier can
decode:
Part of speech, Sentiment, …
word representation

• Predicting missing / next words

in a sentence. (BERT, GPT)

• Contrastive Learning,
matching image and text.
(CLIP)

MLM — Sentence-Transformers documentation (sbert.net)

Joint Representation for Vision and Language
: CLIP
Transformer • Learn a joint
encoding space for
text caption and
image
• Maximize
Vision representation similarity
Transformer
between an image and
its caption.

• Minimize other pairs

CLIP paper
Choice of text encoding
• Encoder in Stable Diffusion: pre-trained CLIP ViT-L/14 text encoder

• Word vector can be randomly initialized and learned online.

• Representing other conditional signals

• Object categories (e.g. Shark, Trout, etc.):

• 1 vector per class

• Face attributes (e.g. {female, blonde hair, with glasses, …}, {male, short hair, dark
skin}):
• set of vectors, 1 vector per attributes

• Time to be creative!!
How does text affect
diffusion?
Incoming Cross
Attention
Origin of Attention:
Machine Translation
(Seq2Seq)
Original French
sentence I love cats and dogs Translation J'adore les chats et les
. chiens.
Encoder Decoder
𝑒1 𝑒2 𝑒3 𝑒4 𝑒5 𝑒6 ℎ ℎ2 ℎ3
hidden hidden
state state 1
(Word (Word
Vector Vectors
s) )

• Use Attention to retrieve useful info from a batch of

vectors.
From Dictionary to
Dictionary:
AttentionHard-
indexing
• Keys 1,2,3
• `dic = {1 : 𝑣
𝑣11,, 2 : 𝑣 2 , 3 : 𝑣3}` 𝟏 𝟐
• Values
𝑣2 , 𝑣3 𝟑 0
• `dic[2]` × =
𝑣1 𝑣 2 𝑣 3
1
• Query 2 𝑣2
• Find 2 in keys
0

• Get corresponding
value.
• Retrieving values as matrix vector
product
• One hot vector over the keys
• Matrix vector product
From Dictionary to
Attention
Attention: Soft-indexing
• Soft indexing
• Define an attention distribution 𝟏 𝟐

𝑎 over the keys 𝟑 0.1

× 0.8 =
𝑣1 𝑣 2 𝑣 3 0.1
• Matrix vector product. 0.8 𝑣 2 +0.1

𝑣 1 +0.1 𝑣 3

• Distribution based on
similarity of query and
key.
QKV attention
• Query : what I need (J’adore : “I want subject pronoun &
verb”)
• Key : what the target provide (I : “Here is the subject”)
• Value : the information to be retrieved (latent related to Je or
J’ )

• Linear projection of “word vector”

• Query𝑞𝑖 = 𝑊𝑞 ℎ𝑖
• Key 𝑘𝑗 = 𝑊𝑘𝑒𝑗
• Value 𝑣𝑗 = 𝑊𝑣 𝑒𝑗
Attention mechanism
• Compute the inner product (similarity) of key 𝑘 and query
𝑞

𝑘� 𝑞
• SoftMax the normalized score as𝑇attention distribution.
𝑎𝑖 = , ෍ 𝑖𝑎 =
SoftMax
𝑗 𝑙𝑖 𝑒𝑛( 1� 𝑗
�

𝑞) �

values 𝑣.
• Use attention distribution to weighted average

𝑐𝑖 = ෍ 𝑎𝑖𝑗𝑣𝑗
𝑗
Attention matrix 𝒂𝒊𝒋
Visualizing

• French 2 English
• “Learnt to pay Attention”
• “la zone
economique
europeenne” -> “the
European Economic
Area”

• “a ete signe” -> “was

signed”
Cross & Self Attention
• Cross Attention
• Tokens in one language pay
French
attention to tokens in another. Translation J'adore les chats

• Self Attention (𝑒 𝑖 = ℎ𝑖 )
Decoder
ℎ ℎ2 ℎ3
hidden
state 1
• Tokens in a language pay attention (Word
Vectors
to )
each other.
“A robot must obey the order given
it.”

https://jalammar.github.io/illustrated-gpt2/
“A robot must obey the order given
it.”

https://jalammar.github.io/illustrated-gpt2/
https://jalammar.github.io/illustrated-gpt2/
Note: Feed Forward
network
• Attention is usually followed by
a 2-layer MLP and
Normalization

• Learn nonlinear transform.

Text2Image as
translation
Source language: Target language:
Images
Words
Spatial
Dimensions
Sequence
Dimensions Channel
Dimensions
Encoded Latent State
Word of
Vectors Image
Patch
“ A ballerina chasing her cat Vectors!

running
on the grass in the style of Monet
"
Text2Image as Spatial
translation
Sequence
Dimensions
Channel
Dimensions
Dimensions
Encoded Latent State
Word of
Vectors Image

“ A ballerina her cat

chasing running
on the grass in the style of Monet
"

Cross Self
Attention: Attention:
Image to Image to
Words Image
Spatial Transformer
• Rearrange spatial tensor
to sequence.
• Cross Attention
• Self Attention
• FFN
• Rearrange back to spatial
tensor
(same shape)
Tips: Implementing attention `einops` lib
• `einops.rearrange` function
• Shift order of axes
• Split / combine dimension.

• `torch.einsum` function
• Multiply & sum tensors
along axes.
• Down • Up
UNet = Giant Sandwich blocks
ResBlock blocks
ResBlock
ResBlock
of Spatial transformer + SpatialTransformer
ResBlock
ResBlock
UpSample

ResBlock (Conv layer) SpatialTransformer

DownSample
ResBlock
SpatialTransformer
ResBlock ResBlock
SpatialTransformer
SpatialTransformer
ResBlock
ResBlock
SpatialTransformer
SpatialTransformer UpSample
DownSample ResBlock
ResBlock SpatialTransformer
SpatialTransformer ResBlock
SpatialTransformer
ResBlock
ResBlock
SpatialTransformer
SpatialTransformer
DownSample UpSample
ResBlock ResBlock
ResBlock SpatialTransformer
ResBlock
SpatialTransformer
ResBlock
SpatialTransformer
Spatial transformer + ResBlock (Conv
layer)
Time

�
embedding

𝟐𝟖𝟎
�
Latent
Spatial Spatial
𝟒,
tensor Resblock Resblock
Transformer Transformer
𝟔𝟒, 𝟔𝟒

Word

𝑳𝒔𝒆𝒒,
Vectors

𝟕𝟖𝟒
• Alternating Time and Word Modulation
• Alternating Local and Nonlocal
Diffusion in Latent
Space
Adding in
AutoEncoder

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-
resolution image synthesis with latent diffusion models, CVPR
Diffusion in latent DownSampling

space
•
32 pix 180
pix
Motivation:
• Natural images are high dimensional
• but have many redundant details that could
be
compressed / statistically filled out
𝑑 = 𝑑 =
2352 97200
• Division of
• Diffusion model -> Generate low resolution
labor
sketch
• AutoEncoder -> Fill out high resolution details

• Train a VAE model to compress images into

latent space.
• 𝑥→𝑧→𝑥 𝑧
• Train diffusion models in latent space of 𝑧. 𝑥
[4,512/𝑓,
512/𝑓] ��ො
[3,512,51 [3,512,51
2] 2]
Spatial Compression Tradeoff
• LDM-{𝑓}. 𝑓 = Spatial downsampling factor
• Higher 𝑓 leads to faster sampling, with degraded image quality (FID ↑)
• Fewer sampling steps leads to faster sampling, with lower quality (FID
↑)

Face
CelebA- ImageNet
HQ
Spatial Compression Tradeoff
• LDM-{𝑓}. 𝑓 = Spatial downsampling factor
• Too little compression 𝑓 = 1,2 or too much compression 𝑓 = 32,
makes
diffusion hard to train.
Details in Stable Diffusion
• In stable diffusion, spatial downsampling 𝑓 =
8

• 𝑥 is (3, 512, 512) image tensor

• 𝑧 is (4, 64, 64) latent tensor
Regularizing the Latent Space
• KL regularizer
• Similar to VAE, make latent distribution like Gaussian distribution.

• VQ regularizer
• Make the latent representation quantized to be a set of discrete
tokens.
Let the GPUs
roar!
Training
details.
data &
Large Data Training
• SD is trained on ~ 2 Billion image – caption (English)
pairs.

• Scraped from web, filtered by CLIP.

• https://laion.ai/blog/laion-5b/
Diffusion Process Visualized
Meaning of latent space
• Latent state contains a “sketch version” of the
image.

𝑧[0:
3, : , : ]

Understanding Stable Diffusion
100% (1)
Understanding Stable Diffusion
66 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
How Does Stable Diffusion Work
No ratings yet
How Does Stable Diffusion Work
79 pages
Stable Diffusion
No ratings yet
Stable Diffusion
6 pages
AI Based Image Generaton For Industrial Design
No ratings yet
AI Based Image Generaton For Industrial Design
25 pages
Tutorial Multidiffusion Upscaler How To + Workflow V 1.2
100% (1)
Tutorial Multidiffusion Upscaler How To + Workflow V 1.2
14 pages
Lecture 06 - Diffusion Models & Negative Prompting
0% (1)
Lecture 06 - Diffusion Models & Negative Prompting
22 pages
SKGGen AI
No ratings yet
SKGGen AI
27 pages
Stable Diffusion Prompts Article
No ratings yet
Stable Diffusion Prompts Article
13 pages
Dalle 3 Playbook
100% (10)
Dalle 3 Playbook
127 pages
ChatGPT - An Honest Manual
No ratings yet
ChatGPT - An Honest Manual
35 pages
FLUX Workflow Guide for AI Artists
100% (1)
FLUX Workflow Guide for AI Artists
15 pages
Digital Painting With Stable Diffusion AI-Assisted Art Generation For Beginners (Alex Morris) (Z-Library)
100% (1)
Digital Painting With Stable Diffusion AI-Assisted Art Generation For Beginners (Alex Morris) (Z-Library)
243 pages
Generative AI Art - A Beginner's Guide To 10x Your Thon & Statistics For Beginners) - Oliver Theobald
100% (2)
Generative AI Art - A Beginner's Guide To 10x Your Thon & Statistics For Beginners) - Oliver Theobald
116 pages
AI Prompting & LLM Guide by Bhowmick
No ratings yet
AI Prompting & LLM Guide by Bhowmick
64 pages
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
No ratings yet
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
12 pages
AI Influencer Book
No ratings yet
AI Influencer Book
35 pages
ControlNet: Conditional Control for Diffusion Models
No ratings yet
ControlNet: Conditional Control for Diffusion Models
12 pages
Beginner's Guide to Generative Art
No ratings yet
Beginner's Guide to Generative Art
14 pages
AI Influencer Prompts and Guide by Ni3
No ratings yet
AI Influencer Prompts and Guide by Ni3
2 pages
Tech AI Magazine - December 2024
No ratings yet
Tech AI Magazine - December 2024
46 pages
Dream Studio Guide for Beginners
100% (1)
Dream Studio Guide for Beginners
17 pages
AI Tuts - Stable Diffusion Prompt Engineering Guide For Beginners
100% (1)
AI Tuts - Stable Diffusion Prompt Engineering Guide For Beginners
15 pages
AI & Google Tools
No ratings yet
AI & Google Tools
24 pages
AI Art Series - Best Text Prompts To Create Stunning AI Art
No ratings yet
AI Art Series - Best Text Prompts To Create Stunning AI Art
2 pages
AI Images For Every One Ebook
100% (8)
AI Images For Every One Ebook
36 pages
No Code Agent
100% (1)
No Code Agent
15 pages
Gum Road
No ratings yet
Gum Road
14 pages
Understanding Stable Diffusion
No ratings yet
Understanding Stable Diffusion
13 pages
How Build A RAG Agent With LlamaIndex
No ratings yet
How Build A RAG Agent With LlamaIndex
4 pages
100+ Stable Diffusion Styles & Mediums
100% (2)
100+ Stable Diffusion Styles & Mediums
63 pages
2025 04 25 AI Updates
100% (1)
2025 04 25 AI Updates
24 pages
The Ultimate AI Mastery Guide - Bazillions
100% (1)
The Ultimate AI Mastery Guide - Bazillions
113 pages
Craft Fictional UX Design Briefs With ChatGPT
100% (1)
Craft Fictional UX Design Briefs With ChatGPT
34 pages
Mastering Pompts
No ratings yet
Mastering Pompts
1 page
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
Applications of AI
No ratings yet
Applications of AI
15 pages
AI Art for Creatives & Professionals
No ratings yet
AI Art for Creatives & Professionals
1 page
100 Practical Applications and Use Cases of Generative AI in Media EN
100% (1)
100 Practical Applications and Use Cases of Generative AI in Media EN
117 pages
AIM307 - Retrieval Augmented Generation With Amazon Bedrock
No ratings yet
AIM307 - Retrieval Augmented Generation With Amazon Bedrock
15 pages
AI Generate
No ratings yet
AI Generate
8 pages
CGPT For DS
100% (1)
CGPT For DS
24 pages
Build Your Own LLM Agents
No ratings yet
Build Your Own LLM Agents
33 pages
8 AI Skills To Succeed in 2025 - Complete Upskilling Guide
No ratings yet
8 AI Skills To Succeed in 2025 - Complete Upskilling Guide
10 pages
The 2023 Midjourney Prompts Collection Abstract Background by Ciguleva
No ratings yet
The 2023 Midjourney Prompts Collection Abstract Background by Ciguleva
10 pages
ChatGPT Masterclass
No ratings yet
ChatGPT Masterclass
21 pages
Generative AI: Creativity Unleashed
No ratings yet
Generative AI: Creativity Unleashed
9 pages
Stable Diffusion Prompt Book From OpenArt 11-13 PDF
100% (5)
Stable Diffusion Prompt Book From OpenArt 11-13 PDF
105 pages
Promtography Guide-AI
100% (1)
Promtography Guide-AI
5 pages
Co-Creating Visual Stories With Generative AI
No ratings yet
Co-Creating Visual Stories With Generative AI
29 pages
Stable Diffusion 2.1 Prompt Book by USP - Ai
No ratings yet
Stable Diffusion 2.1 Prompt Book by USP - Ai
19 pages
Project Development Methodology With Gemini
No ratings yet
Project Development Methodology With Gemini
186 pages
Curriculum GenAI Pinnacle Program
No ratings yet
Curriculum GenAI Pinnacle Program
54 pages
Create Arts With AI v2
100% (3)
Create Arts With AI v2
52 pages
Generative AI for Creatives & Marketers
100% (3)
Generative AI for Creatives & Marketers
26 pages
Training Manual For Prompt Engineering For Trainee's
100% (1)
Training Manual For Prompt Engineering For Trainee's
22 pages
What Is Generative AI
No ratings yet
What Is Generative AI
29 pages
Prompt Engineering With DeepSeek Master AI For 10x Productivity and Effortless Income (Yash Jain) (Z-Library)
100% (2)
Prompt Engineering With DeepSeek Master AI For 10x Productivity and Effortless Income (Yash Jain) (Z-Library)
74 pages
Part 15 MD
No ratings yet
Part 15 MD
36 pages
Introduction To Deep Learning 17th January 2025
No ratings yet
Introduction To Deep Learning 17th January 2025
60 pages
Causal AI Final
No ratings yet
Causal AI Final
71 pages
Generated Qa Pairs
No ratings yet
Generated Qa Pairs
296 pages
Datasetfordigitalelectronics
No ratings yet
Datasetfordigitalelectronics
8 pages
SCALE-Sim Tutorial ASPLOS2021 2 Overview
No ratings yet
SCALE-Sim Tutorial ASPLOS2021 2 Overview
35 pages
Graph Learning Optimization Methods
No ratings yet
Graph Learning Optimization Methods
79 pages
Guide To Effective ChatGPT Prompting
No ratings yet
Guide To Effective ChatGPT Prompting
42 pages
Insertion Sort Algorithm
100% (1)
Insertion Sort Algorithm
14 pages
Annex A Barangay Profile DCF No. 1
77% (13)
Annex A Barangay Profile DCF No. 1
5 pages
Usability Study of Mobile Apps
No ratings yet
Usability Study of Mobile Apps
15 pages
Redhat Actualtests Ex200 Exam Question 2020-Dec-23 by Peter 45q Vce
No ratings yet
Redhat Actualtests Ex200 Exam Question 2020-Dec-23 by Peter 45q Vce
15 pages
Leap Year and Logical Operators Guide
No ratings yet
Leap Year and Logical Operators Guide
49 pages
Bugreport Fire - Id UP1A.231005.007 2024 06 17 01 56 50 Dumpstate - Log 23293
No ratings yet
Bugreport Fire - Id UP1A.231005.007 2024 06 17 01 56 50 Dumpstate - Log 23293
31 pages
Numerical Differentiation
No ratings yet
Numerical Differentiation
12 pages
Full Stack Dev
No ratings yet
Full Stack Dev
3 pages
Salpido Lite 1 Bluettooth Speaker Rm20 / Oem Wireless Mouse Rm10
No ratings yet
Salpido Lite 1 Bluettooth Speaker Rm20 / Oem Wireless Mouse Rm10
2 pages
Embedded Systems and Real-Time Systems
100% (1)
Embedded Systems and Real-Time Systems
9 pages
Adaptive Health Management Information Systems: Concepts, Cases, & Practical Applications: Concepts, Cases, & Practical Applications
100% (26)
Adaptive Health Management Information Systems: Concepts, Cases, & Practical Applications: Concepts, Cases, & Practical Applications
23 pages
5 Elsevier
No ratings yet
5 Elsevier
25 pages
Azure DevOps
100% (1)
Azure DevOps
2 pages
Username Password Peserta
No ratings yet
Username Password Peserta
4 pages
Satya Prakash 7
No ratings yet
Satya Prakash 7
1 page
UNIT 2 Notes CSS
No ratings yet
UNIT 2 Notes CSS
12 pages
Standard PC Hardware
100% (1)
Standard PC Hardware
49 pages
Private Files MODULE-4-CHS11M4real-1 PDF
No ratings yet
Private Files MODULE-4-CHS11M4real-1 PDF
19 pages
AceFEM - Manual 2023
No ratings yet
AceFEM - Manual 2023
622 pages
Gahana Nepal (7th Sem Project Report)
No ratings yet
Gahana Nepal (7th Sem Project Report)
45 pages
Case+study-+IIITB+ +upGrad+Template
No ratings yet
Case+study-+IIITB+ +upGrad+Template
19 pages
Three Kings Day News Letters
No ratings yet
Three Kings Day News Letters
22 pages
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
No ratings yet
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
4 pages
LLMNR Attack
No ratings yet
LLMNR Attack
4 pages
NBIMS-US V3 3 Terms and Definitions
100% (1)
NBIMS-US V3 3 Terms and Definitions
38 pages
Alfa Account Server V2.2.36
No ratings yet
Alfa Account Server V2.2.36
26 pages
RITSWIFT Interface User Guide
No ratings yet
RITSWIFT Interface User Guide
144 pages
BMW DISplus System Overview
No ratings yet
BMW DISplus System Overview
33 pages
Sony MBX 268
No ratings yet
Sony MBX 268
43 pages

Stable Diffusion

Uploaded by

Stable Diffusion

Uploaded by

Stable

DALL·E 2, which works in a similar way.

Similar techniques could be used to

“Batman eating pizza

1. Method of learning to generate new stuff given many

“cool professor person”

3. Way to compress images

…since when you’re generating something new, you need

What is Inductive Bias?

3. Way to compress Autoencode

4. Way to add in good inductive U-net + ‘attention’

• There is a backward diffusion process that reverse the

𝑑𝒙 = 𝑓 𝑥, 𝑡 − 𝑔 𝑡 2∇𝑥 log 𝑝(𝒙, 𝑡) 𝑑𝑡

• Parametrize 𝑓(𝑡) by MLP / linear of Fourier

• Represent words in a vector

• Word vectors should depend on

• Transformers let each word “absorb” Transformer

• Predicting missing / next words

MLM — Sentence-Transformers documentation (sbert.net)

• Minimize other pairs

• Word vector can be randomly initialized and learned online.

• Representing other conditional signals

• Object categories (e.g. Shark, Trout, etc.):

• Use Attention to retrieve useful info from a batch of

𝑎 over the keys 𝟑 0.1

• Linear projection of “word vector”

• “a ete signe” -> “was

• Learn nonlinear transform.

“ A ballerina her cat

ResBlock (Conv layer) SpatialTransformer

• Train a VAE model to compress images into

• 𝑥 is (3, 512, 512) image tensor

• Scraped from web, filtered by CLIP.

You might also like