0% found this document useful (0 votes)

12 views27 pages

AML Course

Uploaded by

Aryansh Raj Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views27 pages

AML Course

Uploaded by

Aryansh Raj Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Diffusion model based image generation and

model maintenance

Aryansh Saxena, Bijayan Ray, Krishanu Bandyopadhyay

May 8, 2025
Contents

Dataset preparation

Diffusion model description (simple unet)

Diffusion model description

Configurations

Model training

Results
Dataset preparation I

▶ A custom dataset class named FlowerDataset is defined,

inheriting from torch.utils.data.Dataset.
▶ The constructor takes the directory path of the images and an
optional transform.
▶ It lists all files in the given directory that have a .jpg
extension.
▶ Class names are extracted from each file name by splitting at
the underscore (assuming file names are formatted as
class index.jpg).
▶ A sorted list of unique class names is created, and a mapping
from class names to numeric indices is generated.
▶ The len method returns the number of image files in the
dataset.
Dataset preparation II

▶ The getitem method opens an image file, converts it to

RGB format, applies the transform (if provided), extracts the
class label from the file name, maps it to its corresponding
index, and returns the image and its label.
▶ An instance of the dataset is created using the image
directory /kaggle/input/flower-image-dataset/flowers
and a specified transform.
▶ A DataLoader is created with the dataset, a specified batch
size, shuffling enabled, and 2 worker processes for data
loading.
Diffusion model description (simple unet) I
▶ The model is a U-Net-style architecture with both time and
text conditioning, suitable for denoising in diffusion models.
The input is a noisy image x ∈ RB×C ×H×W , a timestep
t ∈ NB , and an optional text embedding etext ∈ RB×dtext .
▶ Time embedding block:
▶ A learnable positional embedding layer maps timestep t to an
embedding vector:

Emb(t) ∈ RB×dtime

where Emb : N → Rdtime is a learned embedding function.

▶ The time embedding is passed through a multilayer perceptron
(MLP) consisting of:
▶ A linear layer: W1 ∈ R4dtime ×dtime
▶ SiLU activation: SiLU(x) = x · σ(x)
▶ Another linear layer: W2 ∈ Rdtime ×4dtime
▶ Output:

et = W2 · SiLU(W1 · Emb(t)) ∈ RB×dtime

Diffusion model description (simple unet) II
▶ Text conditioning:
▶ If available, text embedding etext ∈ RB×dtext is linearly
projected:
′
etext = Wtext · etext ∈ RB×dtime
▶ Combined with the time embedding as:
′
econd = et + etext

▶ Residual block structure:

▶ Input: x ∈ RB×Cin ×H×W and econd ∈ RB×dtime
▶ Time embedding projection:

eproj = Wt · econd ∈ RB×Cout

▶ Two convolutional layers with GroupNorm and SiLU:
▶ First layer: GroupNorm → SiLU → Conv2d(Cin → Cout )
▶ Second layer: GroupNorm → SiLU → Conv2d(Cout → Cout )
Diffusion model description (simple unet) III

▶ After the second convolution, add time embedding:

h = h + eproj [:, :, None, None]

▶ Residual connection:
▶ If Cin ̸= Cout : apply 1 × 1 convolution.
▶ Otherwise, use identity.
▶ Final output:
y = h + ResConv(x)
▶ Encoder pathway (downsampling):
▶ Input x passes through the first residual block:

d1 = ResBlock1 (x, econd )

▶ Max pooling by a 2 × 2 kernel:

d1′ = MaxPool(d1 )
Diffusion model description (simple unet) IV
▶ Pass through second residual block:

d2 = ResBlock2 (d1′ , econd )

▶ Another downsampling step:

d2′ = MaxPool(d2 )

▶ Bottleneck:
▶ Apply one residual block at the lowest resolution:

b = ResBlockbot (d2′ , econd )

▶ Decoder pathway (upsampling):

▶ Upsample the bottleneck output:

u1′ = Upsample(b)

▶ Concatenate with d2 along channel dimension:

u1′′ = Concat(u1′ , d2 ) ∈ RB×(Cb +Cd2 )×H×W

Diffusion model description (simple unet) V

▶ Apply residual block:

u1 = ResBlock3 (u1′′ , econd )

▶ Repeat upsampling:

u2′ = Upsample(u1 )

▶ Concatenate with d1 and apply final residual block:

u2 = ResBlock4 (Concat(u2′ , d1 ), econd )

▶ Output head:
▶ GroupNorm → SiLU → Conv2d with kernel size 1 × 1:

x̂ = Conv2d1×1 (SiLU(GroupNorm(u2 ))) ∈ RB×C ×H×W

Diffusion model description I
▶ Sinusoidal Positional Embedding:
▶ The time embedding is generated using sine and cosine
functions.
▶ Frequencies are defined as:
2i
fi = 10000− d

where i is the dimension index, and d is the total

dimensionality.
▶ The time embedding for timestep t is computed as:

emb(t) = sin(t · f1 ), cos(t · f1 ), . . . , sin(t · fd/2 ), cos(t · fd/2 )

▶ This embedding encodes timestep information, enabling the

network to understand temporal dependencies during training.
▶ Conditional Dropout:
▶ Dropout is applied during training to regularize the model.
▶ A mask is created based on the dropout probability p, and
features are zeroed out accordingly.
Diffusion model description II
▶ Dropout is not applied during inference; the model behaves
deterministically during testing.
▶ If the dropout probability p is zero, the input embeddings
remain unchanged.
▶ This technique prevents overfitting by forcing the model to rely
less on specific features during training.
▶ Residual Blocks:
▶ A residual block consists of a series of operations designed to
improve gradient flow.
▶ It contains a group normalization layer followed by a
convolutional layer.
▶ After the convolution, a non-linearity (SiLU) is applied.
▶ A second convolutional layer is applied to refine the features.
▶ The input x is added back to the output of the convolutions
(skip connection):

output = ResidualBlock(x) = Conv(Norm(x)) + x

Diffusion model description III
▶ Skip connections mitigate the vanishing gradient problem,
improving model training.
▶ Cross Attention:
▶ Cross-attention allows the model to condition on text
information while processing image features.
▶ The query Q, key K , and value V are derived from the image
features and text embeddings.
▶ The attention scores are computed by√ taking the dot product
between the query and key, scaled by d, where d is the
dimension of the key:

QK T
attn scores = √
d
▶ Softmax is applied to the attention scores to normalize them.
▶ The attention output is computed as the weighted sum of the
value matrix V :

attn output = Softmax(attn scores)V

Diffusion model description IV
▶ This output is used to refine the image features by
incorporating text-driven information.
▶ Self Attention:
▶ Self-attention allows each pixel in the image to attend to every
other pixel, capturing long-range dependencies.
▶ The query, key, and value matrices are derived from the image
features.
▶ Attention scores are computed in the same manner as
cross-attention:
QK T
attn scores = √
d
▶ Softmax is applied to the scores, and the output is computed
as:
attn output = Softmax(attn scores)V
▶ The output is reshaped back into the original image
dimensions and passed through a projection layer.
▶ The projection layer ensures that the output matches the
expected number of channels.
Diffusion model description V

▶ DownBlock:
▶ The DownBlock is a part of the encoder, which reduces the
spatial dimensions of the image.
▶ It consists of a residual block followed by an optional
cross-attention layer.
▶ After the residual and attention layers, a second residual block
is applied to refine the features.
▶ The final operation in the DownBlock is a convolution with
stride 2, which reduces the image dimensions by half.
▶ The output consists of the processed feature map and the
downsampled version of the feature map.
▶ UpBlock:
▶ The UpBlock is part of the decoder and is responsible for
increasing the spatial resolution of the feature map.
▶ The first operation is a transposed convolution, which
upsamples the feature map.
Diffusion model description VI
▶ The upsampled feature map is concatenated with the
corresponding skip connection from the encoder to retain
fine-grained spatial details.
▶ The concatenated feature map is passed through a residual
block and an optional cross-attention layer.
▶ A second residual block is applied to further refine the features.
▶ The final output of the UpBlock is the refined feature map
after upsampling.
▶ TextGuidedUNet:
▶ The TextGuidedUNet extends the standard UNet by
incorporating text-guided conditioning during image
generation.
▶ The model consists of an encoder-decoder architecture with
additional modules for time embedding and text-based
attention.
▶ In the encoder, two DownBlocks are used to process the image
and progressively downsample it.
Diffusion model description VII
▶ The bottleneck includes residual blocks and self-attention to
capture global dependencies.
▶ The decoder uses two UpBlocks to restore the spatial
resolution of the image.
▶ Skip connections are used to preserve high-resolution details
from the encoder.
▶ The final output is generated through a 1 × 1 convolutional
layer, reducing the feature map to the desired output channels.
▶ Forward Pass:
▶ During the forward pass, the model first computes time
embeddings for the input timestep t using the sinusoidal
positional embedding.
▶ The text embedding is conditioned using the conditional
dropout mechanism, which randomly drops certain features
during training.
▶ The image is processed through the encoder (comprising
DownBlocks) and the bottleneck, where residual and
self-attention layers are applied.
Diffusion model description VIII

▶ The feature map is passed through the decoder (comprising

UpBlocks), where cross-attention and residual processing are
used.
▶ Skip connections from the encoder are concatenated with the
upsampled feature maps during decoding to retain fine-grained
spatial information.
▶ The final output is produced by a 1 × 1 convolution, which
maps the features back to the image space (e.g., RGB
channels).
Configurations I

▶ The computation device is set to use CUDA if available;

otherwise, it falls back to CPU.
▶ The batch size for training is set to 128.
▶ The number of training epochs is 50.
▶ The learning rate is set to 2 × 10−4 .
▶ The input image size is set to 56 pixels.
▶ The number of image channels is 3, corresponding to RGB
images.
▶ The number of diffusion timesteps T is set to 1000.
▶ The dimensionality of the time embedding is 128.
▶ The dimensionality of the text embedding is 512.
▶ The conditional dropout probability is set to 0.1.
Model training I
▶ Initialization:
▶ Initialize empty lists:

loss hist ← [], acc hist ← []

▶ Loop over epochs:

for epoch ∈ {0, 1, . . . , epochs − 1}

▶ Set model to training mode:

model.train()

This enables gradient computation and training-specific layers

like dropout or batch normalization (if present).
▶ Initialize epoch statistics:
▶ Total loss accumulator: total loss ← 0
▶ Total MSE accumulator for pseudo-accuracy: total mse ← 0
Model training II
▶ Iterate over training batches:

for (imgs, labels) ∈ train loader

▶ Transfer input images to device:

imgs ← imgs.to(device)

▶ Compute batch size:

b ← imgs.size(0)

▶ Sample random diffusion timesteps:

t ∼ Uniform{0, T − 1}b

▶ Generate noise tensor:

ϵ ∼ N (0, I ), ϵ ∈ RB×C ×H×W

Model training III
▶ Compute ᾱt from precomputed αcumprod :

ᾱt = extract(αcumprod , t, imgs.shape)

▶ Corrupt input image using forward diffusion process:

√ √
xt = ᾱt · x + 1 − ᾱt · ϵ

▶ Get text embeddings for labels:

etext = label embs[labels]

▶ Predict noise:
ϵ̂ = model(xt , t, etext )
▶ Compute loss:
B
1 X
LMSE = ∥ϵ̂i − ϵi ∥2
BCHW
i=1

where B is batch size, C is channels, H is height and W is

width.
Model training IV

▶ Backpropagation and optimization:

▶ Zero gradients: opt.zero grad()
▶ Compute gradients: LMSE .backward()
▶ Update weights: opt.step()
▶ Accumulate total loss:

total loss += LMSE · b

▶ Estimate denoised image x0 using reparameterization:

√
xt − 1 − ᾱt · ϵ̂
x̂0 = √
ᾱt
▶ Compute pseudo-accuracy =1 - MSE where MSE is:

B
1 X (i)
MSE = ∥x̂0 − x (i) ∥2
BCHW
i=1
Model training V
▶ Accumulate MSE:

total mse += MSE · b

▶ End of epoch:
▶ Normalize accumulated loss and accuracy:

total loss total mse

epoch loss = , epoch acc = 1 −
N N
where N = len(train ds)
▶ Append to history:

loss hist.append(epoch loss), acc hist.append(epoch acc)

▶ Log accuracy to file:

Append “Epoch#i : Accuracy ≈ x” to accuracy log.txt

Results I
Some images generated from the diffusion model trained on flower
dataset:
Results II
Results III

Some images generated from the diffusion model trained on

MNIST dataset:
Results IV

Diffusion Models for Image Segmentation
No ratings yet
Diffusion Models for Image Segmentation
13 pages
Understanding Stable Diffusion
100% (1)
Understanding Stable Diffusion
66 pages
Diffusion
100% (6)
Diffusion
62 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Faster Diffusion - Rethinking The Role of UNet Encoder in Diffusion Models
No ratings yet
Faster Diffusion - Rethinking The Role of UNet Encoder in Diffusion Models
22 pages
Diffusion Models
No ratings yet
Diffusion Models
27 pages
Week 4 - Diffusion Models
No ratings yet
Week 4 - Diffusion Models
35 pages
Lec16 DiffusionModels
No ratings yet
Lec16 DiffusionModels
57 pages
Red Neuronal
No ratings yet
Red Neuronal
5 pages
Stable Diffusion
No ratings yet
Stable Diffusion
19 pages
Img Proc
No ratings yet
Img Proc
2 pages
Stochastic Gradient Descent Guide
No ratings yet
Stochastic Gradient Descent Guide
61 pages
Lecture 24 26
No ratings yet
Lecture 24 26
123 pages
Set A
No ratings yet
Set A
20 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Diffusion-StableDiffusion
No ratings yet
Diffusion-StableDiffusion
27 pages
Diffusion Model For Generating Bulbasaurs
No ratings yet
Diffusion Model For Generating Bulbasaurs
25 pages
Diffusion Models in Vision: A Survey: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah
No ratings yet
Diffusion Models in Vision: A Survey: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah
25 pages
Adding Conditional Control To Text-to-Image Diffusion Models
No ratings yet
Adding Conditional Control To Text-to-Image Diffusion Models
33 pages
Module 5
No ratings yet
Module 5
23 pages
Experiment Number: 2: A) Image Generation Using Generative Adversarial Network (GAN)
No ratings yet
Experiment Number: 2: A) Image Generation Using Generative Adversarial Network (GAN)
10 pages
Slides 1
No ratings yet
Slides 1
50 pages
Module 5
No ratings yet
Module 5
23 pages
Mohamed Nassar Resume
No ratings yet
Mohamed Nassar Resume
6 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
Faster R-CNN
No ratings yet
Faster R-CNN
20 pages
Stable Diffusion
No ratings yet
Stable Diffusion
58 pages
StableDiffusion Presentation
No ratings yet
StableDiffusion Presentation
27 pages
Deep Learning: Unsupervised Methods
No ratings yet
Deep Learning: Unsupervised Methods
60 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
TensorFlow Crash Course: Linear Regression & Neural Networks
No ratings yet
TensorFlow Crash Course: Linear Regression & Neural Networks
63 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
DMD Lowres
No ratings yet
DMD Lowres
22 pages
GPU-Optimized On-Device Diffusion Models
No ratings yet
GPU-Optimized On-Device Diffusion Models
5 pages
Lecture25 TransferLearningOverviewPart1
No ratings yet
Lecture25 TransferLearningOverviewPart1
54 pages
Diffusion
No ratings yet
Diffusion
55 pages
CNNs & Computer Vision with PyTorch
No ratings yet
CNNs & Computer Vision with PyTorch
29 pages
06 RNN Et Séquences Temporelles v2.02
No ratings yet
06 RNN Et Séquences Temporelles v2.02
37 pages
Diffusion Models in Vision Survey
No ratings yet
Diffusion Models in Vision Survey
20 pages
Plant Disease Identification
No ratings yet
Plant Disease Identification
17 pages
Deep Learning - Image Synthesis
No ratings yet
Deep Learning - Image Synthesis
36 pages
Dis10 Sol
No ratings yet
Dis10 Sol
11 pages
Unet Stable Diffusion 10slides
No ratings yet
Unet Stable Diffusion 10slides
10 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Conference Paper
No ratings yet
Conference Paper
3 pages
PyTorch Stable Diffusion Guide
No ratings yet
PyTorch Stable Diffusion Guide
29 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Create Simple Deep Learning Neural Network For Classification
No ratings yet
Create Simple Deep Learning Neural Network For Classification
11 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Deep LearningINAF With MATLAB
No ratings yet
Deep LearningINAF With MATLAB
80 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Week 11 - Convolutional
No ratings yet
Week 11 - Convolutional
78 pages
14 Diffusion Model
No ratings yet
14 Diffusion Model
73 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Beautiful Dawn Research Style
No ratings yet
Beautiful Dawn Research Style
2 pages
Economy Assignment 4 Questions
No ratings yet
Economy Assignment 4 Questions
7 pages
Cousin Moon Expanded Edition
No ratings yet
Cousin Moon Expanded Edition
2 pages
Applied ML Page 1
No ratings yet
Applied ML Page 1
1 page
Premchand 30min Speech
No ratings yet
Premchand 30min Speech
4 pages
Gauranga Kantar Doc Q2 (B)
No ratings yet
Gauranga Kantar Doc Q2 (B)
1 page
Assignment3 Report MDS202312
No ratings yet
Assignment3 Report MDS202312
2 pages
AML Assignment Explanation
No ratings yet
AML Assignment Explanation
1 page
AML2024 Assignment 2
No ratings yet
AML2024 Assignment 2
1 page
127+ Data Science Projects With Python Code.
No ratings yet
127+ Data Science Projects With Python Code.
9 pages
Aspiring IT Professional Resume
No ratings yet
Aspiring IT Professional Resume
3 pages
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
No ratings yet
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
3 pages
PLAN - CYBER SECURITY v2
No ratings yet
PLAN - CYBER SECURITY v2
3 pages
Global Supply Chain Management Simulation
No ratings yet
Global Supply Chain Management Simulation
9 pages
Iphone Components Detailed
No ratings yet
Iphone Components Detailed
3 pages
E-Farming: Agri E-Commerce Platform
50% (2)
E-Farming: Agri E-Commerce Platform
15 pages
Fast Sine
No ratings yet
Fast Sine
9 pages
Open Packet Tracer Scenario: Telnet-CDP Lab-1 Scenario - Pkt. Before Proceeding, Save As Telnet-CDP Lab-1 Scenario-Working - PKT
No ratings yet
Open Packet Tracer Scenario: Telnet-CDP Lab-1 Scenario - Pkt. Before Proceeding, Save As Telnet-CDP Lab-1 Scenario-Working - PKT
3 pages
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
No ratings yet
E Tensible Arkup Anguage Unit-3: Basic XML DTD XML Schema Dom Vs Sax Presenting XML
39 pages
IKMC-2015 Grade5&6
50% (2)
IKMC-2015 Grade5&6
7 pages
Module Name:: X-Mart Information
No ratings yet
Module Name:: X-Mart Information
12 pages
Mayuri
No ratings yet
Mayuri
71 pages
Lesson 10
No ratings yet
Lesson 10
7 pages
Attachment - Report Justus
No ratings yet
Attachment - Report Justus
23 pages
IL230x-B110 Fieldbus Box Modules For EtherCAT
No ratings yet
IL230x-B110 Fieldbus Box Modules For EtherCAT
2 pages
2 - PPT Planning Customer Analytics Initiative
100% (1)
2 - PPT Planning Customer Analytics Initiative
21 pages
GSM Gate Opener GSM Remote Switch
No ratings yet
GSM Gate Opener GSM Remote Switch
13 pages
Top-Down Programming Explained
No ratings yet
Top-Down Programming Explained
2 pages
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
No ratings yet
Series PM135 Powermeters PM135P/PM135E/PM135EH: Modbus Communications Protocol
77 pages
Syllabus Cmpersd Fall2023
No ratings yet
Syllabus Cmpersd Fall2023
6 pages
Automatic Payment Program: Configuration Document
No ratings yet
Automatic Payment Program: Configuration Document
56 pages
Onkyo TX NR 616 Service Manual PDF
No ratings yet
Onkyo TX NR 616 Service Manual PDF
138 pages
Significant Figure Rules: Determining Number of Significant Figures (Sig Figs)
No ratings yet
Significant Figure Rules: Determining Number of Significant Figures (Sig Figs)
4 pages
DW3000 User Manual
No ratings yet
DW3000 User Manual
255 pages
Qlik Connector For SAP
No ratings yet
Qlik Connector For SAP
90 pages
Hacienda Bay - NOC
No ratings yet
Hacienda Bay - NOC
1 page
Motorola GP328 Program Data
No ratings yet
Motorola GP328 Program Data
3 pages
Rulesets Connectors
No ratings yet
Rulesets Connectors
15 pages
Genius: High Accuracy Three Phase Smart Meter
No ratings yet
Genius: High Accuracy Three Phase Smart Meter
2 pages

AML Course

Uploaded by

AML Course

Uploaded by

Diffusion model based image generation and

Aryansh Saxena, Bijayan Ray, Krishanu Bandyopadhyay

Diffusion model description (simple unet)

Diffusion model description

▶ A custom dataset class named FlowerDataset is defined,

▶ The getitem method opens an image file, converts it to

where Emb : N → Rdtime is a learned embedding function.

et = W2 · SiLU(W1 · Emb(t)) ∈ RB×dtime

▶ Residual block structure:

eproj = Wt · econd ∈ RB×Cout

▶ After the second convolution, add time embedding:

h = h + eproj [:, :, None, None]

d1 = ResBlock1 (x, econd )

▶ Max pooling by a 2 × 2 kernel:

d2 = ResBlock2 (d1′ , econd )

▶ Another downsampling step:

b = ResBlockbot (d2′ , econd )

▶ Decoder pathway (upsampling):

▶ Concatenate with d2 along channel dimension:

u1′′ = Concat(u1′ , d2 ) ∈ RB×(Cb +Cd2 )×H×W

▶ Apply residual block:

u1 = ResBlock3 (u1′′ , econd )

▶ Concatenate with d1 and apply final residual block:

u2 = ResBlock4 (Concat(u2′ , d1 ), econd )

x̂ = Conv2d1×1 (SiLU(GroupNorm(u2 ))) ∈ RB×C ×H×W

where i is the dimension index, and d is the total

▶ This embedding encodes timestep information, enabling the

output = ResidualBlock(x) = Conv(Norm(x)) + x

attn output = Softmax(attn scores)V

▶ The feature map is passed through the decoder (comprising

▶ The computation device is set to use CUDA if available;

loss hist ← [], acc hist ← []

▶ Loop over epochs:

for epoch ∈ {0, 1, . . . , epochs − 1}

▶ Set model to training mode:

This enables gradient computation and training-specific layers

for (imgs, labels) ∈ train loader

▶ Transfer input images to device:

▶ Compute batch size:

▶ Sample random diffusion timesteps:

▶ Generate noise tensor:

ϵ ∼ N (0, I ), ϵ ∈ RB×C ×H×W

ᾱt = extract(αcumprod , t, imgs.shape)

▶ Corrupt input image using forward diffusion process:

▶ Get text embeddings for labels:

etext = label embs[labels]

where B is batch size, C is channels, H is height and W is

▶ Backpropagation and optimization:

total loss += LMSE · b

▶ Estimate denoised image x0 using reparameterization:

total mse += MSE · b

total loss total mse

loss hist.append(epoch loss), acc hist.append(epoch acc)

▶ Log accuracy to file:

Append “Epoch#i : Accuracy ≈ x” to accuracy log.txt

Some images generated from the diffusion model trained on

You might also like