Stable Diffusion For Image Generation

Uploaded by

yaseenalsalami999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views23 pages

Stable Diffusion For Image Generation

Uploaded by

yaseenalsalami999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Stable Diffusion

For
Image Generation
Prepared by
Emad Abd Al Fatah 2K21/CO/165
Raad Ghazi 2K21/CO/360
Yaseen Mohammed 2K21/CO/538

Course coordinator: KHUSHBU GUPTA

Introduction
In this presentation, we are going to explore the innovative approach of stable
diffusion for image generation, a technique that combines text and image
generation using deep learning methods
The math that we are going to use for the development of the diffusion model is
based on the mathematical principles outlined in the Diffusion Probabilistic
Models (DDPM) paper .

We will also explore the Forward and Reverse Processes in Diffusion Models
The classifier free guidance ,U-Net Architecture, CLIP Integration, Variational
Autoencoder (VAE) and the image generation process will be also introduced in
this presentation.
Latent Diffusion Models
A latent diffusion model (LDM) is a type of deep learning model that can create
and alter high-resolution images. It is nothing but a generative model. LDM is a
probabilistic model that begins with random noise and gradually transforms it
into realistic images. LDM has several advantages, including:
Flexible conditioning
Text, images, segmentation maps, and other inputs can be encoded into
the latent space.
Detail preservation
The encoder-decoder structure allows for manipulating images while still
preserving intricate details from the original inputs.
They are probabilistic models that can generate high-quality images by
starting with random noise and gradually transforming them into realistic
images through a diffusion process.
The key innovation of latent diffusion models is that they apply this diffusion
process not to the raw pixel values of an image but instead to an encoded
latent representation of the image.
What is a generative model
Generative modeling is the use of artificial intelligence (AI), statistics and probability in
applications to produce a representation or abstraction of observed phenomena or
target variables that can be calculated from observations.

Generative models are pivotal in learning the probability distributions of data, enabling
us to sample from these distributions to generate new images

SO how does it work ????

Generative models are generally run on neural networks. To create a
generative model, a large dataset is typically required. The model is
trained by feeding it various examples from the data set and adjusting
its parameters to better match the distribution of the data.
Once the model is trained, it can be used to generate new data by
sampling from the learned distribution. The generated data can be
similar to the original data set, but with some variations or noise.
For Example:
a data set containing images of cats could be used to build a model that can generate
a new image of a cat that has never existed but still looks almost realistic. This is
possible because the model has learned the general rules that govern the
appearance of a cat.
Joint Distribution and Conditional Probability in
the generative model
a generative model is a class of algorithms that use joint
probability to predict the distribution of individual classes in a
dataset. The joint probability distribution, or P(y, x), models the
distribution of each label, and is equivalent to a model of the
distribution of label values and the distribution of observations
given a label.
Joint probability distribution encodes the marginal distributions, which are the distributions
of each individual random variable. the conditional probability distributions, which describe
how the outputs of one random variable are distributed when given information on the
outputs of the other random variable.
generative models learn a probability distribution of the data and we can
sample from the data to generate new images. If we have many
distributions and we need to sample from a combination of both we will
need a joint distribution and we can evaluate the probabilities using
conditional probability and by marginalizing a variable . Then we learn the
parameters of this distribution and sample from it to generate new data.
Forward and Reverse Processes
The forward and reverse diffusion
processes are the core components
of diffusion models, which define
how data is transformed into noise
and then how the noise is
transformed back into data. The
forward diffusion process gradually
turns an image into noise, while the
reverse diffusion process turns that
noise back into the image.

In the forward diffusion process, Gaussian noise is added step by step to the original data
until it becomes completely random at the final step. This creates a Markov chain starting
from the original data and ending with a fully noised sample. In contrast, the reverse diffusion
process trains the model to progressively remove noise from fully noised data, learning to
reconstruct the original information. These trained models can then generate new data by
inputting random noise and applying the learned denoising process to produce outputs
resembling the original data distribution.
Training and Loss Functions
So in diffusion model process in simple ,We take the input image 𝑥0x0 and gradually add
Gaussian noise to it through a series of 𝑇 steps. We will call this the forward process.

Afterward, a neural network is trained to recover the original data by reversing the
noising process. By being able to model the reverse process, we can generate new data.
This is the so-called reverse diffusion process or, in general, the sampling process of a
generative model.
In diffusion models, after completing both the forward and reverse processes, we obtain
the Evidence Lower Bound (ELBO) and define the loss function
The ELBO is derived from the probability distribution of the original data and the noised
data after the reverse process. It represents a lower bound on the log-likelihood of the
data given the model parameters. The ELBO is calculated using probabilistic inference
techniques such as variational inference. therefore if we maximize the lower bound we
will also maximize the likelihood.

The loss function in diffusion models quantifies the discrepancy between the original
data and the reconstructed data after the reverse process. It typically includes terms to
penalize differences in pixel values and encourage the model to learn accurate denoising
and reconstruction processes. The loss function is optimized during training using
techniques like stochastic gradient descent (SGD) to minimize the reconstruction error
and improve model performance.
Classifier-Free Guidance
Classifier-free guidance refers to a technique where the model generates data without
relying on explicit classifiers but rather uses contextual cues or prompts to guide the
generation process. ( relies on prompt or context signal ).

This approach helps the model generate more diverse

and contextually relevant outputs by leveraging the
information provided in the prompts

therefore We will Utilize classifier-free guidance by

incorporating prompts or contextual signals to
influence the generation process, enhancing the
quality and relevance of generated outputs.
U-Net
Architecture
The U-Net architecture is a
popular convolutional neural
network (CNN) architecture
commonly used in image
processing tasks such as image
generation and segmentation. It's
particularly well-known for its
effectiveness in biomedical image
segmentation, but it's also widely
applicable in various other image-
related tasks.
U-Net components
Encoder: Reduces image size, increases
feature channels for feature extraction.
Decoder: Upsamples feature maps,
reconstructs spatial information.
Skip Connections: Connects encoder and
decoder, preserves spatial information for
precise segmentation.
Prompts: Instructions for image
generation, specifying desired features.
Noise Levels: Control random variation,
enhance diversity and realism.
Outputs: Synthetic images or
segmentation maps, depending on the
task.
Image Generation Process Steps

Initialize Data Add Noise Forward Diffusion Process

Reverse Diffusion Process Adjust Parameters Evaluate Generated Data

Image Generation Process Steps
Initialize Data: Start with a dataset containing the original clean data you want to
generate variations of.
Add Noise: Introduce controlled noise to the clean data. This noise can be in various
forms, such as Gaussian noise or structured noise patterns. The level of noise added can
be adjusted based on the desired diversity in the generated data.
Forward Diffusion Process: In the forward diffusion process, iteratively degrade the
quality of the data by diffusing it through multiple steps. This process involves gradually
increasing the level of noise added to the data, simulating a diffusion or degradation
process over time.
Reverse Diffusion Process: After reaching a certain level of degradation or diffusion,
perform a reverse diffusion process. This involves iteratively removing the noise added
during the forward diffusion process, gradually restoring the data back to its original or a
desired state.
Adjust Parameters: Throughout the diffusion and reverse diffusion processes, you can
adjust parameters such as the diffusion step size, the intensity of added noise, and the
number of diffusion steps to control the characteristics of the generated data.
Evaluate Generated Data: Finally, evaluate the generated data to ensure that it meets
the desired criteria in terms of diversity, realism, and fidelity to the original clean data.
This evaluation can involve qualitative assessment by visual inspection as well as
quantitative measures to compare the generated data with the original dataset.
How to condition the reverse process?
The latest and most successful approach is called classifier-free guidance, in which,
instead of training two networks, one conditional network and an unconditional network,
we train a single network and during training, with some probability, we set the
conditional signal to zero, this way the network becomes a mix of conditioned and
unconditioned network, and we can take the conditioned and unconditioned output and
combine them with a weight that indicates how much we want the network to pay
attention to the conditioning signal.

Classifier Free Guidance (Combine output)

Integration of CLIP for Image
Generation
CLIP (Contrastive Language for Image
Pretraining) integration enhances our
model's capabilities. By leveraging
prompts and images during training and
generation, we achieve a deeper
understanding of contextual cues for
image creation.
Variational Autoencoder (VAE)
Enhancing Semantic Relationships with VAE:
Variational Autoencoder (VAE) improves semantic relationships compared to
traditional autoencoders.
Stable Diffusion is a latent diffusion model, in which
we don’t learn the distribution p(x) of our data set of
images, but rather, the distribution of a latent
representation of our data by using a Variational
Autoencoder.
This allows us to reduce the computation we need to
perform the steps needed to generate a sample,
because each data will not be represented by a
512x512 image, but its latent representation, which is
64x64.
Variational Autoencoder (VAE)
Image to Image Generation
Diffusion models excel in diverse image generation tasks,
transitioning seamlessly from text-to-image to image-to-image
architectures. Their versatility lies in their ability to capture intricate
spatial relationships and generate high-fidelity outputs across
multiple domains.
Inpainting for Image Completion
Inpainting, a key aspect of our model, focuses on completing missing parts in
images. By replacing selected parts based on prompts and original data, we
achieve comprehensive and visually appealing image completion.
Implementation and Code
Our implementation leverages PyTorch and CLIP. Key components
include decoder, encoder, diffusion, attention model, and CLIP model.
The pipeline: dictates prompt scale importance, with additional
features like DDPM scheduler and weights loader enhancing
performance.
Attention Model File: This file implements the attention mechanism, specifically utilizing
softmax for weighting. This mechanism assigns importance to different parts of the input, aiding
in capturing dependencies and relationships crucial for the model's performance.
CLIP Model File: Within this file lies the implementation of the Contrastive Language-
Image Pre-training (CLIP) model. CLIP is a neural network model that learns to
associate images and text by maximizing agreement between image and text
representations across a large dataset. It encapsulates a powerful fusion of vision and
language understanding, enabling the model to understand and generate images
based on textual prompts.
Demo Visualization File:
This file showcases image generation from text prompts, utilizing encoder,
decoder, diffusion, attention, and CLIP models.
A dog wearing glasses a black camera
OUTPUT

A cat stretching on the floor A boy playing football

Thankyou

Lec16 DiffusionModels
No ratings yet
Lec16 DiffusionModels
57 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
SDXL Report
No ratings yet
SDXL Report
21 pages
Ai Image Phase 3
No ratings yet
Ai Image Phase 3
5 pages
Beginner's Guide: Using Multiple Loras
No ratings yet
Beginner's Guide: Using Multiple Loras
9 pages
Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79
No ratings yet
Modern Neural Network Technologies Text-to-Image: Scientific Visualization, 2023, Volume 15, Number 2, Pages 66 - 79
13 pages
Image To Image Translation Using Generative Adversarial Network
No ratings yet
Image To Image Translation Using Generative Adversarial Network
5 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
How Does Stable Diffusion Work
No ratings yet
How Does Stable Diffusion Work
79 pages
Tutorial Multidiffusion Upscaler How To + Workflow V 1.2
100% (1)
Tutorial Multidiffusion Upscaler How To + Workflow V 1.2
14 pages
A Beginner's Guide To AI (ANIME) Art
No ratings yet
A Beginner's Guide To AI (ANIME) Art
31 pages
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
No ratings yet
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
26 pages
AI Generate
No ratings yet
AI Generate
8 pages
FLUX Workflow Guide for AI Artists
100% (1)
FLUX Workflow Guide for AI Artists
15 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
NVIDIA RTX and NPU AI Positioning
No ratings yet
NVIDIA RTX and NPU AI Positioning
18 pages
Fin Aaaaaaa Al
No ratings yet
Fin Aaaaaaa Al
138 pages
Sparks of Artificial General Intelligence: Early Experiments With GPT-4
No ratings yet
Sparks of Artificial General Intelligence: Early Experiments With GPT-4
155 pages
Jailbreaking Text-To-Image Models With LLM-Based A
No ratings yet
Jailbreaking Text-To-Image Models With LLM-Based A
18 pages
Role of AI in Product Marketing 1747155240
No ratings yet
Role of AI in Product Marketing 1747155240
19 pages
Lecture 06 - Diffusion Models & Negative Prompting
0% (1)
Lecture 06 - Diffusion Models & Negative Prompting
22 pages
SDXL Diffusion Model Training - Style & Objects
No ratings yet
SDXL Diffusion Model Training - Style & Objects
49 pages
Thesis
No ratings yet
Thesis
87 pages
A Beginner's Guide To AI Workflow Management
No ratings yet
A Beginner's Guide To AI Workflow Management
11 pages
SKGGen AI
No ratings yet
SKGGen AI
27 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Knowledge Engineering Using Large Language Models - TGDK.1.1.3
No ratings yet
Knowledge Engineering Using Large Language Models - TGDK.1.1.3
19 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
State of AI Report - 2024 ONLINE
No ratings yet
State of AI Report - 2024 ONLINE
213 pages
Prompt Diffusion Model Explained
No ratings yet
Prompt Diffusion Model Explained
5 pages
Gen Ai Syllabus
No ratings yet
Gen Ai Syllabus
27 pages
Designing Sociable Robots - Cynthia L Breazeal
100% (1)
Designing Sociable Robots - Cynthia L Breazeal
282 pages
SDG - Stable Diffusion General - Technology
100% (1)
SDG - Stable Diffusion General - Technology
26 pages
Character Ai
No ratings yet
Character Ai
101 pages
AI Tuts - Stable Diffusion Prompt Engineering Guide For Beginners
100% (1)
AI Tuts - Stable Diffusion Prompt Engineering Guide For Beginners
15 pages
StaticSpeed Security Assessment
No ratings yet
StaticSpeed Security Assessment
57 pages
LangGraph Tutorials
100% (2)
LangGraph Tutorials
3 pages
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
No ratings yet
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
161 pages
Generative AI APIs For Practical Applications
No ratings yet
Generative AI APIs For Practical Applications
27 pages
Stablee
No ratings yet
Stablee
16 pages
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
No ratings yet
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
12 pages
OpenAI RAG File Search Guide
No ratings yet
OpenAI RAG File Search Guide
11 pages
Autoregressive Generative Models Guide
No ratings yet
Autoregressive Generative Models Guide
57 pages
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
No ratings yet
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
103 pages
A Taxonomy of Prompt Modifiers For Text-To-Image Generation
No ratings yet
A Taxonomy of Prompt Modifiers For Text-To-Image Generation
15 pages
Gemini 1 Report
No ratings yet
Gemini 1 Report
90 pages
Generative AI Database
No ratings yet
Generative AI Database
14 pages
Ojo Seminar Paper On Gemini
No ratings yet
Ojo Seminar Paper On Gemini
14 pages
Unit 4 Generative AI
No ratings yet
Unit 4 Generative AI
5 pages
Effectively Creating Prompts
No ratings yet
Effectively Creating Prompts
1 page
Understanding Stable Diffusion
No ratings yet
Understanding Stable Diffusion
13 pages
Diffusion
No ratings yet
Diffusion
55 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
Lecture7 8 Diffusion Model 1 78
No ratings yet
Lecture7 8 Diffusion Model 1 78
78 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
No ratings yet
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
18 pages
Diffusion Model
No ratings yet
Diffusion Model
16 pages
Lecture 24 26
No ratings yet
Lecture 24 26
123 pages
Deloitte India ET&P SAP Analyst JD
No ratings yet
Deloitte India ET&P SAP Analyst JD
3 pages
Final Report - Format
No ratings yet
Final Report - Format
6 pages
De Waal 1997 Are We in Anthropodenial
No ratings yet
De Waal 1997 Are We in Anthropodenial
5 pages
Optimism and Future Tenses Guide
No ratings yet
Optimism and Future Tenses Guide
8 pages
Fa4 - Extensive Reading Exciting Places
No ratings yet
Fa4 - Extensive Reading Exciting Places
4 pages
Influence of Hostel Life On Educational Performance of The Hostelized Universities Students
100% (1)
Influence of Hostel Life On Educational Performance of The Hostelized Universities Students
10 pages
Yhills Intern-8
No ratings yet
Yhills Intern-8
26 pages
Information Systems and E-Business Concepts
No ratings yet
Information Systems and E-Business Concepts
12 pages
January Courses With Hyperlinks To Web Site
No ratings yet
January Courses With Hyperlinks To Web Site
8 pages
Language Use and Style As A Depiction of African Literature An Example of Niyi Osundare's The State Visit
No ratings yet
Language Use and Style As A Depiction of African Literature An Example of Niyi Osundare's The State Visit
12 pages
CEED Brochure
No ratings yet
CEED Brochure
21 pages
Babysitting & Teaching Experience
No ratings yet
Babysitting & Teaching Experience
2 pages
Week 11 DLP FBS
No ratings yet
Week 11 DLP FBS
4 pages
The Astronomy Handbook The Ultimate Guide To Observing and Understanding Stars Planets Galaxies and The Universe Govert Schilling Instant Download
No ratings yet
The Astronomy Handbook The Ultimate Guide To Observing and Understanding Stars Planets Galaxies and The Universe Govert Schilling Instant Download
33 pages
Social Sciences: Structural Functionalism
No ratings yet
Social Sciences: Structural Functionalism
42 pages
The Newborn Early Warning (NEW) System: Development of An At-Risk Infant Intervention System
100% (1)
The Newborn Early Warning (NEW) System: Development of An At-Risk Infant Intervention System
5 pages
Understanding Verb Tenses and Aspects
No ratings yet
Understanding Verb Tenses and Aspects
19 pages
Library Management System Project
No ratings yet
Library Management System Project
28 pages
CHED Memorandum Order CMO Guidelines For Student Internship Abroad Program SIAP PDF
100% (2)
CHED Memorandum Order CMO Guidelines For Student Internship Abroad Program SIAP PDF
9 pages
Ajahn-Amaro-Retreat Nov26 2022
No ratings yet
Ajahn-Amaro-Retreat Nov26 2022
2 pages
A Parents Guide To Sensory Processing
No ratings yet
A Parents Guide To Sensory Processing
2 pages
2 Glory
No ratings yet
2 Glory
11 pages
Video Games Final Draft Essay 2
No ratings yet
Video Games Final Draft Essay 2
8 pages
Invasive Species Thesis Statement
100% (3)
Invasive Species Thesis Statement
4 pages
Motivation Master Public Health
No ratings yet
Motivation Master Public Health
1 page
A Beautiful Mind MSE
100% (3)
A Beautiful Mind MSE
10 pages
Introduction To Socio Cultural and Anthropological Concepts
No ratings yet
Introduction To Socio Cultural and Anthropological Concepts
17 pages
Standard KSSR Tahun 1 Bi 2011
100% (1)
Standard KSSR Tahun 1 Bi 2011
27 pages
Emotion, Cognition, and Behavior
No ratings yet
Emotion, Cognition, and Behavior
4 pages
RPT Form 3 2022 SMKPM
No ratings yet
RPT Form 3 2022 SMKPM
25 pages

Stable Diffusion For Image Generation

Uploaded by

Stable Diffusion For Image Generation

Uploaded by

Stable Diffusion

Course coordinator: KHUSHBU GUPTA

SO how does it work ????

This approach helps the model generate more diverse

therefore We will Utilize classifier-free guidance by

Initialize Data Add Noise Forward Diffusion Process

Reverse Diffusion Process Adjust Parameters Evaluate Generated Data

Classifier Free Guidance (Combine output)

A cat stretching on the floor A boy playing football

You might also like