APS360: Applied Fundamentals of Deep Learning
Week 9: Generative Adversarial Networks
Content
● Generative Models
● Generative Adversarial Networks
● PyTorch Implementation
● Problems of Training GANs
● Applications of GANs
● Adversarial Attacks
Generative Models
Generative Model vs. Discriminative Model
Suppose we have two tasks on a dataset of tweets:
1. Identify if a tweet is real or fake
Generative Model vs. Discriminative Model
Suppose we have two tasks on a dataset of tweets:
1. Identify if a tweet is real or fake
○ This task is supervised and requires a discriminative model.
○ The model learns to approximate p(y|x)
Generative Model vs. Discriminative Model
Suppose we have two tasks on a dataset of tweets:
1. Identify if a tweet is real or fake
○ This task is supervised and requires a discriminative model.
○ The model learns to approximate p(y|x)
2. Generate a new tweet
Generative Model vs. Discriminative Model
Suppose we have two tasks on a dataset of tweets:
1. Identify if a tweet is real or fake
○ This task is supervised and requires a discriminative model.
○ The model learns to approximate p(y|x)
2. Generate a new tweet
○ This task is unsupervised and requires a generative model.
○ The model learns to approximate p(x)
Generative Model vs. Discriminative Model
Encoding
Input
Discriminative Model Generative Model
(e.g. Classifier) (e.g. Variational Autoencoder)
Output Label
Generative Learning
Generative learning is an Unsupervised Learning task:
● There is a loss function → an auxiliary task that we know the answer to
● There is no ground truth with respect to the actual task that we want to
accomplish.
● We are learning the structure & distribution of data, rather than labels for
data!
Generative Models
A generative model is used to generate new data, using some input encoding:
Unconditional Generative Models
● Only get random noise as input
● No control over what category they generate
Conditional Generative Models
● One-hot encoding of the target category + random noise, or
● An embedding generated by another model (e.g., from CNN)
● User have a high-level control over what the model will generate
Generative Models
There are different families of deep generative models:
● Autoregressive Models We already
covered these
● Variational AutoEncoders (VAEs)
We are covering
● Generative Adversarial Networks (GANs)
it today
● Flow-Based Generative Models
We won’t
● Diffusion Models cover them in
this course
Problem with Autoencoders
Vanilla autoencoders generate blurry images
with blurry backgrounds
To minimize the MSE loss, autoencoders predict
the average pixel
Can we use a better loss function?
Generative Adversarial Networks
Generative Adversarial Networks
Idea → Train two models
Generator model: try to fool
the discriminator by
generating real-looking images
Discriminator model: try to
distinguish between real and
fake images
The loss function of the generator is defined by the discriminator!
Learning to Forge Art
You want to learn to forge paintings!*
Only problem: you know nothing about
art….
You sell your paintings to your friend, who
also knows nothing about art… profit!
However…your friend starts becoming
suspicious, and compares your Mona Lisa to
an image on the internet. It's a fake!
*not a recommended career path, stick to engineering!
Learning to Forge Art
The next week...
You learn more about painting, challenged
by your friend who only pays for paintings
he thinks are real!
Your friend learns more about art and
specifically your improved techniques that
initially fool him. He starts to catch many of
your fakes, but not all...
Learning to Forge Art
A few decades later...
You become an expert artist, able to
masterfully create art forgeries that are
near identical to the originals
Your friend becomes an art expert who can
distinguish all but the absolute best of
forgeries.
Real-World Example
Generative Adversarial Networks
Generator network Discriminator network
Input → A noise vector Input → An image
Output → A generated image Output → A binary label (real vs fake)
Generative Adversarial Networks
Play a minmax game:
The discriminator will try to do the best job it can
The generator is set to make the discriminator as wrong as possible
Loss Function for MinMax Game
Learn discriminator weights to maximize the probability that it labels a real
image as real and a generated image as fake
What loss function should we use? Binary Cross-Entropy
Loss Function for MinMax Game
Learn generator weights to maximize the probability that the discriminator
labels a generated image as real
What loss function should we use? Discriminator
Training
Alternate between training the discriminator and training the generator
PyTorch Implementation
PyTorch: Discriminator
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(28*28, 300),
nn.LeakyReLU(0.2),
nn.Linear(300, 100),
nn.LeakyReLU(0.2),
nn.Linear(100, 1))
def forward(self, x):
x = x.view(x.size(0), -1)
out = self.model(x)
return out.view(x.size(0))
PyTorch: Generator
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(100, 300),
nn.LeakyReLU(0.2),
nn.Linear(300, 28*28),
nn.Sigmoid())
def forward(self, x):
out = self.model(x).view(x.size(0), 1, 28, 28)
return out.view(x.size(0))
PyTorch: Training the Discriminator
def train_discrimintor(discriminator, generator, images):
batch_size = images.size(0)
noise = torch.randn(batch_size, 100)
fake_images = generator(noise)
inputs = torch.cat([images, fake_images])
labels = torch.cat([torch.zeros(batch_size), # Real
torch.ones(batch_size)]) # Fake
outputs = discriminator(inputs)
loss = criterion(outputs, labels)
return outputs, loss
PyTorch: Training the Generator
def train_generator(discriminator, generator, batch_size):
batch_size = images.size(0)
noise = torch.randn(batch_size, 100)
fake_images = generator(noise)
outputs = discriminator(fake_images)
# Only looks at fake outputs
# gets rewarded if we fool the discriminator!
labels = torch.zeros(batch_size)
loss = criterion(outputs, labels)
return fake_images, loss
Problems of Training GANs
Vanishing Gradients
If the discriminator is too good, then the generator will not learn
Remember that we are using the discriminator as a loss function for the
generator
If the discriminator is too good, small changes in the generator weights won’t
change the discriminator output
If small changes in generator weights make no difference, then we can’t
incrementally improve the generator (no gradients!)
Mode Collapse
We want the generator to generate variety of outputs
(e.g., all digits within MNIST).
If generator starts producing the same output (or a
small set of outputs), the best strategy for the
discriminator is to reject that output.
However, if the discriminator is trapped in local
optimum, it cannot adapt to generator, and the
generator can fool it by only generating one type of
data (e.g. only digit 1)
Failing to Converge
Due to the MinMax optimization process, training Vanilla GANs is very difficult.
It is difficult to numerically see whether there is progress → Plotting the “training
curve” doesn’t help much!
Takes a long time to train (a long time before we see progress)
To train GANs faster, we’ll use:
● LeakyReLU Activations instead of ReLU
● Batch Normalization
● Regularizing discriminator weights & adding noise to discriminator inputs
Intermission
(5 to 10 min break)
Applications of GANs
GANs in 2018 ...
https://arxiv.org/pdf/1812.04948.pdf
Grayscale to Color
Grayscale to Color
Convert to Conditional
grayscale Generator
D Real/Fake
Discriminator
Conditional Generation
How could we have a GAN trained on MNIST output only specific digits?
Noise
C C
Style Transfer
Cycle GAN: Cycle loss is reconstruction loss between input to cyclegan and
output of cyclegan to ensure consistency
Style Transfer
Adversarial Attacks
Adversarial Examples
Adversarial Attacks
Goal: Choose a small perturbation ε on an image x so that a neural network f
misclassifies x + ε.
Approach: Use the same optimization process to choose ε to minimize the
probability that
f (x + ε) = correct class
We are treating ε as the parameters.
Targeted vs. Non-Targeted Attack
Non-targeted attack
● Minimize the probability that
f (x + ε) = correct class
Targeted attack
● Maximize the probability that
f (x + ε) = target class
White-Box vs. Black-Box Attacks
White-box attacks
● Assumes that the model is known
● We need to know the architectures and weights of f to optimize ε
Black-box attacks
● Don’t know the architectures and weights of f to optimize ε
● Substitute model mimicking target model with known, differentiable function
● Adversarial attacks often transfer across models!
3D Objects
Printed Pictures
Adversarial T-Shirts
https://arxiv.org/pdf/1910.11099v3.pdf
Defence Against Adversarial Attack
It is a very active area of research, and we still don’t know how to handle them.
Failed Defenses:
● Adding noise at test time
● Averaging many models
● Weight decay
● Adding noise at training time
● Adding adversarial noise at training time
● Dropout
Questions?