Module 2 - Variational Autoencoders
Basic Components of VAEs
● Encoder Network
● Converts input data into a probability distribution in latent space
● Maps high-dimensional data to lower-dimensional representations
● Produces mean (μ) and variance (σ) parameters of the latent distribution
● Latent Space
● Compressed, continuous representation of input data
● Usually follows a normal distribution
● Enables meaningful interpolation between data points
● Dimensionality is much lower than input space
● Decoder Network
● Reconstructs input data from latent space samples
● Maps latent representations back to original data space
● Learns to generate new data samples
Architecture and Training of VAEs
1. Network Structure
● Encoder: Input → Hidden Layers → Latent Parameters (μ, σ)
● Sampling Layer: Uses reparameterization trick
● Decoder: Latent Sample → Hidden Layers → Reconstructed Output
The basic architecture of VAE is depicted in the diagram below:
2. Training Process
● Forward pass through encoder
● Sampling from latent distribution
● Reconstruction through decoder
● Backpropagation using loss function
The training process is depicted in diagram given below:
Loss Function Components
1. Reconstruction Loss
● Measures how well the decoder reconstructs input
● Usually binary cross-entropy or mean squared error
2. KL Divergence Loss
● Ensures latent space follows desired distribution
● Regularizes the latent space
● Prevents overfitting and enables generation
The Loss Function Components are depicted in the diagram given below:
Latent Space Representation and Inference
● Properties
● Continuous and smooth
● Semantically meaningful
● Supports interpolation
● Enables feature disentanglement
The Latent Space Representation is depicted in the diagram given below:
● Inference Process
● Encode input to get latent parameters
● Sample from latent distribution
● Generate new samples via decoder
Applications in Image Generation
1. Image Synthesis
● Generate new, realistic images
● Interpolate between existing images
● Style transfer and modification
2. Image Manipulation
● Feature manipulation in latent space
● Attribute editing
● Image completion/inpainting
The image manipulation Workflow is depicted in the diagram given below:
3. Domain Adaptation
● Transfer learning between domains
● Style transfer applications
● Cross-domain translation
Key Advantages
● Probabilistic approach to generative modeling
● Learns meaningful latent representations
● Enables both generation and inference
● Supports various data types (images, text, audio)
Challenges and Limitations
● Blurry reconstructions in image applications
● Difficulty with complex, high-dimensional data
● Balancing reconstruction vs. KL divergence
● Mode collapse issues
Module 2.2: Types of Autoencoders
1. Undercomplete Autoencoders
● Definition: Simplest form where hidden layers have fewer dimensions than input
layer 5
● Key Characteristics:
● Forces compressed data representation
● Acts as dimensionality reduction technique
● More powerful than PCA due to non-linear transformation capabilities 1
2. Sparse Autoencoders
● Core Concept: Similar to undercomplete but uses different regularization approach 1
● Distinctive Features:
● Doesn't require dimension reduction
● Uses sparsity penalty in loss function
● Penalizes activation of hidden layer neurons 7
● Benefits:
● Different nodes specialize for different input types
● Better at preventing overfitting
● Uses L1 Regularization and KL divergence 1
3. Contractive Autoencoders
● Primary Principle: Similar inputs should have similar encodings 1
● Key Features:
● Keeps input hidden layer activations derivatives small
● Enforces robustness in learned representations
● Contracts neighborhood of inputs to small neighborhood of outputs 1
4. Denoising Autoencoders
● Main Purpose: Remove noise from input data 1
● Operational Method:
● Input and output are intentionally different
● Fed with corrupted/noisy versions of input
● Trained to recover clean versions 6
● Applications:
● Image restoration
● Signal denoising
● Data cleaning
5. Variational Autoencoders (VAEs)
● Unique Approach: Creates probabilistic distributions for each dimension 1
● Key Characteristics:
● Encoder outputs probability distribution parameters
● Decoder samples from these distributions
● Enables generative capabilities 1
● Applications:
● Data generation
● Image synthesis
● Feature interpolation
The different types of autoencoders are show in the diagram below:
graph TD
A[Input Data] --> B{Autoencoder Type}
B --> C[Undercomplete]
B --> D[Sparse]
B --> E[Contractive]
B --> F[Denoising]
B --> G[Variational]
C --> H[Encoder (compressed mapping)
→ Bottleneck → Decoder]
D --> I[Encoder (with sparsity penalty)
→ Bottleneck → Decoder]
E --> J[Encoder (with contractive penalty, keeping
activations locally invariant)
→ Bottleneck → Decoder]
F --> K[Encoder (processes noisy input)
→ Bottleneck → Decoder (reconstructs clean output)]
G --> L[Probabilistic Encoder (outputs μ & σ)
→ Sampling → Decoder (generative model)]
Comparison of Architectures
Training Objectives:
1. Undercomplete: Minimize reconstruction error with bottleneck constraint 5
2. Sparse: Balance reconstruction with activation sparsity 7
3. Contractive: Maintain similar encodings for similar inputs 1
4. Denoising: Reconstruct clean data from noisy input 6
5. Variational: Balance reconstruction with distribution matching 1
Use Cases:
● Dimensionality Reduction: Undercomplete, Sparse
● Feature Learning: Sparse, Contractive
● Noise Removal: Denoising
● Data Generation: Variational
Benefits and Limitations:
● Undercomplete: Simple but may lose important information 5
● Sparse: Better feature extraction but complex training 7
● Contractive: Robust features but computationally intensive 1
● Denoising: Effective noise removal but requires corrupt-clean pairs 6
● Variational: Powerful generation but complex optimization