Complete Image Processing Notes - BTech 6th Semester
Your Ultimate Exam Guide to Mastering Digital Image Processing
📋 Table of Contents
1. Introduction to Image Processing
2. Digital Image Formation
3. Mathematical Preliminaries
4. Image Enhancement
5. Image Restoration
6. Image Segmentation
Module 1: Introduction [3L]
🎯 Learning Objectives
Understanding the fundamentals of digital image processing
Learning the basic components and steps involved
Grasping the practical applications and importance
1.1 Background
What is Image Processing?
Definition: Image processing is the analysis and manipulation of a digitized image, especially to improve
its quality or extract meaningful information from it.
Harvey Specter's Analogy: Think of image processing like Harvey preparing for a case. Just as Harvey
takes raw evidence (noisy, unclear information) and processes it to present the clearest, most compelling
argument in court, image processing takes raw digital images and enhances them to reveal the truth
hidden within the pixels.
Why Image Processing Matters?
Medical Imaging: X-rays, MRI, CT scans
Satellite Imagery: Weather forecasting, agriculture
Security: Face recognition, fingerprint analysis
Entertainment: Movies, gaming, social media filters
1.2 Digital Image Representation
Fundamental Concepts
Digital Image Definition: A digital image is a 2D function f(x,y) where x and y are spatial coordinates,
and the amplitude of f at any pair of coordinates (x,y) is called the intensity or gray level of the image at
that point.
Mathematical Representation:
f(x,y) = intensity value at coordinates (x,y)
Key Parameters:
Resolution: Number of pixels (e.g., 1920×1080)
Bit Depth: Number of bits per pixel (8-bit = 256 gray levels)
Dynamic Range: Ratio between brightest and darkest pixel
Types of Images
1. Binary Images: 1-bit (black and white only)
2. Grayscale Images: 8-bit (256 shades of gray)
3. Color Images: 24-bit RGB (16.7 million colors)
Tony Stark's Workshop Analogy: Just like Tony's holographic displays can show different levels of detail
and color depth, digital images have different bit depths that determine how much information they can
store and display.
1.3 Fundamental Steps in Image Processing
The Six Essential Steps
1. Image Acquisition
Capturing the image using sensors (cameras, scanners)
Converting analog signals to digital
2. Preprocessing
Noise reduction
Contrast enhancement
Geometric corrections
3. Segmentation
Partitioning image into meaningful regions
Separating objects from background
4. Representation & Description
Converting segmented regions into forms for computer processing
Feature extraction
5. Recognition & Interpretation
Assigning meaning to recognized objects
Pattern classification
6. Knowledge Base
Domain-specific information
Guides all processing steps
Process Flow Diagram:
Image Acquisition → Preprocessing → Segmentation →
Representation → Recognition → Output
↑ ↑ ↑
Knowledge Base ←────┴────────────┘
1.4 Elements of Digital Image Processing
1.4.1 Image Acquisition
Definition: The process of converting a continuous image into a digital representation.
Components:
Illumination Source: Provides energy (light, X-rays, etc.)
Scene Element: Object being imaged
Imaging System: Camera, sensor array
Digitization: Sampling and quantization
Mathematical Model:
f(x,y) = i(x,y) × r(x,y)
where:
i(x,y) = illumination component
r(x,y) = reflectance component
1.4.2 Image Storage
File Formats:
Uncompressed: BMP, TIFF
Lossless Compression: PNG, GIF
Lossy Compression: JPEG
Storage Requirements:
Size (bytes) = Width × Height × Bit_depth / 8
Example Calculation: For a 1024×768 color image (24-bit): Size = 1024 × 768 × 24 / 8 = 2,359,296 bytes
≈ 2.25 MB
1.4.3 Image Processing
Categories:
1. Low-level: Noise reduction, contrast enhancement
2. Mid-level: Segmentation, object recognition
3. High-level: Scene understanding, cognitive functions
1.4.4 Image Communication
Transmission Methods:
Wired: Ethernet, fiber optic
Wireless: WiFi, cellular networks
Broadcast: Television, satellite
Compression Importance:
Reduces bandwidth requirements
Faster transmission times
Lower storage costs
1.4.5 Image Display
Display Technologies:
LCD: Liquid Crystal Display
OLED: Organic Light Emitting Diode
Projectors: DLP, LCD projection
🧠 Module 1 Quick Quiz
Q1: What are the main components of the image acquisition process? Q2: Calculate the storage size of a
512×512 grayscale image (8-bit). Q3: Name three applications of image processing in real life.
Answers:
1. Illumination source, scene element, imaging system, digitization
2. 512 × 512 × 8 / 8 = 262,144 bytes = 256 KB
3. Medical imaging, satellite imagery, security systems
🎯 Exam Tips for Module 1
Remember the 6 fundamental steps - often asked in short questions
Storage calculation formula - numerical problems are common
Difference between analog and digital images - conceptual question
Applications in different domains - essay-type questions
Likely Exam Questions:
1. Explain the fundamental steps in digital image processing with a neat diagram. (10 marks)
2. Calculate the storage requirement for different image types. (5 marks)
3. Compare analog vs digital image processing. (5 marks)
Module 2: Digital Image Formation [4L]
🎯 Learning Objectives
Understanding image formation models
Learning geometric transformations
Mastering sampling and quantization concepts
2.1 A Simple Image Model
Mathematical Foundation
Image Formation Equation:
f(x,y) = i(x,y) × r(x,y)
Where:
f(x,y): Observed image intensity
i(x,y): Illumination component (0 < i(x,y) < ∞)
r(x,y): Reflectance component (0 < r(x,y) < 1)
Physical Interpretation:
Illumination: Amount of light falling on the scene
Reflectance: Fraction of light reflected by objects
Dr. Strange's Mirror Dimension Analogy: Just like Dr. Strange sees the real world (reflectance) modified
by the magical energy (illumination) in the Mirror Dimension, our cameras capture the combination of
how much light hits an object and how much it reflects back.
Typical Ranges
Sunny day: i(x,y) ≈ 9000 foot-candles
Full moon: i(x,y) ≈ 0.01 foot-candles
Fresh snow: r(x,y) ≈ 0.93
Black velvet: r(x,y) ≈ 0.01
2.2 Geometric Model - Basic Transformations
2.2.1 Translation
Definition: Moving an image from one location to another without changing its orientation or size.
Mathematical Formula:
x' = x + tx
y' = y + ty
Matrix Form:
[x'] [1 0 tx] [x]
[y'] = [0 1 ty] [y]
[1 ] [0 0 1 ] [1]
Example Problem: Translate point (3,4) by (2,1):
x' = 3 + 2 = 5
y' = 4 + 1 = 5
New point: (5,5)
2.2.2 Scaling
Definition: Changing the size of an image by multiplying coordinates by scaling factors.
Mathematical Formula:
x' = Sx × x
y' = Sy × y
Matrix Form:
[x'] [Sx 0 0] [x]
[y'] = [0 Sy 0] [y]
[1 ] [0 0 1] [1]
Types:
Uniform Scaling: Sx = Sy (maintains aspect ratio)
Non-uniform Scaling: Sx ≠ Sy (changes aspect ratio)
Example Problem: Scale point (4,6) by factors (2,0.5):
x' = 2 × 4 = 8
y' = 0.5 × 6 = 3
New point: (8,3)
2.2.3 Rotation
Definition: Rotating an image about a reference point (usually origin) by angle θ.
Mathematical Formula:
x' = x cos(θ) - y sin(θ)
y' = x sin(θ) + y cos(θ)
Matrix Form:
[x'] [cos(θ) -sin(θ) 0] [x]
[y'] = [sin(θ) cos(θ) 0] [y]
[1 ] [0 0 1] [1]
Example Problem: Rotate point (1,0) by 90° counterclockwise:
cos(90°) = 0, sin(90°) = 1
x' = 1×0 - 0×1 = 0
y' = 1×1 + 0×0 = 1
New point: (0,1)
Suits Analogy: Just like Mike Ross can transform his approach to a case (translate his strategy, scale his
efforts, or rotate his perspective), we can transform images using these geometric operations to get the
best view of our data.
2.3 Perspective Projection
Definition
Perspective Projection: The process of projecting 3D world coordinates onto a 2D image plane,
simulating how human vision works.
Mathematical Model
Basic Perspective Equation:
xp = f × X/Z
yp = f × Y/Z
Where:
(X,Y,Z): 3D world coordinates
(xp,yp): 2D image coordinates
f: Focal length of camera
Key Properties
1. Parallel lines converge at vanishing points
2. Objects appear smaller as distance increases
3. Foreshortening effect occurs
Camera Parameters:
Intrinsic: Focal length, principal point, lens distortion
Extrinsic: Camera position and orientation in world
Iron Man's HUD Analogy: Tony Stark's heads-up display projects 3D world information onto his 2D visor
screen, just like perspective projection maps our 3D world onto 2D images.
2.4 Sampling & Quantization
2.4.1 Sampling
Definition: Converting continuous image coordinates into discrete coordinates.
Nyquist Sampling Theorem: To avoid aliasing, sampling frequency must be at least twice the highest
frequency component in the image.
fs ≥ 2 × fmax
Types of Sampling:
Uniform Sampling
Equal spacing between sample points
Regular grid pattern
Most common in digital cameras
Mathematical Representation:
f(x,y) → f(iΔx, jΔy)
where i,j are integers and Δx,Δy are sampling intervals
Non-uniform Sampling
Variable spacing between sample points
Adaptive to image content
Used in specialized applications
Aliasing Effects:
Spatial Aliasing: Jagged edges, moiré patterns
Temporal Aliasing: Wagon wheel effect in movies
Example Calculation: For an image with maximum frequency of 100 cycles/mm:
Minimum sampling rate = 2 × 100 = 200 samples/mm
Maximum sampling interval = 1/200 = 0.005 mm
2.4.2 Quantization
Definition: Converting continuous range of intensities into discrete levels.
Mathematical Process:
Quantized_value = round(Original_value × (2^b - 1) / Max_value)
Where b is the number of bits per pixel.
Types of Quantization:
Uniform Quantization
Equal-sized quantization intervals
Simple implementation
May not be optimal for all images
Step Size Calculation:
Δ = (Max_value - Min_value) / (2^b)
Non-uniform Quantization
Variable-sized intervals
Optimized for human visual system
Better quality at same bit rate
Quantization Effects:
Contouring: False edges in smooth regions
Loss of detail: Fine variations lost
Noise introduction: Quantization noise
Example Problem: Quantize intensity value 150 to 3 bits (8 levels):
Range: 0-255
Step size: 255/7 ≈ 36.4
Level: round(150/36.4) = round(4.12) = 4
Quantized value: 4 × 36.4 ≈ 146
Quality Metrics:
SNR = 10 log10(Signal_Power/Noise_Power) dB
PSNR = 20 log10(Max_Value/RMSE) dB
🧠 Module 2 Quick Quiz
Q1: What is the result of rotating point (2,3) by 180°? Q2: Calculate the minimum sampling rate for an
image with maximum frequency 50 Hz. Q3: How many quantization levels are possible with 4 bits?
Answers:
1. (-2,-3) [cos(180°)=-1, sin(180°)=0]
2. 100 Hz (2 × 50 Hz)
3. 16 levels (2^4 = 16)
🎯 Exam Tips for Module 2
Master transformation matrices - numerical problems are frequent
Understand sampling theorem - conceptual questions common
Quantization calculations - step-by-step approach important
Perspective projection - diagrams help explain concepts
Likely Exam Questions:
1. Derive and apply geometric transformation matrices. (10 marks)
2. Explain sampling and quantization with examples. (8 marks)
3. Calculate quantization levels and step sizes. (5 marks)
Module 3: Mathematical Preliminaries [9L]
🎯 Learning Objectives
Understanding pixel relationships and connectivity
Learning distance measures and operations
Mastering Fourier transforms and their properties
3.1 Neighbors of Pixels
Pixel Adjacency Definitions
4-Adjacency (4-Connected)
Definition: Two pixels p and q are 4-adjacent if q is in the set N4(p).
N4(p) = {(x±1,y), (x,y±1)}
Visual Representation:
[p]
[q] [p] [q]
[p]
8-Adjacency (8-Connected)
Definition: Two pixels p and q are 8-adjacent if q is in the set N8(p).
N8(p) = N4(p) ∪ {(x±1,y±1), (x±1,y∓1)}
Visual Representation:
[q] [q] [q]
[q] [p] [q]
[q] [q] [q]
m-Adjacency (Mixed Connectivity)
Definition: Two pixels p and q are m-adjacent if:
1. q ∈ N4(p), OR
2. q ∈ ND(p) AND N4(p) ∩ N4(q) contains no pixels with values from V
Spider-Man's Web Analogy: Just like Spider-Man's web connects buildings in different ways (direct lines,
diagonal swings, or complex paths), pixels can be connected through 4-adjacency (cardinal directions), 8-
adjacency (including diagonals), or m-adjacency (smart mixed connections).
3.2 Connectivity
Path Definition
Definition: A path from pixel p to pixel q is a sequence of distinct pixels p = p0, p1, p2, ..., pn = q such
that pi is adjacent to pi+1 for 0 ≤ i < n.
Connected Components
Definition: A subset S of pixels is connected if there exists a path between any two pixels in S using pixels
only from S.
Algorithm for Finding Connected Components:
1. Initialize label counter = 1
2. Scan image pixel by pixel
3. If unlabeled pixel with desired value found:
a. Assign current label
b. Find all connected pixels using BFS/DFS
c. Assign same label to all connected pixels
d. Increment label counter
4. Continue until entire image scanned
Boundaries
Definition: The boundary of region R is the set of pixels in R that have at least one neighbor not in R.
Types of Boundaries:
4-boundary: Using 4-adjacency
8-boundary: Using 8-adjacency
3.3 Relations, Equivalence & Transitive Closure
Equivalence Relations
Definition: A relation R on set S is an equivalence relation if it satisfies:
1. Reflexive: aRa for all a ∈ S
2. Symmetric: If aRb, then bRa
3. Transitive: If aRb and bRc, then aRc
Transitive Closure
Definition: The transitive closure R* of relation R is the smallest transitive relation containing R.
Algorithm for Transitive Closure (Floyd-Warshall):
for k = 1 to n:
for i = 1 to n:
for j = 1 to n:
R*[i,j] = R*[i,j] OR (R*[i,k] AND R*[k,j])
Harvey's Case Network Analogy: Just like Harvey builds connections between different pieces of
evidence (if Evidence A connects to B, and B connects to C, then A is transitively connected to C),
transitive closure finds all indirect connections in pixel relationships.
3.4 Distance Measures
Euclidean Distance (L2)
Formula:
D_E(p,q) = √[(x₁-x₂)² + (y₁-y₂)²]
Properties:
Most intuitive distance measure
Forms circular distance contours
Computationally expensive (square root)
Manhattan Distance (L1)
Formula:
D_M(p,q) = |x₁-x₂| + |y₁-y₂|
Properties:
Also called "City Block" distance
Forms diamond-shaped distance contours
Computationally efficient
Chessboard Distance (L∞)
Formula:
D_C(p,q) = max(|x₁-x₂|, |y₁-y₂|)
Properties:
Maximum of coordinate differences
Forms square distance contours
Useful for 8-connected paths
Example Calculation: For points p(2,3) and q(5,7):
Euclidean: √[(5-2)² + (7-3)²] = √[9+16] = 5
Manhattan: |5-2| + |7-3| = 3 + 4 = 7
Chessboard: max(|5-2|, |7-3|) = max(3,4) = 4
3.5 Arithmetic/Logic Operations
Arithmetic Operations
Addition
s(x,y) = f(x,y) + g(x,y)
Applications: Noise addition, image averaging
Subtraction
s(x,y) = f(x,y) - g(x,y)
Applications: Change detection, background subtraction
Multiplication
s(x,y) = f(x,y) × g(x,y)
Applications: Masking, region of interest selection
Division
s(x,y) = f(x,y) / g(x,y)
Applications: Shading correction, normalization
Logic Operations
AND Operation
s(x,y) = f(x,y) AND g(x,y)
Truth Table:
f g f AND g
0 0 0
0 1 0
1 0 0
1 1 1
OR Operation
s(x,y) = f(x,y) OR g(x,y)
XOR Operation
s(x,y) = f(x,y) XOR g(x,y)
NOT Operation
s(x,y) = NOT f(x,y)
Applications:
Binary image operations
Morphological processing
Region extraction
3.6 Fourier Transformation
Continuous Fourier Transform
Forward Transform:
F(u,v) = ∫∫ f(x,y) e^(-j2π(ux+vy)) dx dy
Inverse Transform:
f(x,y) = ∫∫ F(u,v) e^(j2π(ux+vy)) du dv
Physical Interpretation
f(x,y): Spatial domain representation
F(u,v): Frequency domain representation
u,v: Spatial frequencies
Tony Stark's Arc Reactor Analogy: Just like Tony's arc reactor converts different forms of energy (spatial
energy to frequency energy), Fourier transform converts images from spatial domain (what we see) to
frequency domain (how fast things change).
3.7 Properties of 2D Fourier Transform
1. Separability
F(u,v) = ∫ [∫ f(x,y) e^(-j2πvy) dy] e^(-j2πux) dx
Can be computed as two 1D transforms.
2. Translation
f(x-a, y-b) ↔ F(u,v) e^(-j2π(ua+vb))
Translation in spatial domain = phase shift in frequency domain.
3. Rotation
If f(x,y) is rotated by θ, then F(u,v) is also rotated by θ.
4. Scaling
f(ax, by) ↔ (1/|ab|) F(u/a, v/b)
5. Convolution Theorem
f(x,y) * g(x,y) ↔ F(u,v) × G(u,v)
f(x,y) × g(x,y) ↔ F(u,v) * G(u,v)
6. Parseval's Theorem
∫∫ |f(x,y)|² dx dy = ∫∫ |F(u,v)|² du dv
3.8 Discrete Fourier Transform (DFT)
2D DFT Definition
Forward DFT:
F(u,v) = (1/MN) ∑∑ f(x,y) e^(-j2π(ux/M + vy/N))
x=0 y=0
Inverse DFT:
f(x,y) = ∑∑ F(u,v) e^(j2π(ux/M + vy/N))
u=0 v=0
Where M×N is the image size.
Computational Complexity
Direct computation: O(N⁴) for N×N image
FFT algorithm: O(N²log N) for N×N image
FFT Algorithm Benefits:
Dramatically reduces computation time
Makes real-time processing possible
Foundation for many image processing techniques
Example Problem: 4-point DFT
Given sequence f = [1, 2, 3, 4], compute F(u):
Solution:
F(0) = (1/4)[1×e^0 + 2×e^0 + 3×e^0 + 4×e^0] = (1/4)[10] = 2.5
F(1) = (1/4)[1×e^0 + 2×e^(-jπ/2) + 3×e^(-jπ) + 4×e^(-j3π/2)]
= (1/4)[1 + 2(-j) + 3(-1) + 4(j)] = (1/4)[-2+2j] = -0.5+0.5j
3.9 Discrete Cosine Transform (DCT)
Definition
2D DCT:
F(u,v) = α(u)α(v) ∑∑ f(x,y) cos[(2x+1)uπ/2M] cos[(2y+1)vπ/2N]
x=0 y=0
Where:
α(u) = √(1/M) for u = 0
α(u) = √(2/M) for u = 1,2,...,M-1
Properties
Real-valued (unlike DFT which is complex)
Excellent energy compaction
Used in image compression (JPEG standard)
No phase information needed
Applications
JPEG compression: 8×8 DCT blocks
Video compression: MPEG standards
Feature extraction: Pattern recognition
3.10 Discrete Sine Transform (DST)
Definition
2D DST:
F(u,v) = β(u)β(v) ∑∑ f(x,y) sin[(x+1)(u+1)π/(M+1)] sin[(y+1)(v+1)π/(N+1)]
x=0 y=0
Where:
β(u) = √(2/(M+1))
Properties
Real-valued transform
Good for signals with zero boundary conditions
Less common than DCT
Useful for specific applications
🧠 Module 3 Quick Quiz
Q1: What is the Manhattan distance between points (1,1) and (4,5)? Q2: Which property allows FFT to be
computed efficiently? Q3: What is the main advantage of DCT over DFT for image compression?
Answers:
1. |4-1| + |5-1| = 3 + 4 = 7
2. Separability property
3. DCT is real-valued and has better energy compaction
🎯 Exam Tips for Module 3
Distance measure calculations - frequently asked numerical problems
Fourier transform properties - understand conceptually and mathematically
Connectivity concepts - draw diagrams to illustrate
DCT vs DFT differences - comparison questions are common
Likely Exam Questions:
1. Explain different types of pixel connectivity with examples. (8 marks)
2. Derive and apply properties of 2D Fourier transform. (10 marks)
3. Compare DCT and DFT for image processing applications. (6 marks)
Module 4: Image Enhancement [8L]
🎯 Learning Objectives
Master spatial and frequency domain enhancement techniques
Understand contrast enhancement methods
Learn various filtering approaches for smoothing and sharpening
4.1 Introduction to Image Enhancement
Definition
Image Enhancement: The process of improving image quality to make it more suitable for specific
applications or human visual perception.
Key Principle: Enhancement is subjective - what looks better depends on the application and viewer.
Categories:
1. Spatial Domain Methods: Direct manipulation of pixels
2. Frequency Domain Methods: Manipulation in Fourier domain
Spider-Verse Analogy: Just like how different Spider-People see the same multiverse differently (Miles
sees electric effects, Gwen sees everything in her art style), image enhancement lets us see the same
image in different ways that reveal different aspects of the information.
4.2 Spatial Domain Methods
General Form
g(x,y) = T[f(x,y)]
Where:
f(x,y): Input image
g(x,y): Enhanced output image
T: Transformation operator
Point Processing
Single pixel transformations:
s = T(r)
Where r is input gray level and s is output gray level.
4.3 Contrast Enhancement
4.3.1 Linear Stretching
Basic Linear Transformation
s = ar + b
Where:
a: Controls contrast (slope)
b: Controls brightness (y-intercept)
Effects:
a > 1: Increases contrast
0 < a < 1: Decreases contrast
b > 0: Increases brightness
b < 0: Decreases brightness
Contrast Stretching Formula
s = (s_max - s_min)/(r_max - r_min) × (r - r_min) + s_min
Example Problem: Stretch contrast of image with range [50, 200] to full range [0, 255]:
Solution:
s = (255 - 0)/(200 - 50) × (r - 50) + 0
s = 255/150 × (r - 50) = 1.7(r - 50)
For pixel value r = 100:
s = 1.7(100 - 50) = 1.7 × 50 = 85
4.3.2 Nonlinear Stretching
Logarithmic Transformation
s = c × log(1 + r)
Properties:
Expands dark regions of the image
Compresses bright regions
Useful for displaying Fourier spectra
Example Calculation: For r = 100, c = 255/log(256):
s = (255/log(256)) × log(1 + 100)
s = (255/5.545) × log(101)
s = 45.98 × 4.615 ≈ 212
Power-Law (Gamma) Transformation
s = c × r^γ
Effects of γ values:
γ > 1: Darkens the image (compresses dark regions)
γ < 1: Brightens the image (expands dark regions)
γ = 1: Linear transformation (no change)
Applications:
Gamma correction: Compensating for display characteristics
Medical imaging: Enhancing X-ray images
Photography: Artistic effects
Example Problem: Apply gamma correction with γ = 0.5, c = 255^0.5 to pixel value r = 64:
Solution:
s = 255^0.5 × (64/255)^0.5
s = 15.97 × (0.251)^0.5
s = 15.97 × 0.501 ≈ 128
Harvey's Gamma Strategy Analogy: Just like Harvey adjusts his courtroom strategy's intensity (gamma)
based on the jury - sometimes going softer (γ < 1) to appeal to emotions, sometimes being more
aggressive (γ > 1) to make strong points - gamma correction adjusts image intensity to reveal the most
important details.
4.4 Histogram Processing
4.4.1 Histogram Definition
Definition: A histogram of a digital image is a discrete function:
h(rk) = nk
Where:
rk: kth gray level
nk: Number of pixels with gray level rk
Normalized Histogram:
p(rk) = nk/MN
Where MN is total number of pixels.
4.4.2 Histogram Equalization
Objective: Transform image so that its histogram is approximately uniform.
Mathematical Foundation: For continuous case:
s = T(r) = ∫[0 to r] pr(w)dw
Discrete Implementation:
sk = T(rk) = (L-1) ∑[j=0 to k] pr(rj) = (L-1) ∑[j=0 to k] nj/MN
Algorithm Steps:
1. Compute histogram of input image
2. Compute cumulative distribution function (CDF)
3. Transform using CDF as mapping function
Example Problem: Given 4×4 image with gray levels [0,1,2,3]:
Original histogram: h(0)=6, h(1)=4, h(2)=3, h(3)=3
Total pixels: 16
Solution:
s0 = (3/16) × 6 = 1.125 ≈ 1
s1 = (3/16) × (6+4) = 1.875 ≈ 2
s2 = (3/16) × (6+4+3) = 2.4375 ≈ 2
s3 = (3/16) × (6+4+3+3) = 3
4.4.3 Histogram Specification
Goal: Transform image to have a specified histogram shape.
Algorithm:
1. Equalize input image: r → s
2. Equalize desired histogram: z → v
3. Find inverse mapping: v → z
4. Apply composite transformation: r → s → z
4.5 Smoothing Filters
4.5.1 Image Averaging
Purpose: Noise reduction by averaging pixel values in local neighborhoods.
Simple Averaging Filter:
ĝ(x,y) = (1/mn) ∑∑ f(x+s, y+t)
(s,t)∈Sxy
Where Sxy is the neighborhood around (x,y).
4.5.2 Mean Filters
Arithmetic Mean Filter
ĝ(x,y) = (1/mn) ∑∑ f(s,t)
(s,t)∈Sxy
3×3 Arithmetic Mean Kernel:
(1/9) × [1 1 1]
[1 1 1]
[1 1 1]
Geometric Mean Filter
ĝ(x,y) = [∏ f(s,t)]^(1/mn)
(s,t)∈Sxy
Properties:
Less blurring than arithmetic mean
Better edge preservation
Good for Gaussian noise
Harmonic Mean Filter
ĝ(x,y) = mn / ∑∑ 1/f(s,t)
(s,t)∈Sxy
Applications:
Salt noise removal
Not suitable for pepper noise
Contraharmonic Mean Filter
ĝ(x,y) = ∑∑ f(s,t)^(Q+1) / ∑∑ f(s,t)^Q
(s,t)∈Sxy (s,t)∈Sxy
Order Q Effects:
Q > 0: Eliminates pepper noise
Q < 0: Eliminates salt noise
Q = 0: Arithmetic mean
Q = -1: Harmonic mean
4.5.3 Order-Statistic Filters
Median Filter
ĝ(x,y) = median{f(s,t), (s,t) ∈ Sxy}
Algorithm:
1. Sort pixel values in neighborhood
2. Select middle value (median)
3. Replace center pixel with median
Example: For 3×3 neighborhood: [10, 15, 20, 25, 100, 30, 35, 40, 45] Sorted: [10, 15, 20, 25, 30, 35, 40, 45,
100] Median = 30
Advantages:
Excellent for impulse noise removal
Preserves edges better than linear filters
Removes salt-and-pepper noise effectively
Max and Min Filters
Max Filter: ĝ(x,y) = max{f(s,t), (s,t) ∈ Sxy}
Min Filter: ĝ(x,y) = min{f(s,t), (s,t) ∈ Sxy}
Applications:
Max filter: Removes pepper noise (dark spots)
Min filter: Removes salt noise (bright spots)
4.5.4 Low-pass Filtering
Ideal Low-pass Filter:
H(u,v) = {1, if D(u,v) ≤ D0
{0, if D(u,v) > D0
Where D(u,v) is distance from origin and D0 is cutoff frequency.
Butterworth Low-pass Filter:
H(u,v) = 1 / [1 + (D(u,v)/D0)^(2n)]
Gaussian Low-pass Filter:
H(u,v) = e^(-D²(u,v)/2σ²)
Tony Stark's Filter Analogy: Just like FRIDAY filters out irrelevant information and only shows Tony the
most important data in his HUD, low-pass filters remove high-frequency noise and keep the essential
smooth information in images.
4.6 Image Sharpening
4.6.1 High-pass Filtering
Purpose: Enhance edges and fine details by emphasizing high-frequency components.
Ideal High-pass Filter:
H(u,v) = {0, if D(u,v) ≤ D0
{1, if D(u,v) > D0
Butterworth High-pass Filter:
H(u,v) = 1 / [1 + (D0/D(u,v))^(2n)]
Gaussian High-pass Filter:
H(u,v) = 1 - e^(-D²(u,v)/2σ²)
4.6.2 High-boost Filtering
Definition: Combines original image with high-pass filtered version.
Formula:
g(x,y) = A×f(x,y) - f_lp(x,y)
g(x,y) = (A-1)×f(x,y) + f_hp(x,y)
Where:
A: Amplification factor (A ≥ 1)
f_lp: Low-pass filtered image
f_hp: High-pass filtered image
Special Cases:
A = 1: Standard high-pass filtering
A > 1: High-boost filtering (preserves some low frequencies)
4.6.3 Derivative Filtering
First-Order Derivatives
Gradient Magnitude:
∇f = mag(∇f) = √[(∂f/∂x)² + (∂f/∂y)²]
Simplified Version:
∇f ≈ |∂f/∂x| + |∂f/∂y|
Roberts Cross-Gradient Operators:
Gx = [-1 0] Gy = [ 0 -1]
[ 0 1] [ 1 0]
Sobel Operators:
Gx = [-1 0 1] Gy = [-1 -2 -1]
[-2 0 2] [ 0 0 0]
[-1 0 1] [ 1 2 1]
Prewitt Operators:
Gx = [-1 0 1] Gy = [-1 -1 -1]
[-1 0 1] [ 0 0 0]
[-1 0 1] [ 1 1 1]
Second-Order Derivatives
Laplacian Operator:
∇²f = ∂²f/∂x² + ∂²f/∂y²
Discrete Laplacian Masks:
Basic: [0 -1 0]
[-1 4 -1]
[0 -1 0]
Extended: [-1 -1 -1]
[-1 8 -1]
[-1 -1 -1]
Laplacian Sharpening:
g(x,y) = f(x,y) + c×∇²f(x,y)
Where c = -1 if center coefficient is negative, +1 if positive.
Example Problem: Apply basic Laplacian to 3×3 region:
Input: [1 2 1] Laplacian: [0 -1 0]
[2 3 2] [-1 4 -1]
[1 2 1] [0 -1 0]
Solution: Center pixel calculation:
Result = 0×1 + (-1)×2 + 0×1 + (-1)×2 + 4×3 + (-1)×2 + 0×1 + (-1)×2 + 0×1
Result = 0 - 2 + 0 - 2 + 12 - 2 + 0 - 2 + 0 = 4
Sharpened value = 3 + (-1)×4 = -1 (clipped to 0)
4.7 Homomorphic Filtering
Concept
Problem: Image illumination varies across the scene, making enhancement difficult.
Solution: Separate illumination and reflectance components for independent processing.
Mathematical Foundation
Image Model:
f(x,y) = i(x,y) × r(x,y)
Logarithmic Transformation:
ln f(x,y) = ln i(x,y) + ln r(x,y)
Fourier Transform:
F(u,v) = I(u,v) + R(u,v)
Homomorphic Filter Design
H(u,v) = (γH - γL)[1 - e^(-c[D²(u,v)/D0²])] + γL
Where:
γL: Gain for low frequencies (illumination)
γH: Gain for high frequencies (reflectance)
c: Controls filter sharpness
D0: Cutoff frequency
Typical Values:
γL < 1: Suppress illumination variations
γH > 1: Enhance reflectance (details)
Algorithm Steps
1. Take natural logarithm of image
2. Apply FFT
3. Multiply by homomorphic filter
4. Apply inverse FFT
5. Take exponential
Doctor Strange's Reality Manipulation Analogy: Just like Doctor Strange can separate and manipulate
different layers of reality independently, homomorphic filtering separates the illumination layer from the
detail layer, allowing us to enhance each component optimally before recombining them.
4.8 Enhancement in Frequency Domain
Frequency Domain vs Spatial Domain
Advantages of Frequency Domain:
Insight into image structure
Efficient implementation using FFT
Better understanding of filtering effects
Easier design of complex filters
Convolution Theorem Application:
Spatial: g(x,y) = h(x,y) * f(x,y)
Frequency: G(u,v) = H(u,v) × F(u,v)
Filter Design Considerations
Ringing Effects
Cause: Sharp cutoffs in frequency domain
Solution: Use smooth transitions (Butterworth, Gaussian)
Filter Selection Criteria
1. Smoothness: Gaussian > Butterworth > Ideal
2. Ringing: Ideal (most) > Butterworth > Gaussian (least)
3. Computation: All similar with FFT
Practical Implementation
Steps:
1. Pad image to avoid wraparound effects
2. Center FFT by multiplying f(x,y) by (-1)^(x+y)
3. Apply filter in frequency domain
4. Take inverse FFT
5. Take real part and extract original size region
🧠 Module 4 Quick Quiz
Q1: What gamma value would you use to brighten a dark image? Q2: Which filter is best for removing
salt-and-pepper noise? Q3: What is the main advantage of homomorphic filtering?
Answers:
1. γ < 1 (values like 0.4, 0.5, 0.6)
2. Median filter
3. Simultaneous illumination correction and contrast enhancement
🎯 Exam Tips for Module 4
Histogram equalization calculations - step-by-step approach crucial
Gamma transformation effects - understand conceptually
Filter mask applications - practice convolution calculations
Frequency vs spatial domain - know when to use which approach
Likely Exam Questions:
1. Perform histogram equalization on a given image. (10 marks)
2. Compare different mean filters for noise removal. (8 marks)
3. Design and apply Laplacian sharpening filter. (7 marks)
4. Explain homomorphic filtering with block diagram. (8 marks)
Module 5: Image Restoration [7L]
🎯 Learning Objectives
Understand image degradation models
Learn restoration techniques vs enhancement
Master algebraic and constrained restoration approaches
Apply geometric transformations and interpolation methods
5.1 Introduction to Image Restoration
Restoration vs Enhancement
Image Enhancement:
Subjective process
Improves visual appearance
No prior knowledge of degradation
Trial and error approach
Image Restoration:
Objective process
Recovers original image from degraded version
Uses mathematical models of degradation
More systematic approach
Miles Morales' Glitch Analogy: Just like Miles had to restore his unstable molecular structure using
specific scientific knowledge about his glitching (not just trying random fixes), image restoration uses
mathematical models to systematically undo specific types of degradation.
5.2 Degradation Model
Mathematical Model
Spatial Domain:
g(x,y) = h(x,y) * f(x,y) + η(x,y)
Frequency Domain:
G(u,v) = H(u,v) × F(u,v) + N(u,v)
Where:
f(x,y): Original image
g(x,y): Degraded/observed image
h(x,y): Degradation function (PSF - Point Spread Function)
η(x,y): Additive noise
H(u,v): Degradation transfer function
N(u,v): Noise in frequency domain
Types of Degradation
Linear Degradation
Motion blur: Camera/object movement
Out-of-focus blur: Improper focusing
Atmospheric turbulence: Distortion due to air movement
Nonlinear Degradation
Geometric distortion: Lens aberrations
Intensity-dependent effects: Saturation, clipping
Common Point Spread Functions
Motion Blur PSF
For linear motion of length L at angle θ:
H(u,v) = [sin(π(u cos θ + v sin θ)L) / π(u cos θ + v sin θ)L] × e^(-jπ(u cos θ + v sin θ)L)
Out-of-focus Blur (Pillbox)
h(x,y) = {1/πR², if x² + y² ≤ R²
{0, otherwise
Where R is the radius of the blur circle.
Atmospheric Turbulence
H(u,v) = e^(-k(u² + v²)^(5/6))
Where k is a constant related to turbulence strength.
5.3 Discrete Formulation
Matrix-Vector Representation
For an M×N image, the degradation model becomes:
g = Hf + η
Where:
g: MN×1 degraded image vector
H: MN×MN degradation matrix
f: MN×1 original image vector
η: MN×1 noise vector
Circulant Matrix Properties
For spatially invariant degradation:
H is a block-circulant matrix
Diagonalized by 2D DFT
Enables efficient computation
Example: 4×4 Image Degradation Matrix: For simple horizontal averaging (h = [0.5, 0.5]):
H = [0.5 0.5 0 0 ]
[0 0.5 0.5 0 ]
[0 0 0.5 0.5]
[0.5 0 0 0.5]
5.4 Algebraic Approach to Restoration
5.4.1 Unconstrained Restoration
Direct Matrix Inversion
Ideal Solution:
f̂ = H⁻¹g = H⁻¹(Hf + η) = f + H⁻¹η
Problems:
H may not be invertible
Noise amplification: H⁻¹η can be very large
Computationally expensive for large images
Least Squares Solution
Minimize:
||Hf - g||²
Solution:
f̂ = (H^T H)⁻¹H^T g
Where H^T is the transpose of H.
5.4.2 Constrained Restoration
Problem with Unconstrained Approach
Noise amplification
Unrealistic solutions
Instability
Adding Constraints
Minimize:
||Hf - g||² + λ||Cf||²
Where:
C: Constraint operator (usually smoothness)
λ: Regularization parameter
Solution:
f̂ = (H^T H + λC^T C)⁻¹H^T g
5.5 Constrained Least Square Restoration
Wiener Filter Approach
Frequency Domain Solution:
F̂(u,v) = [H*(u,v) / (|H(u,v)|² + γ|C(u,v)|²)] G(u,v)
Where:
*H(u,v)**: Complex conjugate of H(u,v)
γ: Regularization parameter
C(u,v): Constraint function in frequency domain
Parameter Selection
Method 1: Noise-to-Signal Ratio
γ = |N(u,v)|² / |F(u,v)|²
Method 2: Generalized Cross-Validation
Automatically selects optimal γ value.
Method 3: L-curve Method
Plot ||Hf̂ - g||² vs ||Cf̂||² and select corner point.
Example Problem: Wiener Filtering
Given degraded image with:
H(u,v) = 1/(1 + 0.1(u² + v²))
Noise-to-signal ratio γ = 0.01
Solution:
F̂(u,v) = [H*(u,v) / (|H(u,v)|² + 0.01)] G(u,v)
For u=1, v=1:
H(1,1) = 1/(1 + 0.1×2) = 1/1.2 = 0.833
|H(1,1)|² = 0.694
F̂(1,1) = [0.833 / (0.694 + 0.01)] G(1,1) = 1.18 G(1,1)
5.6 Restoration by Homomorphic Filtering
Application to Multiplicative Noise
Degradation Model:
g(x,y) = f(x,y) × n(x,y)
Logarithmic Transformation:
ln g(x,y) = ln f(x,y) + ln n(x,y)
Process Steps
1. Take logarithm of degraded image
2. Apply appropriate filter in frequency domain
3. Take inverse transform
4. Take exponential to get restored image
Filter Design:
Low-pass for multiplicative noise in low frequencies
High-pass for multiplicative noise in high frequencies
Gwen Stacy's Art Style Analogy: Just like Gwen's comic book art style separates and processes different
visual elements (colors, lines, shading) independently before combining them into her unique aesthetic,
homomorphic restoration separates multiplicative components for independent processing.
5.7 Geometric Transformation
5.7.1 Spatial Transformation
Forward Mapping
(x', y') = T(x, y)
Problems:
Holes in output image
Overlapping pixels
Irregular sampling
Inverse Mapping (Preferred)
(x, y) = T⁻¹(x', y')
Advantages:
No holes in output
One-to-one correspondence
Better control over output
Common Geometric Distortions
Barrel Distortion
r' = r(1 + k₁r² + k₂r⁴)
Pincushion Distortion
r' = r(1 - k₁r² - k₂r⁴)
Perspective Distortion
x' = (a₁x + a₂y + a₃)/(c₁x + c₂y + 1)
y' = (b₁x + b₂y + b₃)/(c₁x + c₂y + 1)
Parameter Estimation
Using Control Points:
Identify corresponding points in distorted and reference images
Set up system of equations
Solve for transformation parameters using least squares
Example: Affine Transformation Given control points:
(x₁,y₁) → (x₁',y₁')
(x₂,y₂) → (x₂',y₂')
(x₃,y₃) → (x₃',y₃')
Solve system:
[x₁' y₁' 1] [a₁] = [x₁]
[x₂' y₂' 1] [a₂] [x₂]
[x₃' y₃' 1] [a₃] [x₃]
5.8 Gray Level Interpolation
Need for Interpolation
After geometric transformation, mapped coordinates are usually non-integer, requiring interpolation to
determine pixel values.
5.8.1 Nearest Neighbor Interpolation
f(x,y) = f(round(x), round(y))
Advantages:
Simple and fast
Preserves original pixel values
No new gray levels introduced
Disadvantages:
Blocky appearance
Poor quality for large transformations
5.8.2 Bilinear Interpolation
Formula:
f(x,y) = f(0,0)(1-x)(1-y) + f(1,0)x(1-y) + f(0,1)(1-x)y + f(1,1)xy
Where (x,y) are fractional parts of coordinates.
Example Problem: Interpolate at (1.3, 2.7) given:
f(1,2) = 100
f(2,2) = 120
f(1,3) = 110
f(2,3) = 130
Solution:
x = 0.3, y = 0.7
f(1.3,2.7) = 100×(1-0.3)×(1-0.7) + 120×0.3×(1-0.7) + 110×(1-0.3)×0.7 + 130×0.3×0.7
f(1.3,2.7) = 100×0.7×0.3 + 120×0.3×0.3 + 110×0.7×0.7 + 130×0.3×0.7
f(1.3,2.7) = 21 + 10.8 + 53.9 + 27.3 = 113
5.8.3 Bicubic Interpolation
Uses 4×4 neighborhood:
f(x,y) = ∑∑ aᵢⱼ xⁱ yʲ
i=0 j=0
Advantages:
Smoother results than bilinear
Better edge preservation
Higher computational cost justified for quality
Disadvantages:
More complex computation
May introduce slight overshoot
Interpolation Comparison
Method Neighborhood Quality Speed Applications
Nearest 1×1 Poor Fastest Binary images, quick preview
Bilinear 2×2 Good Fast General purpose
Bicubic 4×4 Best Slow High-quality applications
5.9 Advanced Restoration Techniques
Iterative Restoration
Van Cittert Algorithm:
f^(k+1) = f^(k) + β[g - Hf^(k)]
Landweber Algorithm:
f^(k+1) = f^(k) + βH^T[g - Hf^(k)]
Blind Deconvolution
Problem: Both f(x,y) and h(x,y) are unknown.
Approaches:
Maximum likelihood estimation
Bayesian methods
Neural network approaches
🧠 Module 5 Quick Quiz
Q1: What is the main difference between image enhancement and restoration? Q2: Why is inverse
filtering often problematic? Q3: Which interpolation method provides the best quality?
Answers:
1. Enhancement is subjective, restoration is objective with mathematical models
2. Noise amplification and possible non-invertible degradation matrix
3. Bicubic interpolation (uses 4×4 neighborhood)
🎯 Exam Tips for Module 5
Degradation model equations - fundamental for all restoration methods
Wiener filter derivation - understand the mathematical foundation
Interpolation calculations - practice with numerical examples
Geometric transformation matrices - know the standard forms
Likely Exam Questions:
1. Derive the Wiener filter and explain its significance. (10 marks)
2. Compare different interpolation methods with examples. (8 marks)
3. Explain geometric distortion correction process. (7 marks)
4. Solve a bilinear interpolation numerical problem. (5 marks)
Module 6: Image Segmentation [7L]
Image Processing Module 6: Image Segmentation
Comprehensive BTech Notes
🎯 MODULE OVERVIEW
Image Segmentation is the process of partitioning a digital image into multiple segments or regions to
simplify and/or change the representation of an image into something more meaningful and easier to
analyze.
Harvey Specter Analogy: Think of image segmentation like Harvey dividing a complex legal case into
smaller, manageable parts - each segment represents a distinct region with similar characteristics, making
the overall analysis much more efficient!
📚 DETAILED CONTENT
1. POINT DETECTION
1.1 Definition and Concept
Point Detection is the process of identifying isolated points in an image that differ significantly from
their surroundings. These points are characterized by having intensity values that are distinctly different
from their neighbors.
1.2 Mathematical Foundation
The point detection mask (Laplacian operator) is typically:
-1 -1 -1
-1 8 -1
-1 -1 -1
Formula:
R = |∑∑ w(s,t) × f(x+s, y+t)|
Where:
R = Response at point (x,y)
w(s,t) = Mask coefficient
f(x+s, y+t) = Image intensity
Threshold Condition: Point is detected if R ≥ T (threshold)
1.3 Numerical Example
Problem: Detect points in a 3×3 image region:
Image Region:
10 10 10
10 50 10
10 10 10
Solution:
Response R = |-1(10) + -1(10) + -1(10) + -1(10) + 8(50) + -1(10) + -1(10) + -1(10) + -1(10)|
R = |-10 - 10 - 10 - 10 + 400 - 10 - 10 - 10 - 10|
R = |320| = 320
If threshold T = 100, then R > T, so point is detected!
1.4 Applications
Medical imaging (tumor detection)
Quality control in manufacturing
Astronomical image analysis
Defect detection in materials
1.5 Marvel Analogy
Point detection is like Nick Fury's ability to spot anomalies in S.H.I.E.L.D. operations - identifying single
unusual events that stand out from normal patterns!
2. LINE DETECTION
2.1 Definition and Types
Line Detection identifies linear features in images using directional masks that respond maximally to
lines in specific orientations.
2.2 Standard Line Detection Masks
Horizontal Lines:
-1 -1 -1
2 2 2
-1 -1 -1
Vertical Lines:
-1 2 -1
-1 2 -1
-1 2 -1
+45° Diagonal:
-1 -1 2
-1 2 -1
2 -1 -1
-45° Diagonal:
2 -1 -1
-1 2 -1
-1 -1 2
2.3 Mathematical Approach
For each mask Mi (i = 1,2,3,4):
Ri = |∑∑ Mi(s,t) × f(x+s, y+t)|
Line Direction: Direction corresponding to max(R1, R2, R3, R4)
2.4 Numerical Example
Problem: Detect line orientation in a 3×3 region:
Image:
5 5 5
20 20 20
5 5 5
Solution using Horizontal Mask:
R1 = |-1(5) + -1(5) + -1(5) + 2(20) + 2(20) + 2(20) + -1(5) + -1(5) + -1(5)|
R1 = |-15 + 120 - 15| = |90| = 90
This gives maximum response for horizontal mask, indicating a horizontal line!
2.5 Applications
Road detection in satellite imagery
Document analysis (text line detection)
Industrial inspection
Medical imaging (blood vessel detection)
2.6 Suits Reference
Line detection is like Mike Ross analyzing legal documents - he can quickly identify different types of
arguments (horizontal = facts, vertical = precedents, diagonal = connections) in complex legal texts!
3. EDGE DETECTION
3.1 Fundamental Concepts
Edge is a significant local change in image intensity, usually associated with:
Discontinuities in depth
Surface orientation changes
Material property variations
Illumination changes
3.2 Types of Edges
Step Edge: Abrupt intensity change
Intensity Profile: _____|‾‾‾‾‾
Ramp Edge: Gradual intensity change
Intensity Profile: ____/‾‾‾‾‾
Roof Edge: Peak or valley
Intensity Profile: ___/\___
3.3 Edge Detection Operators
3.3.1 Gradient-Based Operators
First-Order Derivatives:
Sobel Operator:
Gx = [-1 0 1] Gy = [-1 -2 -1]
[-2 0 2] [ 0 0 0]
[-1 0 1] [ 1 2 1]
Prewitt Operator:
Gx = [-1 0 1] Gy = [-1 -1 -1]
[-1 0 1] [ 0 0 0]
[-1 0 1] [ 1 1 1]
Roberts Cross-Gradient:
Gx = [ 1 0] Gy = [ 0 1]
[ 0 -1] [-1 0]
Gradient Magnitude:
|∇f| = √(Gx² + Gy²) ≈ |Gx| + |Gy|
Gradient Direction:
θ = arctan(Gy/Gx)
3.3.2 Second-Order Derivatives
Laplacian Operator:
∇²f = ∂²f/∂x² + ∂²f/∂y²
Discrete Laplacian Masks:
Standard: Diagonal:
0 -1 0 -1 -1 -1
-1 4 -1 -1 8 -1
0 -1 0 -1 -1 -1
3.4 Advanced Edge Detectors
3.4.1 Canny Edge Detector
Steps:
1. Noise Reduction: Apply Gaussian filter
2. Gradient Calculation: Find intensity gradients
3. Non-Maximum Suppression: Thin edges to single pixels
4. Double Thresholding: Use high and low thresholds
5. Edge Tracking: Connect edge pixels
Mathematical Foundation:
Gaussian Filter: G(x,y) = (1/2πσ²)e^(-(x²+y²)/2σ²)
Gradient: ∇f = [Gx, Gy]
Non-max suppression along gradient direction
3.4.2 Marr-Hildreth (LoG) Detector
Concept: Uses Laplacian of Gaussian
LoG(x,y) = -(1/πσ⁴)[1 - (x²+y²)/2σ²]e^(-(x²+y²)/2σ²)
3.5 Comprehensive Numerical Example
Problem: Apply Sobel operator to detect edges:
Image Region:
100 100 100
100 100 100
200 200 200
Solution: Gx Calculation:
Gx = -1(100) + 0(100) + 1(100) + -2(100) + 0(100) + 2(100) + -1(200) + 0(200) + 1(200)
= -100 + 0 + 100 - 200 + 0 + 200 - 200 + 0 + 200
= 0
Gy Calculation:
Gy = -1(100) + -2(100) + -1(100) + 0(100) + 0(100) + 0(100) + 1(200) + 2(200) + 1(200)
= -100 - 200 - 100 + 0 + 0 + 0 + 200 + 400 + 200
= 400
Edge Magnitude: |∇f| = √(0² + 400²) = 400 Edge Direction: θ = arctan(400/0) = 90° (vertical edge)
3.6 Applications
Object recognition and tracking
Medical image analysis
Autonomous vehicle navigation
Quality control in manufacturing
Fingerprint analysis
4. COMBINED DETECTION
4.1 Definition and Approach
Combined Detection integrates multiple detection methods to achieve more robust and accurate results
than individual techniques.
4.2 Combination Strategies
4.2.1 Logical Combinations
AND Operation: Edge detected only if ALL methods agree
OR Operation: Edge detected if ANY method agrees
Majority Voting: Edge detected if majority of methods agree
4.2.2 Weighted Combinations
Combined_Response = w1×R1 + w2×R2 + w3×R3
Where wi are weights and Ri are individual responses.
4.3 Multi-Scale Detection
Combines responses at different scales using various filter sizes:
Final_Edge = α×Edge_small + β×Edge_medium + γ×Edge_large
4.4 Iron Man Analogy
Combined detection is like Tony Stark's FRIDAY AI system - it processes information from multiple sensors
(visual, thermal, electromagnetic) to get a complete picture of threats, just like combining different edge
detectors gives us better edge information!
5. EDGE LINKING & BOUNDARY DETECTION
5.1 Introduction
After edge detection, individual edge pixels need to be connected to form meaningful boundaries. This
process is called Edge Linking.
5.2 Local Processing
5.2.1 Similarity Criteria
Two edge pixels are linked if they satisfy:
1. Magnitude Similarity: |M(p) - M(q)| < T_M
2. Direction Similarity: |α(p) - α(q)| < T_α
Where:
M(p), M(q) = Edge magnitudes at pixels p and q
α(p), α(q) = Edge directions at pixels p and q
T_M, T_α = Thresholds
5.2.2 Local Linking Algorithm
For each edge pixel p:
1. Examine its 8-neighbors
2. Link to neighbors satisfying similarity criteria
3. Mark linked pixels to avoid re-processing
4. Continue until no more linking possible
5.3 Global Processing via Hough Transform
5.3.1 Basic Hough Transform (Lines)
Principle: Transform image space (x,y) to parameter space (ρ,θ)
Line Equation: ρ = x cos θ + y sin θ
Where:
ρ = perpendicular distance from origin to line
θ = angle of perpendicular with x-axis
Algorithm:
1. For each edge pixel (xi, yi)
2. For θ = 0° to 180° (in steps)
3. Calculate ρ = xi cos θ + yi sin θ
4. Increment accumulator A(ρ, θ)
5. Find peaks in accumulator → lines
5.3.2 Generalized Hough Transform
For arbitrary shapes defined by parameters (a1, a2, ..., an):
1. Create accumulator array for all parameters
2. For each edge pixel and orientation
3. Increment all possible parameter combinations
4. Detect peaks in n-dimensional parameter space
5.4 Detailed Numerical Example
Problem: Use Hough Transform to detect a line through points (0,0), (1,1), (2,2)
Solution: For point (0,0):
θ = 0°: ρ = 0×cos(0°) + 0×sin(0°) = 0
θ = 45°: ρ = 0×cos(45°) + 0×sin(45°) = 0
θ = 90°: ρ = 0×cos(90°) + 0×sin(90°) = 0
For point (1,1):
θ = 0°: ρ = 1×cos(0°) + 1×sin(0°) = 1
θ = 45°: ρ = 1×cos(45°) + 1×sin(45°) = √2
θ = 90°: ρ = 1×cos(90°) + 1×sin(90°) = 1
For point (2,2):
θ = 0°: ρ = 2×cos(0°) + 2×sin(0°) = 2
θ = 45°: ρ = 2×cos(45°) + 2×sin(45°) = 2√2
θ = 90°: ρ = 2×cos(90°) + 2×sin(90°) = 2
Peak occurs at θ = 135°, ρ = √2, representing the line y = x.
5.5 Applications
Lane detection in autonomous vehicles
Architectural structure analysis
Document analysis and text line detection
Circle and ellipse detection in medical imaging
6. THRESHOLDING
6.1 Foundation of Thresholding
6.1.1 Basic Concept
Thresholding is the simplest method of image segmentation that creates binary images from grayscale
images.
Mathematical Definition:
g(x,y) = { 1 if f(x,y) ≥ T
{ 0 if f(x,y) < T
Where:
f(x,y) = input grayscale image
g(x,y) = output binary image
T = threshold value
6.1.2 Histogram Analysis
The effectiveness of thresholding depends on the histogram characteristics:
Bimodal Histogram: Two distinct peaks → Easy thresholding
Multimodal Histogram: Multiple peaks → Multiple thresholds needed
Unimodal Histogram: Single peak → Difficult thresholding
6.2 Simple Global Thresholding
6.2.1 Basic Global Thresholding Algorithm
1. Select initial threshold T
2. Segment image using T: G1 (pixels ≤ T), G2 (pixels > T)
3. Calculate mean intensities: μ1 = mean(G1), μ2 = mean(G2)
4. Update threshold: T_new = (μ1 + μ2)/2
5. Repeat steps 2-4 until |T_new - T| < ΔT
6.2.2 Numerical Example
Problem: Find optimal threshold for histogram with pixels: Region 1: 50 pixels with intensity 20-40 (mean
= 30) Region 2: 100 pixels with intensity 80-120 (mean = 100)
Solution:
Initial T = (30 + 100)/2 = 65
Iteration 1:
G1: pixels ≤ 65 (Region 1), μ1 = 30
G2: pixels > 65 (Region 2), μ2 = 100
T_new = (30 + 100)/2 = 65
Since T_new = T, optimal threshold = 65
6.3 Optimal Thresholding (Otsu's Method)
6.3.1 Otsu's Algorithm
Objective: Minimize within-class variance or maximize between-class variance
Mathematical Framework:
σ_w²(T) = ω1(T)σ1²(T) + ω2(T)σ2²(T)
σ_b²(T) = ω1(T)ω2(T)[μ1(T) - μ2(T)]²
Where:
ω1, ω2 = probabilities of two classes
σ1², σ2² = variances of two classes
μ1, μ2 = means of two classes
Optimal Threshold: T* = argmax{σ_b²(T)}
6.3.2 Detailed Otsu's Implementation
For each possible threshold T (0 to L-1):
1. Calculate ω1(T) = Σ(i=0 to T) p(i)
2. Calculate ω2(T) = Σ(i=T+1 to L-1) p(i)
3. Calculate μ1(T) = Σ(i=0 to T) i×p(i)/ω1(T)
4. Calculate μ2(T) = Σ(i=T+1 to L-1) i×p(i)/ω2(T)
5. Calculate σ_b²(T) = ω1(T)ω2(T)[μ1(T) - μ2(T)]²
6. Find T* = argmax{σ_b²(T)}
6.4 Advanced Thresholding Techniques
6.4.1 Adaptive Thresholding
Threshold varies spatially based on local image characteristics:
T(x,y) = mean(neighborhood) - C
or
T(x,y) = weighted_mean(neighborhood) - C
6.4.2 Multi-Level Thresholding
For images with multiple regions:
g(x,y) = { 0 if f(x,y) ≤ T1
{ 1 if T1 < f(x,y) ≤ T2
{ 2 if T2 < f(x,y) ≤ T3
{ ...
6.5 Doctor Strange Analogy
Thresholding is like Doctor Strange using the Time Stone to separate different timelines - he sets a
threshold (decision point) to distinguish between realities where different outcomes occur, just like we set
intensity thresholds to separate objects from background!
7. REGION-ORIENTED SEGMENTATION
7.1 Basic Formulation
7.1.1 Region Definition
A region R in image segmentation must satisfy:
1. Connectivity: All pixels in R are connected
2. Homogeneity: P(R) = TRUE (uniform property)
3. Maximality: P(R ∪ R') = FALSE for any adjacent region R'
7.1.2 Mathematical Formulation
Complete Segmentation: ⋃(i=1 to n) Ri = R (entire image)
Non-overlapping: Ri ∩ Rj = ∅ for i ≠ j
Connected: Ri is connected ∀i
Uniform: P(Ri) = TRUE ∀i
Distinct: P(Ri ∪ Rj) = FALSE for adjacent regions
7.2 Region Growing by Pixel Aggregation
7.2.1 Basic Algorithm
1. Select seed points (manually or automatically)
2. For each seed point:
a. Examine neighbors of current region
b. Add pixels satisfying similarity criteria
c. Update region properties
d. Repeat until no more pixels can be added
3. Merge or split regions based on criteria
7.2.2 Similarity Criteria
Intensity-based:
|f(x,y) - μR| < T
Where μR is the mean intensity of region R.
Statistical-based:
|f(x,y) - μR| < k×σR
Where σR is the standard deviation of region R.
Gradient-based:
|∇f(x,y)| < T_gradient
7.2.3 Detailed Numerical Example
Problem: Apply region growing to a 5×5 image with seed at (2,2):
Image:
10 12 45 47 46
11 10 44 48 45
12 11 43 46 47
45 44 43 44 45
46 45 44 43 44
Solution (using threshold T = 5):
Seed: (2,2) with intensity 43
Step 1: Check neighbors of (2,2)
- (1,2): |10-43| = 33 > 5 ✗
- (3,2): |44-43| = 1 < 5 ✓ → Add to region
- (2,1): |11-43| = 32 > 5 ✗
- (2,3): |46-43| = 3 < 5 ✓ → Add to region
Step 2: Check neighbors of new region pixels
Continue until no more pixels satisfy criteria...
Final Region: {(2,2), (3,2), (2,3), (3,3), (4,2), (4,3), (4,4), ...}
7.3 Region Splitting and Merging
7.3.1 Quadtree-Based Splitting
Algorithm:
Split(Region R):
1. If P(R) = TRUE, stop
2. Else, divide R into 4 quadrants
3. Apply Split() to each quadrant recursively
7.3.2 Merging Algorithm
Merge():
1. For each pair of adjacent regions Ri, Rj:
2. If P(Ri ∪ Rj) = TRUE, merge them
3. Repeat until no more merging possible
7.3.3 Split-and-Merge Algorithm
1. Start with entire image as single region
2. Apply splitting recursively
3. Apply merging to resulting regions
4. Repeat until stability achieved
7.4 Advanced Region Growing Techniques
7.4.1 Seeded Region Growing (SRG)
Uses multiple seeds simultaneously with priority queue:
1. Initialize all seed points in priority queue
2. While queue not empty:
a. Extract pixel with highest priority
b. Assign to most similar neighboring region
c. Add unassigned neighbors to queue
7.4.2 Watershed Segmentation
Treats image as topographic surface:
Catchment Basin: Region around local minimum
Watershed Lines: Boundaries between basins
Flooding Process: Gradual water level increase
Mathematical Model:
Geodesic Distance: dg(p,q) = min{length of path from p to q}
Watershed Transform based on topological distance
7.5 Performance Evaluation Metrics
7.5.1 Quantitative Measures
Uniformity: U = 1 - (σR²/μR²)
Region Contrast: C = |μR1 - μR2|/(μR1 + μR2)
Edge Correspondence: EC = |detected_edges ∩ true_edges|/|true_edges|
7.5.2 Qualitative Assessment
Over-segmentation: Too many small regions
Under-segmentation: Too few large regions
Boundary Accuracy: How well boundaries match ground truth
7.6 Avengers Analogy
Region-oriented segmentation is like how the Avengers assemble - each hero (pixel) joins the team
(region) they're most similar to in abilities and goals. Sometimes teams split when there are conflicts
(splitting), and sometimes separate teams merge when they have common objectives (merging)!
🎯 EXAM-ORIENTED QUICK REFERENCE
Most Important Formulas
1. Point Detection: R = |∑∑ w(s,t) × f(x+s, y+t)|
2. Sobel Gradient: |∇f| = √(Gx² + Gy²)
3. Hough Transform: ρ = x cos θ + y sin θ
4. Basic Thresholding: g(x,y) = {1 if f(x,y) ≥ T; 0 otherwise}
5. Otsu's Criterion: T* = argmax{σ_b²(T)}
Key Concepts to Remember
1. Edge Detection Order: Gradient methods use 1st derivatives, Laplacian uses 2nd derivatives
2. Canny Steps: Gaussian → Gradient → Non-max suppression → Double threshold → Edge tracking
3. Hough Transform: Image space → Parameter space transformation
4. Region Growing: Seed selection → Similarity criteria → Iterative growth
5. Split-Merge: Top-down splitting + Bottom-up merging
📝 PRACTICE PROBLEMS
Problem Set 1: Edge Detection
1. Apply Sobel operator to a given 3×3 image region
2. Compare Roberts, Prewitt, and Sobel operators on the same image
3. Calculate gradient magnitude and direction for edge pixels
Problem Set 2: Hough Transform
1. Find line parameters using Hough transform for given edge points
2. Detect circles using circular Hough transform
3. Analyze accumulator array for multiple lines
Problem Set 3: Thresholding
1. Implement basic global thresholding algorithm
2. Apply Otsu's method step-by-step
3. Compare global vs. adaptive thresholding results
Problem Set 4: Region Segmentation
1. Perform region growing from given seed points
2. Apply split-and-merge algorithm to quadtree structure
3. Evaluate segmentation quality using given metrics
🧠 MEMORY AIDS
Acronyms
CANNY: Convolve, Angle, Non-max, Noise, Yield edges
HOUGH: Historical Occurrences Using Gradient Hypotheses
OTSU: Optimal Threshold Selection Using variance
Visual Memory
Edge Detection: Think of finding coastlines on a map
Region Growing: Like spreading paint from multiple points
Thresholding: Digital switch - ON or OFF based on intensity
⚡ EXAM TIPS
High-Probability Questions
1. Compare different edge detection operators (15-20 marks)
2. Explain Canny edge detector with steps (10-15 marks)
3. Hough transform for line detection (10-12 marks)
4. Otsu's thresholding method (8-10 marks)
5. Region growing algorithm (8-10 marks)
Common Mistakes to Avoid
1. Confusing gradient direction with edge direction
2. Forgetting normalization in Hough transform
3. Missing the iterative nature of optimal thresholding
4. Confusing splitting criteria with merging criteria
5. Not considering connectivity in region definition
Time Management
Numerical Problems: 20-25 minutes each
Algorithm Explanations: 15-20 minutes each
Comparisons: 10-15 minutes each
Quick Definitions: 2-3 minutes each
🎮 QUICK SELF-TEST
Test Your Understanding (2 minutes each)
1. What is the main difference between point detection and edge detection?
2. Why is non-maximum suppression needed in Canny edge detection?
3. How does Hough transform handle noise in edge detection?
4. What makes Otsu's method "optimal" for thresholding?
5. When would you prefer region splitting over region growing?
Challenge Questions (5 minutes each)
1. Design a combined edge detector using Sobel and Laplacian operators
2. Modify Hough transform to detect ellipses instead of circles
3. Develop an adaptive thresholding method for uneven illumination
4. Create a region growing algorithm that handles multiple seed points simultaneously
Answer Key Available in Discussion Sessions!
"In the game of image segmentation, you either segment perfectly or you get noise." - Harvey Specter (if he
were a computer vision expert!)
📋 CHAPTER SUMMARY
Image Segmentation Module covers six major techniques for dividing images into meaningful regions.
From simple point detection to complex region-based methods, each technique serves specific purposes
in computer vision applications. Master the mathematical foundations, understand the algorithms, and
practice numerical problems to excel in exams!
Next Module Preview: Image Representation and Description - How to describe and represent the
regions we've segmented!
Image Processing: The Complete Journey
From Your Phone Camera to AI Magic ✨
🌟 THE BIG PICTURE: Why Image Processing Matters
The Real-World Story
Imagine you're taking a selfie with your friends at a concert. Here's the incredible journey that happens in
milliseconds:
1. 📸 Your Phone Camera → Captures raw light data (Module 1: Introduction)
2. 🔄 Digital Conversion → Transforms light into pixels (Module 2: Digital Image Formation)
3. 🧮 Mathematical Processing → Applies complex algorithms (Module 3: Mathematical Preliminaries)
4. ✨ Enhancement Magic → Makes you look great with filters (Module 4: Image Enhancement)
5. 🔧 Restoration → Removes blur and noise (Module 5: Image Restoration)
6. 🎯 Smart Recognition → Identifies faces, objects, scenes (Module 6: Image Segmentation)
The Tony Stark Perspective
Think of Image Processing like Tony Stark building the Iron Man suit:
Raw Materials (Digital images) → Smart Processing (Algorithms) → Incredible Applications (AI,
Medical diagnosis, Self-driving cars)
🚀 THE COMPLETE IMAGE PROCESSING PIPELINE
Stage 1: "Houston, We Have Pixels!"
[Modules 1-2: Getting Images into Computers]
Your phone camera → Digital sensor → Array of numbers
Real Example: Instagram stories, Snapchat filters, Camera apps
Stage 2: "The Math Behind the Magic"
[Module 3: Mathematical Foundation]
Fourier transforms, filters, mathematical operations
Real Example: JPEG compression, image transmission over internet
Stage 3: "Making Images Beautiful"
[Module 4: Image Enhancement]
Brightness, contrast, sharpening, smoothing
Real Example: Photo editing apps, HDR photography, Night mode
Stage 4: "Fixing What's Broken"
[Module 5: Image Restoration]
Removing blur, noise, distortions
Real Example: Unblurring old photos, enhancing satellite images, CSI-style enhancement
Stage 6: "Teaching Computers to See"
[Module 6: Image Segmentation - Current Module]
Identifying objects, faces, boundaries
Real Example: Face recognition, medical diagnosis, autonomous vehicles
🎭 THE SHERLOCK HOLMES APPROACH TO IMAGE SEGMENTATION
"Elementary, My Dear Watson!"
Image Segmentation is like Sherlock Holmes analyzing a crime scene:
1. Point Detection = Spotting the smallest clues (fingerprints, bullet holes)
2. Line Detection = Following trails and paths (footprints, weapon trajectories)
3. Edge Detection = Identifying object boundaries (where the victim fell, room layouts)
4. Thresholding = Separating important from unimportant evidence
5. Region Segmentation = Grouping related clues into meaningful patterns
Real-World Applications You Use Daily
🤳 Social Media Magic:
Face Detection in your camera app uses edge detection
Background blur in portrait mode uses segmentation
Beauty filters use point and region detection
🚗 Autonomous Vehicles:
Lane detection uses line detection algorithms
Obstacle identification uses edge detection
Traffic sign recognition uses combined detection
🏥 Medical Miracles:
Tumor detection in MRI scans uses region segmentation
Bone fracture identification uses edge detection
Blood vessel analysis uses line detection
🛒 E-commerce Revolution:
Product search by image uses segmentation
Virtual try-on uses region detection
Quality control in manufacturing uses combined detection
Image Processing Module 6: Image Segmentation
Comprehensive BTech Notes
🎯 MODULE OVERVIEW IN THE BIGGER CONTEXT
Image Segmentation is the final detective step in our image processing journey - where we teach
computers to "see" and "understand" what's actually in the image, just like how our brain automatically
separates objects from backgrounds.
The Netflix Analogy: Image segmentation is like Netflix's recommendation algorithm analyzing your
viewing patterns - it groups similar content (pixels) together to understand what you (the computer) are
actually looking at!
Why This Module Matters: This is where all previous modules come together:
Module 1-2 gave us digital images
Module 3 gave us mathematical tools
Module 4-5 cleaned and enhanced our images
Module 6 (THIS MODULE) makes computers intelligent - identifying objects, faces, tumors, obstacles!
🔄 HOW SEGMENTATION CONNECTS TO YOUR DAILY LIFE
Your Morning Routine Through Image Processing Eyes:
7:00 AM - Your phone's face recognition (uses edge detection + region segmentation) unlocks your
device
7:30 AM - Google Photos automatically tags your friends (combined detection + region growing)
8:00 AM - Your car's backup camera highlights obstacles (thresholding + edge linking)
12:00 PM - Instagram stories add perfect filters (point detection for skin smoothing)
8:00 PM - Netflix recommends movies based on poster analysis (Hough transform for text detection)
10:00 PM - Your fitness tracker counts steps using image analysis of your movement patterns
🎬 THE MARVEL CINEMATIC UNIVERSE OF IMAGE PROCESSING
Each Module is Like a Superhero Origin Story:
Module 1 (Basics) = Captain America - The foundation, simple but essential Module 2 (Digital
Formation) = Iron Man - Technology and engineering basics
Module 3 (Math) = Doctor Strange - Complex mathematical magic Module 4 (Enhancement) = Thor
- Power to improve and strengthen Module 5 (Restoration) = Hulk - Fixing and healing damaged
images Module 6 (Segmentation) = Vision - The ability to truly "see" and understand
And Like Avengers: Endgame, all modules work together for the ultimate goal:
Artificial Intelligence that can see and understand the world like humans do!
"The journey from a simple photograph to artificial intelligence isn't magic - it's mathematics, algorithms,
and the incredible science of Image Processing. Every Instagram filter, every medical diagnosis, every
autonomous car decision starts with the concepts you're learning in these six modules!"
🎯 CAREER IMPACT: Where This Knowledge Takes You
💰 High-Paying Career Paths:
1. Computer Vision Engineer ($120k-200k+) - Tesla, Google, Apple
2. Medical Imaging Specialist ($90k-150k+) - Hospitals, Research labs
3. AI/ML Engineer ($130k-250k+) - Meta, Microsoft, Amazon
4. Robotics Engineer ($100k-180k+) - Boston Dynamics, NASA
5. Digital Image Forensics ($80k-140k+) - FBI, CIA, Corporate security
🚀 Startup Opportunities:
Photo/Video Apps (Next TikTok filter?)
Medical AI Solutions (Cancer detection startups)
Autonomous Vehicle Tech (Self-driving car components)
Security Systems (Smart surveillance)
Agricultural AI (Crop monitoring, pest detection)
📱 FROM YOUR PHONE TO MARS ROVERS: The Same Principles!
The Technology Stack You're Actually Learning:
Your Phone Camera App:
Raw Image → Enhancement → Face Detection → Background Blur → Social Media Magic
↓ ↓ ↓ ↓ ↓
Module 2 Module 4 Module 6 Module 6 All Modules!
Tesla Autopilot:
Camera Feed → Noise Removal → Lane Detection → Object Segmentation → Driving Decisions
↓ ↓ ↓ ↓ ↓
Module 2 Module 5 Module 6 Module 6 AI Magic!
Medical CT Scan:
Raw Scan → Image Restoration → Tumor Detection → Region Analysis → Diagnosis
↓ ↓ ↓ ↓ ↓
Module 2 Module 5 Module 6 Module 6 Save Lives!
🧠 STUDY STRATEGY: Connecting the Dots
The "Connect Four" Approach:
1. Understand WHY before HOW - Every algorithm solves a real problem
2. Visual + Mathematical - Draw diagrams while solving equations
3. Real Examples - Always ask "Where do I see this in real life?"
4. Build Up - Each module builds on previous ones
Memory Palace Technique:
Module 1 = Your bedroom (basics, foundation)
Module 2 = Kitchen (transformation, cooking pixels)
Module 3 = Study room (mathematics, formulas)
Module 4 = Bathroom (enhancement, making things better)
Module 5 = Garage (restoration, fixing things)
Module 6 = Living room (segmentation, where everything comes together)
1. POINT DETECTION
1.1 Definition and Concept