EENG 860 Special Topics: Digital Image Processing
Lecture 10: Image Segmentation
Dr. Ahmadreza Baghaie
Department of Electrical and Computer Engineering
New York Institute of Technology
Spring 2020
Readings: Chapter 10 (sections 10.1-10.3)
1 / 49
How to read: MATH alert! read and experiment!
Table of Content
● Fundamentals of Image Segmentation
● Point Detection
● Line Detection
● Edge Detection
● Image Thresholding
2 / 49
Fundamentals of Image Segmentation
● Assume I is the entire spatial region of an image. Image segmentation is the
process of dividing I into partitions R1, R2, …, Rn such that:
n
1) Union: R i I
i 1
2) Connectivity : Ri is a connected set for i=1,2,3,…,n
3) Disjoint: Ri R j for i j
4) Intra-region similarity: Q(Ri)=TRUE for i=1,2,3,…,n
5) Inter-regions dissimilarity: Q(Ri ∪ Rj) = FLASE for i j
● Here Q(Rk) is a logical predicate defined over the points in set Rk.
3 / 49
Fundamentals of Image Segmentation
● Segmentation algorithms are generally based on two basic categories:
– Discontinuity: boundaries of regions are sufficiently different from each
other and from background → Edge-based segmentation
– Similarity: partitioning an image into regions that are similar according
to some criteria → Region-based segmentation.
4 / 49
Finite Differences
● We saw that local averaging smooths an image. Also we saw intensity
variations can be detected by first and second order derivatives.
● Generally speaking, derivatives in images are done in terms of finite
differences.
● First-order derivative:
– Must be zero in areas of constant intensity;
– Must be non-zero at the onset of an intensity step or ramp;
– Must be non-zero along intensity ramps.
● Second-order derivative:
– Must be zero in areas of constant intensity;
– Must be non-zero at the onset and end of an intensity step or ramp;
– Must be zero along intensity ramps.
5 / 49
Finite Differences
● First-order derivative:
– Forward difference:
– Backward difference:
– Central difference:
● Second-order central difference:
6 / 49
Finite Differences
● First derivative produces thick edges
in images.
● Second derivative has a stronger
response to fine details, such as thin
lines, isolated points and noise.
● Second derivative produces double-
edge response at ramp and step
transitions, useful to locate edges.
● The sign of the second derivative can
be used to see if a transition into an
edge is from dark to light or light to
dark.
7 / 49
Point Detection
● As we saw, second-order derivative can be used for detecting isolated
points.
● We have seen Laplacian before:
2∂ 2 f ∂2 f
∇ f= 2+ 2
∂x ∂ y
● Using a central difference formulation:
∇ 2 f ( x , y )=f ( x+1 , y )+ f ( x−1 , y )+ f ( x , y+1)+ f ( x , y−1)−4 f ( x , y)
8 / 49
Point Detection
● Using the Laplacian operator:
– Compute the Laplacian response at every pixel;
– Take the absolute value of Laplacian response;
– If the result is greater than a threshold,
1 | 2 f ( x, y ) |T
set the pixel to 1. Otherwise, set the g ( x, y )
pixel to 0. 0 otherwise
9 / 49
Line Detection
● Using the second-order derivative results in a stronger filter response and
thinner lines in comparison to first-order derivatives.
10 / 49
Line Detection
● We can define filter kernels to detect lines in specified directions.
●
11 / 49
Edge Detection
● Edge detection is a common approach for image segmentation based on
local intensity changes and discontinuities.
● Three types of edges:
– Step edge: a transition between two intensity levels occurring ideally
over the distance of one pixel;
– Ramp edge: a gradual transition between two intensity levels
– Roof edge: intensities on both sides of the roof are similar, with different
intensity in the middle.
12 / 49
Edge Detection
● The first derivative produces a thick edge profile.
● The second derivative produces a double-edge profile with a sign change;
the point of zero-crossing can be used to locate the edge.
13 / 49
Edge Detection
● What happens if the image is noisy?
● Both first and second derivatives are
sensitive to noise; second derivative is
more sensitive!
● Example, an 8-bit image with additive
Gaussian noise of zero mean and
standard deviation of 0.0, 0.1, 1.0 and
10 intensity levels.
● To eliminate effects of noise, it is
common to smooth the image first
before edge detection.
14 / 49
Image Gradient
● The gradient of image f(x,y) is defined as a two-dimensional column vector:
∂ f (x , y)
●
g ( x , y)
∇ f ( x , y)=grad [f ( x , y )]= x
[
gy(x , y)
The magnitude of the gradient vector:
=
] [ ]
∂x
∂ f (x , y)
∂y
M ( x , y )=‖∇ f ‖= √(g2x ( x , y)+g 2y ( x , y))
● The direction of the gradient vector:
g ( x , y)
α ( x , y)=tan−1 y
[
g x( x , y ) ]
15 / 49
Image Gradient - Example
16 / 49
Image Gradient
● The previous kernels are only able to differentiate between horizontal and
vertical edges.
● It is possible to create directional Prewitt and Sobel masks.
17 / 49
Image Gradient
● Kirsch compass kernels are designed to detect edge magnitude and
direction in all eight compass directions.
● Instead of using the equations for magnitude and direction of the edges that
were described before, Kirsch’s approach is to determine the edge
magnitude by convolution of the image with all eight kernels and assign the
edge magnitude at a point as the response of the kernel that gave the
strongest convolution values at that point.
● The edge direction at a point is the direction associated with the kernel with
the strongest convolution response.
18 / 49
Canny Edge Detector
● The Canny edge detector is based on three objectives:
– Low error rate.
– Edge points should be well localized.
– Single edge point response.
● Canny edge detector steps:
1) Smooth the input image using a Gaussian filter;
2) Compute the gradient magnitude and direction;
3) Apply non-maximal suppression to the gradient magnitude;
4) Use double-thresholding and connectivity analysis to detect and link
edges.
19 / 49
Canny Edge Detector
● Step 3, non-maximal suppression is for
thinning the gradient magnitude near edges.
● Assuming gN(x,y) as the resulting thinned
gradient image, non-maximal suppression is
as follows:
1) Find the direction closest to the gradient
at location (x,y);
2) If M(x,y) is less than the gradient of one
of the two neighbors in that direction,
set gN(x,y)=0 (suppression); otherwise
gN(x,y)=M(x,y).
20 / 49
Canny Edge Detector
● In step 4, we apply double-thresholding and connectivity analysis:
– Define two threshold values: TH and TL
– Generate two binary images:
g NH ( x , y )=( g N ( x , y )≥T H )
⇒ g NL ( x , y )=g NL ( x , y)−g NH ( x , y )
g NL ( x , y )=(g N ( x , y )≥T L )
– gNH(x,y) contains strong edges, and gNL(x,y) contains weak edges.
● Steps:
a) Locate the next un-visited edge pixel p in gNH(x,y);
b) Mark as valid edge pixels all the weak pixels in gNL(x,y) that are
connected to p using 8-connectivity.
c) If all non-zero pixels in gNH(x,y) have been visited go to step d. Else,
return to step a.
d) Set to zero all pixels in gNL(x,y) that were not marked as valid edge
pixels. 21 / 49
Canny Edge Detector - Example
● asd
22 / 49
Edge Linking – Local Method
● The result of edge detection is affected by noise, breaks in the edges caused
by non-uniform illumination and other effects.
● Edge detection is commonly followed by edge linking.
● In local edge linking, the gap between two edge points is filled if they have
similar gradient magnitude and direction.
23 / 49
Edge Linking – Global Method
● The common global method for edge linking uses Hough Transform.
● Consider a straight line: y=ax+b; The parameters (a,b) form a parameter
space.
● If you only know an edge point, it corresponds to a line in the parameter
space.
● If you know an edge segment, it corresponds to a point in the parameter
space.
● Adding up all these lines or points in the parameter space results in peaks in
the parameter space. Each peak corresponds to a line in image space.
24 / 49
Edge Linking – Global Method - Example
● In the following example, the two highest peaks in Hough transform generate
two lines in the image space.
25 / 49
Image Thresholding
● The general idea behind image thresholding is very simple.
● Given a threshold T, we want to partition the image into two regions:
foreground (object) and background.
1 if f ( x , y)>T
g( x , y )=
{
0 if f ( x , y)≤T
● When T is a constant applied to the entire image, it is called global
thresholding. When the value of T changes over the image, it is called
variable thresholding.
● When the value of T depends of the spatial coordinate (x,y), then it is called
dynamic or adaptive thresholding.
● We can have multiple thresholds for an image:
a if f ( x , y )>T 2
{
g( x , y )= b if T 1 <f ( x , y)≤T 2
c if f ( x , y )≤T 1
26 / 49
Image Thresholding
● By inspecting the histogram of an image, we can have a better idea about
the optimal threshold.
a if f ( x , y )>T 2
g( x , y )=
1 if f ( x , y)>T
{
0 if f ( x , y)≤T {
g( x , y )= b if T 1 <f ( x , y)≤T 2
c if f ( x , y )≤T 1
27 / 49
Image Thresholding - Noise
● asd
●
28 / 49
Image Thresholding – Illumination and Reflectance
● asd
●
29 / 49
Global Image Thresholding
● When the intensity distributions of objects and background pixels are
sufficiently distinct, it is possible to use a single threshold for the whole
image.
● The steps for global image thresholding are as follows:
1) Select an initial estimate for the global threshold T;
2) Segment the image using T; This produces two groups of pixels: G1
consisting of pixels with intensity values greater than T, and G2
consisting of pixels with intensity values less than or equal to T.
3) Compute the average intensity values m1 and m2 for the pixels in G1
and G2, respectively.
4) Compute a new threshold value midway between m1 and m2:
1
T = (m1 +m2 )
2
5) Repeat steps 2 to 4 until the difference between values of T in
successive iterations is smaller than a predefined ΔT. 30 / 49
Global Image Thresholding - Example
● The initial threshold is set as the average intensity of the image.
● After three iterations, the global threshold T=125.
31 / 49
Optimum Global Thresholding – Otsu’s Method
● Thresholding can be considered as a statistical-decision theory problem.
● The objective is to minimize the average error in assigning pixels to two or
more groups or classes.
● Otsu’s method tries to maximize the between-class variance which
measures the statistical discrimination between the classes.
● This is equivalent to minimizing the in-class variance for each group or
class of pixels.
● The derivation of the equations is based on normalized histograms.
● Remember, the normalized histogram of an image of size MxN with intensity
values i=0,1,2,…, L-1:
ni
pi=
MN
L−1
∑ p i=1 p i≥0
i=0 32 / 49
Optimum Global Thresholding – Otsu’s Method
● Assume we select a threshold T(k)=k for 0<k<L-1 to partition the input
image into two classes c1 and c2.
● c1 consists of pixels with intensity values in the range [0, k]; c2 consists of
pixels with intensity values in the range [k+1, L-1].
● The probability of each class can be calculated by:
k L−1
P1 (k )=∑ pi P2 (k )= ∑ pi =1−P1 (k)
i=0 i=k +1
● The mean intensity value of each class is calculated as:
k L−1
1 1
m1 (k)= ∑
P1 (k) i=0
i pi m2 (k )= ∑
P2 (k) i=k +1
i pi
33 / 49
Optimum Global Thresholding – Otsu’s Method
● We can define the cumulative mean (average intensity) up to level k as:
k
m(k )=∑ i pi
i=0
● Also the average intensity of the entire image is given by:
L−1
m G= ∑ i p i
i=0
● Based the previous equations, it is obvious that:
P1 m1 + P2 m2 =mG
P1 + P2 =1
34 / 49
Optimum Global Thresholding – Otsu’s Method
● How to measure the effectiveness of threshold k? We define the normalized,
dimensionless measure of effectiveness as:
σ 2B
η= 2
σG
L−1
2 2
● Global variance: σ =∑ (i−mG ) pi
G
i=0
● Between-class variance: σ 2B = P1 (m1 −mG )2 + P 2 (m2−mG )2
= P1 P 2 (m 1−m2 )2
2
(mG P1 −m)
=
P1 (1−P1 )
35 / 49
Optimum Global Thresholding – Otsu’s Method
● In the previous equations, some of the parameters are functions of threshold
k. Considering that, the effectiveness measure can be written as:
σ 2B (k ) [m P (k )−m(k )] 2
η (k )= 2 and σ 2B = G 1
σG P 1 (k )[1−P 1 (k )]
● The optimum threshold value is k*, which maximizes the between-class
variance.
● To find k* we can evaluate the above equations for all integer values of k.
● If multiple values of k maximize the between-class variance, k* is calculated
by taking the average of all of those values.
● Finally, the thresholding can be done as seen before.
36 / 49
Optimum Global Thresholding – Otsu’s Method
● The algorithm can be summarized as follows:
1. Compute the normalized histogram of the input image.
2. Compute the cumulative sums, P1(k) for k=0,1,2,...,L-1;
3. Compute the cumulative means, m(k) for k=0,1,2,...,L-1;
4. Compute the global mean and the global variance;
5. Compute the between-class variance for k=0,1,2,...,L-1;
6. Find the optimum k* that maximizes the between-class variance. If the
maximum is not unique, obtain k* by averaging the values that result in
maximum between-class variance.
7. Calculate the separability effectiveness measure as the ratio of the
maximum between-class using k* as threshold and the global variance.
37 / 49
Otsu’s Method - Example
● Global method’s threshold is 169; Otsu’s method’s threshold is 182.
38 / 49
Otsu’s Method – Example With Noisy Image
39 / 49
Improving Global Thresholding Using Edges
● Chances of finding good threshold is higher if the histogram peaks are tall,
narrow, symmetric and separated by deep valleys.
● What if only pixels on or near edges between objects and the background
are used to create histograms?
● If only these pixels are used, the resulting histogram would have peaks of
approximately the same height.
● Also the probability that any of those pixels lies on an object would be
approximately equal to the probability that it lies on the background; this
improves the symmetry of histogram modes.
● Using pixels that satisfy some simple measures based on gradient or
Laplacian operators tend to deepen the valley between histogram peaks.
40 / 49
Improving Global Thresholding Using Edges
● Given these, we can follow the following procedure:
1) Compute an edge image using magnitude of gradient or absolute value
of Laplacian of the image f(x,y).
2) Specify a threshold value T.
3) Threshold the image from step 1 using T from step 2 to produce a binary
image gT(x,y).
4) Using gT(x,y) as a mask, compute a histogram using only the pixels in
f(x,y) that correspond to the locations of the 1-valued pixels in gT(x,y).
5) Use the histogram from step 4 to segment f(x,y) globally, using basic
global thresholding method or Otsu’s method.
● Value of T is specified as a high percentile so that few pixels in the
gradient/Laplacian image will be used in the computation.
41 / 49
Improving Global Thresholding Using Edges -
Example
● Without using edge information.
42 / 49
Improving Global Thresholding Using Edges -
Example
● Using edge information, with value of T chosen at the 99.7 percentile.
43 / 49
Improving Global Thresholding Using Edges -
Example
● The value of T is chosen at the 99.5 percentile.
44 / 49
Multiple Thresholds - Example
● Otsu’s method can be extended to generate more than one threshold.
45 / 49
Local (Variable) Thresholding
● In case of noise or non-uniform illumination, local method performs better
than global methods.
Noisy, Iterative
shaded global
image method
Global Local
Otsu’s Otsu’s
method method
46 / 49
Variable Thresholding Using Moving Averages
● The thresholding is done line by line to reduce illumination bias.
47 / 49
What is Next?
● Fundamentals of Color
● Color Models
● Color Image Processing
48 / 49
Questions?
abaghaie@nyit.edu
49 / 49