Chapter 6
A Color Image
Segmentation Algorithm
6.1 Introduction
Image segmentation is a essential but critical component in low–level vision,
image analysis, pattern recognition, and now in robotic systems. Besides, it is
one of the most difficult and challenging tasks in image processing, and deter-
mines the quality of the final results of the image analysis. Intuitively, image
segmentation is the process of dividing an image into different regions such that
each region is homogeneous while not the union of any two adjacent regions.
An additional requirement would be that these regions had a correspondence to
real homogeneous regions belonging to objects in the scene.
The classical broadly–accepted formal definition of image segmentation is
as follows [PP93]. If P(◦) is a homogeneity predicate defined on groups of con-
nected pixels, then the segmentation is a partition of the set I into connected
components or regions {C1 , . . . , Cn } such that
n
[
Ci with Ci ∩ Cj = ∅ , ∀i 6= j (6.1)
i=1
The uniformity predicate P(Ci ) is true for all regions Ci and P(Ci ∪ Cj ) is
false when i 6= j and sets Ci and Cj are neighbors.
Additionally, it is important to remember here that the image segmentation
problem is basically one of psychophysical perception, and therefore not suscep-
tible to a purely analytical solution, according to [FM81]. Maybe that is why,
literally, there are hundreds of segmentation techniques in literature. Never-
theless, to our knowledge, yet no single method can be considered good for all
sort of images and conditions, being most of them created pretty ad hoc for a
particular task. Despite the importance of the subject, there are only several
surveys specific on the image segmentation issue, principally versed on mono-
chrome segmentation [FM81, HS85], giving little space to color segmentation
[PP93, LM99]. For more details, Chapter 3 is completely devoted to review the
state of the art on the segmentation of color images.
Not until recently has color image segmentation attracted more and more
attention mainly due to reasons such as the ones below
118 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
• Color images provide far more information than gray–scale images and
segmentations are more reliable.
• Computational power of available computers has rapidly increased in re-
cent years, being able even for PCs to process color images.
• Handling of huge image databases, which are mainly formed by color im-
ages, as the Internet.
• Outbreak of digital cameras, 3G mobile phones, and video sets in everyday
life.
• Improvement in the sensing capabilities of intelligent systems and ma-
chines.
Most of the segmentation techniques for monochrome images – histogram
thresholding, feature clustering, edge detection, region–based methods, fuzzy
techniques, and neural networks – have been extended to segment color images
by using RGB color coordinates or some of their transformations, either linear
or nonlinear. However, comprehensive surveys on color image segmentation are
still scarce in number [SK94, CJSW01].
Work in [SK94] discussed the properties of several color representations and
a pretty extensive list of segmentation methods were summarized and analyzed,
splitting them into several categories analogous to those already mentioned for
gray–scale images. The list of conclusions in that review are worth to be taken
into account, specially those saying that
• General purpose algorithms are not robust nor always algorithmically ef-
ficient.
• No general advantage in using one specific color space with regard to others
has been found.
• Color constancy is needed to improve effectiveness when combining region
segmentation with color object recognition.
More recently, the review in [CJSW01] provides an up–to–date summary
of color image segmentation techniques available at present, and describes the
properties of different kinds of color representation methods and some of the
problems encountered when applying those models to segment color images.
Some novel approaches such as fuzzy and physics–based methods are discussed
as well in that work. There is an interesting taxonomy of methods and color
spaces with their description, advantages and disadvantages. For more informa-
tion about the issue, the reader should refer to Chapter 3.
In order to propose a useful segmentation algorithm that fits our needs, we
must say that our choice was among the family of graph–theoretical approaches
because of its good mathematical basements and the fact that the segmenta-
tion problem is straightforwardly translated into a graph–partitioning problem
existing lots of different methods to solve it. Nonetheless, the worst disadvan-
tage of this type of framework is, as can be seen in [WL93, VC93, XU97], that
these algorithms are heavy time–consuming, which should prevent us from their
application in (nearly) real–time applications. For this last reason, we chose
Jaume Vergés–Llahı́ MMV
6.2 Outline of the Chapter 119
among the sort of greedy graph–partitioning algorithms, faster than any other
one method in that family, as observed in [FH98a].
In this Chapter we present our color image segmentation algorithm that is
capable of working on diverse color spaces and metrics. This approach has a
nearly linear computational complexity and is based on that in [FH98a] along
with a set of improvements, both theoretical and practical, which amend the
lacks detected in former results. This algorithm has been successfully applied
to segmenting both static images and sequences, where some further enhance-
ments were introduced to achieve more coherent and stabler segmentations of
sequences.
Finally, in this Chapter some results are provided whose aim is to test the
performance of our segmentation in comparison not only with the results at-
tained by the original algorithm in [FH98a], which has been improved, but also
with those obtained by the unsupervised clustering Expectation–Maximization
(EM) algorithm by Figueiredo [FJ02]. EM is one of the most successful clus-
tering methods in recent years1 , and Figueiredo’s version is completely unsu-
pervised, which avoids the problem of selecting the number of components and
does not require a careful initialization. Besides, it overcomes the possibility
of convergence toward a singular estimate, a serious problem in other EM–like
algorithms. We show that our segmentations are fully comparable to those of
the Figueiredo’s EM algorithm, but at the same time and more importantly,
our algorithm is far faster.
6.2 Outline of the Chapter
Next, we summarize the main aspects discussed in each Section of the Chap-
ter. In Section 6.3 we condense the most related former works dealing with the
image segmentation problem using a graph–theoretical approach. Section 6.4
is devoted to extensively analyzing our color segmentation algorithm. Our ap-
proach has been enlarged to cope with sequences in Section 6.5. Thereafter, in
Section 6.6, we reinforce our previous statements with numerous example of im-
age segmentations of static images and sequences, comparing them with those
obtained employing other image segmentation algorithms. Finally, Section 6.7
encompasses our conclusions about the work carried out in this Chapter.
6.3 Related Previous Work
An important set of techniques to segment images are those based on graph
theory. The main idea consists in building an image representation employing a
graph and then applying some graph–theoretical techniques to obtain homoge-
neous connected components which represent regions in the segmented image.
An additional advantage of using graphs is that region–based and edge–based
segmentation are dual problems, being able to achieve close contours from the
segmentation of regions without any further treatment on the image.
1 Another extremely interesting clustering algorithm usually applied to the image segmen-
tation problem is the one based on the mean–shift transformation [CM97, CM99, CM02].
While this one is nonparametric, EM is a parametric method that provides, as a result, a
finite mixture of Gaussian distributions.
Jaume Vergés–Llahı́ MMV
120 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Two different groups of methods can be considered depending on the tech-
nique employed. On the one hand, there exist all those methods that partition
a graph describing the whole image into a set of subgraphs, where there is one
component for each image region. Algorithms differ in the particular way of re-
moving superfluous edges. Next, some graph–partitioning approaches are briefly
described.
The most efficient graph–based algorithms use fixed thresholds and purely
local measures to find regions. For instance, the approach in [Zah71] is based on
breaking larger edges in a minimum spanning tree of the graph. The inadequacy
of removing larger edges is apparent because edge weights within high variability
region tend to be larger than in any other region. This work also developed
several heuristics to address such issues by using models of the distributions of
weights.
A more recent method is that in [WL93] based on the computation of the
minimum cut in the graph representing an image. Originally, this kind of algo-
rithms were used to solve problems of maximum flow between two points – the
source and the drain – connected by paths with a constrained flow capacity,
e.g., water or electric networks. In the case of images, capacities account for the
similarity between components and node connectivity represents pixel neigh-
borhoods. Therefore, the cut criterion is designed to minimize the similarity
between regions that are being split.
This kind of segmentation captures nonlocal properties of the image but
requires more than nearly linear time, in contrast with more efficient methods
described bellow that just employ local information. Other refinements based
on spectral partitioning techniques can be found in [SM97, SBLM98], where a
normalized version of the minimum cut is computed. For a wider review on
these sort of approaches, we refer the reader to [Els97, Fja98].
Another algorithm proposed in [Urq97] uses a measure of local variability to
decide which edge to remove from the graph. This measure is based only on the
nearest neighbors for each point. When this criterion is applied to segmentation
problems, it is claimed that the nearest neighbors alone are not enough to get a
reasonable measure of the whole image variability since they only capture local
properties of the image. This issue is tackled in [FH98a], as will be seen later.
The interesting graph–theoretical work in [Wan98] presents a method to seg-
ment images into partitions with connected components by using computation-
ally inexpensive algorithms for probability simulation and simulated annealing,
such as that of Hastings’s and the generalized Metropolis algorithm. In order
to reduce the computational burden, a hierarchical approximation is proposed,
minimizing at each step a cost function on the space of all possible partitions
into connected components of a graph.
Finally, there are a number of methods that employ more sophisticated mod-
els, such as those based on Markov Random Fields (e.g., [GG84]). However,
these methods tend to be quite inefficient in terms of time. In our opinion, the
two main goals for an image segmentation algorithm are to capture nonlocal
properties of the image and to be efficient to compute, and those algorithms are
far too time–consuming for our purposes.
On the other hand, there is another set of graph–based algorithms that takes
advantage of region–growing methods, being the growing process driven by the
attributes of nodes and edges. Thus, edges are aggregated forming a list of
connected nodes, which likewise form an image component. Edges are selected in
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 121
such a manner they provide homogeneous components. The particular strategy
applied to select edges is what differentiate algorithm one another.
It is important to state that in both kinds of methods numerous works are
found taking advantage of the Minimum Spanning Tree (MST) as a mean to
reduce the inherent algorithmic complexity of the graph–partitioning problem
as well as the one that may appear in region–growing if all node connections are
taken into account. MST captures the minim structure of an image and helps
by its partition or growing to obtain efficient segmentation algorithms in terms
of time and memory.
In [VC93] vertexes which are connected by the smallest edge weight are
afterwards melted by an iterative process. At the end of that process, the list
encompassing the smallest edges at each step forms a spanning tree which is
further split by way of removing the edges with the greatest weights, while
generating a hierarchy of image partitions.
In [XU97] a MST is build up using the Kruskal’s algorithm to find a parti-
tion that minimizes a cost function afterwards. This task is accomplished by a
dynamic approach and diverse heuristics to further reduce the algorithm com-
plexity. The approach in [FH98a] is even more drastic in the use of MSTs since
it combines both region–growing and Kruskal’s routine. Edge aggregation is
driven by a local measure of image variation over arbitrarily large regions in the
image.
Moreover, this approach addresses a major shortcoming of previous graph–
based methods, i.e., the dichotomy between either using efficient (nearly linear
time) algorithms, but avoiding global properties of the image, or capturing
global image properties, but being less efficient. Despite in [SM97] it is argued
that in order to capture nonlocal properties of an image any segmentation al-
gorithm should start with larger regions in the image and then splitting them
progressively, rather than starting with smaller regions and merging them, work
in [FH98a] suggests arguments to the contrary, i.e., a region merging algorithm
based on nonlocal image properties is as well capable of producing segmenta-
tions.
They do so by introducing global definitions of what it means for an image to
be subsegmented or oversegmented based on the aggregation of local intensity
differences. An image is defined to be oversegmented when there is some pair of
regions for which the variation between regions is small relative to the variation
within each region. Besides, an image is subsegmented when there is a way to
split some regions into subregions such that the resulting segmentation is not an
oversegmentation. These definitions could be used along with other measures
of similarity between regions.
The algorithm in [FH98a] satisfies at the same time the two global proper-
ties of neither subsegmenting nor oversegmenting an image accordingly to their
previous definitions. The algorithm runs in nearly linear time of the number of
pixels, and it is really fast in practice. This efficiency is achieved by a bottom–up
process that successively merges smaller components into larger ones.
6.4 Segmentation of Color Images
Due to the speed of the algorithm in [FH98a], it is a good starting point to
develop a fast algorithm for color segmentation that fits the time constraints
Jaume Vergés–Llahı́ MMV
122 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
of mobile robotics. Hence, many novelties have been introduced in our new
approach in order to improve the final results attained by the original algorithm.
The first change we have introduced is the use color differences instead of
independently running an intensity version of the algorithm as many times as
the number of color channels and trying to mix the obtained regions afterwards.
Secondly, we have developed an energy–based approach to control the compo-
nent merging process so as to relax the oversegmentation condition to obtain,
as a consequence, resultant segmentations with fewer regions.
In addition, we have introduced an index to identify all the spurious regions
that appear in segmentation as a result of highly variant regions not corre-
sponding to any actual area in the image. These regions are removed from
segmentation and joined to their closest neighboring component. The overall
coherence at the ending segmentation is improved because the remaining regions
correlate better with their counterparts in the real scene.
Finally, the algorithm has also been extended to cope with images coming
from video sequences in order to maintain their segmentations as stable as pos-
sible through time. Part of the results described in this Chapter have already
been reported in the papers [VLCS00] and [SAA+ 02].
6.4.1 Some Definitions
First of all, we give some basic definitions will help us along this Section. In
our graph–based approach to image segmentation, Undirected Weighted Graphs
(UWG) are used to represent color images. Being V a set of vertexes and E a
set of edges connecting them, an UWG is a graph G = (V, E) defined from the
set of image pixels P = {pi } and the set of their colors I = {cp : ∀p ∈ P} as
follows.
Each pixel p ∈ P corresponds to a vertex v ∈ V to which a neighborhood
Nρ (p) = {q ∈ P | 0 < DP (p, q) ≤ ρ} can be assigned, being DP : P × P → R+ 0
a distance between pixels, usually Euclidean, in image coordinates. Therefore,
the set of edges is defined as E = {epq = (p, q) : ∀q ∈ Nρ (p)}. ρ is the radius of
the neighborhood, in number of pixels. Commonly, ρ = 1.
Therefore, the weight function ω among edges gives a measure of similarity
between two vertexes (pixels) as follows
ω: E −→ R+ 0 (6.2)
epq 7−→ ω (epq ) = DI (cp , cq ) = ωpq
where DI is some distance in a color space. We refer to [WS82, SK94, Fai97]
for a wider review on color coordinates and distances, which will be partially
reviewed later in this Section. Finally, Ω = {ω(e) : ∀e ∈ E} is the set of all
weights of the edge set in G. The following algorithm works on a fixed ordering
Ẽ = (e1 , . . . , en ) such that ω(ei ) ≤ ω(ej ) , ∀i ≤ j, where n = |E|.
A segmentation of G is defined as a subgraph S = (C, FC ) where C = {Ci } is
the set of components forming a partition 2 of V and FC = {FCi } is a canonical
forest. A component Ci is a set of vertexes that are connected one another
by a path of edges of E minimizing the sum of their edge weights. Cp is the
component to where the vertex p belongs.
2A partition of X is a group of subsets {Xi ⊂ X } | X = ∪i Xi and Xi ∩ Xj = ∅ , ∀i 6= j.
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 123
A canonical forest FC is a set of trees where each FCi ∈ FC is a Minimum
Spanning Tree (MST) of Ci ∈ C. The ordering Ẽ provides a way of selecting a
unique MST from the possible minimum weight spanning trees of Ci . We can
now define the set Σ of all the segmentations S of a graph G and an equiva-
lence relation, ≤, of pairs of elements that is reflexive, anti–symmetrical, and
transitive
T ≤ S ⇐⇒ T ∈ R (S) (6.3)
where R (S) = {Q ∈ Σ : ∀C ∈ Q , ∃C 0 ∈ S | C ⊆ C 0 } is a refinement of a
segmentation S ∈ Σ. Put in words, a refinement of a segmentation S is the
set of all other segmentations which have smaller components in a way that
once these components get merged, they generate the same components as in
S. Moreover, the strict inequality can be defined as T < S if and only if T ≤ S
and T 6= S.
The set (S, ≤) is a partially ordered set because the fact that T, T0 ≤ S does
not imply that T ≤ T0 nor T0 ≤ T. Nevertheless, for any two segmentations
T = (C, FC ) and T0 = (C 0 , FC 0 ) it is true that T ∩ T0 ≤ T, T0 and T, T0 ≤ T ∪ T0 .
Schematically
T ∪ T0
% -
T T0 (6.4)
- %
T ∩ T0
where T ∩ T0 = (C ∩ C 0 , FC∩C 0 ) and T ∪ T0 = (C ∪ C 0 , FC∪C 0 ).
The maximum element of (S, ≤) is G = (V, E) and the minimum is Gmin =
(V, ∅), where all components have only one vertex and trees have no edge. If we
follow an algorithm that put together two components C and C 0 at each step
in respect to an edge in Ẽ, the resultant set of graphs at each step will be in
ascendant order in (S, ≤), i.e., from minimum to maximum, forming a chain
Π : Gmin = S1 ≤ . . . ≤ Sn = G (6.5)
This is the case of greedy algorithms such as the Kruskal’s minimum span-
ning tree algorithm and also that in [FH98a].
6.4.2 Algorithm Analysis
Now in this Section we translate the segmentation of an image I into the problem
of finding a proper segmentation S from a graph G among the set of all possible
segmentations in Σ. As a starting point, we follow the approach in [FH98a],
where a segmentation is sought that fulfills a global property by only carrying
out a local search. As mentioned, this approach takes advantage of a greedy
algorithm that obeys the previous definitions of what is considered to be an
oversegmented and a subsegmented image. The process keeps merging regions
until segmentations which are neither oversegmented nor subsegmented are at-
tained. Ideally, this should occur in an intermediate case corresponding to the
notion of having neither too many nor too few components in a segmentation.
Intuitively, an image is oversegmented when there are still too many com-
ponents that could be further merged into bigger regions. Consequently, the
algorithm should grow components until the image failed to be oversegmented,
Jaume Vergés–Llahı́ MMV
124 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
that is, whenever merging more components were a likely error. Hence, an
image is no more oversegmented if the differences between any two adjacent
components are greater than their differences within
S ∈ Σ is NOT oversegmented if
∀Ci , Cj ∈ S
adjacent =⇒ Dif (Ci , Cj ) > Hom (Ci , Cj ) (6.6)
Ci 6= Cj
where Dif (◦, ◦) is a function measuring the difference between two adjacent
components and Hom(◦, ◦) accounts for the internal homogeneity of both com-
ponents. Be ΣcOS ⊂ Σ the set of all graphs observing Eq. (6.6)3 . If T0 ∈ Σ is the
greatest segmentation in the chain Π being oversegmented, then we can rewrite
ΣcOS in an intervalwise manner as ΣcOS = (T0 , G) = {T ∈ Σ : T0 < T < G}.
In a similar way, an image is subsegmented whenever region–growing has
gone too far and there are too few components left. This implies that too differ-
ent components have been erroneously joined. Therefore, an image will not be
subsegmented if there exists a proper refinement which is neither oversegmented,
meaning that a smaller segmentation can be still found fulfilling Eq. (6.6).
Hence, we can take as an interval the set (Gmin , S) = {T ∈ Σ : Gmin < T < S}
of all proper segmentations smaller than S. So, we get that
S ∈ Σ is NOT subsegmented if
(Gmin , S) ∩ ΣcOS 6= ∅ (6.7)
The algorithm proposed by Felzenszwalb&Huttenlocher in [FH98a] – F&H’s
algorithm, from now on –, which is a modification of the Kruskal’s algorithm
to compute minimum spanning trees, used the two criteria above to control the
segmentation process. Moreover, it was proved that the resulting segmentations
were unique, that is, for a particular image the process always ends at the same
segmentation and follows the same chain of segmentations Π.
Nevertheless, what is important in this algorithm is the fact that the seg-
mentation process takes decisions based on local properties of the image, such as
pixel differences, and, yet, the resulting segmentation reflects global properties
of the image since both oversegmentation and subsegmentation are global image
features.
However, we are convinced that these constraints are still too restrictive,
which causes aggregation to stop prematurely, giving as a result a class of seg-
mentations with too many components for our purposes. Our approach on the
forthcoming Sections touches upon these defects both in a theoretical and a
practical manner, as explained straight away.
6.4.3 Theoretical Approach
Stating the fact that the F&H’s algorithm causes a resultant segmentation S
as soon as both previous constraints are fulfilled and that any two successive
3Σ
OS is the set of all oversegmented graphs and ΣcOS is its complement.
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 125
segmentations Si and Si+1 accomplish that Si ≤ Si+1 , we deduce that the
algorithm will stop whenever
(Gmin , S) ∩ (T0 , G) 6= ∅ ⇐⇒ T0 < S (6.8)
This means that the F&H’s algorithm stops at the first segmentation S that
is not oversegmented, which is in some way quite arbitrary and restrictive since
the segmentation S usually has too many components in practice, i.e., it is still
oversegmented for our proposes.
If the nonoversegmented criterion were relaxed, it would be possible to attain
segmentations S0 with fewer components, i.e., S ≤ S0 . In case S0 were still
oversegmented, again the algorithm would follow the aggregation until another
nonoversegmented S00 appeared, i.e., S0 ≤ S00 . Otherwise, we could deter the
constraint again or just stop at that segmentation, which would be effectively
greater than S and nonoversegmented, as expected.
Nevertheless, oversegmentation can not be pushed too far since as regions
grow, so do their internal dissimilarities, which are more than likely to surpass
their mutual differences. This would cause the nonoversegmented condition not
to be satisfied once a point of no return were crossed, in view of the fact that
aggregation would keep on until only one region remained. So, in practice, the
interval ΣcOS would be (T0 , T1 ) and a resulting segmentation should be obtained
before T1 were dangerously too close to G.
In order to manage this leap over the constraints while avoiding the prob-
lem of going too far, we first reformulated the nonoversegmented criterion as a
problem controlled by an energy function U in the following way
S ∈ Σ is NOT oversegmented if
∀ Ci , Cj ∈ S, adjacent, and Ci 6= Cj =⇒ ∆US→S0 > 0 (6.9)
where S ≤ S0 . ∆US→S0 stands for the energy of the system involved in the
transition between two consecutive segmentations S and S0 . If the transition
is done by joining components Ci and Cj together, we note this as ∆US→S0 =
∆U (Ci ∪ Cj ). In the case of F&H’s, we get that
∆U (Ci ∪ Cj ) = Dif (Ci , Cj ) − Hom (Ci , Cj ) (6.10)
where Dif (◦, ◦) increases as regions grow while Hom(◦, ◦) tends to fall along
the segmentation because components differentiate each other more and more
as they propagate. Those functions are based on local information provided by
edges in Ẽ, which is not modified once computed at the starting point because
of the greediness of that approach.
The merging step of the algorithm employs the following aggregation condi-
tion. At any step k, two components merge if the edge ek = eij ∈ Ẽ connecting
them fulfills that
Cik−1 6= Cjk−1 and ∆U Cik−1 ∪ Cjk−1 ≤ 0
(6.11)
then, at step k, segmentation Sk has a new component formed by
Cik−1 ∪ Cjk−1 and FC k−1 ∪ FC k−1 ∪ ek
(6.12)
i j
Jaume Vergés–Llahı́ MMV
126 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Now a condition is needed to be fulfilled for any energy difference ∆U that
will make possible to attain global properties of images by means of a greedy
algorithm, which is only capable of tracing local features. If for any discarded 4
edge ek = eij 6∈ S such that Ci 6= Cj occurring at position k in the ordering,
with Cik−1 ⊆ Ci and Cjk−1 ⊆ Cj , it is true that
∆U Cir−1 ∪ Cjr−1 > 0 , ∀er = eij 6∈ S with r ≥ k
(6.13)
then, the segmentation produced by conditions in Eq. (6.11) is also nonoverseg-
mented in terms of Eq. (6.9) because ∆U (Ci ∪ Cj ) > 0 for any pair of adjacent
components.
That is to say, if at a point k two adjacent components do not merge because
of their mutual differences, these components will any longer be as similar as
to be put together in the final segmentation. Otherwise, it would mean that
somewhere in the segmentation process the two regions started to resemble.
If using Eq. (6.10), where ∆U rises accordingly to edge values, it is proven
in [FH98a] that Cik−1 = Ci and Cjk−1 = Cj , which satisfies the above condi-
tion. Hence, any other energy function should act similarly in order to provide
nonoversegmentations.
The energy–based approach makes possible to introduce the probability of
an event S → S 0 , namely, the union of two adjacent components Ci ∪ Cj , in
a similar way as it is computed in a simulated annealing process using the
Metropolis dynamics [Wan98]
max{∆U (Ci ∪ Cj ) , 0}
P r (Ci ∪ Cj ) = exp − (6.14)
t
If ∆U (Ci ∪ Cj ) ≤ 0 then P r (Ci ∪ Cj ) = 1. Otherwise, P r (Ci ∪ Cj ) is
compared to a random number to decide whether or not to joint.
The probability thus computed is employed as a condition in Eq. (6.11) to
decide whether to merge two components. As a result, it is possible to find other
nonoversegmentations S0 such that S ≤ S0 . Since it is a probabilistic scheme,
Eq. (6.13) may not be guaranteed to be always fulfilled. Nevertheless, in each
step, the energy needed to break through the constraint is greater so the leap
is less likely, being practically impossible from certain point on which satisfies
Eq. (6.13). Besides, width of the interval (T0 , T1 ) can be selected by tuning the
temperature t. Consequently, at the end we always get segmentations which are
neither oversegmented nor subsegmented, as desired.
6.4.4 Practical Approach
It is time to further specify functions Dif (Ci , Cj ) and Hom(Ci , Cj ) in terms
of edge weights. As said, Dif (Ci , Cj ) accounts for the difference between two
adjacent components and is defined as the lowest weight edge connecting them
Dif (C, C 0 ) = min {ω (eij ) : eij = (vi , vj ) ∈ E} (6.15)
vi ∈C
vj ∈C 0
On the other hand, function Hom(◦, ◦) measures the internal homogeneity
of the two components as the lowest value for the variation within, that is,
Hom(C, C 0 ) = min {Int(C), Int(C 0 )} (6.16)
4 Not fulfilling conditions in Eq. (6.11).
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 127
The inner variation of a component is taken as the highest edge weight in
any minimum spanning tree of that component
Int(C) = max {ω(e)} (6.17)
∀e∈FC
The use of such a function Int(C) has, indeed, some problems [FH98a].
Due to the fact that a component C will not grow for any edge e such that
ω(e) ≥ Int(C), and since Int(C) ≥ ω(e0 ), ∀e0 ∈ FC , it is only possible that all
the edges in FC have the same weight ω(e) = Int(C). Given that the first edge
value is 0, regions can not grow beyond this value because ω(e) > Int(C) = 0
for any edge left in FC .
To solve this defect in such a way that function Int(C) be greater in small
components whereas decreases as components grow, a better version for the
function Int(C) is
τ
Int(C) = max {ω(e)} + (6.18)
∀e∈FC |C|
This function overestimates the internal variation of components when they
are small. Despite helping homogeneous regions to grow, it artificially increases
the internal variation of regions with an already great variation, such as borders
and textured regions. Hence, some spurious regions may appear having no
correspondence to actual regions of homogeneous color, rather than to high
variable and textured regions.
To cope with a pernicious effect that might helplessly increase the number
of segments, we identify all those pixels belonging to these regions by means
of an index IC computed for every region C. Only spurious border regions are
taken under consideration since most of texture is eliminated using a proper
smoothing filter. Index IC accounts for the shape of the region, the amount of
variability, and its size. Therefore, it is directly proportional to the compactness
of the region KC and to the maximum internal variation max{ω}, and inversely
proportional to its area |C|, i.e.,
KC · max∀e∈FC {ω (e)}
IC = (6.19)
|C|
Once all those regions get identified, their pixels are randomly distributed
into the adjacent components with most neighboring pixels. Hence, if the set
of all neighbor components to pixel p is defined as Np = {Cq ∈ C : (p, q) ∈ E},
pixel p will be added to component C 0 if and only if
C 0 = argmax { |N (p) ∩ C| } (6.20)
∀C∈Np
If the number of spurious pixels is too big this step can cause some distortions
to region borders. Hence, in order to have as few spurious pixels as possible it
might be sensible to temporarily deter the oversegmentation constraint, granting
that, at least for ω(ek ) ≤ thr the aggregation be freely done. The combination of
these two heuristics make possible to grow homogeneous regions, while reducing
the population of spurious regions.
Jaume Vergés–Llahı́ MMV
128 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
6.4.5 Algorithm Sketch
Finally, if all those considerations are put together in a proper way, we accom-
plish an algorithm capable of segmenting color images based on a greedy algo-
rithm which computes the minimum spanning tree of an undirected weighted
graph encompassing the differences between the colors of any pair of neighboring
pixels as edge weights. The segmentation thus obtained is a subgraph Sn ⊂ G.
The sketch for the whole algorithm is considered hereafter.
1. Sort edges in E into an ordering Ẽ = e1 , . . . , en , where n = |E|, by
nondecreasing edge weights ω(ek ).
2. Start segmentation with S0 = Gmin and k = 0.
3. Blind aggregation while ω(ek ) ≤ thr1 . Nonoversegmentation condition is
deterred and components grow freely.
4. Repeat step 5 and 6 for thr1 < ω(ek ) ≤ ω(en ).
5. Select a random number ν ∈ [0, 1]
6. Construct Sk from previous segmentation Sk−1 . If edge ek = eij connects
two components such that
Cik−1 6= Cjk−1 and P r Cik−1 ∪ Cjk−1 > ν
(6.21)
then Sk is computed using Eq. (6.12). Otherwise, ek is rejected to compute
the contour image afterwards. Probability is computed with Eq. (6.14).
7. Compute index IC for each component applying Eq. (6.19). Regions with
IC > thr2 are labeled as spurious components.
8. Distribute pixels belonging to spurious components to neighboring regions
applying the heuristic in Eq. (6.20).
Both thr1 and thr2 are thresholds provided by the user controlling blind
aggregation and spurious regions identification, respectively. Two more para-
meters are needed in order to put the routine to work, namely, growing threshold
τ and temperature t. Generally, both thr1 and t are maintained constant, while
the result is controlled by tuning parameters thr2 and τ .
The implementation maintains the segmentation using a disjoint–set forest
with union by rank and path compression as the original Kruskal’s algorithm
in [CCLR01]. The running time for the algorithm can be split into three parts.
First, in Step 1 it is necessary to sort the weights into a nondecreasing ordering.
Since the weights are continuous values we used the bucket sort algorithm, which
requires a O(n) time, being n = |E| the number of edges. Steps 2 to 6 of the
algorithm take a time complexity of O(n α(n)), where α is the very slow–growing
Ackerman’s function [CCLR01]. This is because the number of edges is O(n)
since the neighborhood size δ is constant. Finally, Step 7 and 8 are O(m),
where m ≤ n is the number of pixels in spurious components. To determine
those pixels, the set of discarded edges is employed, which is easily available
from Step 6. At the end, pixel redistribution is done in a raster way simulating
a random assignment to speed up the process.
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 129
6.4.6 Color Spaces and Distances
The world of color spaces and metrics is far wider than one could imagine at
first glance. There are literally dozens of them, usually in a straight relation
to their specific use. Thus, there are color spaces for the fabric industry, paper
industry, press, psychology, television, computers, physics, and even for foods.
Despite the numerous efforts to find a definitive one, there is no single all-terrain
color space nor even a simple way to compare colors valid enough to everyone.
Here, we are not going to rehash them all over again, not even some of them.
Ars longa, vita brevis. We just summarize those found essential for our interests
and means, basically digitalized color images given in RGB coordinates. For a
more extensive study on color, we suggest Wyszecki and Stiles’ book [WS82].
In case this is too much and only a slight coat of paint is needed, work in [SK94]
would suffice. For the latest knowledge on color models, have a look into [Fai97].
RGB
These are the color coordinates provided by most capture and imaging sets
nowadays. They consist basically in the sensor response to a set of filters, as
explained in Chapter 4. Those filters are an artificial counterpart of the human
mechanism of color perception and reproduction of most colors can be achieved
by modulating three channels roughly corresponding to colors red, green, and
blue.
The natural way to compare two colors would be the use of the Euclidean
distance. Thus p
∆C = ∆R2 + ∆G2 + ∆B 2 (6.22)
Nevertheless, some problems rise when trying to emulate the human judge-
ment of color differences. First, we are more sensitive to some colors than others,
which means that for them our sense of difference is finer. This is not the case
when using the above distance. Moreover, some color changes affects differently
on some areas of the color space. Nonetheless, since the Euclidean distance is
homogeneous and isotropic for the RGB color space, the aforementioned kind
of nuances in the differences between colors can not be reproduced.
Next, we consider three possible alternatives coping with those difficulties,
namely, HSI, Lab, and Luv color spaces. All of them try to translate the human
perception of color into figures. Besides, both Lab and Luv aspire to define a
space where the Euclidean metric can be used straight away to estimate subtler
color differences.
In addition to these approaches, there also exists a number of other works on
color representation being the most important among them those of Smeulders
and Gevers [GS99, GBSG01]. The authors try to generate there a set of color
invariants by all sort of derivatives of a fundamental color invariant extracted
from certain reflectance model. We are not considering those endeavors in our
work because their involvement limits a practical application as well as results
only show their performance on a pretty small set of images of too unrealistic
and homogeneous objects.
Our greatest objection to these class of invariants, however, has to do with
the way a given color is transformed independently of what happens in the
rest of the color space and of the illuminant conditions that produced such
measure. As a consequence, the invariant will always produce the same result
Jaume Vergés–Llahı́ MMV
130 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
for the same input no matter this color comes from two different surfaces under
different light conditions which happen to coincide in this color at least. This
problem is usually referred to as metamerism and is greatly reduced if the whole
set of colors is considered instead.
HSI
There are many color models based on human color perception or, at least,
trying to do so. Such models want to divide color into a set of coordinates
decorrelating human impressions such as hue, saturation, and intensity. Next
expressions compute those values from raw sensor RGB quantities [SK94]
1
I= 3 (R + G + B)
min{R,G,B}
S =1− I (6.23)
√
3(G−B)
H = arctan 2R−G−B
I models the intensity of a color, i.e., its position in the gray diagonal 5 .
Saturation S accounts for the distance to a pure white with the same intensity,
that is, to the closest point in the gray diagonal. H is an angle representing just a
single color without any nuance, i.e., naked from its intensity or vividness. Some
approaches erroneously to our taste use the Euclidean directly to compute color
differences in HSI coordinates forgetting that hue is an angle and not strictly
a spatial measure. Hence, as suggested in [SK94], probably a better distance
would be the following expression
q
2
∆C = (I2 − I1 ) + S22 + S12 − 2S2 S1 cos (H2 − H1 ) (6.24)
At small intensities or saturations, hue is very imprecisely determined with
those expressions and it is a better idea to compare colors by means of their
intensity in that case.
CIELAB
The CIE6 1976 (L∗ , a∗ , b∗ ) is a uniform color space developed as a space to be
used for the specification of color differences. It is defined from the tristimulus
values normalized to the white by next equations
13
Y
L∗ = 116 Yw − 16
13 13
∗ X Y
a = 500 Xw − Yw (6.25)
31 13
∗ Y Z
b = 200 Yw − Zw
5 The line from (0, 0, 0) to (R
max , Gmax , Bmax ), where the maximum coordinate value is
255 or 1, if normalized coordinates are used.
6 Comité International d’Éclairage.
Jaume Vergés–Llahı́ MMV
6.4 Segmentation of Color Images 131
In these equations (X, Y, Z) are the tristimulus values of the pixel and
(Xw , Yw , Zw ) are those of the reference white. We approximate these values
from (R, G, B) by the linear transformation in [SK94]
X 0.607 0.174 0.200 R
Y = 0.299 0.587 0.114 G (6.26)
Z 0.000 0.066 1.116 B
Our reference white is (Rw , Gw , Bw ) = (255, 255, 255). L∗ represents light-
ness, a∗ approximates redness–greenness, and b∗ , yellowness–blueness. These
coordinates are used to construct a Cartesian color space where the Euclidean
distance is used, i.e.,
p
∗
∆Eab = ∆L∗ 2 + ∆a∗ 2 + ∆b∗ 2 (6.27)
CIELUV
The CIE 1976 (L∗ , u∗ , v ∗ ) is also a uniform color space defined by equations
13
Y
L∗ = 116 Yw − 16
u∗ = 13L∗ (u0 − u0w ) (6.28)
v ∗ = 13L∗ (v 0 − vw
0
)
In these equations u0 and v 0 are the chromaticity coordinates of the stimulus
and u0w and vw0
are those of the reference white. These values actually are the
CIE 1976 Uniform Chromaticity Scales (UCS) defined by equations
4X
u0 = X+15Y +3Z
(6.29)
9Y
v0 = X+15Y +3Z
As before, (X, Y, Z) are the tristimulus values of a pixel computed from
RGB values with Eq. (6.26). Analogously to (L∗ , a∗ , b∗ ) coordinates, those
coordinates also construct a Cartesian color space where to use the Euclidean
distance p
∗
∆Euv = ∆L∗ 2 + ∆u∗ 2 + ∆v ∗ 2 (6.30)
We must state that in [Fai97] is argued that (L∗ , a∗ , b∗ ) are better coordinates
than (L∗ , u∗ , v ∗ ) since the adaptation mechanism of the latter – a subtractive
shift in chromaticity coordinates, (u0 − u0w , v 0 − vw
0
), rather than a multiplicative
normalization of tristimulus values, (X/Xw , Y /Yw , Z/Zw ) – can result in colors
right out of the gamut of feasible colors. Besides, (L∗ , u∗ , v ∗ ) adaptation trans-
form is extremely inaccurate with respect to predicting visual data. However,
what is worst for our purposes is its poor performance at predicting color differ-
ences. We consequently prefer to use Lab coordinates, whenever an alternative
to the RGB space is needed.
Jaume Vergés–Llahı́ MMV
132 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
6.5 Segmentation of Sequences
We must now face the problem of segmenting a sequence of images keeping
in mind that those segmentations should satisfy at least two general proper-
ties, namely, components should correspond to actual regions in the image of
homogenous color (coherence) and remain as stable as possible through the se-
quence. In other words, we do not want either segmentations with too many
small regions or components which fluctuate too much through time.
However, the process of reducing the number of components by aggregating
similar adjacent regions may cause unstable segmentations because some of them
may be joined differently in contiguous frames. From some preliminary results
it seems that a more coherent segmentation would be necessary to prevent this
shortcoming.
We suggest an approach which takes advantage of the segmentation of the
immediately previous frame in order to obtain that of the next one. The idea is
pretty simple and, for each new frame, consists in grouping similar regions into
bigger ones in the same way as it was done in the preceding frame. Thus, we kill
two birds with one stone, i.e., we get greater coherence and stability. Obviously,
an intermediate step dedicated to matching regions which seem equal in two
consecutive images is needed.
In general, using a correspondence stage in a segmentation process would be
seen as a drawback because of being a time consuming and a usually prone–to–
error process. Nevertheless, we propose to use the ideas laying behind the IRM
distance between regions [WLW01], which provides both robustness to poor
segmentations and effortlessly integrates features from many regions.
Next, we consider the two steps that are needed in our segmentation of
sequences, namely, the computations of correspondences among components
and the propagation of previous segmentations into the new ones for each frame
in the sequence.
6.5.1 Computation of Component Correspondences
The correspondence between two components, Cik−1 ∼ Cik0 , in two correlative
frames Ik−1 and Ik is defined as
Cik−1 ∼ Cik0 ⇐⇒ Cik−1 = D Clk−1 , Cik0 , ∀ Cik0 ∈ Ik (6.31)
argmin
∀Clk−1 ∈Ik−1
where D (◦, ◦) is a measure of distance between components in Ik−1 and Ik .
As said, we follow the ideas of the IRM similarity measure7 in [WLW01]
to compute a content–based distance between two components from different
images. Our approach combines, at the same time, features of appearance and
position. We use the mean color as the appearance feature, while the component
center of mass is the position feature.
Then, the difference D(Clk−1 , Cik0 ) between two components in two successive
frames is computed using the simple Euclidean distance over the features above.
In order to compare side by side two features that apparently are rather het-
erogeneous, such as color and position, we normalize the coordinates to fit the
interval [0, 1] dividing each component by the maximum range of each feature.
7 In Chapter 7 there is a wider explanation about this measure.
Jaume Vergés–Llahı́ MMV
6.5 Segmentation of Sequences 133
This way, things which are a priori different and have dissimilar units can
be compared as if they were basically the same. Theoretically, computations
should be done for all Cik−1 ∈ Ik−1 and Cik0 ∈ Ik so that we finally got all the
correspondences between components in two correlative frames. Nonetheless, to
speed up computations it is interesting to focus comparisons only to a certain
area surrounding the likeliest position where to find those component.
6.5.2 Propagation of Component Correspondences
For each new frame, once the image has been individually segmented into com-
ponents, we would like to use the previous regrouping of components to reduce
the number of existing regions in the present image while preserving the regions
which have already come up, maintaining the degree of coherence along the
sequence as a consequence of it.
Formally, let us suppose that two consecutive frames Ik−1 and Ik provide us
with two segmentations Ik−1 = {Cik−1 }i=1,...,nk−1 and Ik = {Cik0 }i0 =1,...,nk , re-
spectively. Let us also assume we know that the segmentation Ik−1 has been re-
duced to a new segmentation with bigger components Ĩk−1 = {C̃jk−1 }j=1,...,mk−1 ,
where mk−1 ≤ nk−1 and for each component Cik−1 ∈ Ik−1 there exists a bigger
component C̃jk−1 ∈ Ĩk−1 so that Cik−1 ⊆ C̃jk−1 . We define the set of indexes
Indj of all components in Ik−1 that have been put together forming one single
region C̃jk−1 ∈ Ĩk−1 . There hence exist as many index sets as components in
Ĩk−1 .
The problem then is to propagate the segmentation in Ĩk−1 into the one in
Ik forming, as a consequence, a new segmentation Ĩk = {C̃jk0 }j 0 =1,...,mk . This is
carried out by grouping the regions in Ik in such a way that if any component
Cik0 ∈ Ik corresponds to a component Cik−1 ∈ Ik−1 in the previous frame that
was joined forming a bigger region C̃jk−1 ∈ Ĩk−1 , then the component Cik0 will
be grouped with the others satisfying the same property and creating the bigger
component C̃jk0 , which is the propagation of the component C̃jk−1 in the (k −1)th
frame into the k th frame, that is, C̃jk−1 ∼ C̃jk0 . Formally, the component C̃jk0 is
build as follows
[
C̃jk0 = Cik0 , ∀Cik0 ∈ Ik | Cik−1 ∼ Cik0 ∧ Cik−1 ⊆ C̃jk−1 (6.32)
i0 ∈Indj 0
In other words, components in a given frame will be joined together as their
corresponding components were joined in the anterior frame. Finally, a new
segmentation Ĩk is achieved at k th frame, which is in general less oversegmented
than the original one, Ik , while maintaining the stability of regions in respect
to the previous frame.
This scheme does not need to treat in any particular manner the components
that appear or disappear in every new frame. Since component correspondence
is done backwards, disappearing regions simply have no matching in the new
frame. On the other hand, new regions will look for the closest region in the
previous frame in terms of color and position. If the resulting distance is too
great, then it is not adjoined to any component in Ĩk−1 and is considered as a
new region in the segmentation Ĩk .
Jaume Vergés–Llahı́ MMV
134 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
6.6 Experiments and Results
The main concern in this Chapter resides in the segmentation of color images
considered both as static images as well as belonging to a sequence that may
have been obtained, e.g., from an autonomous robot. In order to achieve this
goal, we display in this Section the set of experiments that have been carried
out and the results obtained.
Principally, these experiments consist in the segmentation of such images by
the algorithm we suggest in this Chapter and the comparison of the resulting
outcomes with those attained by means of the two other approaches already
mentioned in previous paragraphs, namely, the original F&H’s algorithm in
[FH98a] and the Figueiredo’s EM clustering method in [FJ02].
The goal of doing so is, first of all, to illustrate the improvements that have
been attained in relation to the results by the original F&H’s algorithm, while
maintaining its speed at a similar level. Likewise, our algorithm has been put
side by side to that of Figueiredo, which is known to perform fairly well, to
comparatively study the quality of our segmentations. Since our segmentations
are definitively far faster than those of Figueiredo’s unsupervised EM, it is
important for us to show that the same range of quality is kept.
6.6.1 Segmentation of Static Images
The images shown in Fig. 6.1 correspond to different stages in the segmentation
of the picture exhibited in Fig. 6.1 (a). First, we display the results obtained
using the original F&H’s algorithm in Fig. 6.1 (b). It can be appreciated how
this segmentation is not completely satisfactory since big homogeneous regions
are split into several components, specially in the background. This is partially
solved in Fig. 6.1 (c), where now homogeneous regions are completely merged
in a coherent way into bigger components.
Nevertheless, the total number of regions is still high in respect of the rel-
atively small number of potential real regions in the image. This is because of
the spurious regions generated in highly variable areas such as borders. These
regions are detected using the index defined in Eq. (6.19) and can be observed in
Fig. 6.1 (d). Finally, the resulting segmentation can be appreciated in Fig. 6.1
(e) after removing spurious regions, closely fitting actual homogeneous areas in
the scene.
An analogous situation is the one shown in Fig. 6.2, where the well–known
picture of peppers is segmented. Again, the original image is portrayed in
Fig. 6.2 (a). Fig. 6.2 (b) is the segmentation before removing the spurious
regions that are pictured in Fig. 6.2 (c). The final result is exhibited in Fig. 6.2
(d). It must be noted that spurious pixels are eliminated by layers, starting
at outer layers and ending with inner pixels. In this manner, regions tend to
phagocytize any small spurious region within and to grow outwards until another
region is found. This is not a genuine dilatation since pixels prefer regions with
the highest number of neighbors in common.
In order this segmentation to be useful in an object recognition system, it is
important that images of a given object, which have been taken from different
angles, be segmented in a similar way. We verify that behaviour in Fig. 6.3 and
Fig. 6.4 where two series of images are shown. Fig. 6.3 displays a toy bear under
six views. The upper row shows the original pictures while the lower row offers
Jaume Vergés–Llahı́ MMV
6.6 Experiments and Results 135
(a) (b)
(c) (d) (e)
Figure 6.1: Comparing our algorithm to that of F&H: (a) Original Image. (b)
F&H’s segmentation. (c) Our segmentation before spurious regions elimination.
(d) Spurious regions. (e) Final result after spurious regions elimination.
(a) (b)
(c) (d)
Figure 6.2: Example of our segmentation: (a) Original Image. (b) Our segmen-
tation before spurious regions elimination. (c) Spurious regions. (e) Final result
after spurious regions elimination.
Jaume Vergés–Llahı́ MMV
136 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Figure 6.3: Example of our segmentation. Upper row: Original image. Lower
row: Segmented image.
Figure 6.4: Example of our segmentation. Upper row: Original image. Lower
row: Segmented image.
Jaume Vergés–Llahı́ MMV
6.6 Experiments and Results 137
(a)
(b)
(c)
(d)
(e)
Figure 6.5: Comparing our algorithm to that of Figueiredo. Upper row:
Figueiredo’s segmentation. Lower row: Our segmentation.
Jaume Vergés–Llahı́ MMV
138 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
the segmented images. A similar series is exhibited in Fig. 6.4, where a set of
ten different views is supplied. In both series, regions formed in neighboring
views are similarly segmented. Shades and highlights are collected into separate
regions, which is quite natural since we, as humans, can also perceive them
as separate areas. In our opinion, it is not a segmentation concern the issue
of identifying such regions and discerning to which component they belong, a
question that should be implemented in a different level task.
Finally, in Fig. 6.5 we check up on whether our segmentation algorithm is
capable of attaining results comparable to those obtained by the Figueiredo’s
unsupervised clustering algorithm [FJ02]. This is an excellent version of the
EM technique, very useful to segment images of unknown content since there
is no need to know the exact number of clusters to run the routine. Moreover,
this algorithm provides us with a family of Gaussian distributions as a result.
Nevertheless, it takes quite a lot of time to complete an image. For example,
a 360 × 288 image takes about 25 sec. to get segmented in a 800 MHz PC.
Our algorithm only takes about 0.10 ÷ 0.20 sec. in the same computer, which is
almost less than two orders of magnitude.
In these series, the upper rows of each object are formed by the results corre-
sponding to the Figueiredo’s segmentation, whereas the lower rows belong to the
ones obtained by our algorithm. The aim in placing these images this way is to
illustrate mainly two important questions, i.e., how different views of the same
object are comparatively segmented and whether these segmentations differ too
much depending on the kind of algorithm used. At first sight, it seems that
both algorithms supply very similar segmentations, despite the elimination of
spurious regions in our approach can produce slightly differing results wherever
textured areas appear in images, as it is the case of fruity drawings and letters
in Fig. 6.5(b) and Fig. 6.5(c), respectively.
6.6.2 Segmentation of Sequences
We now move on to the description of some of the results that have been obtained
after segmenting a video sequence captured from a mobile robot in an indoor
environment. Yet, our aim is to illustrate the performance of our algorithm in
such a task if compared to Figueiredo’s approach. At this point, we must state
the difficulty we found to put these results in paper. Although the sense of all
that is at once grasped once the videos are viewed8 , we try to provide the same
information in the following pages by only showing a set of images from a short
interval out of the whole sequence.
This piece of sequence is in Fig 6.6 and consists of a reduced set of 16 images
from a longer sequence (≈ 1 min.) of 1001 images at a rate of 15 images/sec.
This small set span for about 10 sec. and represent only one every 10 images.
Images are filtered using the median filter with a neighborhood of 3 × 3 pixels
to remove noise and to get smooth images without enlarging region contours.
Color information is stabilized using the color constancy algorithm in Chapter 5
together with the Mean heuristic. The first image in the sequence is employed
as the canonic one.
The first step is to examine how the Figueiredo’s algorithm performs in seg-
menting sequences in order to later compare them with those achieved by our
8 These videos will be provided in a CD–ROM for a better appreciation along with the rest
of the graphics and images used in the conformation of this document.
Jaume Vergés–Llahı́ MMV
6.6 Experiments and Results 139
Figure 6.6: Set of images from the video sequence of a mobile robot moving
about in an indoor environment.
approach. Two set of images are presented in Fig. 6.7 in groups of two rows.
The upper row are the same images in Fig 6.6 that have been independently
segmented, meaning that each image is segmented using a randomly initialized
Gaussian mixture. As can be seen, this method presents a number of problems
since clustered colors are not exactly the same in consecutive frames. To min-
imize this lack of stability, the initialization routine is changed so that it could
take advantage of previous segmentations.
This is very easily attained using at each new frame the finite mixture of
Gaussian distributions from the previous EM execution. When a certain color
disappears, its corresponding Gaussian simply gets a zero weight and dies out.
Letting spare Gaussian distributions initialize at random allows the algorithm
to incorporate new clusters into the next segmentation step. Results obtained
in that manner are displayed at the lower row in Fig. 6.7. The improvement
is obvious in both segmentation and computation time, since convergence of
the EM routine is faster due to the minor number of distributions and their
closeness to the quiescent point.
Afterwards, in order to complete the series of segmentations we carry out the
same experiment as before, but using this time our algorithm in the next two
cases, namely, without and with the enforcement of stability based on the com-
putation and propagation of correspondences between components explained in
Section 6.5. To perform these experiments, we use two color spaces, i.e., RGB
along with the Euclidean distance, and Lab with the ∆Eab metric, both oh them
reviewed in Section 6.4.6. Results obtained this way are exhibited in Fig. 6.8
for case of Lab space, and in Fig. 6.9 for RGB coordinates.
As explained for Fig. 6.7, segmentations produced as if images were inde-
Jaume Vergés–Llahı́ MMV
140 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Figure 6.7: Images from the video sequence segmented using Figueiredo’s algo-
rithm. Upper row: independent images. Lower row: using previous segmenta-
tion.
Jaume Vergés–Llahı́ MMV
6.6 Experiments and Results 141
Figure 6.8: Images from the video sequence segmented using our algorithm
and Lab color space. Upper row: independent images. Lower row: component
correspondence.
Jaume Vergés–Llahı́ MMV
142 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Figure 6.9: Images from the video sequence segmented using our algorithm and
RGB color space. Upper row: independent images. Lower row: component
correspondence.
Jaume Vergés–Llahı́ MMV
6.7 Conclusions 143
pendently considered are placed in the upper rows. The lower rows are reserved
to segmentations after applying the component correspondence. White circles
have been painted around some areas in the upper row of Fig. 6.8 and Fig. 6.9
to focus on the regions that shift back and forth uncertainly compared to those
in the lower row, which remain far stabler.
Despite it is difficult to catch this behaviour at once in paper, what we must
understand from these results is that some areas in Fig. 6.8 and Fig. 6.9, such
as those corresponding to doors, the floor, and the pair of black wastepaper
baskets, present a swinging segmentation, since some regions are differently
joined in two consecutive frames.
This bad consequence of subsegmenting images mainly occurs in poorly de-
fined regions and is greatly reduced by component correspondence, as it can be
appreciated in the lower rows of Fig. 6.8 and Fig. 6.9. These results are even
better than those exhibited in the lower row of Fig. 6.7 corresponding to the
case of Figueiredo’s routine being fed with Gaussian distributions from previous
steps. And what is more important, images get segmented in far less time.
6.7 Conclusions
As a conclusion to this Chapter, we claim that the problem of segmenting color
images is faced, no matter their origin is static or from a video sequence, in
a way that both coherent and stable segmentations are sought. For us, coher-
ence means that components in a segmentation must correspond as close as
possible to actual regions of the segmented scene, whereas stability has to do
with the existence of components through time in a sequence, meaning that two
consecutive frames must generate similar segmentations where corresponding
components encompass similar areas in the scene.
To that purpose we suggest a greedy algorithm based on the computation of
the minimum spanning tree which grows components attending to local proper-
ties of pixels. The process is fully controlled by an energy function that estimates
the probability whether two components may be put together or not. Spurious
regions that are helplessly generated during the growing process are removed
accordingly to a quality index identifying such class of regions. Hence, a fast
algorithm is achieved providing image segmentations that are good enough for
identification purposes, as will be seen later in Chapter 7.
The segmentation algorithm is additionally extended to handle sequences
in order to get stabler segmentations through time. For each new frame, this
job is done by propagating forward the segmentation in the previous image, i.e,
regions which get joined in a frame forming a bigger component are matched to
other segments in the posterior frame by way of a distance that weights both
position and color appearance, and then, these segments are grouped into a new
component. Thus, it is granted that a pair of corresponding components in two
consecutive frames of the sequence look similar.
Results show that segmentations using the Felzenszwalb&Huttenlocher’s al-
gorithm [FH98a], from which our method is inspired, have been improved and
are similar in coherence and stability to those achieved by Figueiredo’s EM in
[FJ02], though being far faster. Furthermore, our segmentation algorithm will
be used in the next Chapter to obtain the segmentations needed to carry out a
set of experiments related with image retrieval and object recognition.
Jaume Vergés–Llahı́ MMV
144 CHAPTER 6. A COLOR SEGMENTATION ALGORITHM
Jaume Vergés–Llahı́ MMV