Unsupervised Image Segmentation
using Deep Image Clustering
Monazza Qadeer Khan
206154
Introduction
• Object segmentation is the most vital operation in image processing
techniques prior to image analysis
• Object segmentation is a challenging problem in the field of computer
vision and it has been widely applied in areas such as object recognition
and image classification
• Generally speaking, object segmentation methods can be divided into
three categories, unsupervised, semi-supervised and fully supervised.
Introduction
• In fully supervised segmentation, accurate labeled training dataset is
used
• In unsupervised segmentation, there are no ground truth labels
• Focus of this project is on unsupervised image segmentation
• It has two parts: extraction of features from given image and division of
image into different regions
Supervised vs. Unsupervised
Problem Statement
• Conventional clustering methods like K-means , Active contour ,
normalized cut , MLSS and SAS can be used for segmentation
• These methods have two principal drawbacks i.e. they are sensitive to
the segmentation parameters such as cluster numbers and the whole
procedure is complex, which cannot be optimized easily
• So, a deep image clustering (DIC) network is designed and
implemented
• It consists of a feature transformation subnetwork and a
differentiable deep clustering subnetwork; it divides the image space
into different clusters
Objectives
• Encouraged by neural networks’ flexibility and their ability for modelling intricate
patterns, an unsupervised segmentation framework based on a novel deep image
clustering (DIC) model is proposed
• The DIC consists of a feature transformation subnetwork (FTS) and a trainable
deep clustering subnetwork (DCS) for unsupervised image clustering
• FTS is built on a simple and capable network architecture
• DCS can assign pixels with different cluster numbers by updating cluster
associations and cluster centers iteratively
Material
• Extensive experiments have been conducted on the Berkley Segmentation
Database
• The experimental results show that DCS is more effective in aggregating features
during the clustering procedure
• DIC has also proven to be less sensitive to varying segmentation parameters and
of lower computation costs
• DIC can achieve significantly better segmentation performance compared to the
state-of-the-art techniques
Material
Berkeley Segmentation Dataset (BSD)
• The dataset consists of 500 natural images, ground-truth human annotations
and benchmarking code
• The data is explicitly separated into disjoint train, validation and test
subsets
• The dataset is an extension of the BSDS300, where the original 300 images
are used for training / validation and 200 fresh images, together with human
annotations, are added for testing
• Each image was segmented by five different subjects on average
Flow Diagram
Illustration of the proposed DIC framework for unsupervised image segmentation. DIC
consists of a FTS and a DCS and DIC is trained by an iterative refinement loss.
Methodology
Unsupervised image segmentation
• Includes technical details like preprocessing steps, features, how they
are extracted, their visualization, model training and testing
• Deep image clustering model consists of two modules:
1. a subnetwork for feature extraction
2. and a deep clustering subnetwork
• Super-pixel guided iterative refinement loss
• Over-fitting training protocol optimizing the network parameters in an end-to-end
way
Methodology
1. Network architecture for Feature Transformation subnetwork (FTS)
• Autoencoder architecture is used and the connection is skipped for
constructing the feature transformation subnetwork(FTS)
• The CNN for feature extraction is composed of a series of convolution layers
interleaved with batch normalization (BN) and ReLU activations
• FTS consists of six convolution blocks, one max-pooling operation, one
deconvolution operation and a simple convolution operation.
Methodology
• We use max-pooling, which down samples the input by a factor of 2, after the
2nd convolution block to increase the receptive field
• Then the 4th convolution block outputs are up-sampled by deconvolution and
concatenated with the 2nd convolution block outputs to pass onto the 5th
convolution block
• After the 6th convolution block and the simple convolution block, feature Y
with dimension C is generated
Methodology
• We use 3* 3 convolution filters with the number of output channels set to 64,
128 or 192 in each block, except the last CNN layer which outputs C channels
• The resulting C dimensional features Y can be taken as coarse cluster
associations
• In order to aggregate the features more effectively, Y will be passed onto the
following deep clustering module that iteratively updates the pixel-clusters
associations and cluster centers for 𝜏 iterations
Methodology
The flowchart of the feature transformation subnetwork.
• Convolution block (CB) - 33 convolution
• Batch-normalization max-pooling(MP) with the factor 2
• Relu
• Max-pooling(MP) with the factor 2
• Deconvolution(DC) of sample features by 2 times
Methodology
2. Deep Clustering Subnetwork
• Firstly the extracted feature Y is flattened to the dimension N C, where N
D H W, H is the height of image, W is the width of image and C is the
channel number or super-pixel number (SPN). Then a neural network
based clustering procedure is designed
• The cluster centers Ω are defined as the initializations for feature
clustering. Assuming the cluster centers are defined as Ω={Ω1, Ω2, Ω3,
…,ΩM}, M is the number of default clusters and Ωi is with dimension C*1
Methodology
The flowchart of the deep clustering subnetwork. DCS contains two iterative steps:
calculating cluster associations H and updating cluster centers Ω
Experimentation
• The segmentation results on two Berkley Segmentation Databases (BSDS300 and
BSDS500) [35] which consists of 300 and 500 natural images respectively, are
reported.
• To quantitatively evaluate the segmentation results, five criteria are used:
1. Probabilistic Rand Index (PRI)
2. Variation of Information (VoI)
3. Global Consistency Error (GCE)
4. Boundary Displacement Error (BDE)
5.Segmentation Covering (SC).
• The segmentation performance is better if PRI and SC is large and the other three
are smaller compared to the ground truths
Experimentation
𝜏 is set as 3 according to the cross-validation
experiments. Training epoch T is set
as T = 100, the learning rate is set as 2 and the momentum is set as 0.9
Illustration of the iteration clustering process
Results
• In order to evaluate the proposed method DIC comprehensively, we compare the average
scores of the DIC’s with sixteen benchmark algorithms, such as Ncut, Mean-shift gPb-owt-
ucm, MLSS, W-Net MLSS , the optimal Image scale (OIS) is selected for segmenting images
in the Berkley Segmentation Database
• DIC works better in merging similar pixels and separating diverse regions by learning
from local image patterns adaptively
Results
The visual comparison between DIC and other state-of-the-arts, such as
MLSS, SAS
Demo
• Github link: https://github.com/zmbhou/DIC
• BSD dataset link:
https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/
• Contour Detection and Image Segmentation
Resources:
http://web.archive.org/web/20160306133802/http://www.eecs.berkeley.edu/Research/P
rojects/CS/vision/grouping/resources.html#bsds500
Demo
Thank You
Q&A