UNIT-4: Unsupervised Learning
Introduction to Clustering
Unsupervised vs Supervised Learning,
Different types of clustering techniques
Partitioning Methods
Hierarchical methods
Density Based Methods
Grid Based Methods
Model Based Clustering Methods.
4. K-means clustering,
5. Apriori algorithm and
Association rule.
Hierarchical clustering, K-Medoids,
Density-based methods DBSCAN.
Overview of Clustering
Introduction to clustering?
Definition:-1
Clustering is the task of dividing the population or data points into a number of groups such that data
points in the same groups are more similar to other data points in the same group and dissimilar to
the data points in other groups.
It is basically a collection of objects on the basis of similarity and dissimilarity between them.
Definition:-2
"A way of grouping the data points into different clusters, consisting of similar data points. The
objects with the possible similarities remain in a group that has less or no similarities with another
group."
What is clustering?
Grouping unlabeled examples is called clustering.
As the examples are unlabeled, clustering relies on unsupervised machine learning.
If the examples are labeled, then clustering is called classification.
Figure 1: Unlabeled examples grouped into three clusters.
Examples
Differentiate between Classification and Clustering
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (data point belongs to only one
group) and Soft Clustering (data points can belong to another group also).
Partitioning Clustering
Hierarchical Clustering
Density-Based Clustering
Model-Based Clustering
Grid Based Methods
What are the Uses of Clustering?
Common applications for clustering include the following:
market segmentation
social network analysis
search result grouping
medical imaging
image segmentation
anomaly detection
Examples of clustering(Grouping):
Group stars by brightness.
Group organisms by genetic information into a taxonomy.
Group documents by topic.
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (data point belongs to only one
group) and Soft Clustering (data points can belong to another group also).
Partitioning Clustering
Hierarchical Clustering
Density-Based Clustering
Model-Based Clustering
Grid-Based Methods
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups.
It is also known as the centroid-based method.
The most common example of partitioning clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k-groups, where K is used to define the number of
pre-defined groups.
The cluster center is created in such a way that the distance between the data points of one cluster
is minimum as compared to another cluster centroid.
Working of K-Means Algorithm:
The following stages will help us understand how the K-Means clustering technique works:
Step 1: First, we need to provide the number of clusters, K.
Step 2: Next, choose K data points at random and assign each
data points to a cluster.
Step 3: The cluster centroids will now be computed.
Step 4: Iterate the steps below until we find the ideal centroid,
which is the assigning of data points to clusters that do not vary.
4.1 The sum of squared distances between data points and centroids would be calculated first.
4.2 At this point, we need to allocate each data point to the cluster that is closest to the others
(centroid).
4.3 Finally, compute the centroids for the clusters by averaging all of the clusters data points.
The working of the K-Means algorithm is given below:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other
from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the distances between data points and centroids first and allocate each data point
to the cluster that is closest to the others (centroid). Place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of
each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go
to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
Step-1:Let us pick k clusters, i.e., K=2, to identify the dataset and to put them into different clusters.
Step2:- We need to select some random k data points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not the part of our dataset.
Consider the image below:
Now we will assign each data point of the scatter plot to its closest K-point or centroid.
We will compute it by applying some mathematics that we have studied to calculate the distance
between two points.
So, we will draw a median between both the centroids.
Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid,
and points to the right of the line are close to the yellow centroid.
Let's color them as blue and yellow for clear visualization.
The left Form cluster has a blue centroid, whereas the right Form cluster has a yellow centroid.
Repeat the procedure, this time by selecting a different centroid.
To choose the new centroids, we will compute the center of gravity of these centroids, and will find
new centroids as below:
Next, we will reassign each data point to the new centroid. For this, we will repeat the same process
of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids
or K-points.
We will repeat the process by finding the center of gravity of centroids, so the new centroids will be
as shown in the below image:
As we got the new centroids so again will draw the median line and reassign the data points. So, the
image will be:
.
We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final clusters are
as shown in the fig. below:
How to choose the value of "K number of clusters" in K-means Clustering?
Choosing the optimal number of clusters is a big task.
There are many different ways to find the optimal number of clusters, but here we are discussing the
most appropriate method to find the number of clusters or value of K.
The method is Elbow Method:
The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value.
WCSS stands for Within Cluster Sum of Squares, which defines the total variations within a cluster.
The formula to calculate the value of WCSS (for 3 clusters) is given below:
WCSS= Pi in Cluster1 distance(Pi C1)2 +Pi in Cluster2distance(Pi C2)2+Pi in
CLuster3 distance(Pi C3)2
Association rule
Association rule is a kind of unsupervised learning technique that tests for the reliance of one data
element on another data element and design appropriately so that it can be more cost-effective.
The association rule learning is the most important approach of machine learning, and it is employed
in Market Basket analysis, Web usage mining, continuous production, etc. In market basket
analysis, it is an approach used by several big retailers to find the relations between items.
Association rule learning can be divided into three types of algorithms:
1. Apriori algorithm
2. Eclat algorithm
3. F-P Growth algorithm
Associations: Market Basket Analysis
How does Association Rule Learning work?
Association rule learning works on the concept of If and Else Statement, such as if A then B.
To measure the associations between thousands of data items, there are several metrics.
These metrics are given below:
Support
Confidence
Lift
Applications of Machine Learning:
Learning Associations:
Support Count() Frequency of occurrence of a items.Here ({Milk, Diaper, Beer})=2 ({Bread,
Diaper, Beer})=2
Association Rule An implication expression of the form X -> Y, where X and Y are any 2 item sets.
Example: {Milk, Diaper}->{Beer}
Definition of Support:
Support is the frequency of A or how frequently an item set appears in the dataset.
From the above table:
Support(s)= ({Milk, Diaper, Beer})/|T| = 2/5
= 0.4
Where T is the total number of transactions.
Definition of Confidence:
How often the items X and Y occur together in the dataset when the occurrence of X is already
given.
It is the ratio of the transaction that contains X and Y to the number of records that contain X.
From Example {Milk, Diaper} > {Beer}
Confidence(c)= (Milk, Diaper, Beer)/(Milk, Diaper)=2/3
=0.67
Definition of Lift(l): The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the item sets X and Y are independent of each other. The expected
confidence is the confidence divided by the frequency of {Y}.
If Lift(l)=1: It indicates X and Y almost often appear together as expected,
If Lift(l)>1: It means they appear together more than expected and
If Lift(l)<1: It means they appear less than expected. Greater lift values indicate stronger association.
Lift(l)= Support(X,Y)/(Support(X)*Support(Y))
l=Supp({Milk, Diaper, Beer})/ (Supp({Milk, Diaper})*Supp({Beer}))
= 0.4/(0.6*0.6)
=1.11
Applications of Association Rule Learning
Below are some popular applications of association rule learning:
Market Basket Analysis: It is one of the popular examples and applications of association rule
mining. This technique is commonly used by big retailers to determine the association between
items. By discovering such associations, retailers produce marketing methods by analyzing which
elements are frequently purchased by users.
Medical Diagnosis: With the help of association rules, patients can be cured easily, as it helps in
identifying the probability of illness for a particular disease.
Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.
Web usage mining: Web usage mining is basically the extraction of various types of interesting data
that is readily available and accessible in the ocean of huge web pages, from Internet.
UNIT-4: Unsupervised Learning
Introduction to Clustering
Unsupervised vs Supervised Learning,
Different types of clustering techniques
Partitioning Methods
Hierarchical methods
Density Based Methods
Grid Based Methods
Model Based Clustering Methods.
4. K-means clustering,
5. Apriori algorithm and
Association rule.
Hierarchical clustering, K-Medoids,
Density-based methods DBSCAN.
Hierarchical clustering
The hierarchical clustering methods are used to group the data into hierarchy or tree-like structure.
For example, in a machine learning problem of organizing employees of a university in different
departments, first the employees are grouped under the different departments in the university, and
then within each department, the employees can be grouped according to their roles such as
professors, assistant professors, supervisors, lab assistants, etc.
This creates a hierarchical structure of the employee data and eases visualization and analysis.
Similarly, there may be a data set which has an underlying hierarchy structure that we want to
discover and we can use the hierarchical clustering methods to achieve that.
Types of hierarchical clustering
There are two main hierarchical clustering methods: Agglomerative clustering and Divisive
clustering.
Agglomerative clustering: is a bottom-up technique which starts with individual objects as clusters
and then iteratively merges them to form larger clusters.
Divisive clustering: starts with one cluster with all given objects and then splits it iteratively to form
smaller clusters.
Agglomerative Clustering:
It uses a bottom-up approach.
It starts with each object forming its own cluster and then iteratively merges the clusters according to
their similarity to form large clusters.
It terminates either when certain clustering condition imposed by user is achieved or All clusters
merge into a single cluster.
Divisive Clustering:
Divisive clustering is the opposite of Agglomerative clustering.
It uses the top-down approach.
The starting point is the largest cluster with all objects in it and then split recursively to form smaller
and smaller clusters.
It terminates when the user-defined condition is achieved or final clusters contain only one object.
A dendrogram, which is a tree like structure, is used to represent hierarchical clustering.
Individual objects are represented by leaf nodes and the clusters are represented by root nodes. A
representation of dendrogram is shown in this figure:
Hierarchical Clustering
Produce a nested sequence of clusters.
One approach: recursive application of a partitional clustering algorithm.
Types of hierarchical clustering
Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from the bottom level, and
merges the most similar (or nearest) pair of clusters
stops when all the data points are merged into a single cluster (i.e., the root cluster).
Divisive (top down) clustering: It starts with all data points in one cluster, the root.
Splits the root into a set of child clusters. Each child cluster is recursively divided further
stops when only singleton clusters of individual data points remain, i.e., each cluster with only a
single point
Dendrogram: Hierarchical Clustering
Dendrogram
Given an input set S
nodes represent subsets of S
Features of the tree:
The root is the whole input set S.
The leaves are the individual elements of S.
The internal nodes are defined as the union of their children.
10
Dendrogram: Hierarchical Clustering
Dendrogram
Each level of the tree represents a partition of the input data into several (nested) clusters or groups.
May be cut at any level: Each connected component forms a cluster.
10
Hierarchical clustering
Initialization
Each individual point is taken as a cluster
Construct distance/proximity matrix
Distance/Proximity Matrix
Intermediate State
After some merging steps, we have some clusters
C1
C4
C2
C5
C3
Distance/Proximity Matrix
Intermediate State
Merge the two closest clusters (C2 and C5) and update the distance matrix.
C1
C4
C2
C5
C3
Distance/Proximity Matrix
After Merging
Update the distance matrix
C1
C4
C2 U C5
C3
? ? ? ?
C2 U C5
C1
C1
C3
C4
C2 U C5
C3
C4
Closest Pair
16
A few ways to measure distances of two clusters.
Single-link
Similarity of the most similar (single-link)
Complete-link
Similarity of the least similar points
Centroid
Clusters whose centroids (centers of gravity) are the most similar
Average-link
Average cosine between pairs of elements
17
Single Link Example
It Can result in straggly (long and thin) clusters due to chaining effect.
Agglomerative Clustering Example
Consider the points A (1, 1), B(2, 3), C(3, 5), D(4,5), E(6,6), and F(7,5) and try to cluster them.
To perform clustering, we will first create a distance matrix consisting of the distance between each
point in the dataset. The distance matrix looks as follows.
Consider the following Sample Data:
The distance matrix is:
For our convenience, we will be considering only the lower bound values of the matrix as shown
below. Specifically, the lower bound values represent the minimum distance between any two points
in the dataset.
Step 1: Calculate Euclidean distance, create the distance matrix.
Step 2: Find the minimum value element from distance matrix.
The minimum value element is (p3,p6)and value is 0.11 i.e. cluster (p3,p6)
Step 3: Recalculate or update the distance matrix for cluster(p3,p6)
Same formula can be used for p2,p4,p5
Min[dist((p3,p6),p2)]
Min[dist((p3,p2),(p6,p2)]
Min[dist(0.15, 0.25)]=0.15
Min[dist((p3,p6),p4)]
Min[dist((p3,p4),(p6,p4)]
Min[dist(0.15, 0.22)]=0.15
Min[dist((p3,p6),p5)]
Min[dist((p3,p5),(p6,p5)]
Min[dist(0.28, 0.39)]=0.28
Updated distance matrix:
Step 4: repeat the step 2 & 3.
The minimum value element is (p2,p5)and value is 0.14 i.e. cluster (p2,p5)
Recalculate or update the distance matrix for cluster (p2,p5)
Same formula can be used for (p3,p6),p4
Min[dist((p2,p5),(p3,p6)]
Min[dist((p2,(p3,p6),(p5, (p3,p6))]
Min[dist(0.15, 0.28)]=0.15
Min[dist((p2,p5),p4)]
Min[dist((p2,p4),(p5,p4)]
Min[dist(0.20, 0.29)]=0.20
Updated distance matrix:
Step 5: repeat the step 2 & 3.
The minimum value element is (p2,p5,p3,p6) and value is 0.15
Here 2 values are same and minimum, then first element is choose as minimum value element. i.e.
cluster (p2,p5,p3,p6)
Recalculate or update the distance matrix for cluster (p2,p5,p3,p6)
Min[dist([(p2p5),(p3p6)],p1)]
Min[dist([((p2p5), p1)], [((p3p6), p1)]]
Min[dist([(0.23)], [(0.22)]]=0.22
Same formula can be used for p4
Min[dist([(p2p5),(p3p6)],p4)]
Min[dist([((p2p5), p4)], [((p3p6), p4)]]
Min[dist([(0.20)], [(0.15)]]=0.15
Updated distance matrix:
Step 6: repeat the step 2 & 3.
The minimum value element is (p2,p5,p3,p6,p4) and value is 0.15. i.e. our 4th cluster
(p2,p5,p3,p6,p4)
Recalculate or update the distance matrix for cluster (p2,p5,p3,p6,p4)
Updated distance matrix:
Step 8: repeat the step 3 & 4.
The minimum value element is (p2,p5,p3,p6,p4,p1) and value is 0.22
i.e. our 5th cluster (p2,p5,p3,p6,p4,p1)
Recalculate or update the distance matrix for cluster (p2,p5,p3,p6,p4,p1)
In this step only 1 value is remaining so it is by default cluster.
Step 8: Drawing the Dendogram.