Unsupervised Learning
Dr. Fehmina Malik
Supervised Vs Unsupervised Learning
Supervised vs. Unsupervised Machine Learning
Supervised machine Unsupervised machine
Parameters learning technique learning technique
Algorithms are used
Input Data Algorithms are trained against data which is not
using labeled data. labelled
Computational Complexity Supervised learning is a Unsupervised learning is
simpler method. computationally complex
Highly accurate and Less accurate and
Accuracy trustworthy method. trustworthy method.
Why Unsupervised Learning?
Here, are prime reasons for using Unsupervised Learning in Machine Learning:
• Unsupervised machine learning finds all kind of unknown patterns in data.
• Unsupervised methods help you to find features which can be useful for
categorization.
• It is taken place in real time, so all the input data to be analyzed and labeled in
the presence of learners.
• It is easier to get unlabeled data from a computer than labeled data, which
needs manual intervention.
clustering
Clustering
Cluster
1
𝐶𝑖= ∑
𝑛 𝑖
𝑥𝑖
• Customer Segmentation
We covered this earlier – one of the most common applications of clustering
Applications is customer segmentation. And it isn’t just limited to banking. This strategy
is across functions, including telecom, e-commerce, sports, advertising,
sales, etc.
of Clustering • Document Clustering
in Real-World This is another common application of clustering. Let’s say you have
multiple documents and you need to cluster similar documents together.
Scenarios Clustering helps us group these documents such that similar documents are
in the same clusters.
• Image Segmentation
We can also use clustering to perform image segmentation. Here, we try to
club similar pixels in the image together. We can apply clustering to create
clusters having similar pixels in the same group.
• Recommendation Engines
Clustering can also be used in recommendation engines. Let’s say you want
to recommend songs to your friends. You can look at the songs liked by that
person and then use clustering to find similar songs and finally recommend
the most similar songs.
Common Distance Measures
Clustering types
K-means clustering
• K-means clustering is a widely used method for cluster analysis where the aim is to
partition a set of objects into K clusters in such a way that the sum of the squared distances
between the objects and their assigned cluster mean is minimized.
• Note that if N is the number of objects, then .
• Step-1: Select the number K to decide the number of
clusters.
• Step-2: Select random K points or centroids. (It can be other
How does the from the input dataset).
K-Means • Step-3: Assign data point to its closest centroid, which will
Algorithm form the predefined K clusters.
Work? Here we find the closest point using distance method.
• Step-4: Calculate the new centroid of each cluster.
• Step-5: Repeat the third steps, which means assignment of
another datapoint to the new closest centroid of each cluster.
• Step-6: Repeat for all points.
EXAMPLE: Given the following data make 3
clusters assuming A1, A2 AND A3 as centroids.
A1 (2,10)
A2(2,5)
• Let K1, K2 and K3 be three clusters having centroids A1, A2, A3.
• Take point A4, find the distance of it from centroids of clusters A1,
A3 (8,4) A2 and A3.
A4 (5,8)
A5 (7,5)
A6 (6,4) Minimum distance of A4 is with A1. So A4 is added in K1.
A7 (1,2)
New centroid of K1 is
A8 (4,9)
A1 (2,10)
• Take point A5, find the distance of it from all three centroids
A2(2,5)
A3 (8,4)
A4 (5,8)
A5 (7,5)
• Minimum distance of A5 is with A3. So A5 is added in K3.
A6 (6,4)
A7 (1,2)
• New centroid of K3 is
A8 (4,9)
A1 (2,10) • Take point A6, find the distance of it from all three centroids.
A2(2,5)
A3 (8,4)
A4 (5,8)
A5 (7,5) • Minimum distance of A6 is with K3. So A6 is added in K3.
A6 (6,4)
A7 (1,2) • New centroid of K3 is
A8 (4,9)
A1 (2,10) • Take point A7, find the distance of it from all centroids
A2(2,5)
A3 (8,4)
A4 (5,8)
A5 (7,5)
A6 (6,4) • Minimum distance of A7 is with K2. So A7 is added in K2.
A7 (1,2)
A8 (4,9) • New centroid of K is
A1 (2,10) • Take point A8, find the distance of it from centroids.
A2(2,5)
A3 (8,4)
A4 (5,8)
A5 (7,5) • Minimum distance of A8 is with K1. So A8 is added in K1.
A6 (6,4)
A7 (1,2) • New centroid of K is
A8 (4,9) • So clusters are : K1={A1,A4,A8} , K2={A2,A7}, K3={A3,A5,A6}
Cluster Quality
Pros & Cons.