0% found this document useful (0 votes)

9 views84 pages

Unsupervised Learning

use for machine learning

Uploaded by

yoyoyashu104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views84 pages

Unsupervised Learning

use for machine learning

Uploaded by

yoyoyashu104

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Unsupervised Learning

By
Dr. Abhilasha Chaudhuri
Assistant Professor
Department of Computer Science and engineering,
Sardar Vallabhbhai National Institute of Technology, Surat
What is unsupervised learning
■ Unsupervised learning: no predefined classes (i.e., learning by
observations vs. learning by examples: supervised)
■ Typical applications
■ As a stand-alone tool to get insight into data distribution

■ As a preprocessing step for other algorithms

■ Cluster: A collection of data objects
■ similar (or related) to one another within the same group

■ dissimilar (or unrelated) to the objects in other groups

■ Cluster analysis (or clustering, data segmentation, …)

■ Finding similarities between data according to the characteristics found
in the data and grouping similar data objects into clusters
Applications of cluster analysis
• Outlier detection: Outliers are often viewed as those “far away”
from any cluster
• Hypothesis generation and testing
■ Data reduction
■ Summarization: Preprocessing for regression, PCA, classification, and

association analysis
■ Compression: Image processing: vector quantization

■ Information retrieval: document clustering

• Land use: Identification of areas of similar land use in an earth
observation database
Applications of cluster analysis
■ Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
■ City-planning: Identifying groups of houses according to their
house type, value, and geographical location
■ Earth-quake studies: Observed earth quake epicenters should
be clustered along continent faults
■ Climate: understanding earth climate, find patterns of
atmospheric and ocean
Clustering is an optimization problem
• variability(c) = σ𝑒∈𝑐(𝑚𝑒𝑎𝑛 𝑐 , 𝑒)2
• 𝑑𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑐 = σ𝑐∈𝐶 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑐)
Major Clustering Approaches
■ Partitioning approach:
■ Construct various partitions and then evaluate them by some criterion, e.g.,

minimizing the sum of square errors

■ Typical methods: k-means, k-medoids, CLARANS

■ Hierarchical approach:
■ Create a hierarchical decomposition of the set of data (or objects) using some

criterion
■ Typical methods: Diana, Agnes, BIRCH, CAMELEON

■ Density-based approach:
■ Based on connectivity and density functions

■ Typical methods: DBSACN, OPTICS, DenClue

Hierarchical Clustering
1. Start by assigning each item to a cluster, so that if you have N items,
you now have N clusters, each containing just one item.
2. Find the closest (most similar) pair of clusters and merge them into
a single cluster
3. Continue the process until all items are clustered into a single
cluster of size N.

how will we measure closeness / distance?

Dendrogram: Shows How Clusters are Merged

Decompose data objects into a several levels of

nested partitioning (tree of clusters), called a
dendrogram

A clustering of the data objects is obtained by

cutting the dendrogram at the desired level, then
each connected component forms a cluster
• Dendrogram

9
• Dendrogram

10
Linkage Metrics
• Single-linkage: consider the distance between one cluster and another
cluster to be equal to the shortest distance from any member of one
cluster to any member of the other cluster

• Complete-linkage: consider the distance between one cluster and another

cluster to be equal to the greatest distance from any member of one
cluster to any member of the other cluster
• Average-linkage: consider the distance between one cluster and another
cluster to be equal to the average distance from any member of one
cluster to any member of the other cluster
Example of Hierarchical Clustering
• Suppose we have the aerial distance between some pair of cities
SURAT PUNE MUMBAI RAIPUR UDAIPUR NAGPUR
SURAT 0 206 963 1949 3095 2979
PUNE 0 802 1771 2934 2815
MUMBAI 0 966 2142 2013
RAIPUR 0 1235 1307
UDAIPUR 0 808
NAGPUR 0

{SURAT} {PUNE} {MUMBAI} {RAIPUR} {UDAIPUR} {NAGPUR}

{SURAT, PUNE} {MUMBAI} {RAIPUR} {UDAIPUR} {NAGPUR}
{SURAT,PUNE, MUMBAI} {RAIPUR} {UDAIPUR} {NAGPUR}
{SURAT,PUNE, MUMBAI} {RAIPUR} {UDAIPUR, NAGPUR}

{SURAT,PUNE, MUMBAI, RAIPUR} {UDAIPUR, NAGPUR} single linkage

or
{SURAT,PUNE, MUMBAI} {RAIPUR ,UDAIPUR, NAGPUR}complete linkage
K-means Algorithm
• Randomly chose k examples as initial centroids
• While true:
• Create k clusters by assigning each example to the closest centroid
• Compute k new centroids by averaging examples in each cluster
• If centroid don’t change:
• Break
complexities
• What is the complexity of k-means algorithm?
• K*N

• What is the complexity of hierarchical clustering?

• N*N
An Example of k-means clustering
An Example of k-means clustering

Initial centroids
K=4
An Example of k-means clustering

Iteration 1
An Example of k-means clustering

Iteration 2
An Example of k-means clustering
• Iteration 3
An Example of k-means clustering

Iteration 4
An Example of K-Means Clustering
• Apply K(=2)-Means algorithm over the data (185, 72), (170, 56), (168, 60),
(179,68), (182,72), (188,77) up to two iterations and show the clusters. Initially
choose first two objects as initial centroids.

• Centroid for first cluster c1 = (185, 72)

Centroid for second cluster c2 = (170, 56)
• Iteration 1: Now calculating similarity by using Euclidean distance measure as:

21
An Example of K-Means Clustering

22
An Example of K-Means Clustering

• Iteration 2: Now calculating centroid for each cluster:

23
An Example of K-Means Clustering
• Now, again calculating similarity:

24
An Example of K-Means Clustering
• After second iteration:

25
Issues with k means
• Choosing the “wrong” k can lead to strange results
• Consider k=3
• Results depend upon the initial centroids
How to choose the value of k
• A priori knowledge about application domain
• There are five different types of fabrics: k=5
• There are two types of students: k=2
• Search for a good k
• Try different values of k and evaluate quality of results
• Run hierarchical clustering on subset of data to get the idea about underlying
relationships
• High quality clusters have:
■ high intra-class similarity: cohesive within clusters
■ low inter-class similarity: distinctive between clusters
How to choose the value of k
• Empirical method
• # of clusters: k ≈√n/2 for a dataset of n points, e.g., n = 200, k = 10
• Elbow method
• The elbow method is a graphical method for finding the optimal K
value in a k-means clustering algorithm. The elbow graph shows the
within-cluster-sum-of-square (WCSS) values on the y-axis
corresponding to the different values of K (on the x-axis). The optimal
K value is the point at which the graph forms an elbow.
• Use the turning point in the curve of sum of within cluster variance
w.r.t the # of clusters
How to choose the value of k
How to choose the value of k
• Cross validation method
• Divide a given data set into m parts
• Use m – 1 parts to obtain a clustering model
• Use the remaining part to test the quality of the clustering
• E.g., For each point in the test set, find the closest centroid, and use
the sum of squared distance between all points in the test set and the
closest centroids to measure how well the model fits the test set
• For any k > 0, repeat it m times, compare the overall quality measure
w.r.t. different k’s, and find # of clusters that fits the data the best
Unlucky initial centroids
Unlucky initial centroids

After
convergence
Mitigating dependence on initial centroids
• Try multiple sets of randomly chosen centroids
• Select “best” results

best = kMeans(points)
for t in range(numTrials):
C = kMeans(points)
If dissimilarity (C) < dissimilarity(best):
Best = C
Return best
An Example
• Many patients with 4 features each
• Heart rate in beats per minute
• Number of past heart attacks
• Age
• ST elevation (binary)
• Outcome (death) based on features
• Probabilistic, not deterministic
• E.g., older people with multiple heart attacks are at higher risl
• Cluster and examine purity of cluster relative to outcomes
Distance Metrices
• Euclidean
• Manhattan
• Minskowski
• Mahanalobis
• Hamming
Euclidean distance
• For two points, 𝑋 = (𝑥1 , 𝑥2 , 𝑥3 . . . . . 𝑥𝑛 ) and 𝑌 = (𝑦1 , 𝑦2 , 𝑦3 . . . . . 𝑦𝑛 ) in an
n-dimensional space, the Euclidean distance is calculated as:

𝐷𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 (𝑋, 𝑌) = σ𝑛𝑖=1(𝑥𝑖 − 𝑦𝑖 )2

In vector format:
𝑥1 𝑦1
𝑥2 𝑦2
𝑋= ⋮ 𝑌= ⋮
𝑥𝑛 𝑦𝑛

2
𝐷𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝑋, 𝑌 = 𝑋 − 𝑌 ∙ 𝑋 − 𝑌 = 𝑋 − 𝑌 𝑇 (𝑋 − 𝑌)
Euclidean distance

• When to use it:

• When you have continuous numerical data.
• When the data dimensions are of a similar scale or have been standardized.
• In algorithms like K-Means clustering and as the default for KNN.
• Limitations:
• Scale Variance: Euclidean distance is sensitive to the scale of the features. If one
feature has a much larger range of values than others, it will dominate the distance
calculation. Therefore, feature scaling (e.g., standardization or normalization) is often
a necessary preprocessing step.
Manhattan Distance
• Also known as City Block distance, the Manhattan distance is the sum of the absolute differences
of their Cartesian coordinates. It measures the distance a taxi would have to travel along a grid of
streets to get from point X to point Y.

𝐷𝑚𝑎𝑛ℎ𝑎𝑡𝑡𝑎𝑛 (𝑋, 𝑌) = σ𝑛𝑖=1 𝑥𝑖 − 𝑦𝑖

• When to use it:

• When dealing with high-dimensional data, as it is less affected by the curse of dimensionality compared
to Euclidean distance.
• For datasets with categorical features that have been one-hot encoded.
• When the individual feature differences are more important than their squared differences.
• Limitations:
• It can be less intuitive than Euclidean distance in some contexts as it doesn't represent the shortest
path.
• Like Euclidean distance, it is also sensitive to the scale of the features.
Minkowski Distance
• Minkowski distance is a generalized metric that can represent both
Euclidean and Manhattan distances by varying a parameter, 'p’.

𝐷𝑚𝑖𝑛𝑘𝑜𝑤𝑠𝑘𝑖 (𝑋, 𝑌) = (σ𝑛𝑖=1 𝑥𝑖 − 𝑦𝑖 𝑝 )1/𝑝

• When p = 1, it becomes the Manhattan distance.

• When p = 2, it becomes the Euclidean distance.
• As p approaches infinity, it becomes the Chebyshev distance.
• When to use it:
• When you want to experiment with different distance metrics by tuning the
parameter 'p'.
• It offers flexibility in defining the distance based on the problem's characteristics.
Mahanalobis
𝑥 𝑦
Distance
1 1
𝑥2 𝑦2
𝑋= ⋮ 𝑌= ⋮
𝑥𝑛 𝑦𝑛

𝐷𝑚𝑎ℎ𝑎𝑛𝑎𝑙𝑜𝑏𝑖𝑠 (𝑋, 𝑌) = 𝑋 − 𝑌 𝑇 𝑆 −1 (𝑋 − 𝑌)
Here S is the covariance matrix
If there is no correlation between any two dimention in 2- dimensional
data (i.e., data is uniformly distributed in the space). Then
1 0
𝑆= i.e. σ2 =1
0 1
It means Mahanalobis distance is essentially the Euclidean distance. But,
whenever there is some correlation between the data points it takes care of
it and gives us the better measurement of distance.
Hamming Distance
• The Hamming distance is used for comparing two binary strings of equal length. It
is the number of positions at which the corresponding symbols are different.
• For two strings x and y of equal length n, the Hamming distance is given by:

• 𝐷𝐻𝑎𝑚𝑚𝑖𝑛𝑔 (𝑥, 𝑦) = σ𝑛𝑖=1(𝑥𝑖 ≠ 𝑦𝑖 )

• When to use it:

• For comparing categorical variables (e.g., one-hot encoded vectors).
• In information theory for error detection and correction in digital communication.
• Limitations:
• It is only defined for strings or vectors of the same length.
• It is most suitable for binary or categorical data, not continuous numerical data.
Density-Based Clustering:
DBSCAN Example

42
Density-Based Clustering: Basic
Concepts
■ Two parameters:
■ Eps: Maximum radius of the neighbourhood
■ MinPts: Minimum number of points in an Eps-
neighbourhood of that point
■ NEps(q): {p belongs to D | dist(p,q) ≤ Eps}
■ Directly density-reachable: A point p is directly
density-reachable from a point q w.r.t. Eps, MinPts if

■ p belongs to NEps(q) p MinPts = 5

■ core point condition: Eps = 1 cm
|NEps (q)| ≥ MinPts q

57
Density-Reachable and Density-Connected

■ Density-reachable:
■ A point p is density-reachable from p
a point q w.r.t. Eps, MinPts if there p1
is a chain of points p1, …, pn, p1 = q
q, pn = p such that pi+1 is directly
density-reachable from pi
■ Density-connected
■ A point p is density-connected to a p q
point q w.r.t. Eps, MinPts if there is
a point o such that both, p and q o
are density-reachable from o w.r.t.
Eps and MinPts
58
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
■ Relies on a density-based notion of cluster: A cluster is
defined as a maximal set of density-connected points
■ Discovers clusters of arbitrary shape in spatial databases
with noise

Outlier

Border
Eps = 1cm

Core MinPts = 5

59
DBSCAN: Sensitive to Parameters

DBSCAN online Demo:

http://webdocs.cs.ualberta.ca/~yaling/Cluster/Applet/Code/Cluster.html
60
DBSCAN: Example
minPts = 3 and epsilon = 1.9

61
DBSCAN: Example
minPts = 3 and epsilon = 1.9

62
DBSCAN: Example
minPts = 3 and epsilon = 1.9

63
DBSCAN: Example
minPts = 3 and
=3 epsilon = 1.9

64
• Example: Given the points A(3, 7), B(4, 6), C(5, 5), D(6, 4), E(7, 3), F(6, 2), G(7, 2) and H(8, 4), Find the core points and outliers using
DBSCAN. Take Eps = 2.5 and MinPts = 3

• Step 1: To find the core points, outliers and clusters by using DBSCAN we need to first calculate the distance among all pairs of
given data point. Let us use Euclidean distance measure for distance calculation

65
66
• The final distance matrix becomes as shown below:

• In the above table, Distance ≤ Epsilon (i.e. 2.5) is marked red.

67
• Step 2: Now, finding all the data points that lie in the Eps-neighborhood of each data
points. That is, put all the points in the neighborhood set of each data point whose
distance is <=2.5.
• N(A) = {B}; — — — — — — -→ because distance of B is <= 2.5 with A
N(B) = {A, C}; — — — — — → because distance of A and C is <= 2.5 with B
N(C) = {B, D}; — — — — —→ because distance of B and D is <=2.5 with C
N(D) = {C, E, F, G, H}; — → because distance of C, E, F,G and H is <=2.5 with D
N(E) = {D, F, G, H}; — — → because distance of D, F, G and H is <=2.5 with E
N(F) = {D, E, G}; — — — — → because distance of D, E and G is <=2.5 with F
N(G) = {D, E, F, H}; — — -→ because distance of D, E, F and H is <=2.5 with G
N(H) = {D, E, G}; — — — — → because distance of D, E and G is <=2.5 with H
• Here, data points A, B and C have neighbors <= MinPts (i.e. 3) so can’t be considered as
core points. Since they belong to the neighborhood of other data points, hence there
exist no outliers in the given set of data points.
• Data points D, E, F, G and H have neighbors >= MinPts (i.e. 3) and hence are the core
data points.

68
Measuring Clustering Quality
■ 2 kinds of measures: External, internal
■ External: supervised, employ criteria not inherent to the
dataset
■ Compare a clustering against prior or expert-specified
knowledge (i.e., the ground truth) using certain clustering
quality measure
■ Internal: unsupervised, criteria derived from data itself
■ Evaluate the goodness of a clustering by considering how
well the clusters are separated, and how compact the
clusters are, e.g., Silhouette coefficient

69
Measuring Clustering Quality
■ Silhouette coefficient:
• For a data set, D, of n objects, suppose D is partitioned into k clusters, C1, : : : ,Ck.
• For each object o ∈ D, we calculate a(o) as the average distance between o and all
other objects in the cluster to which o belongs. (cohesion)
• Similarly, b(o) is the minimum average distance from o to all clusters to which o does
not belong. (separation)
• The silhouette coefficient of o is then defined as

• The value of the silhouette coefficient is between -1 and 1. The value of a.(o) reflects
the compactness of the cluster to which o belongs. The smaller the value, the more
compact the cluster.
• The value of b(o) captures the degree to which o is separated from other clusters. The
larger b(o) is, the more separated o is from other clusters.
70
Measuring Clustering Quality

• Therefore, when the silhouette coefficient value of o approaches 1, the cluster containing o is compact and o
is far away from other clusters, which is the preferable case.
• However, when the silhouette coefficient value is negative (i.e., b(o) < a(o)), this means that, in expectation,
o is closer to the objects in another cluster than to the objects in the same cluster as o. this is a bad situation
and should be avoided.
• Silhouette Plot:

71
Measuring Clustering Quality

• Interpretation of Silhouette Plot:

72
Example

• Consider a scenario with 3 clusters, where each cluster has 3 (2-dimensional points)
• Cluster 1:
Point A1: (2, 5)
Point A2: (3, 4)
Point A3: (4, 6)

Cluster 2:
Point B1: (8, 3)
Point B2: (9, 2)
Point B3: (10, 5)

Cluster 3:
Point C1: (6, 10)
Point C2: (7, 8)
Point C3: (8, 9)
• calculate the silhouette coefficient for Point A1

73
• Step 1: Calculate Cohesion for Point A1
• Distance from Point A1 to Point A2: sqrt((2–3)² + (5–4)²) = sqrt(2)
• Distance from Point A1 to Point A3: sqrt((2–4)² + (5–6)²) = sqrt(5)
• Average cohesion for Point A1 = (sqrt(2) + sqrt(5)) / 2 ≈ 1.825
• Step 2: Calculate Separation for Point A1
• Distance from Point A1 to Point B1: sqrt((2–8)² + (5–3)²) = sqrt(40)
• Distance from Point A1 to Point B2: sqrt((2–9)² + (5–2)²) = sqrt(58)
• Distance from Point A1 to Point B3: sqrt((2–10)² + (5–5)²) = sqrt(64)
• Average separation for Point A1 = (sqrt(40) + sqrt(58) + sqrt(64)) / 3 ≈ 7.313
• Distance from Point A1 to Point C1: sqrt((2–6)² + (5–10)²) = sqrt(26)
• Distance from Point A1 to Point C2: sqrt((2–7)² + (5–8)²) = sqrt(18)
• Distance from Point A1 to Point C3: sqrt((2–8)² + (5–9)²) = sqrt(32)
• Average separation for Point A1 = (sqrt(26) + sqrt(18) + sqrt(32)) / 3 ≈ 5.333

74
• Step 3: Calculate Silhouette Coefficient for Point A1
• silhouette coefficient = (separation — cohesion) / max(separation,
cohesion)
• silhouette coefficient = (5.333–1.825) / max(5.333, 1.825) ≈ 0.657
• Step 4: Average Silhouette Coefficient
• Calculate the average silhouette coefficient across all data points to
obtain the overall silhouette score for the clustering result.

75
Gaussian Mixture Model
Gaussian Mixture Model using Expectation
Maximization technique
• Consider a Gaussian Distribution
• Univariate density
• Multivariate density
• Maximum Likelihood Estimation technique
• Log of Gaussian distribution, take the derivative and equate it to zero
Gaussian Distribution
• To understand the GMM first we have to understand the Gaussian
distribution

• To describe this bell curve/ Gaussian distribution we need two

quantities mean (µ) and variance (σ)
Gaussian Distribution
• Once you have these values of (µ) and (σ) you can pick any point x
and you can get likelihood of point x coming out of this gaussian
distribution.

• You can also have the bell curve in more than one dimensions
Gaussian Distribution
• In this cate the mean will be represented by a vector. With one mean
for each dimension.
• Mean = µ ➔ [ µ1, µ2]

• And sigma will be represented by the matrix containing pair wise

variances for each dimensions.
• Variance = σ ➔ Covariance matrix σ [[V11, V12], [V21,V22]]
• And simple multiplication will be replaced by matrix multiplication
Gaussian Mixture model using Expectation
Maximization
• Just like any clustering model, it is iterative and it has two main steps
• Assignment
• Update
• In the initialization phase we randomly assign our gaussians for these data
points, just like centroids in the k-means
• Then calculate the likelihood of each data point for coming from each of
these gaussians.
• In Update phase, we convert these likelihood to weights by dividing each
likelihood vector by its sum.
• Then based on these weights we adjust the mu and sigma for our
gaussians; which in turn will adjust the position of the gaussians
Assignment Update
Assignment Update
Assignment Update

Clustering Part1
No ratings yet
Clustering Part1
84 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
Unsupervised Methods Overview
No ratings yet
Unsupervised Methods Overview
26 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
55 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
57 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Module 4
No ratings yet
Module 4
63 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Clustering
No ratings yet
Clustering
24 pages
Clustering
No ratings yet
Clustering
75 pages
Week 9
No ratings yet
Week 9
66 pages
Clustering
No ratings yet
Clustering
20 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Week 10
No ratings yet
Week 10
84 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Machine Learning Clustering Guide
No ratings yet
Machine Learning Clustering Guide
80 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Clustering
No ratings yet
Clustering
125 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
10 Lecture AI 10
No ratings yet
10 Lecture AI 10
48 pages
Day 3
No ratings yet
Day 3
74 pages
SLide#4 - Clustering and Elbow Technique
No ratings yet
SLide#4 - Clustering and Elbow Technique
29 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Unit 2
No ratings yet
Unit 2
89 pages
PCA Notes
No ratings yet
PCA Notes
10 pages
Numerical - Questions - Module 1
No ratings yet
Numerical - Questions - Module 1
2 pages
Unit1 - Introduction To Management - Civil A
No ratings yet
Unit1 - Introduction To Management - Civil A
57 pages
Lecture3-Bayesian Decision Theory
No ratings yet
Lecture3-Bayesian Decision Theory
57 pages
Production Rules
No ratings yet
Production Rules
57 pages
Wireless Signal Detection with Uni-DNN
No ratings yet
Wireless Signal Detection with Uni-DNN
6 pages
Fiza Project
No ratings yet
Fiza Project
73 pages
Student - Grade Prediction
No ratings yet
Student - Grade Prediction
12 pages
Linear Programming Guide
No ratings yet
Linear Programming Guide
23 pages
Chapter 2 & 3 LP
No ratings yet
Chapter 2 & 3 LP
74 pages
Chemical Engineering Math
No ratings yet
Chemical Engineering Math
2 pages
PDDL
No ratings yet
PDDL
14 pages
Cahcet Updated Algorithm Lab Manual
No ratings yet
Cahcet Updated Algorithm Lab Manual
63 pages
Design of Rigid Pavements 2 PDF
No ratings yet
Design of Rigid Pavements 2 PDF
5 pages
Control Systems (CS) : Lecture-7 Routh-Herwitz Stability Criterion
No ratings yet
Control Systems (CS) : Lecture-7 Routh-Herwitz Stability Criterion
30 pages
Position Control of 4DOF Manipulator Arm
No ratings yet
Position Control of 4DOF Manipulator Arm
7 pages
Fundamental Algorithms, Assignment 4 Solutions
No ratings yet
Fundamental Algorithms, Assignment 4 Solutions
3 pages
Quantum Information Theory
No ratings yet
Quantum Information Theory
19 pages
Self-Directed Online Machine Learning For Topology
No ratings yet
Self-Directed Online Machine Learning For Topology
19 pages
Confusion Matrix Examples
No ratings yet
Confusion Matrix Examples
2 pages
Cse Lab Manual Datastructure PDF
No ratings yet
Cse Lab Manual Datastructure PDF
40 pages
9 Roc Auc
No ratings yet
9 Roc Auc
27 pages
Unit 5 - QB
No ratings yet
Unit 5 - QB
7 pages
Machine Learning Tutorial - 1 PDF
0% (1)
Machine Learning Tutorial - 1 PDF
1 page
Mae Syllabus
No ratings yet
Mae Syllabus
113 pages
DFA Minimization
100% (1)
DFA Minimization
24 pages
SQC CH6
0% (2)
SQC CH6
63 pages
Moments
No ratings yet
Moments
15 pages
Statistical Mechanics & Ensemble Theory
No ratings yet
Statistical Mechanics & Ensemble Theory
51 pages
Convolutional Coding Presentation
No ratings yet
Convolutional Coding Presentation
23 pages
Math-I (MA1003)
No ratings yet
Math-I (MA1003)
8 pages
Review of Probability by Leo Breiman - Keith Hirst
No ratings yet
Review of Probability by Leo Breiman - Keith Hirst
2 pages
10 1 1 83 586 PDF
No ratings yet
10 1 1 83 586 PDF
31 pages
(Graduate Studies in Mathematics 199) Weinan E - Tiejun Li - Eric Vanden-Eijnden - Applied Stochastic Analysis-American Mathematical Society (2019)
No ratings yet
(Graduate Studies in Mathematics 199) Weinan E - Tiejun Li - Eric Vanden-Eijnden - Applied Stochastic Analysis-American Mathematical Society (2019)
330 pages

Unsupervised Learning

Uploaded by

Unsupervised Learning

Uploaded by

Unsupervised Learning

■ As a preprocessing step for other algorithms

■ dissimilar (or unrelated) to the objects in other groups

■ Cluster analysis (or clustering, data segmentation, …)

■ Information retrieval: document clustering

minimizing the sum of square errors

■ Typical methods: DBSACN, OPTICS, DenClue

how will we measure closeness / distance?

Decompose data objects into a several levels of

A clustering of the data objects is obtained by

• Complete-linkage: consider the distance between one cluster and another

{SURAT} {PUNE} {MUMBAI} {RAIPUR} {UDAIPUR} {NAGPUR}

{SURAT,PUNE, MUMBAI, RAIPUR} {UDAIPUR, NAGPUR} single linkage

• What is the complexity of hierarchical clustering?

• Centroid for first cluster c1 = (185, 72)

• Iteration 2: Now calculating centroid for each cluster:

𝐷𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 (𝑋, 𝑌) = σ𝑛𝑖=1(𝑥𝑖 − 𝑦𝑖 )2

• When to use it:

𝐷𝑚𝑎𝑛ℎ𝑎𝑡𝑡𝑎𝑛 (𝑋, 𝑌) = σ𝑛𝑖=1 𝑥𝑖 − 𝑦𝑖

• When to use it:

𝐷𝑚𝑖𝑛𝑘𝑜𝑤𝑠𝑘𝑖 (𝑋, 𝑌) = (σ𝑛𝑖=1 𝑥𝑖 − 𝑦𝑖 𝑝 )1/𝑝

• When p = 1, it becomes the Manhattan distance.

• 𝐷𝐻𝑎𝑚𝑚𝑖𝑛𝑔 (𝑥, 𝑦) = σ𝑛𝑖=1(𝑥𝑖 ≠ 𝑦𝑖 )

• When to use it:

■ p belongs to NEps(q) p MinPts = 5

DBSCAN online Demo:

• In the above table, Distance ≤ Epsilon (i.e. 2.5) is marked red.

• Interpretation of Silhouette Plot:

• To describe this bell curve/ Gaussian distribution we need two

• And sigma will be represented by the matrix containing pair wise

You might also like