0% found this document useful (0 votes)

14 views79 pages

Unsupervised Learning

The document discusses unsupervised learning, focusing on clustering techniques to group similar data points without predefined labels. It covers various clustering methods including K-means, Fuzzy C-means, and Hierarchical clustering, detailing their algorithms, advantages, and disadvantages. Applications of clustering in marketing, fashion, and document organization are also highlighted, emphasizing the importance of defining similarity and cluster quality.

Uploaded by

harshranjan1123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views79 pages

Unsupervised Learning

Uploaded by

harshranjan1123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

Unsupervised

Learning
Motivation
The goal of clustering is to group
data points that are close (or
similar) to each other, identify
such groupings (or clusters) in an
unsupervised manner

How to define similarity ?

How many iterations for
checking cluster quality ?
Introduction
Supervised learning: discover patterns in the data with known target
(class) or label.
These patterns are then utilized to predict the values of the target attribute
in future data instances.
Examples ?
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
Can we perform regression here ?
Examples ?
Cluster A cluster is represented by a
single point, known as centroid
(or cluster center) of the cluster.

Centroid is computed as the

mean of all data points in a
𝐶𝑗 = ∑ 𝑥𝑖
cluster

Cluster boundary is decided by

the farthest data point in the
cluster.
Applications
Example 1: groups people of similar sizes together to make “small”,
“medium” and “large” T-Shirts.
Tailor-made for each person: too expensive
One-size-fits-all: does not fit all.

Example 2: In marketing, segment customers according to their

similarities
To do targeted marketing.

Example 3: Given a collection of text documents, we want to

organize them according to their content similarities,
To produce a topic hierarchy
Types of clustering
Clustering: Task of grouping a set of data points such that data points in
the same group are more similar to each other than data points in
another group (group is known as cluster)
It groups data instances that are similar to (near) each other in one cluster and data
instances that are very different (far away) from each other into different clusters.

Types:
1. Exclusive Clustering: K-means
2. Overlapping Clustering: Fuzzy C-means
3. Hierarchical Clustering: Agglomerative clustering, divisive
clustering
4. Probabilistic Clustering: Mixture of Gaussian models
1. Exclusive clustering: K-means
Basic idea: randomly initialize the k cluster centers, and iterate between the two
steps we just saw.
1. Randomly initialize the cluster centers, c1, ..., cK
2. Given cluster centers, determine points in each cluster
For each point p, find the closest ci. Put p into cluster i
3. Given points in each cluster, solve for ci
Set ci to be the mean of points in cluster i
4. If ci have changed, repeat Step 2
K-means contd..
Algorithm
Begin
initialize n, c, 1, 2, …, c(randomly selected)
do classify n samples according to
nearest i
recompute i
until no change in i
return 1, 2, …, c
End
K-means example

This slide is taken from: Andrew

Moore
Example contd..

This slide is taken from: Andrew

Moore
Example contd..

This slide is taken from: Andrew

Moore
Example contd..

This slide is taken from: Andrew

Moore
Example contd..

This slide is taken from: Andrew

Moore
Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
• Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
• The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
• Ρ(a, b) = |x2 – x1| + |y2 – y1|
• Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution
Iteration-01: Calculating Distance Between A1(2, 10) and C1(2, 10)-

We calculate the distance of each Ρ(A1, C1)

point from each of the center of the
three clusters. = |x2 – x1| + |y2 – y1|
The distance is calculated by using
the given distance function. = |2 – 2| + |10 – 10|

Calculating Distance Between A1(2, 10) and C2(5, 8)-

Ρ(A1, C2)

= |x2 – x1| + |y2 – y1|

= |5 – 2| + |8 – 10|
=3+2

=5
Calculating Distance Between A1(2, 10) and C3(1, 2)-

Ρ(A1, C3)

= |x2 – x1| + |y2 – y1|

= |1 – 2| + |2 – 10|

=1+8

In the similar manner, we calculate the distance of other points

from each of the center of the three clusters.
From here, New clusters are-
Distance from Distance from Distance from
Point belongs
Given Points center (2, 10) center (5, 8) of center (1, 2) of Cluster-01:
to Cluster
of Cluster-01 Cluster-02 Cluster-03
First cluster contains points-
A1(2, 10) 0 5 9 C1
A1(2, 10)

A2(2, 5) 5 6 4 C3 Cluster-02:
A3(8, 4) 12 7 9 C2 Second cluster contains points-
A4(5, 8) 5 0 10 C2
A3(8, 4)
A5(7, 5) 10 5 9 C2 A4(5, 8)
A5(7, 5)
A6(6, 4) 10 5 7 C2 A6(6, 4)
A8(4, 9)
A7(1, 2) 9 10 0 C3
Cluster-03:
A8(4, 9) 3 2 10 C2 Third cluster contains points-

A2(2, 5)
A7(1, 2)
Now,
•We re-compute the new cluster clusters.
•The new cluster center is computed by taking mean of all
the points contained in that cluster.
For Cluster-01:

We have only one point A1(2, 10) in Cluster-01.

So, cluster center remains the same.

For Cluster-02:

Center of Cluster-02

= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)

= (6, 6)

For Cluster-03:

Center of Cluster-03

= ((2 + 1)/2, (5 + 2)/2)

= (1.5, 3.5)

This is completion of Iteration-01.

Iteration-02:

We calculate the distance of each point from each of the center of the three clusters.
The distance is calculated by using the given distance function.

The following illustration shows the calculation of distance between point A1(2, 10) and each of the center of the
three clusters-

Calculating Distance Between A1(2, 10) and C1(2, 10)-

Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0

Calculating Distance Between A1(2, 10) and C2(6, 6)-

Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |6 – 2| + |6 – 10|
=4+4
=8
Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-

Ρ(A1, C3)

= |x2 – x1| + |y2 – y1|

= |1.5 – 2| + |3.5 – 10|

= 0.5 + 6.5

In the similar manner, we calculate the distance of other points from each of the center of the
three clusters.
New clusters are-
Cluster-01:

First cluster contains points-

Distance from Distance from Distance from
Given Point belongs
center (2, 10) center (6, 6) of center (1.5, 3.5) A1(2, 10)
Points to Cluster
of Cluster-01 Cluster-02 of Cluster-03
A8(4, 9)
A1(2,
0 8 7 C1 Cluster-02:
10)
Second cluster contains points-
A2(2, 5) 5 5 2 C3
A3(8, 4)
A3(8, 4) 12 4 7 C2 A4(5, 8)
A4(5, 8) 5 3 8 C2 A5(7, 5)
A6(6, 4)
A5(7, 5) 10 2 7 C2
Cluster-03:
A6(6, 4) 10 2 5 C2 Third cluster contains points-

A7(1, 2) 9 9 2 C3 A2(2, 5)
A7(1, 2)
A8(4, 9) 3 5 8 C1
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points
contained in that cluster.
For Cluster-01:
Center of Cluster-01 After second iteration, the center of the three clusters are-
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5) C1(3, 9.5)
C2(6.5, 5.25)
For Cluster-02: C3(1.5, 3.5)
Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)

For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-02.

Use K-Means Algorithm to create two clusters-
Assume A(2, 2) and C(1, 1) are centers of the two
clusters.

Distance Formula : sqrt [ (x2 – x1)2 + (y2 – y1)2 ]

Contd..
Pros
Simple, fast to compute
Converges to local minimum of within-cluster
squared error

Cons
Setting k?
Sensitive to initial centers
Sensitive to outliers
Detects spherical clusters
Assuming means can be computed

Image source:
Google
2. Fuzzy C-Means Clustering
One data point may belong to two or more cluster with
different memberships.
Objective function:

where 1 ≤ m< ∞
An extension of k-means
Fuzzy c-means algorithm
Let x be a vector of values for data point g .
i i

1. Initialize membership U(0) = [ uij ] for data point gi of cluster clj by

random
2. At the k-th step, compute the fuzzy centroid C(k) = [ cj ] for j =
1, ..,
nc, where nc is the number ofnclusters, using

i1
c j  m
(u
n
ij ) xi

 (uij )m
i1

where m is the fuzzy parameter and n is

the number of data points.
Fuzzy c-means algorithm
3. Update the fuzzy membership U(k) = [ uij ], using

1
 1  m1


uij  xi  c j 
 1
 nc 
1  m1

 x 

j 1
 i c j


4. If ||U(k) – U(k-1)|| < , then STOP, else return to step 2.

5. Determine membership cutoff
🞄 For each data point gi, assign gi to cluster clj if uij of
U(k) > 
Fuzzy c-means
Pros:
Allows a data point to be in multiple clusters
A more natural representation of the behavior of genes
genes usually are involved in multiple functions
Cons:
Need to define c (k in K-means), the number of clusters
Need to determine membership cutoff value
Clusters are sensitive to initial assignment of centroids
Fuzzy c-means is not a deterministic algorithm
3. Hierarchical
Clustering
Hierarchical clustering is
another unsupervised
machine learning
algorithm, which is used to
group the unlabeled
datasets into a cluster and
also known as hierarchical
cluster analysis or HCA.
Produce a nested sequence of
clusters, a tree, also called
Dendrogram.
Types of hierarchical clustering
Agglomerative (bottom up) clustering: It builds the dendrogram (tree)
from the bottom level, and
• merges the most similar (or nearest) pair of clusters
• stops when all the data points are merged into a single cluster (i.e., the
root cluster).
Divisive (top down) clustering: It starts with all data points in one
cluster, the root.
• Splits the root into a set of child clusters. Each child cluster is
recursively divided further
• stops when only singleton clusters of individual data points remain,
i.e.,
each cluster with only a single point

This slide is taken from Bing Liu, 31

UIC
Agglomerative clustering
• Agglomerative clustering is a type of data clustering method used in
unsupervised learning. It is an iterative process that groups similar
objects into clusters based on some measure of similarity.
• Agglomerative clustering uses a bottom-up approach for dividing data
points into clusters. It starts with individual data points and merges
them into larger clusters until all of the objects are clustered together.

32
How the Agglomerative Hierarchical
clustering Work?

The agglomerative hierarchical clustering algorithm follows a systematic process to build the cluster hierarchy. Below is a
step-by-step explanation:
1. Initialize Clusters:
1. Treat each data point as a separate cluster. For a dataset with n n n points, there are initially n n n clusters.

2. Compute Distances:
1. Calculate the distance (dissimilarity) between all pairs of clusters using a chosen distance metric, such as Euclidean or Manhattan
distance.

3. Merge Closest Clusters:

1. Identify the pair of clusters with the smallest distance and merge them into a single cluster, reducing the total number of clusters by
one.

4. Update Distance Matrix:

1. Recalculate distances between the new cluster and all remaining clusters, using a linkage method to define how distances are
computed between clusters.

5. Repeat:
1. Continue merging clusters and updating the distance matrix until all data points are combined into a single cluster.

6. Construct Dendrogram:
1. Represent the merging process as a dendrogram, where the x-axis lists data points, and the y-axis indicates the distance at which
clusters are merged.
• Step-1: Create each data point as a
single cluster. Let's say there are N data
points, so the number of clusters will also
be N.

• Step-2: Take two closest data points or

clusters and merge them to form one
cluster. So, there will now be N-1
clusters.

• Step-3: Again, take the two closest clusters and

merge them together to form one cluster. There
will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left. So, we will
get the following clusters. Consider the below images:

• Step-5: Once all the clusters are combined into one big
cluster, develop the dendrogram to divide the clusters as
per the problem.
Measure for the distance between two clusters

• Single Linkage: It is the Shortest Distance between the closest points

of the clusters.
• Complete Linkage: It is the farthest distance between the two points
of two different clusters. It is one of the popular linkage methods as it
forms tighter clusters than single-linkage.
• Average Linkage: It is the linkage method in which the distance
between each pair of datasets is added up and then divided by the total
number of datasets to calculate the average distance between two
clusters. It is also one of the most popular linkage methods.
• Centroid Linkage: It is the linkage method in which the distance
between the centroid of the clusters is calculated.
Solved Example (Single Linkage)
Find the points with minimum distance i.e. 42 and 43
Construct new matrix as per new cluster formed and find minimum distance between the points
Reconstruct the matrix with new cluster and find minimum distance and reform clusters
Final clusters are (42,43) and (25,
27, 22, 18)
Average Linkage
{b}
Hierarchical
Pros
clustering
Dendograms are great for visualization
Provides hierarchical relations between clusters
Shown to be able to capture concentric clusters
Cons
Not easy to define levels for clusters
Experiments showed that other clustering techniques outperform
hierarchical clustering
Density-Based Clustering

• Density-based clustering is based on the idea that clusters are regions of high
density separated by regions of low density.
• The algorithm works by first identifying "core" data points, which are data
points that have a minimum number of neighbors within a specified distance.
These core data points form the center of a cluster.
• Next, the algorithm identifies "border" data points, which are data points that
are not core data points but have at least one core data point as a neighbor.
• Finally, the algorithm identifies "noise" data points, which are data points that
are not core data points or border data points.
DBSCAN Clustering
• DBSCAN, which stands for Density-Based Spatial Clustering of Applications with
Noise, is a powerful clustering algorithm that groups points that are closely
packed together in data space.
• DBSCAN is a density-based clustering algorithm that groups data points that
are closely packed together and marks outliers as noise based on their density
in the feature space. It identifies clusters as dense regions in the data space,
separated by areas of lower density.
• Unlike some other clustering algorithms, DBSCAN doesn't require specifying
the number of clusters beforehand, making it particularly useful for
exploratory data analysis.
DBSCAN revolves around three key
concepts:
1.Core Points: These are points that have at
least a minimum number of other points
(MinPts) within a specified distance (ε or
epsilon).
2.Border Points: These are points that are
within the ε distance of a core point but
don't have MinPts neighbors themselves.
3.Noise Points: These are points that are
neither core points nor border points.
They're not close enough to any cluster to
be included.
DBSCAN uses two main parameters:
• ε (epsilon): The maximum distance
between two points for them to be
considered as neighbors.
• MinPts: The minimum number of points
required to form a dense region.
How Does DBSCAN Work?
DBSCAN operates by examining the neighborhood of each point in the dataset. The algorithm follows a step-by-

step process to identify clusters based on the density of data points:

1.Parameter Selection
• Choose ε (epsilon): The maximum distance between two points for them to be considered as neighbors.
• Choose MinPts: The minimum number of points required to form a dense region.
2.Select a Starting Point
• The algorithm starts with an arbitrary unvisited point in the dataset.
3.Examine the Neighborhood
• It retrieves all points within the ε distance of the starting point.
• If the number of neighboring points is less than MinPts, the point is labeled as noise (for now).
• If there are at least MinPts points within ε distance, the point is marked as a core point, and a new cluster is formed.
4.Expand the Cluster
• All the neighbors of the core point are added to the cluster.
• For each of these neighbors:
• If it's a core point, its neighbors are added to the cluster recursively.
• If it's not a core point, it's marked as a border point, and the expansion stops.
5.Repeat the Process
• The algorithm moves to the next unvisited point in the dataset.
• Steps 3-4 are repeated until all points have been visited.
6.Finalize Clusters
• After all points have been processed, the algorithm identifies all clusters.
• Points initially labeled as noise might now be border points if they're within ε distance of a core point.
7.Handling Noise
Solved Example
Using the parameters form clusters from P1 – P12 where epsilon in <= 1.9
Find the status of points as Core, Noise and Border such that in a cluster for a
given point if points are >= minPts than make that point “Core” otherwise
Noise and if a point classified as Noise is in the cluster of core point than update
its status as Border
Final cluster formed are:
What is dimensionality reduction?
• Dimensionality reduction in machine learning is the process of
reducing the number of features or variables in a dataset while
retaining as much of the original information as possible. In other
words, it is a way of simplifying the data by reducing its complexity.
• The need for dimensionality reduction arises when a dataset has a
large number of features or variables. Having too many features can
lead to overfitting and increase the complexity of the model.
Why Dimensionality Reduction is Important
Dimensionality reduction brings many advantages to your machine
learning data, including:
• Fewer features mean less complexity
• You will need less storage space because you have fewer data
• Fewer features require less computation time
• Model accuracy improves due to less misleading data
• Algorithms train faster thanks to fewer data
• Reducing the data set’s feature dimensions helps visualize the data faster
• It removes noise and redundant features
Dimensionality Reduction
Techniques
• Dimensionality reduction techniques can be broadly
divided into two categories:
• Feature Selection
• Feature Extraction
Feature Selection
It involves selecting a subset of relevant features from the original feature
set to reduce the feature space while improving the model’s performance
by reducing computational power. When a dataset has too many features,
some may be irrelevant or add noise which can slow down training
process and reduce accuracy but it helps in building simpler, faster and
more accurate models and also helps in reducing overfitting.
There are various algorithms used for feature selection and are grouped
into three main categories:
1.Filter Methods

2.Wrapper Methods

3.Embedded Methods
Feature Extraction
• Feature extraction is a machine learning technique that
reduces the number of resources required for processing
while retaining significant or relevant information.
• In other words, feature extraction entails constructing new
features that retain the key information from the original
data but in a more efficient manner transforming raw data
into a set of numerical features that a computer program
can easily understand and use.
• Dimensionality reduction approaches include Principal
Component Analysis (PCA), Linear Discriminant Analysis
(LDA), and t-SNE.
Principal Component
Analysis(PCA)
• PCA is a statistical technique introduced by
mathematician Karl Pearson in 1901. It works by
transforming high-dimensional data into a lower-
dimensional space while maximizing the variance
(or spread) of the data in the new space. This helps
preserve the most important patterns and relationships
in the data.
Steps
• Standardize the data − PCA requires that the data be standardized to have
zero mean and unit variance.
• Compute the covariance matrix − PCA computes the covariance matrix of
the standardized data.
• Compute the eigenvectors and eigenvalues of the covariance
matrix − PCA then computes the eigenvectors and eigenvalues of the
covariance matrix.
• Select the principal components − PCA selects the principal components
based on their corresponding eigenvalues, which indicate the amount of
variation in the data explained by each component.
• Project the data onto the new feature space − PCA projects the data
onto the new feature space defined by the selected principal components.
Algorithm
Solved Example
Advantages of PCA
Following are the advantages of using Principal Component Analysis −
• Reduces dimensionality − PCA is particularly useful for high-dimensional datasets because it can reduce the number
of features while retaining most of the original variability in the data.
• Removes correlated features − PCA can identify and remove correlated features, which can help improve the
performance of machine learning models.
• Improves interpretability − The reduced number of features can make it easier to interpret and understand the data.
• Reduces overfitting − By reducing the dimensionality of the data, PCA can reduce overfitting and improve the
generalizability of machine learning models.
• Speeds up computation − With fewer features, the computation required to train machine learning models is faster.
Disadvantages of PCA
Following are the disadvantages of using Principal Component Analysis −
• Information loss − PCA reduces the dimensionality of the data by projecting it onto a lower-dimensional space, which
may lead to some loss of information.
• Can be sensitive to outliers − PCA can be sensitive to outliers, which can have a significant impact on the resulting
principal components.
• Interpretability may be reduced − Although PCA can improve interpretability by reducing the number of features, the
resulting principal components may be more difficult to interpret than the original features.
• Assumes linearity − PCA assumes that the relationships between the features are linear, which may not always be the
case.
• Requires standardization − PCA requires that the data be standardized, which may not always be possible or
appropriate.
What is t-SNE ?
• t-SNE (t-distributed Stochastic Neighbor Embedding) is an unsupervised non-linear
dimensionality reduction technique for data exploration and visualizing high-
dimensional data. Non-linear dimensionality reduction means that the algorithm allows
us to separate data that cannot be separated by a straight line.
t-SNE vs PCA

Both t-SNE and PCA are dimensional reduction techniques with different mechanisms
that work best with different types of data.
PCA (Principal Component Analysis) is a linear technique that works best with data that
has a linear structure. It seeks to identify the underlying principal components in the
data by projecting onto lower dimensions, minimizing variance, and preserving large
pairwise distances. Read our Principal Component Analysis (PCA) tutorial to
understand the inner workings of the algorithms with R examples.
• But, t-SNE is a nonlinear technique that focuses on preserving the pairwise similarities
between data points in a lower-dimensional space. t-SNE is concerned with preserving
small pairwise distances whereas, PCA focuses on maintaining large pairwise distances
to maximize variance.
How t-SNE Works

The t-SNE algorithm finds the similarity measure between pairs of instances in
higher and lower dimensional space. After that, it tries to optimize two
similarity measures. It does all of that in three steps.
1.t-SNE models a point being selected as a neighbor of another point in both
higher and lower dimensions. It starts by calculating a pairwise similarity
between all data points in the high-dimensional space using a Gaussian kernel.
The points far apart have a lower probability of being picked than the points
close together.
2.The algorithm then tries to map higher-dimensional data points onto lower-
dimensional space while preserving the pairwise similarities.
3.It is achieved by minimizing the divergence between the original high-
dimensional and lower-dimensional probability distribution. The algorithm uses
gradient descent to minimize the divergence. The lower-dimensional
embedding is optimized to a stable state.

Unit V
No ratings yet
Unit V
165 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
K Means
No ratings yet
K Means
25 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
19 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K Means Alg, Example
No ratings yet
K Means Alg, Example
9 pages
3 00f3f2a7d5 K Means
No ratings yet
3 00f3f2a7d5 K Means
13 pages
Clustering Solved Examples
No ratings yet
Clustering Solved Examples
13 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Module 5
No ratings yet
Module 5
98 pages
K Means
No ratings yet
K Means
66 pages
K Means
No ratings yet
K Means
14 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Kmeans Clustering Lecture 8
No ratings yet
Kmeans Clustering Lecture 8
20 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
Lecture-02 Unsupervised Learning Algorithm (Clustering)
No ratings yet
Lecture-02 Unsupervised Learning Algorithm (Clustering)
60 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
90 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
K Means Example
No ratings yet
K Means Example
8 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Kmeans Clustering Numerical - 1
No ratings yet
Kmeans Clustering Numerical - 1
5 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Clustering
No ratings yet
Clustering
38 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Algo
No ratings yet
Algo
59 pages
Clustering
No ratings yet
Clustering
125 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
75 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Cluster
100% (1)
Cluster
72 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
54 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Recap: Categorical Quantitative Continuous Discrete Ordinal Nominal
No ratings yet
Recap: Categorical Quantitative Continuous Discrete Ordinal Nominal
3 pages
Regression
No ratings yet
Regression
18 pages
Answer Key To Exercises - LN3 - Ver2
No ratings yet
Answer Key To Exercises - LN3 - Ver2
16 pages
Offshore Wind Turbine O&M Analysis
No ratings yet
Offshore Wind Turbine O&M Analysis
25 pages
Money Mules 2
No ratings yet
Money Mules 2
22 pages
Yugdeep 1
No ratings yet
Yugdeep 1
1 page
Thinking Like An: Foundational Skills For Aspiring Data Professionals
100% (2)
Thinking Like An: Foundational Skills For Aspiring Data Professionals
108 pages
Data Science & Statistics Concepts Explained
100% (1)
Data Science & Statistics Concepts Explained
77 pages
Power BI & SAP BW Integration Guide
100% (1)
Power BI & SAP BW Integration Guide
59 pages
7-Research Methods 3684 Lecture Seven
100% (1)
7-Research Methods 3684 Lecture Seven
30 pages
CHED Guidelines for BSCS, BSIS, BSIT
No ratings yet
CHED Guidelines for BSCS, BSIS, BSIT
10 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
2 pages
An Introduction To The Psych Package: Part I: Data Entry and Data Description
No ratings yet
An Introduction To The Psych Package: Part I: Data Entry and Data Description
63 pages
Business METRICS
No ratings yet
Business METRICS
4 pages
KNN Algorithm: Classification Example
No ratings yet
KNN Algorithm: Classification Example
2 pages
Practical Data Science With R 1st Edition Nina Zumel Online PDF
No ratings yet
Practical Data Science With R 1st Edition Nina Zumel Online PDF
123 pages
Analysis of Variance: One-Way Anova: Questions Answered in This Chapter
No ratings yet
Analysis of Variance: One-Way Anova: Questions Answered in This Chapter
6 pages
Martin y Lopez (1995)
No ratings yet
Martin y Lopez (1995)
4 pages
SCM Forecasting with Holt's Method
No ratings yet
SCM Forecasting with Holt's Method
41 pages
BCS301.Module 5
No ratings yet
BCS301.Module 5
43 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
Tutorial Solutions - Week6
No ratings yet
Tutorial Solutions - Week6
11 pages
Pengaruh Konflik Kerja, Stres Kerja, Dan Motivasi Kerja Terhadap Kinerja Karyawan Pada Maju Hardware Madiun
No ratings yet
Pengaruh Konflik Kerja, Stres Kerja, Dan Motivasi Kerja Terhadap Kinerja Karyawan Pada Maju Hardware Madiun
9 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Data Analytics
No ratings yet
Data Analytics
2 pages
Group-16 Project Report
No ratings yet
Group-16 Project Report
27 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
BBA SEMESTER IV Syllabus
No ratings yet
BBA SEMESTER IV Syllabus
8 pages
统计大纲
No ratings yet
统计大纲
14 pages
Notes KM Bca 6
No ratings yet
Notes KM Bca 6
29 pages

Unsupervised Learning

Uploaded by

Unsupervised Learning

Uploaded by

Unsupervised

How to define similarity ?

Centroid is computed as the

Cluster boundary is decided by

Example 2: In marketing, segment customers according to their

Example 3: Given a collection of text documents, we want to

This slide is taken from: Andrew

This slide is taken from: Andrew

This slide is taken from: Andrew

This slide is taken from: Andrew

This slide is taken from: Andrew

We calculate the distance of each Ρ(A1, C1)

Calculating Distance Between A1(2, 10) and C2(5, 8)-

= |x2 – x1| + |y2 – y1|

= |x2 – x1| + |y2 – y1|

In the similar manner, we calculate the distance of other points

We have only one point A1(2, 10) in Cluster-01.

= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)

= ((2 + 1)/2, (5 + 2)/2)

This is completion of Iteration-01.

Calculating Distance Between A1(2, 10) and C1(2, 10)-

Calculating Distance Between A1(2, 10) and C2(6, 6)-

= |x2 – x1| + |y2 – y1|

= |1.5 – 2| + |3.5 – 10|

First cluster contains points-

This is completion of Iteration-02.

Distance Formula : sqrt [ (x2 – x1)2 + (y2 – y1)2 ]

1. Initialize membership U(0) = [ uij ] for data point gi of cluster clj by

where m is the fuzzy parameter and n is

4. If ||U(k) – U(k-1)|| < , then STOP, else return to step 2.

This slide is taken from Bing Liu, 31

3. Merge Closest Clusters:

4. Update Distance Matrix:

• Step-2: Take two closest data points or

• Step-3: Again, take the two closest clusters and

• Single Linkage: It is the Shortest Distance between the closest points

step process to identify clusters based on the density of data points:

You might also like