Clustering and Pattern Recognition Unit 5

The document discusses unsupervised learning techniques for clustering. It describes how clustering can be used to automatically group similar data points without labels. It also explains the k-means clustering algorithm, which assigns data points to k clusters based on minimizing distance between points and cluster centers.

Uploaded by

Prerna Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views21 pages

Clustering and Pattern Recognition Unit 5

Uploaded by

Prerna Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT - 5

UNSUPERVISED LEARNING:
CLUSTERING AND PATTERN
DETECTION
• Have you ever spent time watching a large
crowd? If so, you are likely to have seen some
recurring personalities. Perhaps a certain type of
person, identified by a freshly pressed suit and a
briefcase, comes to typify the "fat cat" business
executive. A twenty-something wearing skinny
jeans, a flannel shirt, and sunglasses might be
dubbed a "hipster," while a woman unloading
children from a minivan may be labeled a "soccer
mom.“
• Of course, these types of stereotypes are
dangerous to apply to individuals, as no two
people are exactly alike. Yet understood as a
way to describe a collective, the labels capture
some underlying aspect of similarity among
the individuals within the group.
Clustering
• Clustering is an unsupervised machine
learning task that automatically divides the
data into clusters, or groups of similar items.
It does this without having been told how the
groups should look ahead of time. As we may
not even know what we're looking for,
clustering is used for knowledge discovery
rather than prediction. It provides an insight
into the natural groupings found within data.
• Without advance knowledge of what comprises a
cluster, how can a computer possibly know where
one group ends and another begins? The answer
is simple. Clustering is guided by the principle
that items inside a cluster should be very similar
to each other, but very different from those
outside. The definition of similarity might vary
across applications, but the basic idea is always
the same— group the data so that the related
elements are placed together.
The resulting clusters can then be used for action. For
instance, you might find clustering methods employed
in the following applications:
• Segmenting customers into groups with similar
demographics or buying patterns for targeted
marketing campaigns
• Detecting anomalous behavior, such as unauthorized
network intrusions, by identifying patterns of use
falling outside the known clusters
• Simplifying extremely large datasets by grouping
features with similar values into a smaller number of
homogeneous categories
Clustering as a machine Learning Task
• Suppose you were organizing a conference on the
topic of data science. To facilitate professional
networking and collaboration, you planned to
seat people in groups according to one of three
research specialties: computer and/or database
science, math and statistics, and machine
learning. Unfortunately, after sending out the
conference invitations, you realize that you had
forgotten to include a survey asking which
discipline the attendee would prefer to be seated
with.
• As expected, there seems to be a pattern. We
might guess that the upper-left corner, which
represents people with many computer
science publications but few articles on math,
could be a cluster of computer scientists.
Following this logic, the lower-right corner
might be a group of mathematicians. Similarly,
the upper-right corner, those with both math
and computer science experience, may be
machine learning experts
K-Means Clustering Algorithm
• The k-means algorithm assigns each of the n examples
to one of the k clusters, where k is a number that has
been determined ahead of time. The goal is to
minimize the differences within each cluster and
maximize the differences between the clusters.
• The algorithm essentially involves two phases. First, it
assigns examples to an initial set of k clusters. Then, it
updates the assignments by adjusting the cluster
boundaries according to the examples that currently
fall into the cluster. The process of updating and
assigning occurs several times until changes no longer
improve the cluster fit. At this point, the process stops
and the clusters are finalized.
Using distance to assign and update
cluster
• The k-means algorithm begins by choosing k
points in the feature space to serve as the
cluster centers. These centers are the catalyst
that spurs the remaining examples to fall into
place. As we hope to identify three clusters,
according to this method, k = 3 points will be
selected at random. These points are indicated
by the star, triangle, and diamond in the
following diagram:
• After choosing the initial cluster centers, the other
examples are assigned to the cluster center that is nearest
according to the distance function.
• As shown in the following diagram, the three cluster
centers partition the examples into three segments labeled
Cluster A, Cluster B, and Cluster C. The dashed lines
indicate the boundaries for the Voronoi diagram created by
the cluster centers. The Voronoi diagram indicates the
areas that are closer to one cluster center than any other;
the vertex where all the three boundaries meet is the
maximal distance from all three cluster centers. Using these
boundaries, we can easily see the regions claimed by each
of the initial k-means seeds:
• Now that the initial assignment phase has been
completed, the k-means algorithm proceeds to
the update phase. The first step of updating the
clusters involves shifting the initial centers to a
new location, known as the centroid, which is
calculated as the average position of the points
currently assigned to that cluster. The following
diagram illustrates how as the cluster centers
shift to the new centroids, the boundaries in the
Voronoi diagram also shift and a point that was
once in Cluster B (indicated by an arrow) is added
to Cluster A:
• As a result of this reassignment, the k-means
algorithm will continue through another
update phase. After shifting the cluster
centroids, updating the cluster boundaries,
and reassigning points into new clusters (as
indicated by arrows), the figure looks like this:
• Because two more points were reassigned,
another update must occur, which moves the
centroids and updates the cluster boundaries.
However, because these changes result in no
reassignments, the k-means algorithm stops.
The cluster assignments are now final:

Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Clustering: Dr. Md. Al-Amin Bhuiyan
No ratings yet
Clustering: Dr. Md. Al-Amin Bhuiyan
6 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
Clustering
No ratings yet
Clustering
84 pages
ML-Unit3 Updated
No ratings yet
ML-Unit3 Updated
95 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Unit 4
No ratings yet
Unit 4
40 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Lect 6 - Clustering
No ratings yet
Lect 6 - Clustering
50 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Clustering
No ratings yet
Clustering
29 pages
Lab Manual 6
No ratings yet
Lab Manual 6
10 pages
Unit 4
No ratings yet
Unit 4
125 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Unit 4
No ratings yet
Unit 4
74 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Clustering
No ratings yet
Clustering
75 pages
Cluster Analysis Explained
No ratings yet
Cluster Analysis Explained
22 pages
Week 11
No ratings yet
Week 11
49 pages
The Math Behind The K-Means and Hierarchical Clust+
No ratings yet
The Math Behind The K-Means and Hierarchical Clust+
13 pages
Clustering
No ratings yet
Clustering
125 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
No ratings yet
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
48 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Unit - Iv Unsupervisied Learning - Notes
No ratings yet
Unit - Iv Unsupervisied Learning - Notes
32 pages
Cluster Lecture-1
No ratings yet
Cluster Lecture-1
20 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
2 Cluster Analysis
No ratings yet
2 Cluster Analysis
55 pages
K-Means Clustering Basics Lab
No ratings yet
K-Means Clustering Basics Lab
3 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
29 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Data Science: Clustering & Similarity
No ratings yet
Data Science: Clustering & Similarity
29 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Clustering
No ratings yet
Clustering
10 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
54 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Week 9
No ratings yet
Week 9
66 pages
4.7 Clustering Kmean MeanShift
No ratings yet
4.7 Clustering Kmean MeanShift
24 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages

Clustering and Pattern Recognition Unit 5

Uploaded by

Clustering and Pattern Recognition Unit 5

Uploaded by

UNIT - 5

You might also like