0% found this document useful (0 votes)

29 views21 pages

Clustering in Machine Learning

Clustering, or Cluster Analysis, is an unsupervised machine learning technique used to group unlabelled datasets based on similarities among data points. It can be categorized into various methods, including Hard and Soft Clustering, with popular algorithms such as K-Means, DBSCAN, and Hierarchical Clustering. Applications of clustering span across market segmentation, image segmentation, and identification of cancer cells, showcasing its versatility in data analysis.

Uploaded by

shagunverma2525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views21 pages

Clustering in Machine Learning

Uploaded by

shagunverma2525

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Clustering in

Machine Learning
• Clustering or Cluster Analysis is a technique that is
Introduc used to group the unlabelled datasets.
• It can be defined as
tion to “A way of grouping the data points into different
clusters, consisting of similar data points. The
Clusteri objects with the possible similarities remain in a
group that has less or no similarities with another

ng group.”
It does it by finding some similar patterns in the
unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the
presence and absence of those similar patterns

It is basically a type of unsupervised learning

method

It is used as a process to find meaningful

structure, explanatory underlying processes,
generative features, and groupings inherent in a
set of examples.
• Clustering is the task of dividing the population or data
points into a number of groups such that data points in
the same groups are more similar to other data points in
the same group and dissimilar to the data points in other
groups.

• It is basically a collection of objects on the basis of

similarity and dissimilarity between them.
• After applying this clustering technique, each cluster or
group is provided with a cluster-ID. ML system can use
this id to simplify the processing of large and complex
datasets
• For ex– The data points in the graph below clustered
together can be classified into one single group. We can
distinguish the clusters, and we can identify that there
are 3 clusters in the below picture.
• Example: Let's understand the clustering technique
with the real-world example of Mall:
• When we visit any shopping mall, we can observe
that the things with similar usage are grouped
together.
• Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable
sections, apples, bananas, Mangoes, etc., are
grouped in separate sections, so that we can easily
find out the things.
• The clustering technique can be widely used in
various tasks. Some most common uses of this
technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Types of Clustering Methods
• The clustering methods are broadly divided into Hard clustering (datapoint belongs to
only one group) and Soft Clustering (data points can belong to another group also).
• Below are the main clustering methods used in Machine learning:
• Partitioning Clustering
• Density Based Clustering
• Distribution Model Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning
Clustering
• It is a type of clustering that divides the data into
non-hierarchical groups.
• It is also known as the centroid-based method.
• The most common example of partitioning
clustering is the K-Means Clustering algorithm.
• In this type, the dataset is divided into a set of k
groups, where K is used to define the number of
pre-defined groups.
• The cluster center is created in such a way that
the distance between the data points of one
cluster is minimum as compared to another
cluster centroid.
The density-based clustering method connects the highly-
dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be
connected.

Density- This algorithm does it by identifying different clusters in the

dataset and connects the areas of high densities into clusters.

Based
Clusteri The dense areas in data space are divided from each other by
sparser areas.

ng These algorithms can face difficulty in clustering the data

points if the dataset has varying densities and high
dimensions.
Distribu • In the distribution model-based clustering method,
tion the data is divided based on the probability of how
a dataset belongs to a particular distribution.
Model- • The grouping is done by assuming some
distributions commonly Gaussian Distribution.

Based • The example of this type is the Expectation-

Maximization Clustering algorithm that uses

Clusteri Gaussian Mixture Models (GMM).

ng
Hierarchical clustering can be used as an alternative for
the partitioned clustering as there is no requirement of
pre-specifying the number of clusters to be created.

Hierarch In this technique, the dataset is divided into clusters to

create a tree-like structure, which is also called

ical a dendrogram.

Clusterin The observations or any number of clusters can be

selected by cutting the tree at the correct level.

g
The most common example of this method is
the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
• Fuzzy clustering is a type of soft method in which a data object may belong to more than
one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also
known as the Fuzzy k-means algorithm.
Clustering algorithms that are
widely used in machine learning
1. K-Means algorithm:
• The k-means algorithm is one of the most popular clustering algorithms.
• It classifies the dataset by dividing the samples into different clusters of equal variances.
• The number of clusters must be specified in this algorithm.
• It is fast with fewer computations required, with the linear complexity of O(n).
2. Mean-shift algorithm:
• Mean-shift algorithm tries to find the dense areas in the smooth density of data points.
• It is an example of a centroid-based model, that works on updating the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm:
• It stands for Density-Based Spatial Clustering of Applications with Noise.
• It is an example of a density-based model similar to the mean-shift, but with some remarkable advantages.
• In this algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be found in any arbitrary
shape.
4. Expectation-Maximization Clustering using GMM:
• This algorithm can be used as an alternative for the k-means algorithm or for those cases where K-means can be failed.
• In GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm:
• The Agglomerative hierarchical algorithm performs the bottom-up hierarchical clustering.
• In this, each data point is treated as a single cluster at the outset and then successively merged.
• The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation:
• It is different from other clustering algorithms as it does not require to specify the number of clusters.
• In this, each data point sends a message between the pair of data points until convergence.
• It has O(N2T) time complexity, which is the main drawback of this algorithm.
In Identification of Cancer Cells:

• The clustering algorithms are widely used for the identification of cancerous cells.
• It divides the cancerous and non-cancerous data sets into different groups.

In Search Engines:

• Search engines also work on the clustering technique.

• The search result appears based on the closest object to the search query.
• It does it by grouping similar data objects in one group that is far from the other dissimilar

Applicatio
objects.
• The accurate result of a query depends on the quality of the clustering algorithm used.

ns of Customer Segmentation:

• It is used in market research to segment the customers based on their choice and
preferences.

Clustering In Biology:

• It is used in the biology stream to classify different species of plants and animals using the
image recognition technique.

In Land Use:

• The clustering technique is used in identifying the area of similar lands use in the GIS
database.
• This can be very useful to find that for what purpose the particular land should be used,
that means for which purpose it is more suitable.

Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Cbsyllabus Bda
No ratings yet
Cbsyllabus Bda
5 pages
Machine Learning Clustering Guide
No ratings yet
Machine Learning Clustering Guide
7 pages
Module 5
No ratings yet
Module 5
91 pages
Clustering
No ratings yet
Clustering
20 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Clustering
No ratings yet
Clustering
11 pages
ML Unit-3
No ratings yet
ML Unit-3
22 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
Clustering Methods in Machine Learning
No ratings yet
Clustering Methods in Machine Learning
45 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Unit 4
No ratings yet
Unit 4
62 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Clustering
No ratings yet
Clustering
6 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Clustering
No ratings yet
Clustering
4 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Classification & Clustering in ML
No ratings yet
Classification & Clustering in ML
8 pages
Clustering New
No ratings yet
Clustering New
6 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
57 pages
Unit 4
No ratings yet
Unit 4
16 pages
Clustering
No ratings yet
Clustering
8 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Unit 5
No ratings yet
Unit 5
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
A Short Review On Different Clustering Techniques and Their Applications
No ratings yet
A Short Review On Different Clustering Techniques and Their Applications
15 pages
Clustering
No ratings yet
Clustering
10 pages
MaxiTPMS ITS600 Pro User Manual US
No ratings yet
MaxiTPMS ITS600 Pro User Manual US
173 pages
Theory and Application of Field Effect Transistors
No ratings yet
Theory and Application of Field Effect Transistors
73 pages
Translucent Concrete
No ratings yet
Translucent Concrete
17 pages
IX Science 2: Detailed Science Lesson Plan
100% (2)
IX Science 2: Detailed Science Lesson Plan
4 pages
1862-Article Text-1862-1-10-20141206
No ratings yet
1862-Article Text-1862-1-10-20141206
8 pages
Cisco CVD Icons Key
No ratings yet
Cisco CVD Icons Key
5 pages
Helyer #3 Ans
100% (1)
Helyer #3 Ans
3 pages
Styrene Acrylic Emulsion
No ratings yet
Styrene Acrylic Emulsion
7 pages
Financial Modeling 4th Edition Simon Benninga 0262027283 978-0262027281 Instant Download
100% (4)
Financial Modeling 4th Edition Simon Benninga 0262027283 978-0262027281 Instant Download
54 pages
' SEPAKAT SETIA PERUNDING (SDN) BHD, ,, MM, ,",,, - "
No ratings yet
' SEPAKAT SETIA PERUNDING (SDN) BHD, ,, MM, ,",,, - "
1 page
Reading Genre Descriptive-Text
No ratings yet
Reading Genre Descriptive-Text
5 pages
Budget 2025 SuperKalam
No ratings yet
Budget 2025 SuperKalam
27 pages
3M PGF Cutting Tools Catalog LR 61 5002 8282 9
No ratings yet
3M PGF Cutting Tools Catalog LR 61 5002 8282 9
12 pages
Iecex CML 14.0029X
No ratings yet
Iecex CML 14.0029X
8 pages
Manual de Analizador Paramagnetico OX
No ratings yet
Manual de Analizador Paramagnetico OX
4 pages
Camden Market (London) - All You Need To Know BEFORE You Go
No ratings yet
Camden Market (London) - All You Need To Know BEFORE You Go
1 page
46 PDF
No ratings yet
46 PDF
23 pages
KL74B2829 Vehicle Details
No ratings yet
KL74B2829 Vehicle Details
2 pages
Ashford (2024) Impaired Oral Health - A Required Companion of Bacterial Aspiration Pneumonia
No ratings yet
Ashford (2024) Impaired Oral Health - A Required Companion of Bacterial Aspiration Pneumonia
19 pages
Eo 25-13
No ratings yet
Eo 25-13
13 pages
Unit-Simple Mechanism Revision Class: Engineering-Projects
No ratings yet
Unit-Simple Mechanism Revision Class: Engineering-Projects
48 pages
Your Electronic Ticket Receipt
No ratings yet
Your Electronic Ticket Receipt
2 pages
2023 Hood River Fruit Loop Map
No ratings yet
2023 Hood River Fruit Loop Map
1 page
20 Self Exploration Exercises
100% (1)
20 Self Exploration Exercises
12 pages
Trainer: Class 1
100% (1)
Trainer: Class 1
13 pages
Linear Programming: (Graphical Method)
No ratings yet
Linear Programming: (Graphical Method)
10 pages
About The ISO 8573 1 Standard
100% (1)
About The ISO 8573 1 Standard
14 pages
MS6001FA
No ratings yet
MS6001FA
14 pages
How To Make Blind Contour Drawings (And Why You Should)
No ratings yet
How To Make Blind Contour Drawings (And Why You Should)
6 pages
Elbow Friction Massage
No ratings yet
Elbow Friction Massage
5 pages

Clustering in Machine Learning

Uploaded by

Clustering in Machine Learning

Uploaded by

Clustering in

It is basically a type of unsupervised learning

It is used as a process to find meaningful

• It is basically a collection of objects on the basis of

Density- This algorithm does it by identifying different clusters in the

ng These algorithms can face difficulty in clustering the data

Based • The example of this type is the Expectation-

Clusteri Gaussian Mixture Models (GMM).

Hierarch In this technique, the dataset is divided into clusters to

Clusterin The observations or any number of clusters can be

• Search engines also work on the clustering technique.

You might also like