KEMBAR78
Clustering in Machine Learning | PDF | Cluster Analysis | Information Science
0% found this document useful (0 votes)
29 views21 pages

Clustering in Machine Learning

Clustering, or Cluster Analysis, is an unsupervised machine learning technique used to group unlabelled datasets based on similarities among data points. It can be categorized into various methods, including Hard and Soft Clustering, with popular algorithms such as K-Means, DBSCAN, and Hierarchical Clustering. Applications of clustering span across market segmentation, image segmentation, and identification of cancer cells, showcasing its versatility in data analysis.

Uploaded by

shagunverma2525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views21 pages

Clustering in Machine Learning

Clustering, or Cluster Analysis, is an unsupervised machine learning technique used to group unlabelled datasets based on similarities among data points. It can be categorized into various methods, including Hard and Soft Clustering, with popular algorithms such as K-Means, DBSCAN, and Hierarchical Clustering. Applications of clustering span across market segmentation, image segmentation, and identification of cancer cells, showcasing its versatility in data analysis.

Uploaded by

shagunverma2525
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Clustering in

Machine Learning
• Clustering or Cluster Analysis is a technique that is
Introduc used to group the unlabelled datasets.
• It can be defined as
tion to “A way of grouping the data points into different
clusters, consisting of similar data points. The
Clusteri objects with the possible similarities remain in a
group that has less or no similarities with another

ng group.”
It does it by finding some similar patterns in the
unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the
presence and absence of those similar patterns

It is basically a type of unsupervised learning


method

It is used as a process to find meaningful


structure, explanatory underlying processes,
generative features, and groupings inherent in a
set of examples.
• Clustering is the task of dividing the population or data
points into a number of groups such that data points in
the same groups are more similar to other data points in
the same group and dissimilar to the data points in other
groups.

• It is basically a collection of objects on the basis of


similarity and dissimilarity between them.
• After applying this clustering technique, each cluster or
group is provided with a cluster-ID. ML system can use
this id to simplify the processing of large and complex
datasets
• For ex– The data points in the graph below clustered
together can be classified into one single group. We can
distinguish the clusters, and we can identify that there
are 3 clusters in the below picture.
• Example: Let's understand the clustering technique
with the real-world example of Mall:
• When we visit any shopping mall, we can observe
that the things with similar usage are grouped
together.
• Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable
sections, apples, bananas, Mangoes, etc., are
grouped in separate sections, so that we can easily
find out the things.
• The clustering technique can be widely used in
various tasks. Some most common uses of this
technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Types of Clustering Methods
• The clustering methods are broadly divided into Hard clustering (datapoint belongs to
only one group) and Soft Clustering (data points can belong to another group also).
• Below are the main clustering methods used in Machine learning:
• Partitioning Clustering
• Density Based Clustering
• Distribution Model Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning
Clustering
• It is a type of clustering that divides the data into
non-hierarchical groups.
• It is also known as the centroid-based method.
• The most common example of partitioning
clustering is the K-Means Clustering algorithm.
• In this type, the dataset is divided into a set of k
groups, where K is used to define the number of
pre-defined groups.
• The cluster center is created in such a way that
the distance between the data points of one
cluster is minimum as compared to another
cluster centroid.
The density-based clustering method connects the highly-
dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be
connected.

Density- This algorithm does it by identifying different clusters in the


dataset and connects the areas of high densities into clusters.

Based
Clusteri The dense areas in data space are divided from each other by
sparser areas.

ng These algorithms can face difficulty in clustering the data


points if the dataset has varying densities and high
dimensions.
Distribu • In the distribution model-based clustering method,
tion the data is divided based on the probability of how
a dataset belongs to a particular distribution.
Model- • The grouping is done by assuming some
distributions commonly Gaussian Distribution.

Based • The example of this type is the Expectation-


Maximization Clustering algorithm that uses

Clusteri Gaussian Mixture Models (GMM).

ng
Hierarchical clustering can be used as an alternative for
the partitioned clustering as there is no requirement of
pre-specifying the number of clusters to be created.

Hierarch In this technique, the dataset is divided into clusters to


create a tree-like structure, which is also called

ical a dendrogram.

Clusterin The observations or any number of clusters can be


selected by cutting the tree at the correct level.

g
The most common example of this method is
the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
• Fuzzy clustering is a type of soft method in which a data object may belong to more than
one group or cluster.
• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also
known as the Fuzzy k-means algorithm.
Clustering algorithms that are
widely used in machine learning
1. K-Means algorithm:
• The k-means algorithm is one of the most popular clustering algorithms.
• It classifies the dataset by dividing the samples into different clusters of equal variances.
• The number of clusters must be specified in this algorithm.
• It is fast with fewer computations required, with the linear complexity of O(n).
2. Mean-shift algorithm:
• Mean-shift algorithm tries to find the dense areas in the smooth density of data points.
• It is an example of a centroid-based model, that works on updating the candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm:
• It stands for Density-Based Spatial Clustering of Applications with Noise.
• It is an example of a density-based model similar to the mean-shift, but with some remarkable advantages.
• In this algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be found in any arbitrary
shape.
4. Expectation-Maximization Clustering using GMM:
• This algorithm can be used as an alternative for the k-means algorithm or for those cases where K-means can be failed.
• In GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm:
• The Agglomerative hierarchical algorithm performs the bottom-up hierarchical clustering.
• In this, each data point is treated as a single cluster at the outset and then successively merged.
• The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation:
• It is different from other clustering algorithms as it does not require to specify the number of clusters.
• In this, each data point sends a message between the pair of data points until convergence.
• It has O(N2T) time complexity, which is the main drawback of this algorithm.
In Identification of Cancer Cells:

• The clustering algorithms are widely used for the identification of cancerous cells.
• It divides the cancerous and non-cancerous data sets into different groups.

In Search Engines:

• Search engines also work on the clustering technique.


• The search result appears based on the closest object to the search query.
• It does it by grouping similar data objects in one group that is far from the other dissimilar

Applicatio
objects.
• The accurate result of a query depends on the quality of the clustering algorithm used.

ns of Customer Segmentation:

• It is used in market research to segment the customers based on their choice and
preferences.

Clustering In Biology:

• It is used in the biology stream to classify different species of plants and animals using the
image recognition technique.

In Land Use:

• The clustering technique is used in identifying the area of similar lands use in the GIS
database.
• This can be very useful to find that for what purpose the particular land should be used,
that means for which purpose it is more suitable.

You might also like