Clustering

Uploaded by

ghugekrish824

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

Clustering

Uploaded by

ghugekrish824

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Clustering in Machine Learning

In real world, not every data we work upon has a target variable. Have you ever wondered how
Netflix groups similar movies together or how Amazon organizes its vast product catalog? These
are real-world applications of clustering. This kind of data cannot be analyzed using supervised
learning algorithms.
When the goal is to group similar data points in a dataset, then we use cluster analysis.
What is Clustering?
The task of grouping data points based on their similarity with each other is called Clustering
or Cluster Analysis. This method is defined under the branch of unsupervised learning, which
aims at gaining insights from unlabeled data points.
Think of it as you have a dataset of customers shopping habits. Clustering can help you group
customers with similar purchasing behaviors, which can then be used for targeted
marketing, product recommendations, or customer segmentation
For Example, In the graph given below, we can clearly see that there are 3 circular clusters forming
on the basis of distance.

Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algorithms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in shape.
Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data points:
 Hard Clustering: In this type of clustering, each data point belongs to a cluster completely
or not. For example, Let's say there are 4 data point and we have to cluster them into 2
clusters. So each data point will either belong to cluster 1 or cluster 2.

Data Points Clusters

A C1

B C2

C C2

D C1
 Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is evaluated. For
example, Let's say there are 4 data point and we have to cluster them into 2 clusters. So we
will be evaluating a probability of a data point belonging to both clusters. This probability
is calculated for all data points.

Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of
Clustering algorithms. Clustering algorithms are majorly used for:
 Market Segmentation: Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
 Market Basket Analysis: Shop owners analyze their sales and figure out which items are
majorly bought together by the customers.
 Social Network Analysis: Social media sites use your data to understand your browsing
behavior and provide you with targeted friend recommendations or content
recommendations.
 Medical Imaging: Doctors use Clustering to find out diseased areas in diagnostic images
like X-rays.
 Anomaly Detection: To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
 Simplify working with large datasets: Each cluster is given a cluster ID after clustering
is complete. Now, you may reduce a feature set's whole feature set into its cluster ID.
Clustering is effective when it can represent a complicated case with a straightforward
cluster ID. Using the same principle, clustering data can make complex datasets simpler.
 Centroid-based Clustering (Partitioning methods)
Centroid-based clustering organizes data points around central vectors (centroids) that represent
clusters. Each data point belongs to the cluster with the nearest centroid. Generally, the similarity
measure chosen for these algorithms are Euclidian distance, Manhattan Distance or Minkowski
Distance.
The datasets are separated into a predetermined number of clusters, and each cluster is
referenced by a vector of values. When compared to the vector value, the input data variable
shows no difference and joins the cluster.
The major drawback for centroid-based algorithms is the requirement that we establish the number
of clusters, "k," either intuitively or scientifically (using the Elbow Method) before any clustering
machine learning system starts allocating the data points. Despite this limitation, it remains the
most popular type of clustering due to its simplicity and efficiency. Popular algorithms
of Centroid-based clustering are:
 K-means and
 K-medoids clustering

4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Clustering
No ratings yet
Clustering
20 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Unit 3
No ratings yet
Unit 3
34 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Clustering
No ratings yet
Clustering
3 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Cbsyllabus Bda
No ratings yet
Cbsyllabus Bda
5 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Clustering New
No ratings yet
Clustering New
6 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
57 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Unit 4
No ratings yet
Unit 4
62 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unit 4
No ratings yet
Unit 4
16 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
Module 5
No ratings yet
Module 5
91 pages
Unit 5
No ratings yet
Unit 5
33 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Clustering
No ratings yet
Clustering
6 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Full Clustering
No ratings yet
Full Clustering
10 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
ML Unit-3
No ratings yet
ML Unit-3
22 pages
Machine Learning Clustering Guide
No ratings yet
Machine Learning Clustering Guide
7 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Clustering
No ratings yet
Clustering
21 pages
Clustering
No ratings yet
Clustering
12 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
FPA Unit 3
No ratings yet
FPA Unit 3
17 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Data Science
No ratings yet
Data Science
20 pages
ML Unit 3
No ratings yet
ML Unit 3
24 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
A Short Review On Different Clustering Techniques and Their Applications
No ratings yet
A Short Review On Different Clustering Techniques and Their Applications
15 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Clustering in Data Mining Lecture
No ratings yet
Clustering in Data Mining Lecture
80 pages
DWDM 5
No ratings yet
DWDM 5
12 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
93 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering Distance Measures
No ratings yet
Clustering Distance Measures
5 pages
7 7R504 Curriculum
No ratings yet
7 7R504 Curriculum
6 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
7 7T103 Curriculum
No ratings yet
7 7T103 Curriculum
8 pages
7 7T102 Curriculum
No ratings yet
7 7T102 Curriculum
8 pages
Networking Programs
No ratings yet
Networking Programs
2 pages
Ajp Networking Basics
No ratings yet
Ajp Networking Basics
28 pages
Unit 4amm
No ratings yet
Unit 4amm
22 pages
Explain Normalization With Examples PDF
No ratings yet
Explain Normalization With Examples PDF
2 pages
Pega 8.8 Updates for Developers
No ratings yet
Pega 8.8 Updates for Developers
11 pages
Pelaksanaan Keluar Masuk Barang (01-1)
100% (1)
Pelaksanaan Keluar Masuk Barang (01-1)
11 pages
Enterprise Architect Role at OIB
No ratings yet
Enterprise Architect Role at OIB
3 pages
Annu Priya CV
No ratings yet
Annu Priya CV
1 page
Text Data Cleaning with Python
No ratings yet
Text Data Cleaning with Python
5 pages
Lsan Thesis Leicester
No ratings yet
Lsan Thesis Leicester
23 pages
1st Year (2018-2019) 1st Term 2nd Term: Semester
No ratings yet
1st Year (2018-2019) 1st Term 2nd Term: Semester
2 pages
Sintetička Membrana - Prohodni Krov - Odvodni Kanal
No ratings yet
Sintetička Membrana - Prohodni Krov - Odvodni Kanal
1 page
ResearchGate PDF
No ratings yet
ResearchGate PDF
22 pages
PowerCenter Mapping Tips
No ratings yet
PowerCenter Mapping Tips
53 pages
Sms Spam Filtering Pres
No ratings yet
Sms Spam Filtering Pres
18 pages
Data Warehouse QP1 - Answer Keysnew
No ratings yet
Data Warehouse QP1 - Answer Keysnew
8 pages
Cybersecurity Threats & Solutions Guide
No ratings yet
Cybersecurity Threats & Solutions Guide
2 pages
Distributed Systems: File Models
No ratings yet
Distributed Systems: File Models
25 pages
Turban Chap 03
No ratings yet
Turban Chap 03
30 pages
Nishant Agarwal Resume
No ratings yet
Nishant Agarwal Resume
2 pages
Big Data Analytics Overview
100% (1)
Big Data Analytics Overview
81 pages
Chala
No ratings yet
Chala
1 page
Local SEO Guide PDF
No ratings yet
Local SEO Guide PDF
36 pages
Mohammed Shahjahan
No ratings yet
Mohammed Shahjahan
2 pages
360 1540 1 PB
No ratings yet
360 1540 1 PB
9 pages
UNIT V Tools
No ratings yet
UNIT V Tools
4 pages
Smart - Infrastructure
No ratings yet
Smart - Infrastructure
2 pages
DB Design Exercises
No ratings yet
DB Design Exercises
14 pages
Full Download Big Data Computing A Guide For Business and Technology Managers 1st Edition Vivek Kale PDF
100% (7)
Full Download Big Data Computing A Guide For Business and Technology Managers 1st Edition Vivek Kale PDF
63 pages
How Is RAG Used in The Industry Launchpad - Rag - Seminar - q2 - 8 - May - 2025
No ratings yet
How Is RAG Used in The Industry Launchpad - Rag - Seminar - q2 - 8 - May - 2025
49 pages
When Using OSAM: - Reasons You May Want To Use OSAM Are
No ratings yet
When Using OSAM: - Reasons You May Want To Use OSAM Are
90 pages
Class Test-I Database Management Systems Time: 1Hr. 2010-11
No ratings yet
Class Test-I Database Management Systems Time: 1Hr. 2010-11
2 pages
Grade 8 Combined Study Material 2024
No ratings yet
Grade 8 Combined Study Material 2024
88 pages

Clustering

Uploaded by

Clustering

Uploaded by

Clustering in Machine Learning

Data Points Clusters

Data Points Probability of C1 Probability of C2

You might also like