KEMBAR78
Machine Learning | PDF | Cluster Analysis | Algorithms
0% found this document useful (0 votes)
13 views3 pages

Machine Learning

Clustering is an unsupervised machine learning technique used to group data points based on similarities, with applications in customer segmentation, image segmentation, and anomaly detection. The K-Means algorithm is a popular method within centroid-based clustering, which involves selecting initial centroids, assigning data points to clusters, and iterating until convergence. While K-Means is easy to implement and efficient, it requires prior specification of the number of clusters and struggles with outliers and large datasets.

Uploaded by

Angelo Vita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

Machine Learning

Clustering is an unsupervised machine learning technique used to group data points based on similarities, with applications in customer segmentation, image segmentation, and anomaly detection. The K-Means algorithm is a popular method within centroid-based clustering, which involves selecting initial centroids, assigning data points to clusters, and iterating until convergence. While K-Means is easy to implement and efficient, it requires prior specification of the number of clusters and struggles with outliers and large datasets.

Uploaded by

Angelo Vita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Clustering

Clustering is an unsupervised machine learning algorithm that is used to group or cluster data points
based on similarities or patterns. It can be used in data mining during the initial data exploration stage
as well as in the data processing state.

Applications of Clustering

Some of the most common applications of clustering includes:

• Customer Segmentation
• Image Segmentation
• Anomaly Detection
• Document Processing and Classification

Categories of Clustering Techniques

Based on literature, there is no universal number of categories of clustering methods or techniques.


But the most common includes:

• Centroid-Based Clustering
o K-Means Clustering
• Hierarchical Clustering
o Agglomerative Clustering
o Divisive Clustering
• Density-Based Clustering
o DBSCAN

K-Means Clustering

The K-Means clustering algorithm is one of the most popular centroid-based clustering
algorithms. The goal of K-Means clustering is to find the minimum pairwise distance between each
data point in the dataset and the cluster centroids.

The K-Means Clustering Algorithm

1. Select random K points from the dataset which will act as the initial cluster centroids.

2. For each data point in the dataset, calculate the distance between that point and each of the
K centroids.

3. Assign the data point to the cluster whose centroid is closest to it.

4. After the data points have been assigned to clusters, recalculate the centroids of the clusters
by taking the mean (average) of all data points assigned to each cluster.

5. Repeat steps 2 and 3 until the centroids no longer change significantly or when a specified
number of iterations is reached.

6. Once convergence is achieved, the algorithm outputs the final cluster centroids and the
assignment of each data point to a cluster.

You are granted access to this material for your personal use only. Unauthorized
distribution, reproduction, modification, transmission, or exploitation of this material in any way
without the written permission of the author is strictly prohibited. – Harold L. Costales, April 2024.
Illustrative Example: Not yet available

Elbow Method

The elbow method is a graphical method for finding the optimal K value in a k-means clustering
algorithm. The elbow graph shows the within-cluster-sum-of-square (WCSS) values on the y-axis
corresponding to the different values of K (on the x-axis). The optimal K value is the point at which the
graph forms an elbow.

Silhouette Score

The silhouette score and plot are used to evaluate the quality of a clustering solution produced
by the k-means algorithm. The silhouette score measures the similarity of each point to its own cluster
compared to other clusters, and the silhouette plot visualizes these scores for each sample. A high
silhouette score indicates that the clusters are well separated, and each sample is more similar to the
samples in its own cluster than to samples in other clusters. A silhouette score close to 0 suggests
overlapping clusters, and a negative score suggests poor clustering solutions.

Advantages

• This algorithm is very easy to understand and implement.


• This algorithm is efficient, Robust, and Flexible
• If data sets are distinct and spherical clusters, then give the best results

Disadvantages

• This algorithm needs prior specification for the number of cluster centers that is the value of
K.
• It cannot handle outliers and noisy data, as the centroids get deflected
• It does not work well with a very large set of datasets as it takes huge computational time.

You are granted access to this material for your personal use only. Unauthorized
distribution, reproduction, modification, transmission, or exploitation of this material in any way
without the written permission of the author is strictly prohibited. – Harold L. Costales, April 2024.
References

GeeksforGeeks. (2022). Clustering in data mining. Retrieved on April 27, 2024 from
https://www.geeksforgeeks.org/clustering-in-data-mining/

IBM. (n.d.). What is clustering? Retrieved on April 27, 2024 from


https://www.ibm.com/topics/clustering

Sharma, P. (2024). The Ultimate Guide to K-Means Clustering: Definition, Methods and Applications.
Retrieved from https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/#How_to_Apply_K-Means_Clustering_Algorithm?

Tomar, A. (2023). Stop Using Elbow Method in K-Means Clustering. https://builtin.com/data-


science/elbow-method

Towards AI. (2023). What are the advantages and disadvantages of K-Means clustering? Retrieved
from https://towardsai.net/p/machine-learning/what-are-the-advantages-and-disadvantages-of-k-
means-clustering

You are granted access to this material for your personal use only. Unauthorized
distribution, reproduction, modification, transmission, or exploitation of this material in any way
without the written permission of the author is strictly prohibited. – Harold L. Costales, April 2024.

You might also like