0% found this document useful (0 votes)

10 views41 pages

Chapter 3 Unsupervised Machine Learning

Chapter 3 discusses unsupervised learning, a machine learning approach that analyzes unlabeled data to uncover patterns and relationships without prior knowledge. It covers types of unsupervised learning such as clustering, association rule learning, and dimensionality reduction, with a focus on clustering algorithms like K-Means. The chapter also addresses evaluation metrics, challenges, and applications of unsupervised learning in various fields.

Uploaded by

mersha abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views41 pages

Chapter 3 Unsupervised Machine Learning

Uploaded by

mersha abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Chapter-3

Unsupervised Learning
 Introduction to Unsupervised Learning
 Types of Unsupervised Learning
 Clustering Algorithms Such as K-Means Clustering
 Evaluation Metrics in Unsupervised Learning
 Challenges in Unsupervised Learning
 Applications of Unsupervised Learning
Compiled by: Wogayehu A.
Introduction to Unsupervised Learning
• Unsupervised learning is a branch of machine learning that deals with
unlabeled data.
• Unlike supervised learning, where the data is labeled with a specific
category or outcome, unsupervised learning algorithms are tasked
with finding patterns and relationships within the data without any
prior knowledge of the data's meaning.
• Unsupervised machine learning algorithms find hidden patterns and
data without any human intervention, i.e., we don't give output to
our model.
• The training model has only input parameter values and discovers the
groups or patterns on its own.
• Unsupervised Learning finds patterns or structures in data without
labeled outputs.
• Goal: Discover underlying structure (e.g., clusters, associations, low-
dimensional representations).
2
Introduction to Unsupervised Learning
• In general, How Unsupervised Learning Works
 Unsupervised learning works by analyzing unlabeled data to identify
patterns and relationships.
 The data is not labeled with any predefined categories or outcomes,
so the algorithm must find these patterns and relationships on its
own.
 This can be a challenging task, but it can also be very rewarding, as it
can reveal insights into the data that would not be apparent from a
labeled dataset.
• The input to the unsupervised learning models is as follows:
 Unstructured data: May contain noisy(meaningless) data, missing values, or
unknown data. Unstructured data is often more powerful for unsupervised learning
because it carries more hidden patterns (e.g., in text, images, audio). But it also
requires more preprocessing and computational resources.
 Unlabeled data: Data only contains a value for input parameters; there is no
targeted value(output). It is easy to collect as compared to the labeled one in the
Supervised approach.
3
Introduction to Unsupervised Learning
Key Characteristics of Unsupervised Learning:
o No Labeled Data: This is the most defining feature. The input data
doesn't have predefined categories or target outputs.
o Discovery of Hidden Structures: The primary goal is to uncover inherent
patterns, groupings, and relationships within the data that might not be
immediately obvious.
o No Feedback: The algorithm doesn't receive feedback on the correctness
of its "predictions" during training. It learns by exploring the data and
discovering patterns on its own.
o Data Exploration and Insight Generation: It's incredibly useful for
exploring new datasets, understanding their underlying organization,
and generating insights when you don't have a clear idea of what you're
looking for.
o Feature Learning: Algorithms can automatically learn relevant features
or representations from raw data, which can be useful for further
analysis or modeling
4
Types of Unsupervised Learning
• Unsupervised learning primarily involves three types of tasks:-
 Clustering---This involves grouping similar data points together into
"clusters“ based on similarities or patterns. Data points within the same
cluster are more similar to each other than to those in other clusters.
 Association Rule Learning--This aims to discover interesting relationships or
"rules" between variables in large datasets. It identifies items that
frequently occur together. Discovers relationships and dependencies
between variables in a dataset.
 Dimensionality Reduction---This technique reduces the number of features
(or dimensions) in a dataset while retaining as much relevant information
as possible. This is useful for:
 Simplifying Data: Making high-dimensional data easier to visualize
and understand.
 Improving Model Performance: Reducing noise and preventing
overfitting in other machine learning models.
 Feature Extraction: Creating new, more meaningful features from
existing ones.
5
Types of Unsupervised Learning
• Unsupervised learning primarily involves three types of tasks:-

Autoencoders

• Clustering:- Similar to classification but without predefined classes.

Clustering aims to group similar data points based on their features.
• Association Rule Learning: Identifies relationships or associations
between variables in large datasets.
• Dimensionality reduction:- Simplifies datasets by reducing the
number of features while retaining essential information.
6
Clustering
• Clustering is a technique used to group similar items or data points
together based on certain characteristics or features.
• Clustering can help to identify data points that are far away from the
dataset (outliers) or variations in a dataset.
• It is an unsupervised machine learning algorithm that organizes and
classifies different objects, data points, or observations into groups
or clusters based on similarities or patterns.
• Examples of clusters can include genres of music, different groups of
users, key segments of a market segmentation, types of network
traffic on a server cluster, friend groups in a social network, or many
other kinds of categories.
• The process of clustering can use just one feature of the data or it
can use all of the features present in the data.

7
Clustering

• Clustering is a tool in data science for data analysis and machine

learning to group similar data points together into “clusters.”
• The goal of clustering is to find patterns in data and group similar
data points together, while separating dissimilar data points into
different groups.
• For example, let’s say you have a customer dataset containing their
age, income, and location. You could use clustering to group together
customers who are similar in terms of age and income, and separate
out customers who are very different in terms of these
characteristics. This might be useful for a business that wants to
target marketing campaigns to specific customer segments.
8
Clustering
• Example: Patient Segmentation for Lung Cancer Diagnosis
• Objective: To group patients with similar clinical and diagnostic
patterns related to lung cancer, helping doctors in early detection,
risk profiling, and treatment planning, even when clear labels
(diagnosed vs not) are not available.
• Input Data (Patient Features): Collected from CT scans, medical history, and tests:
o Age
o Smoking history (pack-years)
o Cough frequency
o Presence of chest pain
o Nodule size from CT scan
o Shortness of breath score
o Lung function test results (e.g., FEV1)
o Family history of cancer
o Blood oxygen level

9
Clustering
• Task: Clustering with K-Means or Hierarchical Clustering
• Group patients into clusters based on similarity in features.
• Possible Clustering Output:
Cluster Characteristics Interpretation
Older patients, heavy
smokers, large High risk – Likely
Cluster 1
nodules, low oxygen lung cancer
levels
Moderate-age,
moderate smoking, Medium risk – Needs
Cluster 2
medium-size nodules, monitoring
some symptoms
Younger, non-smokers,
Low risk – Routine
Cluster 3 no nodules, high lung
check-up only
function
• How It Helps:
o Doctors can focus diagnostic tests on high-risk clusters.
o Hospitals can prioritize resources (CT scans, specialist referrals) for critical groups.
o Early intervention may increase survival rates.

10
Type of Clustering
There are many different algorithms that can be used for clustering, such as
k-means clustering and hierarchical clustering. The most common clustering
types are:-

Hierarchical clustering, sometimes called connectivity-based clustering,

groups data points together based on the proximity and connectivity of their
attributes.
11
K-Means Clustering-Partition Clustering
• What is K-Means?
• K-Means is one of the most popular clustering algorithms in unsupervised
machine learning. It is used to group similar data points into clusters based
on their features.
• It is a partition-based clustering algorithm that divides a dataset into K
distinct non-overlapping clusters, where each data point belongs to the
cluster with the nearest mean (called the centroid).
• K-Means works iteratively to minimize the distance between data points
and their respective cluster centers (centroids).
• Objects are classified into a predefined number of groups.
o K = the number of clusters you want the algorithm to find in your data.
o You choose the value of K before the algorithm runs based on the features.
o For example, if K = 3, the algorithm will try to group the data into 3 clusters.
o "Means" refers to the centroid (average position) of a cluster.
12
K-Means Clustering-Distance Measure
• K-Means clustering uses distance to determine the similarity
between data points and cluster centroids. The most common
distance measure used is Euclidean Distance.
• Distance Measure will determine the similarity between two
elements and it will influence the shape of the clusters.
• Euclidean distance is the straight-line distance between two points in space.
• Types of distance Measure
o Euclidean distance measure
 the ordinary straight line which is the distance between two
points in Euclidean spaces. For two features: x and y
For two points:
Point A: A(x1,y1)A(x1,y1)
Point B: B(x2,y2)B(x2,y2)

13
K-Means Clustering-Distance Measure
• Euclidean Distance is the default and most widely used distance
metric in K-Means.
• Euclidean distance is used to assign points to the nearest centroid.

• New centroids are computed using the mean of all points in a cluster.

• The process repeats until the cluster assignments don’t change.

k-Means Clustering
• This algorithm partitions the dataset into a set number (k) of clusters.
• It randomly initializes k centroids and assigns each data point to the
nearest centroid. For example: If you want 3 clusters (k = 3), you
randomly place 3 centroids.
• The centroids are updated iteratively until the clusters are stable.
K-Means Clustering-Distance Measure
• Manhattan Distance measure

• Manhattan Distance measures the distance between two points

along axes at right angles.

• Squared Euclidean distance measure

• Squared Euclidean Distance measures the square of the straight-line (Euclidean)
distance between two points. It avoids taking the square root, making it faster to
compute (especially in machine learning algorithms like K-Means).
How does K-Means algorithm Works?

16
K-Means Clustering: Steps
1. Start: The algorithm begins.
2. Number of Clusters K: The first step is to define the
number of clusters, denoted by 'K', that the data
will be grouped into. This value is determined by
the user or through some heuristic methods.
3. Centroid: Initial K centroids are chosen. These are
essentially the initial central points for each of the
K clusters. The choice of initial centroids can be
random or based on some specific strategy.
4. Distance Objects to Centroids: Each data point
(object) in the dataset is assigned to the nearest
centroid. The distance is typically calculated using
metrics like Euclidean distance.
5. Grouping based on minimum Distance: After
calculating distances, each data point is assigned to
the cluster whose centroid is closest to it. This
forms initial groupings.
17
K-Means Clustering: Steps
6. Centroid has Converged? (Decision Point): This is the
core of the iterative process.
False: If the centroids have not converged (meaning their
positions are still changing significantly from one iteration to the
next), the process loops back to the "Centroid" step. In this
iteration, new centroids are calculated as the mean of all data
points currently assigned to that cluster. Then, steps 4 and 5 are
repeated with these new centroids.
True: If the centroids have converged (meaning their positions are
no longer changing significantly, or the assignments of data points
to clusters have stabilized), the algorithm proceeds to the "End"
step.

7. End: The algorithm terminates, and the final clusters

and their centroids are produced. In essence, the K-Means
algorithm works by iteratively performing two main steps:
•Assignment Step: Assigning each data point to its closest
centroid.
•Update Step: Recalculating the centroids based on the mean
of the data points assigned to each cluster.
This process continues until the cluster assignments no longer change
significantly, or a predefined number of iterations is reached, indicating
that the algorithm has converged
18 to a stable clustering solution.
How to decide the number of clusters?
Elbow Method(most common): The elbow method is to run K-Means clustering
on datasets where ‘K’ is referred as number of clusters.
• Plot how the total error (within-cluster sum of squares or WSS) decreases as K
increases.
• Pick the K at the "elbow" point — where adding more clusters gives diminishing
returns.
• The sum of squared error is defined as the sum of the squared distances between
each member of the cluster and its centroid :

Where is closest point to centroid.

We can see a very slow change in the values of

WSS after k=2 , so you should take that elbow point
value as the final number of clusters
• Silhouette Score, Gap Statistic, Domain
Knowledge are also other19 methods to decide K.
Example Problem using Euclidean Distance

• Cl
Example Problem

• Cl

Given
Example Problem

• Cl

Given

Once the distance calculation is finished, we need to assign the

datapoints to one the 3 clusters. If we have 3 clusters
We assign based on the minimum distance to the centroids.
Example Problem

• Cl

Now each data point is assigned to the respective cluster. Once you initially assign data
points to clusters in algorithms like K-Means, it is mandatory to compute new centroids
and reassign points until convergence. This is a core part of the K-Means algorithm.
Now Compute the mean and get the new centroids. Once we do the computation we
got new centroid which is

Make this centroids as current centroids

Example Problem

• Cl

Now we need to consider the new centroids as current centroids and compute the
distance again. Now do the assignments also based on the minimum distance from
each centroids. Snice C2 data point is grouped once in cluster 2 and then in cluster 1
later it is not converged. Or C2 moved from one cluster to another therefore we need to
compute a new Centroid again.
This becomes the current
centroid
Example Problem

• Cl

Again, B1 data point is Grouped once in cluster 2 and then in cluster 1 later it
is not converged. Or B1 moved from one cluster to another so, we need to
compute a new Centroid b/c not Converged still.
And make the new cluster becomes current
Cluster.
Example Problem

• Cl

When we look the previous assignment and the current assignment both are
the same. It means that all the data points are converged to this new clusters
and it is the final step based Euclidean distance b/c all the data points are
Converged.
Finally, We need to write down the final cluster as datapoints A1, B1, C2
are in Claster 1, & A3, B2, B3 are in cluster 2 and A2 and C1 are in cluster 3.
Exercise with 1D clustering

Assign each 1D data points to two clusters as C1 and C2.

Exercise with 1D clustering

Euclidean Distance for 1D

Exercise with 1D clustering

After 1st iteration we will get this groups

& find the new centroids for the new
Cluster for coparison purpose
Exercise with 1D clustering

Once we get the new centroids it will become the current centroids for the 2 nd
Iteration and compute the distances again
Exercise with 1D clustering

After we grouped the data points based on the minimum distances, we have
To compare the clusters between the previous one and the current cluster.
If there are variations between the two it is not converged yet and we
Need to compute another centroids until it becomes converged.
Exercise with 1D clustering

Now we need to check the clusters still it is not converged so we need

to compute the new centroids. Move the new cluster assignment to current
Cluster assignment and compute the distances again.
Exercise with 1D clustering

Now all the data points are converged b/c there is no variation between the
Previous assignments and the current assignments. And we need to put the
final cluster also like above given.
Pros and Cons: K-Means Clustering
Pros:
o Simple and understandable
o Items automatically assigned to cluster
Cons:
o Must define number of clusters
o Hard-cluster
o Need to select the initial centroids for each of the clusters
o Unable to handle noise data and outliers

34
Fuzzy C-Means Clustering
• Fuzzy C-means is an extension of K-Means, the popular
simple clustering techniques.
• Fuzzy clustering(also referred to as soft clustering) is a form
of clustering in which each point can belong to more than one
cluster.
• Fuzzy C-Means is an unsupervised clustering algorithm like K-
Means, but with one big difference:
 Instead of assigning each data point to one single cluster, Fuzzy C-Means
allows each point to belong to multiple clusters with varying degrees of
membership.
o "Fuzzy" means soft or partial membership.
o C is the number of clusters, just like K in K-Means.
o So “Fuzzy C-Means” = Fuzzy Clustering with C Clusters.

35
Fuzzy C-Means Clustering
• Fuzzy C-Means (FCM) is a soft clustering algorithm that allows data
points to belong to multiple clusters with varying degrees of
membership, unlike "hard" clustering methods like K-means, where
each data point belongs exclusively to one cluster.

• The core idea of FCM is to minimize an objective function that

represents the sum of squared errors between data points and
cluster centers, weighted by their membership degrees.
• This iterative optimization process moves the cluster centers to the
"right" location within the dataset.

36
Fuzzy C-Means Clustering

37
Evaluation Metrics in Unsupervised Learning
• • For Clustering:
 Silhouette Coefficient
 Davies-Bouldin Index
 Adjusted Rand Index
 Confusion Matrix (when ground truth is available)

• • For Dimensionality Reduction:

 Reconstruction Error
 Visualization

38
Challenges of Unsupervised Learning
•No Ground Truth Labels
•Interpretability of Clusters
•Choice of Distance Metrics
•Scalability

Applications of Unsupervised Learning

•Customer Segmentation
•Recommender Systems
•Anomaly Detection (e.g., Fraud Detection)
•Document and Text Clustering

39
Applications ……
 Recommendation Systems: Suggesting products or content
based on the behaviors of similar users.
 Anomaly/Outlier Detection: Identifying unusual data points
that deviate significantly from the norm (e.g., fraud
detection, network intrusion).
 Natural Language Processing (NLP): Categorizing text, topic
modeling, and understanding relationships between
words.
 Bioinformatics: Grouping genes or proteins with similar
functions.
 Data Preprocessing: Cleaning and preparing data for other
machine learning tasks.
40
Thank You !!!

Unsupervised Learning
No ratings yet
Unsupervised Learning
9 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
13 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Unsupervised Lec
No ratings yet
Unsupervised Lec
12 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Unit 4
No ratings yet
Unit 4
53 pages
Unit 5
No ratings yet
Unit 5
44 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
43 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
9 pages
Ai - W8L15
No ratings yet
Ai - W8L15
44 pages
Unit 4
No ratings yet
Unit 4
96 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
Machine Learning Unsupervised Learning Methods
No ratings yet
Machine Learning Unsupervised Learning Methods
10 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
CH 5
No ratings yet
CH 5
34 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ML UNIT-4 Answers
No ratings yet
ML UNIT-4 Answers
19 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
ML & DA Unit3
No ratings yet
ML & DA Unit3
25 pages
Unit 4
No ratings yet
Unit 4
62 pages
U5 Unsupervised Learning
No ratings yet
U5 Unsupervised Learning
15 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Module 6.1
No ratings yet
Module 6.1
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
14 pages
Module 3
No ratings yet
Module 3
17 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Unsupervised Learning in Deep Learning
No ratings yet
Unsupervised Learning in Deep Learning
51 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
M Learning
No ratings yet
M Learning
11 pages
Assignment 3
No ratings yet
Assignment 3
22 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Unit IV
No ratings yet
Unit IV
96 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
9 Som
No ratings yet
9 Som
32 pages
Unsuper Clustering
No ratings yet
Unsuper Clustering
220 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
L05 Unsupervised Learning - Overview
No ratings yet
L05 Unsupervised Learning - Overview
16 pages
Week 11 - Unsupervised Learning
No ratings yet
Week 11 - Unsupervised Learning
44 pages
Chapter Two
No ratings yet
Chapter Two
68 pages
Technical Part of SDD Refrigerator
No ratings yet
Technical Part of SDD Refrigerator
27 pages
40m3 Combo
No ratings yet
40m3 Combo
56 pages
Chapter 1
100% (1)
Chapter 1
47 pages
SDD Refrigerators
No ratings yet
SDD Refrigerators
49 pages
4-Chapter Four - Dynamic Programming
No ratings yet
4-Chapter Four - Dynamic Programming
42 pages
3 Preventive Maintenance of SDD Refrigerator 1
No ratings yet
3 Preventive Maintenance of SDD Refrigerator 1
26 pages
3-Chapter Three - Divide and Conquer
No ratings yet
3-Chapter Three - Divide and Conquer
52 pages
1-Chapter One - Analysis of Algorithms
No ratings yet
1-Chapter One - Analysis of Algorithms
31 pages
Research Proposal PowerPoint
No ratings yet
Research Proposal PowerPoint
15 pages
Introduction On SDD Instalation
No ratings yet
Introduction On SDD Instalation
11 pages
PM - Mid-Level CCEs Training Manual Approved
No ratings yet
PM - Mid-Level CCEs Training Manual Approved
149 pages
Pos 408
No ratings yet
Pos 408
16 pages
Case Study - Unit I
No ratings yet
Case Study - Unit I
4 pages
Mastering Data Modeling - A Comprehensive Guide To Conceptual, Logical, and Physical Models - by Nilimesh Halder, PHD - Medium
No ratings yet
Mastering Data Modeling - A Comprehensive Guide To Conceptual, Logical, and Physical Models - by Nilimesh Halder, PHD - Medium
9 pages
SQL MCQ
100% (1)
SQL MCQ
7 pages
Week 3 PSOSM - NPTEL
No ratings yet
Week 3 PSOSM - NPTEL
6 pages
Human-Eye Controlled Virtual Mouse
No ratings yet
Human-Eye Controlled Virtual Mouse
9 pages
Major Project
No ratings yet
Major Project
9 pages
Fundamentals of Database System Course Outline
No ratings yet
Fundamentals of Database System Course Outline
3 pages
Mapping Logical Data Model To Relational Schema (Physical Data Model)
No ratings yet
Mapping Logical Data Model To Relational Schema (Physical Data Model)
31 pages
Practical 2:: Analyzing Data With Pivot Tables
0% (1)
Practical 2:: Analyzing Data With Pivot Tables
10 pages
Distributed DBMS - Failure & Commit
No ratings yet
Distributed DBMS - Failure & Commit
4 pages
Journal Citation Reports 2025 Presentation MENA
No ratings yet
Journal Citation Reports 2025 Presentation MENA
28 pages
Technical Report
No ratings yet
Technical Report
8 pages
Cognitive Tech in Financial Audits
No ratings yet
Cognitive Tech in Financial Audits
2 pages
Python Bank Management System Guide
No ratings yet
Python Bank Management System Guide
7 pages
Nishant Agarwal Resume
No ratings yet
Nishant Agarwal Resume
2 pages
Kasus: Sistem Informasi Dan Aktivitas Pengendalian Internal
No ratings yet
Kasus: Sistem Informasi Dan Aktivitas Pengendalian Internal
2 pages
IT - 4201, HCI, Lecture - 1
No ratings yet
IT - 4201, HCI, Lecture - 1
19 pages
Building The Unified Data Warehouse and Data Lake: Best Practices Report Q2
No ratings yet
Building The Unified Data Warehouse and Data Lake: Best Practices Report Q2
30 pages
Azure DevOps Services Overview
No ratings yet
Azure DevOps Services Overview
2 pages
Data Warehousing Business Intelligence Syllabus - 1921-T5 PDF
No ratings yet
Data Warehousing Business Intelligence Syllabus - 1921-T5 PDF
7 pages
9 - Ict - T2 - Revsion Material - MS - 2022-23
No ratings yet
9 - Ict - T2 - Revsion Material - MS - 2022-23
5 pages
Intro to Computer Hardware Basics
No ratings yet
Intro to Computer Hardware Basics
72 pages
DA 100 Exam Practice Questions
100% (1)
DA 100 Exam Practice Questions
21 pages
Object Detection
No ratings yet
Object Detection
7 pages
Sms Spam Filtering Pres
No ratings yet
Sms Spam Filtering Pres
18 pages
Annexure - Ipt - Int Report
No ratings yet
Annexure - Ipt - Int Report
2 pages
#### JUIT (Pre-Registration Choice Send To HOD)
No ratings yet
#### JUIT (Pre-Registration Choice Send To HOD)
2 pages
Distributed Systems: File Models
No ratings yet
Distributed Systems: File Models
25 pages
Barangay Info System Thesis Help
100% (3)
Barangay Info System Thesis Help
4 pages

Chapter 3 Unsupervised Machine Learning

Uploaded by

Chapter 3 Unsupervised Machine Learning

Uploaded by

Chapter-3

• Clustering:- Similar to classification but without predefined classes.

• Clustering is a tool in data science for data analysis and machine

Hierarchical clustering, sometimes called connectivity-based clustering,

• The process repeats until the cluster assignments don’t change.

• Manhattan Distance measures the distance between two points

• Squared Euclidean distance measure

7. End: The algorithm terminates, and the final clusters

Where is closest point to centroid.

We can see a very slow change in the values of

Once the distance calculation is finished, we need to assign the

Make this centroids as current centroids

Assign each 1D data points to two clusters as C1 and C2.

Euclidean Distance for 1D

After 1st iteration we will get this groups

Now we need to check the clusters still it is not converged so we need

• The core idea of FCM is to minimize an objective function that

• • For Dimensionality Reduction:

Applications of Unsupervised Learning

You might also like