0% found this document useful (0 votes)

41 views18 pages

Unsupervised Learning

The document discusses unsupervised learning, particularly focusing on clustering as a key technique for identifying intrinsic structures in data without predefined labels. It covers various clustering methods, including K-means and DBSCAN, and highlights their applications in real-world scenarios like customer segmentation and anomaly detection. The conclusion emphasizes the ongoing development of clustering algorithms and their practical significance across multiple fields.

Uploaded by

preyanshi555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views18 pages

Unsupervised Learning

Uploaded by

preyanshi555

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

ASSIGNMENT 4:

Unsupervised
Learning

Made BY: Preyanshi

Enrollment
No:226140307031
Supervised learning vs.
unsupervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class) attribute.
 These patterns are then utilized to predict the values of the target
attribute in future data instances.

• Unsupervised learning: The data have no target attribute.

 We want to explore the data to find some intrinsic structures in them.

2
Clustering
• Clustering is a technique for finding similarity groups
in data, called clusters. I.e.,
 it groups data instances that are similar to (near) each
other in one cluster and data instances that are very
different (far away) from each other into different clusters.
• Clustering is often called an unsupervised
learning task as no class values denoting an a
priori grouping of the data instances are given,
which is the case in supervised learning.
• Due to historical reasons, clustering is often
considered synonymous with unsupervised learning.
 In fact, association rule mining is also unsupervised

• This chapter focuses on clustering.

3
An illustration
• The data set has three natural groups of data
points, i.e., 3 natural clusters.

CS583, Bing Liu, UIC 4

What is clustering for?
• Let us see some real-life examples
• Example 1: groups people of similar sizes together to make “small”,
“medium” and “large” T-Shirts.
 Tailor-made for each person: too expensive
 One-size-fits-all: does not fit all.

• Example 2: In marketing, segment customers according to their

similarities
 To do targeted marketing.

5
What is clustering for?
(cont…)
• Example 3: Given a collection of text documents, we want to
organize them according to their content similarities,
 To produce a topic hierarchy

• In fact, clustering is one of the most utilized data mining

techniques.
 It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing, insurance,
libraries, etc.
 In recent years, due to the rapid increase of online documents, text
clustering becomes important.

6
K-means clustering
• K-means is a partitional clustering algorithm
• Let the set of data points (or instances) D be

{x1, x2, …, xn},

where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is
the number of attributes (dimensions) in the data.

• The k-means algorithm partitions the given data into k clusters.

 Each cluster has a cluster center, called centroid.
 k is specified by the user

7
K-means algorithm
• Given k, the k-means algorithm works as follows:
1)Randomly choose k data points (seeds) to be the initial centroids, cluster
centers
2)Assign each data point to the closest centroid
3)Re-compute the centroids using the current cluster memberships.
4)If a convergence criterion is not met, go to 2).

8
K-means algorithm – (cont
…)

9
K-means summary
• Despite weaknesses, k-means is still the most popular algorithm
due to its simplicity, efficiency and
 other clustering algorithms have their own lists of weaknesses.

• No clear evidence that any other clustering algorithm performs

better in general
 although they may be more suitable for some specific types of data or
applications.

• Comparing different clustering algorithms is a difficult task. No one

knows the correct clusters!

10
Common ways to represent
clusters
• Use the centroid of each cluster to represent the cluster.
 compute the radius and
 standard deviation of the cluster to determine its spread in each
dimension

 The centroid representation alone works well if the clusters are of the
hyper-spherical shape.
 If clusters are elongated or are of other shapes, centroids are not
sufficient

1
Hierarchical Clustering
• Produce a nested sequence of clusters, a
tree, also called Dendrogram.

CS583, Bing Liu, UIC 12

Using classification model
• All the data points in a
cluster are regarded
to have the same
class label, e.g., the
cluster ID.
 run a supervised
learning algorithm on
the data to find a
classification model.

CS583, Bing Liu, UIC 13

DBSCAN Application
• Real-Time Problem: Anomaly Detection in
Credit Card Transactions
• Objective: Detect fraudulent credit card
transactions.
• Dataset: Transaction records including amount,
location, and time.
• Process:
• Apply DBSCAN to cluster normal transactions while
identifying outliers.
• DBSCAN is effective because it does not assume
spherical clusters and can detect outliers.

• Result: Detect anomalies that may indicate

fraudulent activity.

14
Apriori Algorithm
Application
• Real-Time Problem: Optimizing Product
Placement in Retail
• Objective: Identify frequently purchased items
together to improve store layout and product
recommendations.
• Dataset: Transaction data from a large retail store.
• Process:
• Apply the Apriori algorithm to find association rules
between products (e.g., milk and bread are often bought
together).
• Set a minimum support and confidence to filter the rules.

• Result: Store layouts are redesigned to place

frequently bought-together items closer, boosting sales
by cross-promoting products.

15
Conclusion and Key
Takeaways
• Unsupervised Learning is powerful for uncovering
hidden patterns in unlabeled data.
• Real-Time Applications:
• Customer segmentation (K-Means)
• Anomaly detection (DBSCAN)
• Market basket analysis (Apriori)
• Case Study: Retail industry benefits from association
rule mining to improve sales and customer
experience.

16
Summary
• Clustering is has along history and still active
 There are a huge number of clustering algorithms
 More are still coming every year.
• We only introduced several main algorithms. There
are many others, e.g.,
 density based algorithm, sub-space clustering, scale-up
methods, neural networks based methods, fuzzy clustering,
co-clustering, etc.
• Clustering is hard to evaluate, but very useful in
practice. This partially explains why there are still a
large number of clustering algorithms being devised
every year.
• Clustering is highly application dependent and to
some extent subjective.
17
•Thank You!

18

Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
13 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Unit 3
No ratings yet
Unit 3
34 pages
Clustering
No ratings yet
Clustering
38 pages
Unit 4
No ratings yet
Unit 4
53 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Lec7 ML
No ratings yet
Lec7 ML
13 pages
ML UNIT-4 Answers
No ratings yet
ML UNIT-4 Answers
19 pages
K Means
No ratings yet
K Means
9 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
20 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
UnSupervised Learning
No ratings yet
UnSupervised Learning
3 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unit 4
No ratings yet
Unit 4
62 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
Unit 4
No ratings yet
Unit 4
16 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Unit 4
No ratings yet
Unit 4
74 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
43 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Module 3
No ratings yet
Module 3
21 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
5 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Unit 6
No ratings yet
Unit 6
22 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
Unsupervised Learning: Niveditha. GH
No ratings yet
Unsupervised Learning: Niveditha. GH
10 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Clustering
No ratings yet
Clustering
4 pages
Clustering
No ratings yet
Clustering
67 pages
Math Book - 8 Perimeter Area and Volume
No ratings yet
Math Book - 8 Perimeter Area and Volume
8 pages
As 4392.1-1996 Heavy Mineral Sands - Analysis by Wavelength Dispersive X-Ray Fluorescence Spectrometry Titani
No ratings yet
As 4392.1-1996 Heavy Mineral Sands - Analysis by Wavelength Dispersive X-Ray Fluorescence Spectrometry Titani
6 pages
0580 w16 QP 12
No ratings yet
0580 w16 QP 12
12 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Solved ISRO Scientist or Engineer Civil 2013 Paper With Solutions
No ratings yet
Solved ISRO Scientist or Engineer Civil 2013 Paper With Solutions
21 pages
Measure Angles Up To 180°: The Angle Marked Is 30 Degrees
No ratings yet
Measure Angles Up To 180°: The Angle Marked Is 30 Degrees
2 pages
Mass Spectrometry: Quadrupole Mass Filter: Advanced Lab, Jan. 2008
No ratings yet
Mass Spectrometry: Quadrupole Mass Filter: Advanced Lab, Jan. 2008
8 pages
Acct 4103 Solve
No ratings yet
Acct 4103 Solve
10 pages
CAT Prep Schedule & Sessions
No ratings yet
CAT Prep Schedule & Sessions
10 pages
2019 20TOG ElementaryGrade3
No ratings yet
2019 20TOG ElementaryGrade3
34 pages
Theory of Tensile Test Engineering Essay PDF
No ratings yet
Theory of Tensile Test Engineering Essay PDF
8 pages
Wind Pressure Calculation ASCE 7-05
100% (1)
Wind Pressure Calculation ASCE 7-05
8 pages
CMP2015 - A Review of Gas Dispersion Studies in Flotation PL
No ratings yet
CMP2015 - A Review of Gas Dispersion Studies in Flotation PL
27 pages
Engineering Lab: Mass-Spring Dynamics
No ratings yet
Engineering Lab: Mass-Spring Dynamics
5 pages
Journal of Statistical Software: Learning Bayesian Networks With The Bnlearn R Package
No ratings yet
Journal of Statistical Software: Learning Bayesian Networks With The Bnlearn R Package
22 pages
Chain Rule
100% (1)
Chain Rule
3 pages
Lesson 19 Homework 4.1
100% (1)
Lesson 19 Homework 4.1
8 pages
International Journal of Solids and Structures: J.A. Sanz-Herrera, A.R. Boccaccini
No ratings yet
International Journal of Solids and Structures: J.A. Sanz-Herrera, A.R. Boccaccini
12 pages
Observer-Based Reduced Order Controller Design For The Stabilization of Large Scale Linear Discrete-Time Control Systems
No ratings yet
Observer-Based Reduced Order Controller Design For The Stabilization of Large Scale Linear Discrete-Time Control Systems
11 pages
Essential Elements Math Pacing Guide February Part 1
No ratings yet
Essential Elements Math Pacing Guide February Part 1
90 pages
Instant Access To Waves and Particles Two Essays On Fundamental Physics 1st Edition Roger G Newton Ebook Full Chapters
100% (15)
Instant Access To Waves and Particles Two Essays On Fundamental Physics 1st Edition Roger G Newton Ebook Full Chapters
85 pages
CSIR NET Physical Sciences Syllabus
No ratings yet
CSIR NET Physical Sciences Syllabus
4 pages
Grade 8 Math Exam Paper 2 - Nov 2023
No ratings yet
Grade 8 Math Exam Paper 2 - Nov 2023
17 pages
Basic Statistics Solutions Guide
No ratings yet
Basic Statistics Solutions Guide
3 pages
C Programming Essentials
No ratings yet
C Programming Essentials
49 pages
Reading and Writing Decimals
No ratings yet
Reading and Writing Decimals
12 pages
Share Full (Probability) Tests and Solutions (1 - 11)
No ratings yet
Share Full (Probability) Tests and Solutions (1 - 11)
105 pages
Collectanea Hermetica 1000149697
100% (2)
Collectanea Hermetica 1000149697
126 pages
MKSH
No ratings yet
MKSH
47 pages
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
No ratings yet
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
11 pages

Unsupervised Learning

Uploaded by

Unsupervised Learning

Uploaded by

ASSIGNMENT 4:

Made BY: Preyanshi

• Unsupervised learning: The data have no target attribute.

• This chapter focuses on clustering.

CS583, Bing Liu, UIC 4

• Example 2: In marketing, segment customers according to their

• In fact, clustering is one of the most utilized data mining

{x1, x2, …, xn},

• The k-means algorithm partitions the given data into k clusters.

• No clear evidence that any other clustering algorithm performs

• Comparing different clustering algorithms is a difficult task. No one

CS583, Bing Liu, UIC 12

CS583, Bing Liu, UIC 13

• Result: Detect anomalies that may indicate

• Result: Store layouts are redesigned to place

You might also like