KEMBAR78
Unsupervised Machine Learning Techniques | PDF | Cluster Analysis | Machine Learning
0% found this document useful (0 votes)
13 views58 pages

Unsupervised Machine Learning Techniques

Uploaded by

banadawithunde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views58 pages

Unsupervised Machine Learning Techniques

Uploaded by

banadawithunde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Unsupervised Machine

Learning
Techniques
Outlin
e
• Unsupervised
learning
• Associated rule
mining
Unsupervised
Learning
• Group unstructured data according to its similarities
and
distinct patterns in the dataset

• No labels are given to the learning algorithms

• Can be a goal in itself or a means toward an end

• Unsupervised Learning is harder as compared to


Supervised Learning tasks.

• How do we know if results are meaningful since no


answer labels are available?
Unsupervised
Learning
• Some applications of unsupervised machine learning
techniques include:
– Clustering -- automatically split the dataset into
groups according to similarity.
– Anomaly detection -- discovers unusual data points in
your dataset.
– Association mining -- identifies sets of items that
frequently
occur together in your dataset.
– Latent variable models are commonly used for
data preprocessing--(dimensionality reduction)
What is a Good
cluster
• The quality of a clustering result depends on:
– The similarity measure used by the method and
its implementation.
– Its ability to discover some or all of the hidden
patterns.

• A good clustering method will produce high quality


clusters in which:
– The intra-class similarity is low.
– The inter-class similarity is high.
Basic Steps in
Clustering
• Feature Selection– minimal information redundancy
• Proximity measure
– Similarity of two feature vectors
• Clustering criterion
– Expressed via a cost function or some rules
• Clustering algorithms – choice
• Validation of the result
• Interpretation of the result – integration with
application
Types of
Clustering
• The major clustering methods can be classified into:
– Partitioning methods: Given a set of n objects, a
partitioning method constructs k partitions of the
data
• Each partition represents a cluster and k <= n.
• Typical methods: K-means, K-mediods,
CLARANS
– Hierarchical methods: Creates a
hierarchical decomposition of the given
set of data objects
• Can be classified as being either
agglomerative(bottom-up) or divisive(top-
down).
Types of
Clustering
– Density-based approach: based on connectivity
and
density functions

• Typical methods: DBSACN, OPTICS, DenClue


– Grid-based approach: based on a multiple-level
granularity structure

• Typical methods:STING, WaveCluster, CLIQUE


Distance
Measures
• Assume a k-dimensional Euclidean space, the
distance
between two points, x = [x1, x2,…,xk] and y = [y1,
y2,…,yk]
Distance between
clusters
• Single link: smallest distance between an element in
one
cluster and an element in the other dist
• Complete link: largest distance between an element in
one cluster and an element in the other
• Average: avg distance between an element in one cluster
and an element in the other cluster

• Centroid : distance between the cetroid of two clusters


• Medoid: distance between the medoids of two clusters
• Medoid: the most centrally located point within that
Centroid-based clustering
techniques
• Given k, find a partition of k clusters that optimize the
chosen
partitioning criterion.
• Uses the centroid (center point) of a cluster, to represent
that cluster.

– kmeans or kmedoid
How does the k-means algorithm
works
• Randomly selects k of the objects in D, each of which
initially
represents a cluster mean or center
• ‘Closeness’ is measured by Euclidean distance
• K-means algorithm then iteratively improves the
within- cluster variation
• Most of the convergence happens in the first few
iterations.
• What is the complexity of k-means clustering alogirhtm?
K-means clustering
variants
• Handling categorical data: k-modes
– Replace means of clusters with modes

– Using new dissimilarity measures to deal with categorical


data

– Using a frequency-based method to update the modes of


clusters

– A mixture of continuous and categorical data – k-prototype


K-means clustering-
Example
- Assume Euclidean distance
i x1 x2 - Start by picking k, the number
A 1 1 of clusters
- Initialize clusters y picking one point
B 1 0
per cluster
C 0 2
D 2 4 - Let k = 2, let us choice observations
E 3 5 A & C as the two cluster means
(mean centroids)
(𝑥1 − 𝑥2)2+(𝑦1 − 𝑦2)2 Calculate the Euclidean distance
b/n
observations
𝑑𝑖𝑠 𝐴,𝐵 = (1 − 1) 2 +(1 − 0) 2
𝑑𝑖𝑠𝑑𝑖𝑠 𝐵,
𝐵,𝐷𝐶 =
= (1 −− 0)
(1 +(0 −− 2)
2)22+(0 4)22
𝑑𝑖𝑠 𝐴, 𝐶 = (1 − 0) 2 +(1 − 2) 2
𝑑𝑖𝑠 𝐵,𝐸 = (1 − 3) 2 +(0 − 5) 2
𝑑𝑖𝑠 𝐴,𝐷 = (1 − 2) 2 +(1 − 4) 2
𝑑𝑖𝑠 𝐴,𝐸 = (1 − 3) 2 +(1 − 5) 2 𝑑𝑖𝑠 𝐶,𝐷 = (0 − 2) 2 +(2 − 4) 2
𝑑𝑖𝑠 𝐶,𝐸 = (0 − 3) 2 +(2 − 5) 2
K-means clustering-
Example
i C1 C2 - B is grouped in cluster 1 since 1 <2.2
A 0 1.4 - While C, D & E is grouped in C2
B 1 2.2 - There for C1 ={A, B} and C2={C,D,E}
- Then update the location value of
C 1.4 0 centroids 𝑚𝑒𝑎𝑛𝐶1= (1, 0.5) and 𝑚𝑒𝑎𝑛𝐶2=
D 3.2 2.8 (1.7,3.7)
E 4.5 2.5 - Next recalculate the distance of each
point from the cluster mean

i C1 C2 - A, B & C are in cluster 1


A 0.5 2.7 - D & E are in cluster 2
- Then calculate cluster means for 𝑚𝑒𝑎𝑛𝐶
B 0.5 3.7
1= (0.7, 1) and 𝑚𝑒𝑎𝑛𝐶2= (2.5,3.7)
C 1.8 2.4 - Then recalculate the distance of each
D 3.6 0.5 point from the cluster means
- Therefor when k =2 the cluster
E 4.9 1.9 convergences
The K-mediod Clustering
Method
• K-medioids clustering- find representative objects
(mediods)
in clusters
– PAM (Partitioning Around Medoids)
• Start from the initial set of medioids and iteratively
replaces one of the medioids by one of the non-
medoids if it improves the total distance of the
resulting cluster
• Works effectively for small data set but due to
computational complexity not scale for large data set
– Efficiency improvement on PAM
• CLARA and CLARANS
Advantage Vs
Disadvantage
Hierarchical
Clustering
• Produces a set of nested clusters organized as a
hierarchical
tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges

Hierarchical clustering
starts with k = N clusters
and proceed by merging
the two closest objects
into one cluster, obtaining
k- =This process repeated
N-1 clusters.
The cluster of all objects
until we reach the desired
is the root of the tree number of clusters K.
Strength of
HC
• Do not have to assume any particular number of
clusters
– Any desired number of clusters can be obtained by
’cutting’ the dendogram at the proper level

• They may correspond to meaningful taxonomies


– Example in biological sciences (e.g., animal kingdom,
phylogeny reconstruction, ...)
Hierarchical
• Two main types Clustering
of hierarchical
clustering
– Agglomerative:
• Start with the points as
individual clusters– uses
bottom up strategy
• At each step, merge the closest
pair of
clusters until only one cluster left
– Divisive:
• Starts by placing all objects in one
cluster– employs a top-down
strategy
• At each step, split a cluster until
each cluster contains a point
Agglomerative Clustering
Algorithm
More popular hierarchical clustering
technique
Association Rule
Mining
• Association analysis is useful for discovering
relationship interesting hidden in
frequent
s
occurrence large
of items in data sets ---
a dataset co
-
• The uncovered relationships can be represented in the
form of association rules or sets of frequent items.

• Due to its good scalability characteristics association


rules are an essential data mining tool for extracting
knowledge from data.
Association Rule Mining– Application
Areas
• Market-basket data
analysis,

• Catalog design

• Customizing store layout

• Data preprocessing
recommendation systems e.g. for
• Personalization
an
• Analysis of genomic
d browsing web
data
pages
Association-
Example
• The following rule can be extracted from the data set
shown
in the previous Table:

• Diapers is referred to as the antecedent and Beer


is the consequent
• The rule suggests that a strong relationship exists
between the sale of diapers and beer because many
customers who buy diapers also buy beer.
Item Set and Support
Counts
Support and
Confidence

Support and confidence are used to measure the quality of a given rule:
- Support (absolute frequency) tells us how many examples (transactions)
from a
data set that was used to generate the rule include items.
Support and
Confidence
Support and
Confidence

- Confidence (correlative frequency) expresses how many examples


(transactions) that include items from LHS also include items from RHS.
Why Support and
Confidence
• Support is often used to eliminate uninteresting rules.

• Confidence, on the other hand, measures the reliability of


the inference made by a rule.

• For a given rule X → Y , the higher the confidence, the


more likely it is for Y to be present in transactions that
contain X.

• Confidence also provides an estimate of the


conditional probability of Y given X.
Definition: Association
Rule
Association Rule Mining
Tasks
Mining Association
Rules
Mining Association
Rules
• Steps in ARM approach:
– Frequent Itemset Generation: generate all itemsets
whose support ≥ minsup

– Rule Generation: these rules must satisfy


minimum support and minimum confidence

• Frequent itemset generation is still


computationally expensive.
Frequency Itemset
Generation
Brute Force
Approach
Brute Force Approach –
Example
Computational
Complexity
Frequency Itemset
Generation
Reduce the Number of
Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must
also be frequent
• Apriori principle holds due to the following property of
the support measure:

• Support of an itemset never exceeds the support


of its subsets
– This is known as the anti-monotone property of
support
Apriori
algorithm
Apriori algorithm- How it
works
Illustrating Apriori
Principle
Rules Generation in Apriori
Algorithm
Apriori
summary
• Apriori is one of the earliest algorithms to have
successfully addressed the combinatorial explosion of
frequent itemset generation.
• Apply the Apriori principle to prune the exponential
search space.
• Incurs considerable I|O overhead since it requires
making several passes over the transaction data set.
• Performance degrade significantly for dense data sets
• Alternative methods have been developed to overcome
these limitations
Improving the Efficiency of
Apriori
• How can we further improve the efficiency of Apriori-
based
mining?
• Many variations of the Apriori algorithm have been
proposed:
– Hash-based technique: hashing itemsets
into corresponding buckets
– Transaction reduction: reducing the
number of transactions scanned in future
iterations
– Partitioning: partitioning the data to find candidate
itemsets
– Sampling: mining on a subset of the given data
– Dynamic itemset counting: adding candidate itemsets
FP-Growth
Algorithm
• Alternative algorithm that takes a radically different
approach
to discovering frequent itemsets.
• Encodes the data set using a compact data structure
called an FP-tree and extracts frequent itemsets directly
from this structure.

• Once an FP-tree has been constructed, it uses a


recursive divide and conquer approach to mine the
frequent itemsets.
FP-Growth
Algorithm

• As different transactions can have several items in


common, their paths may overlap.

• The more the paths overlap with one another, the


more compression we can achieve using the FP-
tree structure.
Apriori VS FP
Growth
FP Tree Data
Structure
FP – Tree
Size
Evaluation of
ASRs
• Interestingness measures can be used to prune/rank the
derived patterns --- contingency table for application of
interestingness measure.
Support and
Confidence
Other Interestingness
Measuring
techniques
Unsupervised Deep
Learning
• Unsupervised learning has also been extended to
neural nets
and deep learning.
– Restricted Boltzmann machines
– Deep belief networks
– Deep Boltzmann machines
– Nonlinear autoencoders
Unsupervised Deep
Learning
• One popular application of deep learning in an
unsupervised
fashion is called an Autoencoder.

- Autoencoder tries to figure out how to best represent our


input
data as itself, using a smaller amount of data than the
Unsupervised Deep
Learning
• Existingunsupervised deep learning methods
generally fall
into four different categories:
– Clustering analysis: optimise clustering analysis
and representation learning
– Samplespecificity learning: goes to the other extreme
by considering every single sample as an independent
class
– Self-supervised learning: exploit some
information intrinsically available in the unlabelled
training data
– Generative models: way of learning the true data
distribution of the training set in an unsupervised
manner
Unsupervised Deep
Learning

Figure 1. Illustration of three unsupervised learning


strategies (a) Clustering analysis (b) Sample specificity
learning
Unsolved Problems in Data
Science
• Machine learning leads mathematicians to
unsolvable
problem

You might also like