0% found this document useful (0 votes)

120 views16 pages

CZ4032 Data Analytics & Mining Notes

This document discusses various techniques for classification and clustering in machine learning. It covers topics like overfitting and underfitting in decision trees, instance-based classifiers like k-nearest neighbors, Bayesian classifiers, support vector machines, artificial neural networks, and ensemble methods. It also discusses different types of clustering like partitional, hierarchical, density-based clustering and the k-means clustering algorithm.

Uploaded by

Feng Chengxuan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views16 pages

CZ4032 Data Analytics & Mining Notes

Uploaded by

Feng Chengxuan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 4 – Classification: Alternative Techniques

Underfitting – when model is too simple, both training and test errors are large

 Insufficient Examples:
o Lack of data points in the lower half of the diagram makes it difficult to
predict correctly the class labels of that region
o Insufficient number of training records in the region causes the decision tree
to predict the test examples using other training records that are irrelevant
to the classification task

Overfitting – decision trees that are more complex than necessary

 The learning algorithm has access only to the training set during model building. It
has no knowledge of the test set
 Solution 1: Pre-Pruning
o Stop the algorithm before it becomes a fully-grown tree
o Typical stop conditions:
 All instances belong to the same class
 Attribute values are the same
o Restrictive conditions:
 Number of instances is less than some user-specified threshold
 Class distribution of instances are independent of the available
features
 Expanding the current node does not improve impurity measures
 Solution 2: Post-pruning
o Grow decision tree to its entirety
o Trim the nodes of decision tree in a bottom-up fashion
o If generalization error improves after trimming, replace sub-tree by a leaf
node
o Class label of a leaf node is determined from majority class of instances in the
sub-tree
o Can use Minimum Description Length (MDL) for post pruning
Estimating Generalization Errors

Occam’s Razor: Given two models of similar generalization errors, one should prefer the
simpler model over the more complex model

 For complex models, there is a greater chance that it was fitted accidentally by errors
in data
 Minimum Description Length:
o Cost (Model, Data) = Cost (Data|Model) + Cost (Model)
 Cost: number of bits needed for encoding
 Search for the least costly model
o Cost (Data|Model): misclassification errors
o Cost (Model): node encoding (number of children) + splitting condition
encoding
Instance Based Classifiers:

 Store the training records (without training explicit models)

 Use training records directly to predict the class label of unseen cases
 Rote Learner: memorizes entire training data and performs classification only if
attributes of record match one of the training examples exactly
 K-Nearest Neighbours (k-NN): Uses k “closest” points (nearest neighbours) for
performing classification
Nearest-Neighbour Classifier:

 Things Required:
o Set of stored training records
o Distance metric to compute distance between records
o Value of k, the number of nearest neighbours to retrieve
 If k is too small, sensitive to noise points
 If k is too large, neighbourhood may include points from other classes
 To classify an unknown record:
o Compute distance to other training records
o Identify k nearest neighbours
o The class labels of the nearest neighbours to determine the class label of the
unknown vote
 Computation:
o Euclidean distance:

o Determine the class from nearest neighbour list:

 Take majority vote of class labels among the k-nearest neighbours
 Weigh the vote according to distance (Weight factor, w = 1/d2)
 1 nearest-neighbour:
o Voronoi Diagram: boundaries denote 1-neighborhoods

Bayesian Classifier: A probabilistic framework for classification problems

Naïve Bayes Classifier

 If one of the conditional probabilities is zero, then the entire expression becomes
zero

Estimate Probabilities from Data:

 For continuous attributes:

o Discretize the range into bins
 One ordinal attribute per bin
 Violates independence assumption
o Two-way split: (A<v) or (A>v)
 Choose only one of the two splits as new attribute
o Probability density estimation:
 Assume attribute follows a normal distribution
 Use data to estimate parameters of distribution (e.g. mean and
standard deviation)
 Once probability distribution is known, can use it to estimate the
conditional probability P(Ai|c)
 Normal Distribution:
Support Vector Machines

 To know which line is better, we want to find a hyperplane that maximizes the
margin:

 If the problem is not linearly separable:

Nonlinear Support Vector Machines

 Transform data into higher dimensional space

 Using “Kernel Trick”, actual transformation need not be known
 Just compute similarity between two vectors in original space
Artificial Neural Networks (ANN)

 Perceptron Model is an assembly of inter-connected nodes and weighted links

 Output node sums up each of its input value according to the weights of its links
 Compare output node against some threshold t

 Algorithm for learning ANN:

o Initialize the weights (w0, w1, w2, …, wk)
o Adjust the weights in such a way that the output of ANN is consistent with
class labels of training examples
 Objective function:

 Find the weights wi that minimize the above objective function

e.g. backpropagation algorithm
Ensemble Methods:

 Construct a set of classifiers from the training data

 Predict class label previously unseen records by aggregating predictions made by
multiple classifer
 Assumption: Individual classifiers (voters) could be lousy (stupid), but the aggregate
(electorate) can usually classify (decide) correctly
 Bagging:
o Sampling with replacement
o Build classifier on each bootstrap sample
o Each sample has probability of (1 – (1-1/N) N of being selected
 Tends to be 0.63 for large N
 Boosting: An iterative procedure to adaptively change distribution of training data by
focusing more on previously misclassified records
o Initially, all N are assigned equal weights
o Unlike bagging, sampling weights may change at the end of boosting round
o Records that are wrongly classified will have their weights increased
o Records that are correctly classified will have their weights decreased
Lecture 6 – Cluster Analysis
Cluster Analysis: Finding groups of objects such that the objects in a group will be similar to
one another and different from to the objects in other groups.

 Types of Clustering:
o Partitional Clustering: A division data objects into non-overlapping subsets
(clusters) such that each data object is in exactly one subset
o Hierarchical Clustering: A set of nested clusters organized as a hierarchical
tree
o Contiguous Cluster (Nearest neighbour or Transitive): A cluster is a set of
points such that a point in a cluster is closer to one or more other points in
the cluster than to any point not in cluster
o Density-based: A cluster is a dense region or points, which is separated by
low-density regions, from other regions of high density
 Used when clusters are irregular or intertwined, and when noise and
outliers are present
o Conceptual Cluster: Finds clusters that share some common property or
represent a particular concept
 Other Distinctions:
o Exclusive versus non-exclusive
o Fuzzy vs non-fuzzy
o Partial versus complete
o Heterogeneous versus homogeneous

Partial Clustering: Hierarchical Clustering:

 K-means Clustering:
o Partitional clustering approach
o Each cluster is associated with a centroid
o Each point is assigned to the cluster with the closest centroid (Euclidean
distance, cosine similarity, etc.)
o Number of clusters, K, must be specified
o Complexity: O (n*k | *d) where n = number of points, k = number of clusters,
I = number of iterations, d = number of attributes
o Choice of initial centroids are extremely important
o Sum of Squared Error (SSE): For every point, the error is the distance to the
nearest cluster

o To reduce SSE, it is important to increase K, the number of clusters

 A good clustering with smaller K can have a lower SSE than a poor
clustering with higher K
o Problem with Selecting Initial Points:
 If there are K ‘real’ clusters then the chance of selecting one centroid
from each cluster is small
 Chance is relatively small when K is large
 If clusters are the small size, n, then

o Solution:
 Multiple runs
 Sample and use hierarchical clustering to determine initial centroids
 Select more than k initial centroids and then select among these initial
centroids: Select more widely separated
 Postprocessing
 Bisecting K-means: Not as susceptible to initialization issues
o Handling Empty Clusters:
 Choose the point that contributes most to SSE
 Choose a point from the cluster with the highest SSE
 If there are several empty clusters, the above can be repeated several
times
o Updating Centres Incrementally after each assignment:
 Each assignment updates zero/2 centroids
 More expensive
 Introduces an order dependency
 Never get an empty cluster
 Can use “weights” to change the impact
o Pre-Processing:
 Normalize the data
 Eliminate outliers
o Post-processing:
 Eliminate small clusters that may represent outliers
 Split ‘loose’ clusters, i.e. clusters with relatively high SSE
 Merge clusters that are ‘close’ and that have relatively low SSE
 Can use these steps during the clustering process: ISODATA (Iterative
Self-Organizing Data Analysis)
Bisecting K-means algorithm: Variant of K-means that can produce a partitional or a
hierarchical clustering

 Limitations:
o Problems when clusters are of differing sizes, densities, non-globular shapes
o Data contains outliers

Hierarchical Clustering

 Produces a set of nested clusters organized as a hierarchical tree

 Space: O(N2) where N is number of points; Time: O(N3) where there are N steps and
at each step the size, N2, proximity matrix must be updated and searched
 Can be visualized as a dendrogram
o A tree like diagram that records that sequences of merges or splits

 Strengths:
o Do not have to assume any particular number of clusters: Any desired
number of clusters can be obtained by ‘cutting’ the dendrogram at the
proper level
o Correspond to meaningful taxonomies
 Types of hierarchical clustering:
o Agglomerative:
 Start with points as individual clusters
 At each step, merge the closest pair of clusters until only one cluster
(or k clusters) left

o Divisive:
 Start with one, all-inclusive cluster
 At each step, split a cluster until each cluster contains a point
 Inter-Cluster Similarity

MIN MAX

Strength: Strength:
- Can handle non-elliptical shapes - Less susceptible to noise and outliers
Limitation: Limitations:
- Sensitive to noise and outliers - Tends to break large clusters
- Biased towards globular clusters
Group Average Distance between Centroids

Strength:
- Less susceptible to noise and outliers
Limitations:
- Biased towards globular clusters
 Cluster Similarity: Ward’s Method
o Based on the increase in squared error when two clusters are merged:
o Less susceptible to noise and outliers
o Biased towards globular clusters
o Hierarchical analogue of K-means

Minimum Spanning Tree (MST): Divisive Hierarchical Clustering

 Build MST
o Start with a tree that consists of any point
o In successive steps, look for the closest pair of points (p, q) such that one
point (p) is in the current tree but the other (q) is not
o Add q to the tree and put an edge between p and q

 Use MST for constructing hierarchy of clusters

Lecture 7 – Cluster Analysis: Alternative Methods

DBSCAN: density-based algorithm

 Density = number of points within a specified radius (Eps)
 A point is a core point if it has more than a specified number of points (MinPts)
within Eps (including itself)
o These are points that are at the interior of a cluster
 A border point is not a core point but is in the neighbourhood of a core point
 A noise point is any point that is not a core point or a border point

 DBSCAN Algorithm
o Eliminate noise points
o Perform clustering on the remaining noise
 Two core points within a specific radius are put into the same cluster
 Border points are added
 To determine EPS and Min Pts:
o For points in a cluster, their kth nearest neighbours are at roughly the same
distance
o Noise points have the kth nearest neighbour at farther distance
o So, plot sorted distance of every point to its kth nearest neighbour

Clustering Using REpresentatives (CURE)

 Representative points are found by selecting a constant number of points from a

cluster and then “shrinking” them toward the centre of the cluster
 Cluster similarity is the similarity of the closest pair of representative points from
different clusters
 Shrinking representative points toward the centre helps avoid problems with noise
and outliers
 CURE is better able to handle clusters of arbitrary shapes and sizes

Graph-Based Clustering
 Graph-Based clustering uses the proximity graph
o Start with the proximity matrix
o Considering each point as a node in a graph
o Each edge between two nodes has a weight which is the proximity between
the two points
 A graph, G = (V, E) representing a set of objects (V) and their relations
(E)
 Initially the proximity graph is fully connected
 MIN (single-link) and MAX (complete-link) can be viewed as starting
with this graph
o Clusters are connected components in the graph
 Clustering may work better
o Sparsification techniques keep the connections to the most similar (nearest)
neighbour of a point while breaking the connections to less similar points
o The nearest neighbours of a point tend to belong to the same class as the
point itself
o This reduces the impact of noise and outliers and sharpens the distinction
between clusters
 Sparsification facilitates the use of graph partitioning algorithms (Chameleon &
Hypergraph-based Clustering)

o Chameleon: Clustering using Dynamic Modelling

 Adapt to the characteristics of the data set to find the natural clusters
 Use a dynamic model to measure the similarity between clusters
 Relatively closeness & Relative inter-connectivity of the cluster
 Two clusters are combined if the resulting cluster shares
certain properties with the constituent clusters
 Merging scheme preserves self-similarity
 Used in spatial data:
 Clusters are defined as densely populated regions of
space
 Clusters have arbitrary shapes, orientation, and non-
uniform sizes
 Difference in densities across clusters and variation in
density within clusters
 Existence of special artefacts and noise
 Steps:
 Pre-processing Step: Represent the Data by a Graph
 Given a set of points, construct the k-nearest-
neighbour graph to capture the relationship between a
point and its k nearest neighbours
 Concept of neighbourhood is captured dynamically
 Phase 1: Use a multilevel graph partitioning algorithm on the
graph to find a large number of clusters of well-connected
vertices
 Each cluster should contain mostly points from one
“true” cluster
 Phase 2: Use Hierarchical Agglomerative Clustering to merge
sub-clusters: Two clusters are combined if the resulting cluster
shares certain properties with the constituent clusters
 Relative Interconnectivity: Absolute interconnectivity
of two clusters normalized by the internal connectivity
of the clusters
 Relative Closeness: Absolute closeness of two clusters
normalized by the internal closeness of the clusters

Shared Nearest Neighbour (SNN) graph: the weight of an edge is the number off shared
neighbours between the vertices given that the vertices are connected

 Jarvis Patrick Clustering:

o K-nearest neighbours of all points are found (breaking all but the k strongest
links from a point to other points in the proximity graph)
o Pair of points is put in the same cluster if:
 Two points share more than T neighbours
 Two points are in each other’s k nearest neighbour list
 SNN Density-based Clustering Algorithm:
o Compute the similarity matrix
o Sparsify the similarity matrix by keeping only the k most similar neighbours
o Construct the shared nearest neighbour graph from the sparsified similarity
matrix
o Find the core points
o Form clusters from the core points
o Discard all the noise points
o Assign all non-noise, non-core points to clusters

Cluster Validation:

 Determining the clustering tendency of a set of data, i.e. distinguishing whether non-
random structure actually exists in the data
 Compare the results of a cluster analysis to externally known results
 Evaluate how well the results of a cluster analysis fit the data without referencing to
external information
 Comparing the results of two different sets of cluster analyses to determine which is
better
 Determining the ‘correct’ number of clusters
 Measures of Cluster Validity:
o External Index: extent that cluster labels match externally supplied class
labels (Entropy)
o Internal Index: goodness of a clustering structure without respect to external
information (Sum of Squared Error)
 Cluster Cohesion: Measures how closely related are objects in a
cluster (e.g. SSE) – sum of weight of all links within a cluster
 Cluster Separation: Measures how distinct or well-separated a cluster
is from other clusters (e.g. Squared Error) – sum of weights between
nodes in the cluster and nodes outside the cluster

 Silhouette Coefficient: combine ideas of both cohesion and separation

o Relative Index: Used to compare two different clustering or cluster

o Correlation:
 Proximity Matrix
 Incidence Matrix: entry = 1 if associated pair of points belong to the
same cluster; else entry = 0
 Compute the correlation between the two matrices
 High correlation indicates that points that belong to the same cluster
are close to each other

Lecture 8 – Association Rule Mining

Associate Rule Mining: Given a set of transactions, find rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction
Frequent Itemset: An itemset whose support is greater than or equal to a minsup threshold
 Itemset: A collection of one or more items (E.g. Milk, Bread, Diaper)
 K-itemset: An itemset that contains k items
Support count: Frequency of occurrence of an itemset
Support: Fraction of transactions that contain an itemset

Chapter 2,3,4
No ratings yet
Chapter 2,3,4
8 pages
Mooc Part 2
No ratings yet
Mooc Part 2
8 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
DM See M4
No ratings yet
DM See M4
8 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
Session 5
No ratings yet
Session 5
36 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Entropy (S) Log (P) : I 1c I I
No ratings yet
Entropy (S) Log (P) : I 1c I I
5 pages
Classification and Clustering Method
0% (1)
Classification and Clustering Method
30 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Clustering
No ratings yet
Clustering
45 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
ML Notes
No ratings yet
ML Notes
12 pages
Slides Courtesy: Ling Chen lchen@L3S.de
No ratings yet
Slides Courtesy: Ling Chen lchen@L3S.de
42 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
ML ModuleUntitled 2
No ratings yet
ML ModuleUntitled 2
8 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Bsan CS
No ratings yet
Bsan CS
1 page
ML IA2v1
No ratings yet
ML IA2v1
18 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Data Analytics & Decision Trees
No ratings yet
Data Analytics & Decision Trees
51 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
69 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Unit 4
No ratings yet
Unit 4
20 pages
Learning AI
No ratings yet
Learning AI
34 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Unit 2
No ratings yet
Unit 2
57 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Clustering
No ratings yet
Clustering
82 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Classification 2
No ratings yet
Classification 2
56 pages
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
No ratings yet
Unit-1 Introduction To Machine Learning: 1. What Is Learning? Learning Data Example
15 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Medical Data Mining with WEKA
No ratings yet
Medical Data Mining with WEKA
4 pages
Excel Solver Upgrade Guide
100% (1)
Excel Solver Upgrade Guide
17 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
RGPV Syllabus Btech Cs 7 Sem cs703 B Data Mining and Warehousing
No ratings yet
RGPV Syllabus Btech Cs 7 Sem cs703 B Data Mining and Warehousing
1 page
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
24 pages
DBMS Notes 2 - TutorialsDuniya
No ratings yet
DBMS Notes 2 - TutorialsDuniya
98 pages
Feature Extraction & PCA Guide
No ratings yet
Feature Extraction & PCA Guide
10 pages
DFT To Machine Learning
No ratings yet
DFT To Machine Learning
75 pages
00 Introduction
No ratings yet
00 Introduction
14 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
AI Diabetes Prediction in UAE
No ratings yet
AI Diabetes Prediction in UAE
21 pages
A Rapid Review of Clustering Algorithms
No ratings yet
A Rapid Review of Clustering Algorithms
14 pages
Nursing Informatics Chapter 47
No ratings yet
Nursing Informatics Chapter 47
20 pages
ML Mid Papers
No ratings yet
ML Mid Papers
3 pages
CH 6 Web Mining and Other Data Mining
No ratings yet
CH 6 Web Mining and Other Data Mining
19 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (2)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
RTMNU Mtech CSE 1st Sem Syllabus
No ratings yet
RTMNU Mtech CSE 1st Sem Syllabus
5 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
Social Media Data Mining Survey
No ratings yet
Social Media Data Mining Survey
23 pages
Marketing Analytics
No ratings yet
Marketing Analytics
14 pages
Sem 4 Important Question
No ratings yet
Sem 4 Important Question
13 pages
Business Analytics & Decision Making
No ratings yet
Business Analytics & Decision Making
25 pages
Analytical CRM
No ratings yet
Analytical CRM
26 pages
Unit 3
No ratings yet
Unit 3
131 pages
Data Mining in Cloud Computing: Ruxandra-Ştefania PETRE
No ratings yet
Data Mining in Cloud Computing: Ruxandra-Ştefania PETRE
5 pages
CiteSpace Guidlines
No ratings yet
CiteSpace Guidlines
21 pages
Special Education Literature Review
100% (1)
Special Education Literature Review
7 pages
BRI405B
No ratings yet
BRI405B
2 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
Apriori Algorithm for Rule Mining
No ratings yet
Apriori Algorithm for Rule Mining
22 pages

CZ4032 Data Analytics & Mining Notes

Uploaded by

CZ4032 Data Analytics & Mining Notes

Uploaded by

Lecture 4 – Classification: Alternative Techniques

 Store the training records (without training explicit models)

o Determine the class from nearest neighbour list:

Bayesian Classifier: A probabilistic framework for classification problems

Estimate Probabilities from Data:

 For continuous attributes:

 If the problem is not linearly separable:

Nonlinear Support Vector Machines

 Transform data into higher dimensional space

 Perceptron Model is an assembly of inter-connected nodes and weighted links

 Algorithm for learning ANN:

 Find the weights wi that minimize the above objective function

 Construct a set of classifiers from the training data

Partial Clustering: Hierarchical Clustering:

o To reduce SSE, it is important to increase K, the number of clusters

 Produces a set of nested clusters organized as a hierarchical tree

Minimum Spanning Tree (MST): Divisive Hierarchical Clustering

 Use MST for constructing hierarchy of clusters

DBSCAN: density-based algorithm

Clustering Using REpresentatives (CURE)

 Representative points are found by selecting a constant number of points from a

o Chameleon: Clustering using Dynamic Modelling

 Jarvis Patrick Clustering:

 Silhouette Coefficient: combine ideas of both cohesion and separation

o Relative Index: Used to compare two different clustering or cluster

Lecture 8 – Association Rule Mining

You might also like