0% found this document useful (0 votes)

8 views3 pages

Clustering

notes of clustering

Uploaded by

Aatish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Clustering

notes of clustering

Uploaded by

Aatish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Clustering

Clustering is an unsupervised learning technique used to group similar data points into clusters based on their
features.

1. Mixture Densities

 Definition: A statistical model that represents a distribution as a combination of multiple component

distributions, each associated with a different cluster.
 Example: A Gaussian mixture model (GMM) where data points can belong to multiple clusters with
different probabilities.

2. k-Means Clustering

 Definition: A popular clustering algorithm that partitions data into k clusters by minimizing the variance
within each cluster.
 Algorithm Steps:
1. Initialization: Choose k initial centroids randomly.
2. Assignment: Assign each data point to the nearest centroid.
3. Update: Recalculate the centroids as the mean of all points in the cluster.
4. Repeat: Continue until the assignments no longer change or a maximum number of iterations is
reached.
 Advantages: Simple, fast, and easy to implement.
 Disadvantages: Requires pre-specification of k, sensitive to initial centroid positions, and can converge
to local minima.

3. Expectation-Maximization (EM)

 Definition: A statistical technique for finding maximum likelihood estimates in models with latent
variables.
 Process:
1. Expectation Step (E-step): Estimate the expected value of the log-likelihood function, given the
current parameters.
2. Maximization Step (M-step): Maximize this expectation to update the parameters.
 Application: Often used with Gaussian mixture models to cluster data where clusters have different
shapes and orientations.

4. Mixtures of Latent Variable Models

 Definition: Models that assume the data is generated from a mixture of several latent (unobserved)
variables.
 Use Case: Useful for modeling complex data distributions and can represent hierarchies or interactions
between observed variables.

5. Supervised Learning after Clustering

 Definition: Applying supervised learning techniques to the clustered data to enhance prediction models.
 Process: After clustering, labels can be assigned to clusters, and these labels can be used as features in
supervised learning algorithms, improving model performance.
6. Hierarchical Clustering

 Definition: A clustering method that builds a hierarchy of clusters, either agglomeratively (bottom-up)
or divisively (top-down).
 Types:
o Agglomerative: Starts with each data point as its own cluster and merges them based on
distance.
o Divisive: Starts with one cluster and recursively splits it into smaller clusters.
 Dendrogram: A tree-like diagram that shows the arrangement of clusters based on distance or
similarity.

Decision Trees

Decision trees are a supervised learning technique used for classification and regression tasks. They model
decisions and their possible consequences in a tree-like structure.

1. Univariate Trees

 Definition: Trees that make decisions based on a single feature at each node.
 Construction: Each node represents a feature, and branches represent the decision based on that feature.

2. Classification Tree

 Purpose: Used for classifying data into distinct categories.

 Process:
1. Choose the best feature to split the data based on a criterion (e.g., Gini impurity, information
gain).
2. Repeat recursively until stopping criteria are met (e.g., maximum depth, minimum samples per
leaf).

3. Regression Trees

 Purpose: Used for predicting continuous outcomes.

 Structure: Similar to classification trees but splits the data to minimize the variance in the target
variable.

4. Pruning

 Definition: The process of removing sections of the tree that provide little predictive power to avoid
overfitting.
 Methods:
o Cost Complexity Pruning: Balances the tree's size and its accuracy.
o Reduced Error Pruning: Evaluates the effect of removing branches based on validation data.
5. Rule Extraction from Trees

 Definition: Deriving human-readable rules from decision trees.

 Process: Each path from the root to a leaf can be expressed as a rule that describes the conditions under
which a particular class is predicted.

6. Learning Rules from Data

 Definition: Extracting general rules or patterns from data, often using algorithms like Apriori or FP-
Growth for association rule mining.

7. Multivariate Trees

 Definition: Decision trees that can split on multiple features simultaneously, often using techniques like
CART (Classification and Regression Trees).
 Advantages: Can capture more complex relationships between features compared to univariate trees.

Summary

 Clustering techniques like k-means, EM, and hierarchical clustering help identify patterns in data
without prior labels, useful for exploratory analysis and preprocessing for supervised learning.
 Decision Trees are powerful tools for both classification and regression, employing simple, interpretable
structures to model complex relationships in the data. Techniques such as pruning and rule extraction
enhance their usability.

Unit 2
No ratings yet
Unit 2
57 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
60 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
M.L. 3,5,6 Unit 3
No ratings yet
M.L. 3,5,6 Unit 3
6 pages
Data Science and ML Concepts
No ratings yet
Data Science and ML Concepts
4 pages
Fifth Chapter Classification Clustering
No ratings yet
Fifth Chapter Classification Clustering
16 pages
A. Decision Trees: o o o o o
No ratings yet
A. Decision Trees: o o o o o
3 pages
Chapter 7
No ratings yet
Chapter 7
3 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
Detailed Clustering in Machine Learning Notes
No ratings yet
Detailed Clustering in Machine Learning Notes
4 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Unit IV Unsupervised Learning
No ratings yet
Unit IV Unsupervised Learning
4 pages
Data Science Techniques Overview
No ratings yet
Data Science Techniques Overview
1 page
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Clustering
No ratings yet
Clustering
11 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Clustering
No ratings yet
Clustering
8 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Unit 6
No ratings yet
Unit 6
22 pages
Unit 4
No ratings yet
Unit 4
29 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
30 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Asynchronous Task Cluster Analysis
No ratings yet
Asynchronous Task Cluster Analysis
2 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Clustering in Machine Learning Notes
No ratings yet
Clustering in Machine Learning Notes
2 pages
ML Notes
No ratings yet
ML Notes
12 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
5 pages
Slides Courtesy: Ling Chen lchen@L3S.de
No ratings yet
Slides Courtesy: Ling Chen lchen@L3S.de
42 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
Classification
No ratings yet
Classification
32 pages
Clustering
No ratings yet
Clustering
45 pages
Clustering
No ratings yet
Clustering
6 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Cluster
No ratings yet
Cluster
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Mooc Part 2
No ratings yet
Mooc Part 2
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
3 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Clustering
No ratings yet
Clustering
3 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Data Mining Tasks and Techniques
No ratings yet
Data Mining Tasks and Techniques
3 pages
Assignment 2nd DMDW
No ratings yet
Assignment 2nd DMDW
11 pages
Data Mining - 5
No ratings yet
Data Mining - 5
4 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Unit 4
No ratings yet
Unit 4
62 pages
FDS Unit-5 Tutorial
No ratings yet
FDS Unit-5 Tutorial
5 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Unsupervised Learning: Niveditha. GH
No ratings yet
Unsupervised Learning: Niveditha. GH
10 pages
E&H SIL Poster
No ratings yet
E&H SIL Poster
1 page
Technote-HowToDecodeOpt82 v1
No ratings yet
Technote-HowToDecodeOpt82 v1
4 pages
Small Business Color Printer Guide
No ratings yet
Small Business Color Printer Guide
2 pages
REview Form
No ratings yet
REview Form
5 pages
DZJournal Special Issue For 2019 National Literature Month.
No ratings yet
DZJournal Special Issue For 2019 National Literature Month.
24 pages
Moneris PCI DSS Checklist 100716
No ratings yet
Moneris PCI DSS Checklist 100716
3 pages
Quarter 2 Arts 10 Module 4
No ratings yet
Quarter 2 Arts 10 Module 4
20 pages
Katalog Agra Jaya 2022
No ratings yet
Katalog Agra Jaya 2022
41 pages
Pic
100% (1)
Pic
71 pages
T300MVi-MTX EOI Manual
No ratings yet
T300MVi-MTX EOI Manual
56 pages
Canoga 9145
No ratings yet
Canoga 9145
2 pages
Fibre Optics
No ratings yet
Fibre Optics
57 pages
Advanced Driver Assistance Systems
No ratings yet
Advanced Driver Assistance Systems
11 pages
Using Database Partitioning With Oracle E-Business Suite (Doc ID 554539.1)
No ratings yet
Using Database Partitioning With Oracle E-Business Suite (Doc ID 554539.1)
39 pages
IMADA Catalog Contact Force Measurement
No ratings yet
IMADA Catalog Contact Force Measurement
16 pages
EN25F80 8 Megabit Serial Flash Memory With 4kbytes Uniform Sector
No ratings yet
EN25F80 8 Megabit Serial Flash Memory With 4kbytes Uniform Sector
32 pages
Flow, - Mass Flow, - Level, - Pressure, - Conductivity, - pH-Sensor, - Viscosity, - Humidity
No ratings yet
Flow, - Mass Flow, - Level, - Pressure, - Conductivity, - pH-Sensor, - Viscosity, - Humidity
40 pages
Goldman Sachs Cover Letter Advice
100% (2)
Goldman Sachs Cover Letter Advice
7 pages
BTEC Assignment Brief: (For NQF Only)
No ratings yet
BTEC Assignment Brief: (For NQF Only)
4 pages
Applsci 12 08252
No ratings yet
Applsci 12 08252
20 pages
Manual Rotuladora Kroy K4100
No ratings yet
Manual Rotuladora Kroy K4100
59 pages
PTE - (Full Presentation)
0% (1)
PTE - (Full Presentation)
64 pages
Cloud Storage and Local Storage
100% (2)
Cloud Storage and Local Storage
15 pages
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
No ratings yet
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
15 pages
Practical 3linux Practical For B.tech Student
No ratings yet
Practical 3linux Practical For B.tech Student
6 pages
KIRAN's Resume
No ratings yet
KIRAN's Resume
1 page
Automatic Differentiation in Solid Mechanics
No ratings yet
Automatic Differentiation in Solid Mechanics
7 pages
SATEC Catalog
No ratings yet
SATEC Catalog
28 pages
PC Assembly & Disassembly Guide
No ratings yet
PC Assembly & Disassembly Guide
27 pages
Johanna Silva MCS3002273 Task1
No ratings yet
Johanna Silva MCS3002273 Task1
10 pages

Clustering

Uploaded by

Clustering

Uploaded by

Clustering

 Definition: A statistical model that represents a distribution as a combination of multiple component

4. Mixtures of Latent Variable Models

5. Supervised Learning after Clustering

 Purpose: Used for classifying data into distinct categories.

 Purpose: Used for predicting continuous outcomes.

 Definition: Deriving human-readable rules from decision trees.

6. Learning Rules from Data

You might also like