Data Mining - 5

Cluster analysis is essential in data mining for grouping similar objects, requiring scalability, adaptability to various data types, and robustness against noise. Key clustering methods include partitioning, hierarchical, and density-based approaches, each with unique characteristics and limitations. The k-means algorithm is a popular partitioning method that iteratively assigns objects to clusters based on similarity to cluster means.

Uploaded by

Misba firdose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

Data Mining - 5

Uploaded by

Misba firdose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data mining – 5

Requirements for Cluster Analysis

Cluster analysis is a crucial task in data mining that helps in grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar to each other than to those
in other groups (clusters). Below are the key requirements for effective clustering:

1. Scalability:
o Issue: Many clustering algorithms perform well on small datasets but struggle
with large datasets containing millions or billions of objects.
o Requirement: Algorithms need to be highly scalable to handle large databases
effectively.
2. Ability to Deal with Different Types of Attributes:
o Issue: Traditional algorithms are often designed for numeric data.
o Requirement: Algorithms should be capable of clustering different data types
such as binary, nominal (categorical), ordinal data, and complex data types like
graphs, sequences, images, and documents.
3. Discovery of Clusters with Arbitrary Shape:
o Issue: Many algorithms assume clusters are spherical and of similar size and
density.
o Requirement: Algorithms should detect clusters of arbitrary shapes, useful in
applications like environmental surveillance where phenomena may have
irregular boundaries.
4. Requirements for Domain Knowledge to Determine Input Parameters:
o Issue: Many algorithms need predefined parameters like the number of clusters,
which can be hard to determine.
o Requirement: Algorithms should minimize the need for user-specified input
parameters to reduce user burden and improve clustering quality.
5. Ability to Deal with Noisy Data:
o Issue: Real-world data often contains noise and errors, which can degrade
clustering quality.
o Requirement: Robust clustering methods that can handle noisy data and outliers
are needed.
6. Incremental Clustering and Insensitivity to Input Order:
o Issue: Some algorithms can't update clusters incrementally and are sensitive to the
order of data input.
o Requirement: Algorithms should support incremental updates and be insensitive
to the order in which data is presented.
7. Capability of Clustering High-Dimensional Data:
o Issue: High-dimensional data, such as documents with thousands of keywords, is
challenging to cluster.
o Requirement: Algorithms should effectively handle high-dimensional data,
considering the sparsity and skewness of such data.
8. Constraint-Based Clustering:
o Issue: Real-world applications may have specific constraints, such as
geographical barriers or customer types.
o Requirement: Algorithms should perform clustering under various constraints to
meet real-world needs.
9. Interpretability and Usability:
o Issue: Users need clustering results that are understandable and practical for their
specific applications.
o Requirement: Clustering results should be interpretable and usable, with clear
semantic meaning and relevance to the application goals.

Overview of Basic Clustering Methods

Clustering methods are essential tools in data mining, used to group a set of objects into clusters,
such that objects in the same cluster are more similar to each other than to those in other clusters.
The main categories of clustering methods are:

1. Partitioning Methods:
o Concept: Partitioning methods divide a dataset into k groups (partitions), where
each group represents a cluster, and each cluster must contain at least one object.
o Process: These methods typically use an iterative relocation technique to improve
partitioning by moving objects between groups.
o Criteria: The quality of partitioning is judged based on how close objects in the
same cluster are to each other and how far apart objects in different clusters are.
o Techniques: Common techniques include k-means and k-medoids, which are
heuristic methods aimed at finding local optima for clustering.
o Extensions: These methods can be extended for subspace clustering to handle
sparse data in high-dimensional spaces.
o Limitations: Achieving global optimality is often computationally prohibitive,
and these methods generally find spherical-shaped clusters.
2. Hierarchical Methods:
o Concept: Hierarchical methods create a tree-like (hierarchical) decomposition of
the dataset, which can be either agglomerative (bottom-up) or divisive (top-
down).
o Agglomerative Approach: Starts with each object as a separate cluster and
merges the closest clusters iteratively until all objects are in one cluster or a
termination condition is met.
o Divisive Approach: Starts with all objects in one cluster and splits them
iteratively until each object is in its own cluster or a termination condition is met.
o Techniques: These methods can be based on distance, density, or continuity.
o Limitations: Once a merge or split is done, it cannot be undone, leading to
potential errors that cannot be corrected. However, methods to improve the
quality of hierarchical clustering exist.
3. Density-Based Methods:
o Concept: These methods form clusters based on the density of data points in a
region, continuing to grow a cluster as long as the density in the neighborhood
exceeds a certain threshold.
o Process: For each data point in a cluster, the neighborhood within a given radius
must contain at least a minimum number of points.
o Techniques: These methods are effective in filtering out noise and discovering
clusters of arbitrary shape.
o Applications: Density-based methods can create multiple exclusive clusters or a
hierarchy of clusters, and they can be extended to subspace clustering.
o Limitations: Typically, these methods consider only exclusive clusters and may
not handle fuzzy clusters well.

Partitioning Methods

Partitioning methods are a fundamental approach to clustering in data mining. They organize a
set of objects into k clusters, where each cluster is represented by a single object or a summary of
the objects (such as the mean). The key idea is to partition the data in such a way that objects
within the same cluster are more similar to each other than to those in other clusters.

Key Characteristics

 Exclusive Clusters: Each object belongs to exactly one cluster.

 Fixed Number of Clusters: The number of clusters (k) is predefined.
 Iterative Optimization: These methods typically involve iterative refinement to improve
clustering quality.

Algorithm: k-means

Input:

k: the number of clusters

D: a dataset containing n objects

Output:

A set of k clusters

Method:
1. Arbitrarily choose k objects from D as the initial cluster centers.

2. Repeat

a. (Re)assign each object to the cluster to which the object is the most similar, based on the mean
value of the objects in the cluster.

b. Update the cluster means, that is, calculate the mean value of the objects for each cluster.

3. Until no change in cluster assignments.

Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Unit VII
No ratings yet
Unit VII
30 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Module V
No ratings yet
Module V
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Unit 4
No ratings yet
Unit 4
21 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
Unit 4
No ratings yet
Unit 4
4 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
15 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Clustering
No ratings yet
Clustering
8 pages
Unit-V (Dmwh6em)
No ratings yet
Unit-V (Dmwh6em)
30 pages
Unit 5
No ratings yet
Unit 5
27 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Clustering
No ratings yet
Clustering
7 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Clustering
No ratings yet
Clustering
11 pages
Clustering
No ratings yet
Clustering
41 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Unit 4-DWDM
No ratings yet
Unit 4-DWDM
23 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
Chapter 7
No ratings yet
Chapter 7
3 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Data Mining Assignment 2
No ratings yet
Data Mining Assignment 2
25 pages
Clustering Insights for Data Analysts
No ratings yet
Clustering Insights for Data Analysts
4 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
DM 3rd Unit
No ratings yet
DM 3rd Unit
5 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
12 pages
Cluster Analysis in Data Mining
No ratings yet
Cluster Analysis in Data Mining
36 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
Big Data Clustering Techniques
No ratings yet
Big Data Clustering Techniques
28 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Big Data Techniques of 2025
No ratings yet
Big Data Techniques of 2025
31 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Mca210 Rdbms Unit-4 Even 2022
No ratings yet
Mca210 Rdbms Unit-4 Even 2022
68 pages
24mca130 Rdbms Unit-2 Odd 2024
No ratings yet
24mca130 Rdbms Unit-2 Odd 2024
70 pages
Research Unit 3
No ratings yet
Research Unit 3
11 pages
Hormones and Their Classification
No ratings yet
Hormones and Their Classification
4 pages
Computer Vision
No ratings yet
Computer Vision
47 pages
Mesacc - Edu-Solving Polynomial Inequalities
No ratings yet
Mesacc - Edu-Solving Polynomial Inequalities
4 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Algorithm Basics for Beginners
0% (1)
Algorithm Basics for Beginners
2 pages
An Introduction To Numerical Methods and Analysis Using MATLAB
No ratings yet
An Introduction To Numerical Methods and Analysis Using MATLAB
325 pages
Chapter 2 Solutions for Automata Theory
No ratings yet
Chapter 2 Solutions for Automata Theory
6 pages
Chapter 3 Uninformed Search
No ratings yet
Chapter 3 Uninformed Search
44 pages
Mid Term Question Paper AI Soln v2
No ratings yet
Mid Term Question Paper AI Soln v2
6 pages
New Syllabus
No ratings yet
New Syllabus
2 pages
Lecture 3 - Complexity Analysis Cont
No ratings yet
Lecture 3 - Complexity Analysis Cont
19 pages
Complier Design WINTER 2021 PAPER SOLUTION
No ratings yet
Complier Design WINTER 2021 PAPER SOLUTION
19 pages
Doubly Linked List Algorithm
No ratings yet
Doubly Linked List Algorithm
3 pages
Artificial Intelligence Paper Solution May 2019
No ratings yet
Artificial Intelligence Paper Solution May 2019
21 pages
Paper:: JAVA Sample Question Bank (Solved)
No ratings yet
Paper:: JAVA Sample Question Bank (Solved)
114 pages
Analysis & Design of Algorithms (3466)
No ratings yet
Analysis & Design of Algorithms (3466)
4 pages
Ass4 - Hamming Code
No ratings yet
Ass4 - Hamming Code
5 pages
Q1. WAP To Show Concept of Polymorphism To Calculate The Area of Circle, Triangle, Square
No ratings yet
Q1. WAP To Show Concept of Polymorphism To Calculate The Area of Circle, Triangle, Square
35 pages
Problem Statement - Sever Farm Power Optimization With Conflicts
No ratings yet
Problem Statement - Sever Farm Power Optimization With Conflicts
2 pages
Compiler Design Book Final
No ratings yet
Compiler Design Book Final
75 pages
C++ Data Types: Primitive Data Types : These Data Types Are Built-In or Predefined Data Types and
No ratings yet
C++ Data Types: Primitive Data Types : These Data Types Are Built-In or Predefined Data Types and
16 pages
Class XI Computer Science Lesson 15 Logic Gates Part 2 Session 2023-'24
No ratings yet
Class XI Computer Science Lesson 15 Logic Gates Part 2 Session 2023-'24
10 pages
DMGT Imp Questions
No ratings yet
DMGT Imp Questions
3 pages
AI-Based Bearing Fault Diagnosis
No ratings yet
AI-Based Bearing Fault Diagnosis
14 pages
Huffman Encoding for Engineers
No ratings yet
Huffman Encoding for Engineers
5 pages
Radix Sort: Non-Comparative Algorithm
No ratings yet
Radix Sort: Non-Comparative Algorithm
46 pages
PDC Assigment - 01
No ratings yet
PDC Assigment - 01
5 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
12 pages
Insem Paper AI
No ratings yet
Insem Paper AI
1 page
6 1+day+1+worksheet
No ratings yet
6 1+day+1+worksheet
7 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
CD3291 Data Structures and Algorithms Lecture Notes 1
No ratings yet
CD3291 Data Structures and Algorithms Lecture Notes 1
162 pages

Data Mining - 5

Uploaded by

Data Mining - 5

Uploaded by

Data mining – 5

Requirements for Cluster Analysis

Overview of Basic Clustering Methods

 Exclusive Clusters: Each object belongs to exactly one cluster.

k: the number of clusters

D: a dataset containing n objects

3. Until no change in cluster assignments.

You might also like