0% found this document useful (0 votes)

122 views13 pages

Module5 DMW

This document discusses module 5 of a syllabus covering association rule mining concepts and algorithms. It introduces association rule mining, the Apriori algorithm for finding frequent itemsets, and generating association rules. The Apriori algorithm uses a level-wise search approach to iteratively find candidate itemsets that meet minimum support, pruning unpromising candidates at each level based on the Apriori property.

Uploaded by

Sreenath Sree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views13 pages

Module5 DMW

Uploaded by

Sreenath Sree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module 5: Syllabus

Association Rules Mining: Concepts, Apriori and FP-Growth Algorithm.

Cluster Analysis: Introduction, Concepts, Types of data in cluster analysis, Categorization of clustering
methods.
Partitioning method: K-Means and K-Medoid Clustering

Association Rules Mining: Concepts

Association rule mining finds interesting association or correlation relationships among a
large set of data items. With massive amounts of data continuously being collected and stored in
databases, many industries are becoming interested in mining association rules from their databases. For
example, the discovery of interesting association relationships among huge amounts of business
transaction records can help catalog design, cross-marketing, lossleader analysis, and other business
decision making processes.

A typical example of association rule mining is market basket analysis. This process analyzes
customer buying habits by finding associations between the different items that customers place in their
“shopping baskets" (Figure 6.1). The discovery of such associations can help retailers develop marketing
strategies by gaining insight into which items are frequently purchased together by customers. For
instance, if customers are buying milk, how likely are they to also buy bread (and what kind of bread) on
the same trip to the supermarket? Such information can lead to increased sales by helping retailers to do
selective marketing and plan their shelf space. For instance, placing milk and bread within close proximity
may further encourage the sale of these items together within single visits to the store.

Basic concepts
1. Transaction database
 Let I = {I1, I2,..., Im} be an itemset {SET OF ITEMS} .
 Let D, the task-relevant data, be a set of database transactions T where each transaction ti is a
nonempty item set such that ti ⊆ I.
 Transaction database, T={t1, t2,…….,tn}
 A transaction database T contains “n” transaction.
 Each transaction is associated with an identifier, called a TID.

Fig: Sample Transaction database

2. Association rule
An association rule is an implication of the form X ⇒ Y, where X⇒ I, Y ⇒ I, and X∩Y = φ.

 Meaning that whenever X appears in the Transaction Y also tends to appear.

 X and Y are single item or set of Items.
 Example: {Milk} ⇒ {Bread},{Computer, CD Drive}⇒ {CD}
 Usually we prefer association rules that satisfies required support and confidence.

3. Support, Confidence and Lift

 Support and confidence are used to measure interestingness of the rule
 Support(X) is the number of times X appears in the transaction divided by total number of
transaction.

Support(X) =(No. of times X appears)/( total number of transaction)=P(X)

Support(X,Y) =(No. of times X and Y appears together)/( total number of transaction)=P(X∩Y)

 Confidence of the association rule X ⇒ Y is defined as the ratio of support of X and Y together to the
support of X

Confidence ( X ⇒ Y)= ( support of X and Y together)/( support of X)= P(X∩Y)/ P(X)= P(Y|X)

 Lift is used to measure the power of association between items that are purchased together.

Lift( X ⇒ Y)= P(X∩Y)/ P(X)= P(Y|X)/ P(Y)

4. Frequent Items
Items that frequently appeared in the transactions are called Frequent Items. Usually we select
frequent items based on minimum support count.

Association Rules Mining: Algorithms

In general, association rule mining can be viewed as a two-step process:

Step 1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count.

2. Generate strong association rules from the frequent itemsets: By definition, these rules
must satisfy minimum support and minimum confidence.

1. Apriori Algorithm
Apriori algorithm consist of two parts

Part 1: Find all frequent item sets: Item sets that exceeds minimum support

Part 2: Generating strong association rules from frequent itemsets ( Association rule that meet
minimum confidence)

PART 1: Finding frequent itemsets

To improve the efficiency of the level-wise generation of frequent itemsets, an important property
called the Apriori property is used to reduce the search space.

The Apriori property. All non empty subsets of a frequent itemset must also be frequent.
This property belongs to a special category of properties called anti-monotone in the sense that if a
set cannot pass a test, all of its supersets will fail the same test as well.

Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to
explore (k+1)-itemsets.

First, the set of frequent 1-itemsets is found. This set is denoted L1. L1 is used to fnd L2, the
frequent 2-itemsets, which is used to fnd L3, and so on, until no more frequent k-itemsets can be found.
The finding of each Lk requires one full scan of the database.

A two-step process is followed to find frequent items consisting of join and prune actions.
2. The prune step: Ck is a superset of Lk , that is, its members may or may not be frequent, but all of the
frequent k-itemsets are included in Ck . A database scan to determine the count of each candidate in
Ck would result in the determination of Lk (i.e., all candidates having a count no less than the
minimum support count are frequent by definition, and therefore belong to Lk). Ck , however, can be
huge, and so this could involve heavy computation. To reduce the size of Ck , the Apriori property is
used as follows. Any (k − 1)-itemset that is not frequent cannot be a subset of a frequent k-itemset.
Hence, if any (k − 1)-subset of a candidate k-itemset is not in Lk−1, then the candidate cannot be
frequent either and so can be removed from Ck .

PART 2: Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found, it is straightforward to
generate strong association rules from them (where strongassociation rules satisfy both minimum
support and minimum conﬁdence). This can be done using Equation for conﬁdence.

Confidence ( X ⇒ Y)= ( support of X and Y together)/( support of X)= P(X∩Y)/ P(X)= P(Y|X)

.
Example Problem 1:

Based on the AllElectronics transaction database, D, of Table 6.1. Find association rule present in
the database by considering Minimum support of 20% and confidence of 75%?.

Solution:

There are nine transactions in this database, that is, n = 9.

Set of Items, I={I1,I2,I3,I4,I5}

Part 1: Frequent Item Set generation

Step 1- E item is a member of the set of candidate 1-itemsets, C1. The algorithm simply scans all of the
transactions to count the number of occurrences of each item.

Step 2- The set of frequent 1-itemsets, L1, can then be determined Minimum support count required is 2,
that is, min sup = 2. (Here, support is 2/9 = 22%.). All Items in C1 are qualified
Step 3- To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 ⋈ L1 to generate a
candidate set of 2-itemsets, C2.

Step 4. Next, the transactions in D are scanned and the support count of each candidate itemset in C2 is
calculated

Step 5: The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets in
C2 having minimum support.

Step 6. The generation of the set of the candidate 3-itemsets, C3. From the join step, we first get C3 = L2 ⋈
L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}

C3
Items
{I1, I2, I3},
{I1, I2, I5},
{I1, I3, I5},
{I2, I3, I4},
{I2, I3, I5}
{I2, I4, I5}

Step 7. The transactions in D are scanned to determine support count of C3

C3
Items Sup count
{I1, I2, I3}, 2
{I1, I2, I5}, 2
{I1, I3, I5}, 1
{I2, I3, I4}, 0
{I2, I3, I5} 1
{I2, I4, I5} 0
Step 8: Find 3 frequent item set, L3. ( candidate 3-itemsets in C3 having minimum support.)

Step 9: Generation of the set of the candidate 4-itemsets, C4. From the join step, we first get C4 = L3 ⋈ L3

C4
Items Sup count
{I1, I2, I3,I5 }, 1

Step 10: Find 4 frequent item set, L4. ( C3 having minimum support)

NULL list. Stop Frequent Item set generation.

PART 2: Generating Association Rules from Frequent Item sets

 Minimum confidence required 75 %

 Confidence ( X ⇒ Y)= ( support of X and Y together)/( support of X)= P(X∩Y)/ P(X)= P(Y|X)

Consider L3={{I1, I2, I3},{I1, I2, I5}}

Association rules formed from frequent Item set {I1, I2, I3},

I1^I2⇒ I3, conﬁdence = 2/4 = 50%

I1Î3⇒ I2, confidence = 2/4 = 50%
I2Î3⇒ I1, confidence = 2/4 = 50%
I1⇒ I2Î3, confidence = 2/6 = 33%
I2⇒ I1Î3, confidence = 2/7 = 29%
I3⇒ I1Î2, confidence = 2/6 = 33%
Association rules formed from frequent Item set {I1, I2, I5},
Association rules that satisfy minimum confidence are {I1Î5⇒ I2 ,I2Î5⇒ I1, I5⇒ I1Î2}

Major drawbacks of Apriori Algorithm

1. The number of candidate itemset grows quickly and result in large candidate set. For example, if
there are 104 frequent 1-itemsets (L1), the Apriori algorithm will need to generate more than 107
candidate 2-itemsets(C2).
2. The Apriori algorithm requires many scans of the database.

FP-growth Association Rule Mining Algorithm

“Can we design a method that mines the complete set of frequent itemsets without such a costly
candidate generation process?”

An interesting method in this attempt is called frequent pattern growth, or simply FP-growth, which
adopts a divide-and-conquer strategy as follows.

First, it compresses the database representing frequent items into a frequent pattern tree, or FP-tree,
which retains the itemset association information. It then divides the compressed database into a set of
conditional databases (a special kind of projected database), each associated with one frequent item or
“pattern fragment,” and mines each database separately. For each “pattern fragment,” only its associated
data sets need to be examined. Therefore, this approach may substantially reduce the size of the data sets
to be searched, along with the “growth” of patterns being examined.

Note: FPgrowth algorithm find set of frequent itemsets without candidate generation process

Advantages of FP growth algorithm:-

1. Faster than apriori algorithm

2. No candidate generation

3. Only two passes over dataset

Disadvantages of FP growth algorithm:-

1. FP tree may not fit in memory

2. FP tree is expensive to build

Fp growth algorithm example

FP-growth Association Rule Mining Algorithm

Algorithm: FP growth. Mine frequent itemsets using an FP-tree by pattern fragment growth.

Input: D- a transaction database; min_sup- the minimum support count threshold.

Output: The complete set of frequent patterns.

Example Problem:

Based on the AllElectronics transaction database, D, of Table 6.1. Find frequent pattern present in
the database using FP Growth Algorithm by considering minimum support of 20% and confidence of
75%?.
Solution:

FP-Tree corresponding to given Dataset is shown in below figure

Frequent patterns generated from the given Dataset is shown in below figure
What is Cluster Analysis?

 Cluster: A collection of data objects

o similar (or related) to one another within the same group
o dissimilar (or unrelated) to the objects in other groups
o
 Cluster analysis (or clustering, data segmentation, …)
o Finding similarities between data according to the characteristics found in the data and
grouping similar data objects into clusters
 Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by
examples: supervised)

 Typical applications of Cluster analysis

o As a stand-alone tool to get insight into data distribution
o As a preprocessing step for other algorithms

Clustering for Data Understanding and Applications

 Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
 Information retrieval: document clustering
 Land use: Identification of areas of similar land use in an earth observation database
 Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
 City-planning: Identifying groups of houses according to their house type, value, and
geographical location
 Earth-quake studies: Observed earth quake epicenters should be clustered along continent
faults
 Climate: understanding earth climate, find patterns of atmospheric and ocean
 Economic Science: market research

Clustering as a Preprocessing Tool (Utility)

 Summarization:
o Preprocessing for regression, PCA, classification, and association analysis
 Compression:
o Image processing: vector quantization
 Finding K-nearest Neighbors
o Localizing search to one or a small number of clusters
 Outlier detection
o Outliers are often viewed as those “far away” from any cluster

Quality: What Is Good Clustering?

 A good clustering method will produce high quality clusters

o high intra-class similarity: cohesive within clusters
o low inter-class similarity: distinctive between clusters

 The quality of a clustering method depends on

o the similarity measure used by the method
o its implementation, and
o Its ability to discover some or all of the hidden patterns

Major Clustering Approaches

In general, the major clustering methods can be classiﬁed into the following categories.
 Partitioning Methods
 Hierarchical Methods
 Density-Based Methods
 Grid-Based Methods

 Partitioning approach:
 Construct various partitions and then evaluate them by some criterion, e.g., minimizing the
sum of square errors
 Given a database of n objects, a partitioning method constructs k partitions of the data object,
where each partition represents a cluster .
 That is, it clusters the data into k groups, which together satisfy the following requirements:
(1) each group must contain at least one object, and
(2) Each object must belong to exactly one group
 Typical methods: k-means, k-medoids, CLARANS

 Hierarchical approach:

o A hierarchical method creates a hierarchical decomposition of the given set of data objects.

o A hierarchical method can be classiﬁed as being either agglomerative or divisive

o The agglomerative approach, also called the bottom-up approach, starts with each object
forming a separate group. It successively merges the objects or groups that are close to one
another, until all of the groups are merged into one (the topmost level of the hierarchy), or
until a termination condition holds.

o The divisive approach, also called the top-down approach, starts with all of the objects in the
same cluster. In each successive iteration, a cluster is split up into smaller clusters, until
eventually each object is in one cluster, or until a termination condition holds.

o Typical methods: Diana, Agnes, BIRCH, CAMELEON

 Density-based approach:

o Most partitioning methods cluster objects based on the distance between objects. Such
methods can ﬁnd only spherical-shaped clusters and encounter difﬁculty at discovering
clusters of arbitrary shapes.

o Density-based clustering methods have been developed based on the notion of density.

o Based on connectivity and density functions

o Typical methods: DBSACN, OPTICS, DenClue

 Grid-based approach:
o Grid-based methods quantize the object space into a ﬁnite number of cells that form a
grid structure.

o All of the clustering operations are performed on the grid structure (i.e., on the quantized
space).

o The main advantage of this approach is its fast processing time, which is typically
independent of the number of data objects and dependent only on the number of cells in
each dimension in the quantized space.

o Typical methods: STING, WaveCluster, CLIQUE

 Model-based:

o Model-based methods hypothesize a model for each of the clusters and ﬁnd the best
ﬁt of the data to the given model.

o Typical methods: Expectation-Maximization (EM), SOM, COBWEB

 Frequent pattern-based:

o Based on the analysis of frequent patterns

o Typical methods: p-Cluster

 User-guided or constraint-based:

o Clustering by considering user-specified or application-specific constraints

o Typical methods: COD (obstacles), constrained clustering

DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Mod 5
No ratings yet
Mod 5
56 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
7 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 4
No ratings yet
Unit 4
97 pages
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Contents
No ratings yet
Contents
59 pages
Association Rules
No ratings yet
Association Rules
24 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Association Rule
No ratings yet
Association Rule
27 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Unit - III
No ratings yet
Unit - III
27 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rules
No ratings yet
Association Rules
33 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
23 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
No ratings yet
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
3 pages
Designing The Architecture: 4 Edition
No ratings yet
Designing The Architecture: 4 Edition
63 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
4150 70-37-3 Requirement Analysis
No ratings yet
4150 70-37-3 Requirement Analysis
71 pages
Wa0002 PDF
No ratings yet
Wa0002 PDF
39 pages
Solved SSC CGL 2012 Tier 1 Paper With Solutions PDF
No ratings yet
Solved SSC CGL 2012 Tier 1 Paper With Solutions PDF
73 pages
GPS Tracking Explained for Engineers
100% (1)
GPS Tracking Explained for Engineers
4 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Graphs and Algorithms Exam Guide
No ratings yet
Graphs and Algorithms Exam Guide
17 pages
Regula-Falsi Method for CE Students
No ratings yet
Regula-Falsi Method for CE Students
5 pages
DAA Program
No ratings yet
DAA Program
4 pages
Dsa Notes All
No ratings yet
Dsa Notes All
3 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Binary Division Techniques
No ratings yet
Binary Division Techniques
4 pages
Numerical Methods with MATLAB
No ratings yet
Numerical Methods with MATLAB
25 pages
Final Quiz 2 3
No ratings yet
Final Quiz 2 3
4 pages
CS300 Sample Final Solutions
No ratings yet
CS300 Sample Final Solutions
10 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
FinalExam 2010
No ratings yet
FinalExam 2010
19 pages
IGNOU MCS-021 Data Structures Assignment
No ratings yet
IGNOU MCS-021 Data Structures Assignment
22 pages
AVL Tree
No ratings yet
AVL Tree
5 pages
Ant Colony Optimization
No ratings yet
Ant Colony Optimization
12 pages
Dsa Hashingppt
No ratings yet
Dsa Hashingppt
8 pages
Assignment Problem
No ratings yet
Assignment Problem
17 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
21 pages
Minimum Spanning Trees: Algorithms & Applications
No ratings yet
Minimum Spanning Trees: Algorithms & Applications
42 pages
Dec - 2021 MCS-21
No ratings yet
Dec - 2021 MCS-21
3 pages
Sequencing Problem - Processing N Jobs Through 2 Machines PDF
100% (1)
Sequencing Problem - Processing N Jobs Through 2 Machines PDF
5 pages
Lecture12 - Vertical Fragmentation - II
No ratings yet
Lecture12 - Vertical Fragmentation - II
15 pages
Generate and Test
No ratings yet
Generate and Test
19 pages
Advance Analysis of Algorithms
No ratings yet
Advance Analysis of Algorithms
26 pages
8-Queen Problem
No ratings yet
8-Queen Problem
2 pages
Chapter 3 - String Processing
0% (1)
Chapter 3 - String Processing
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
14 pages
Association Analysis in Detail
No ratings yet
Association Analysis in Detail
15 pages
DSA Imp Question Bank by MCA SCHOLARS Group
No ratings yet
DSA Imp Question Bank by MCA SCHOLARS Group
2 pages
Lec Week 2
No ratings yet
Lec Week 2
8 pages
Lab Manual - CSP 350
No ratings yet
Lab Manual - CSP 350
57 pages

Module5 DMW

Uploaded by

Module5 DMW

Uploaded by

Module 5: Syllabus

Association Rules Mining: Concepts, Apriori and FP-Growth Algorithm.

Association Rules Mining: Concepts

Fig: Sample Transaction database

 Meaning that whenever X appears in the Transaction Y also tends to appear.

3. Support, Confidence and Lift

Support(X) =(No. of times X appears)/( total number of transaction)=P(X)

Support(X,Y) =(No. of times X and Y appears together)/( total number of transaction)=P(X∩Y)

Lift( X ⇒ Y)= P(X∩Y)/ P(X)= P(Y|X)/ P(Y)

Association Rules Mining: Algorithms

PART 1: Finding frequent itemsets

PART 2: Generating Association Rules from Frequent Itemsets

There are nine transactions in this database, that is, n = 9.

Set of Items, I={I1,I2,I3,I4,I5}

Part 1: Frequent Item Set generation

Step 7. The transactions in D are scanned to determine support count of C3

NULL list. Stop Frequent Item set generation.

PART 2: Generating Association Rules from Frequent Item sets

 Minimum confidence required 75 %

Consider L3={{I1, I2, I3},{I1, I2, I5}}

I1^I2⇒ I3, conﬁdence = 2/4 = 50%

Major drawbacks of Apriori Algorithm

FP-growth Association Rule Mining Algorithm

Advantages of FP growth algorithm:-

1. Faster than apriori algorithm

3. Only two passes over dataset

Disadvantages of FP growth algorithm:-

1. FP tree may not fit in memory

2. FP tree is expensive to build

Fp growth algorithm example

FP-growth Association Rule Mining Algorithm

Input: D- a transaction database; min_sup- the minimum support count threshold.

Output: The complete set of frequent patterns.

FP-Tree corresponding to given Dataset is shown in below figure

 Cluster: A collection of data objects

 Typical applications of Cluster analysis

Clustering for Data Understanding and Applications

Clustering as a Preprocessing Tool (Utility)

Quality: What Is Good Clustering?

 A good clustering method will produce high quality clusters

 The quality of a clustering method depends on

Major Clustering Approaches

o A hierarchical method can be classiﬁed as being either agglomerative or divisive

o Typical methods: Diana, Agnes, BIRCH, CAMELEON

o Based on connectivity and density functions

o Typical methods: DBSACN, OPTICS, DenClue

o Typical methods: STING, WaveCluster, CLIQUE

o Typical methods: Expectation-Maximization (EM), SOM, COBWEB

o Based on the analysis of frequent patterns

o Typical methods: p-Cluster

o Clustering by considering user-specified or application-specific constraints

o Typical methods: COD (obstacles), constrained clustering

You might also like