0% found this document useful (0 votes)

6 views8 pages

Chapter 7

Uploaded by

rakeshbachchan018833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

Chapter 7

Uploaded by

rakeshbachchan018833

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit 7

Association Rule Mining

Association rules are if/then statements that help uncover relationships
between seemingly unrelated data in a relational database or other
information repository. An example of an association rule would be "If a
customer buys a dozen eggs, he is 80% likely to also purchase milk." An
association rule has two parts, an antecedent (if) and a consequent (then).
An antecedent is an item found in the data. A consequent is an item that is
found in combination with the antecedent.

Association rule mining is a method for discovering interesting relations

between variables in large databases. It is intended to identify strong rules
discovered in databases using different measures of interestingness. For
example, the rule found in the sales data of a supermarket would indicate
that if a customer buys onions and potatoes together, they are likely to also
buy hamburger meat. Such information can be used as the basis for
decisions about marketing. In addition to the above example from market
basket analysis association rules are employed today in many application
areas including web usage mining, intrusion detection, bioinformatics etc.

The problem of association rule mining is defined as: Let I={I 1,I 2............In} be
a set of binary attributes called items. Let D={T 1,T 2............Tm} be a set of
transactions called the database. Each transaction in has a unique
transaction ID and contains a subset of the items in A rule is defined as an
implication of the form:
X => Y
Where X , Y ⊆I and X ∩Y =Φ
Every rule is composed by two different set of items, also known as X and Y
itemsets and, where X is called antecedent or left-hand-side (LHS) and Y is
consequent or right-hand-side (RHS).

Transaction Milk Brea Butte Bee Diape

ID d r r r
1 1 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 1
4 1 1 1 0 0

1
5 0 1 0 0 0

To illustrate the concepts, we use a small example from the supermarket

domain. The set of items in above table shows a small database containing
the items, where, in each entry, the value 1 means the presence of the item
in the corresponding transaction, and the value 0 represent the absence of
an item in a that transaction. An example rule for the supermarket could
be {butter ,bread }=> ¿ ¿meaning that if butter and bread are bought,
customers also buy milk.

Support and Confidence

In order to select interesting rules from the set of all possible rules,
constraints on various measures of significance and interest are used. The
best-known constraints are minimum thresholds on support and confidence.

Support of association rule A => B is the percentage of transactions in dataset

that contain both items. In formula
Support( A => B)=P ( A∪B )
For example, in above data-set, the association rule bread => milk has a
support of 2/5 since both items occurs in 40% of all transactions (2 out of 5
transactions).

Confidence of association rule A => B with respect to set of transactions T in

dataset D is the percentage of transactions in D containing A that also
contain B. In formula
Confidence( A => B)=P (B|A )=P( A∪B )/ P( A )=Support ( A => B )/ P( A )

For example, in above data-set, the association rule bread => milk has a
confidence of 2/3, since 66.66% of all transactions containing bread also
contains milk.

Rules that satisfy both a minimum support threshold and a minimum

confidence threshold are called strong. By convention, we write support and
confidence values so as to occur between 0% and 100%, rather than 0 to
1.0.

Why Association Mining

2
In data mining, association rules are useful for analyzing and predicting
customer behavior. They play an important part in shopping basket data
analysis, product clustering, and catalog design and store layout.
Programmers use association rules to build programs capable of machine
learning

Apriori Algorithm
It is a classic algorithm used in data mining for learning association rules. It
is very simple. Learning association rules basically means finding the items
that are purchased together more frequently than others. The name of the
algorithm is based on the fact that the algorithm uses prior knowledge of
frequent item set properties.

Apriori employs an iterative approach known as a level-wise search, where k-

itemsets are used to explore (k+1)-itemsets. First, the set of frequent 1-
itemsets is found by scanning the database to accumulate the count for each
item, and collecting those items that satisfy minimum support. The resulting
set is denoted L1.Next, L1 is used to find L2, the set of frequent 2-itemsets,
which is used to find L3, and so on, until no more frequent k-itemsets can be
found. The finding of each Lk requires one full scan of the database. To
improve the efficiency of the level-wise generation of frequent itemsets, an
important property called the Apriori property, presented below, is used to
reduce the search space. Apriori Property states that any subset of frequent
item set must be frequent.

Example

3
Consider a database, D, consisting of 9 transactions. Suppose min. support
count required is 2 (i.e. min-sup = 2/9 = 22 %). Let minimum confidence
required is 70%. We have to first find out the frequent item set using Apriori
algorithm. Then, Association rules will be generated using min. support &
min. confidence.

Solution
Step 1: Generating 1-itemset Frequent Pattern
The set of frequent 1-itemsets, L1, consists of the candidate 1-itemsets
satisfying minimum support. In the first iteration of the algorithm, each item
is a member of the set of candidate.

In the first iteration of the algorithm, each item is a member of the set of
candidate.

4
Step 2: Generating 2-itemset Frequent Pattern
To discover the set of frequent 2-itemsets, L2, the algorithm uses L1 Join
L1to generate a candidate set of 2-itemsets, C2. Next, the transactions in D
are scanned and the support count for each candidate item set in C2 is
accumulated (as shown in the middle table). The set of frequent 2-itemsets,
L2, is then determined, consisting of those candidate 2-itemsets in C2 having
minimum support.

Step 3: Generating 3-itemset Frequent Pattern

The generation of the set of candidate 3-itemsets, C3, involves use of the
Apriori Property. In order to find C3, we compute L2JoinL2. C3= L2 JoinL2 =
{{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
Now, Join step is complete and Prune step will be used to reduce the size of
C3. Prune step helps to avoid heavy computation due to large Ck.

Based on the Apriori property that all subsets of a frequent item set must
also be frequent, we can determine that four latter candidates cannot
possibly be frequent. For example, let’s take {I1, I2, I3}.The 2-item subsets
of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3}
are members of L2, We will keep {I1, I2, I3} in C3. Lets take another
example of {I2, I3, I5} which shows how the pruning is performed. The 2-
item subsets are {I2, I3}, {I2, I5} & {I3,I5}. BUT, {I3, I5} is not a member of
L2and hence it is not frequent violating Apriori Property. Thus we will have to

5
remove {I2, I3, I5} from C3. Therefore, C3= {{I1, I2, I3}, {I1, I2, I5}} after
checking for all members of result of Join operation for Pruning. Now, the
transactions in D are scanned in order to determine L3, consisting of those
candidates 3-itemsets in C3 having minimum support.

Step 4: Generating 4-itemset Frequent Pattern

The algorithm uses L3 JoinL3to generate a candidate set of 4-itemsets, C4.
Although the join results in {{I1, I2, I3, I5}}, this item set is pruned since its
subset {{I2, I3, I5}}is not frequent. Thus, C4= φ, and algorithm terminates,
having found all of the frequent items. This completes our Apriori Algorithm.

These frequent itemsets will be used to generate strong association rules

( where strong association rules satisfy both minimum support & minimum
confidence).

Step 5: Generating Association Rules from Frequent Itemsets

Procedure:
For each frequent item set “l”, generate all nonempty subsets of l. For
every nonempty subset s of l, output the rule “ s =>l −s ”if support_count(l) /
support_count(s) >= min_conf where min_conf is minimum confidence
threshold.

Back To Example
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3},
{I2,I4}, {I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
Let’s take l = {I1, I2, I5}. It’s all nonempty subsets are {I1, I2}, {I1, I5}, {I2,
I5}, {I1}, {I2}, {I5}.

6
Let minimum confidence threshold is, say 70%. The resulting association
rules are shown below, each listed with its confidence.
R1: I1 ^ I2 =>I5
Confidence = support_count {I1, I2,

I5}/support_count{I1,I2} = 2/4 = 50%, R1 is Rejected.

R2: I1 ^ I5 =>I2
Confidence = support_count {I1,I2,I5}/ support_count {I1,I5} = 2/2 = 100%,
R2 is Selected.
R3: I2 ^ I5 =>I1
Confidence = support_count {I1,I2,I5}/ support_count {I2,I5} = 2/2 = 100%,
R3 is Selected.
R4: I1 =>I2 ^ I5
Confidence = support_count {I1,I2,I5}/ support_count {I1} = 2/6 = 33%, R4
is Rejected.
R5: I2 =>I1 ^ I5
Confidence = support_count {I1,I2,I5}/ support_count {I2} = 2/7 = 29%, R5
is Rejected.
R6: I5 =>I1 ^ I2
Confidence = support_count {I1,I2,I5}/ support_count {I5} = 2/2 = 100%,
R6 is Selected.

In this way, we have found three strong association rules.

Improving Efficiency of Apriori Algorithm

Many variations of the Apriori algorithm have been proposed that focus on
improving the efficiency of the original algorithm. Several of these variations
are summarized as follows:

Hash-based technique
A hash-based technique can be used to reduce the size of the candidate k-
itemsets, Ck, for k > 1. For example, when scanning each transaction in the
database to generate the frequent 1-itemsets, L1, from the candidate 1-
itemsets in C1, we can generate all of the 2-itemsets for each transaction,
hash them into the different buckets of a hash table structure, and increase
the corresponding bucket counts. A 2-itemset whose corresponding bucket
count in the hash table is below the support threshold cannot be frequent
and thus should be removed from the candidate set. Such a hash-based
technique may substantially reduce the number of the candidate k-itemsets
examined.
7
Transaction Reduction
It reduces the number of transactions scanned in future iterations. A
transaction that does not contain any frequent k-itemsets cannot contain any
frequent (k+1)-itemsets. Therefore, such a transaction can be marked or
removed from further consideration because subsequent scans of the
database for j-itemsets, where j > k, will not require it.

Partitioning
The set of transactions may be divided into a number of disjoint subsets.
Then each partition is searched for frequent itemsets. These frequent
itemsets are called local frequent itemsets. Any itemset that is potentially
frequent with respect to D must occur as a frequent itemset in at least one of
the partitions. Therefore, all local frequent itemsets are candidate itemsets
with respect to D. The collection of frequent itemsets from all partitions
forms the global candidate itemsets with respect to D.

Sampling
A random sample (usually large enough to fit in the main memory) may be
obtained from the overall set of transactions and the sample is searched for
frequent itemsets. These frequent itemsets are called sample frequent
itemsets. Because we are searching for frequent itemsets in S rather than in
D, it is possible that we will miss some of the global frequent itemsets. To
lessen this possibility, we use a lower support threshold than minimum
support to find the frequent itemsets local to S

Mod 5
No ratings yet
Mod 5
56 pages
Unit 4
No ratings yet
Unit 4
97 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Mining Association Rules Guide
No ratings yet
Mining Association Rules Guide
41 pages
Apriori
No ratings yet
Apriori
37 pages
Association Rules
No ratings yet
Association Rules
33 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Apriori
No ratings yet
Apriori
34 pages
Association Rules
No ratings yet
Association Rules
24 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Unit - III
No ratings yet
Unit - III
27 pages
Contents
No ratings yet
Contents
59 pages
Mod 3 Notes Full
No ratings yet
Mod 3 Notes Full
25 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Unit 4
No ratings yet
Unit 4
72 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
23 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
91 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
CH3 Association Rules Mining
No ratings yet
CH3 Association Rules Mining
25 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
B.Tech Algorithm Exam Guide
No ratings yet
B.Tech Algorithm Exam Guide
1 page
GATE 2024: Data Science & AI Guide
No ratings yet
GATE 2024: Data Science & AI Guide
221 pages
New Syllabus
No ratings yet
New Syllabus
2 pages
Reasoning Systems For Categories
No ratings yet
Reasoning Systems For Categories
13 pages
CS 3110 Recitation 17 More Amortized Analysis: Binary Counter
No ratings yet
CS 3110 Recitation 17 More Amortized Analysis: Binary Counter
5 pages
Graph Theory Basics for Beginners
No ratings yet
Graph Theory Basics for Beginners
89 pages
AI-CSP (Constraint Satisfaction Problem)
No ratings yet
AI-CSP (Constraint Satisfaction Problem)
34 pages
ME 310 Numerical Methods Optimization
No ratings yet
ME 310 Numerical Methods Optimization
11 pages
LAB211 Assignment: Title Background Context
No ratings yet
LAB211 Assignment: Title Background Context
3 pages
Genetic Algorithms & Combinatorial Problems
No ratings yet
Genetic Algorithms & Combinatorial Problems
7 pages
Self Information
No ratings yet
Self Information
3 pages
Syncronised Oscillatory Networks: Applications To Graph Colouring
No ratings yet
Syncronised Oscillatory Networks: Applications To Graph Colouring
1 page
Neural Network Presentation
100% (4)
Neural Network Presentation
33 pages
c2 Chapter 2.2
No ratings yet
c2 Chapter 2.2
6 pages
Introduction To Graphs (Part-I - III)
No ratings yet
Introduction To Graphs (Part-I - III)
69 pages
Assignment 05
No ratings yet
Assignment 05
2 pages
AI Local Search Techniques
No ratings yet
AI Local Search Techniques
20 pages
DSA Learning Plan and Strategy
No ratings yet
DSA Learning Plan and Strategy
2 pages
K-Nearest-Neighbors Regression: I2n (X) I K 1 N
No ratings yet
K-Nearest-Neighbors Regression: I2n (X) I K 1 N
3 pages
Rules of Functional Dependencies PDF
No ratings yet
Rules of Functional Dependencies PDF
39 pages
Design of Pseudo Random Binary Sequence Generator Using VHDL
No ratings yet
Design of Pseudo Random Binary Sequence Generator Using VHDL
2 pages
BCS-042 Solved Assignment 2021-22
100% (1)
BCS-042 Solved Assignment 2021-22
28 pages
Preliminaries
No ratings yet
Preliminaries
45 pages
Logic Sec4
No ratings yet
Logic Sec4
24 pages
Unit 2 - Bubble - Sort
No ratings yet
Unit 2 - Bubble - Sort
15 pages
DSA Model Exam
No ratings yet
DSA Model Exam
1 page
Feedback Arc Set
No ratings yet
Feedback Arc Set
8 pages
Class XI Computer Science Lesson 15 Logic Gates Part 2 Session 2023-'24
No ratings yet
Class XI Computer Science Lesson 15 Logic Gates Part 2 Session 2023-'24
10 pages
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
No ratings yet
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
20 pages
Classification: Table 4.1. Data Set For Exercise 2
No ratings yet
Classification: Table 4.1. Data Set For Exercise 2
7 pages

Chapter 7

Uploaded by

Chapter 7

Uploaded by

Unit 7

Association Rule Mining

Association rule mining is a method for discovering interesting relations

Transaction Milk Brea Butte Bee Diape

To illustrate the concepts, we use a small example from the supermarket

Support and Confidence

Support of association rule A => B is the percentage of transactions in dataset

Confidence of association rule A => B with respect to set of transactions T in

Rules that satisfy both a minimum support threshold and a minimum

Why Association Mining

Apriori employs an iterative approach known as a level-wise search, where k-

Step 3: Generating 3-itemset Frequent Pattern

Step 4: Generating 4-itemset Frequent Pattern

These frequent itemsets will be used to generate strong association rules

Step 5: Generating Association Rules from Frequent Itemsets

I5}/support_count{I1,I2} = 2/4 = 50%, R1 is Rejected.

In this way, we have found three strong association rules.

Improving Efficiency of Apriori Algorithm

You might also like