0% found this document useful (0 votes)

30 views6 pages

Frequent Pattern Mining With Associations: Lesson Introduction

The document discusses frequent pattern mining, focusing on the identification of frequent itemsets, subsequences, and substructures within large datasets, particularly in the context of market basket analysis. It highlights the importance of discovering associations among items to inform business decisions and marketing strategies, using the Apriori algorithm as a method for efficient mining of frequent itemsets. The document outlines the process of generating association rules from frequent itemsets, emphasizing the significance of support and confidence measures in evaluating the strength of these rules.

Uploaded by

gihanisa.singersl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views6 pages

Frequent Pattern Mining With Associations: Lesson Introduction

Uploaded by

gihanisa.singersl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

9.

Frequent Pattern Mining with Associations

Lesson Introduction

Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that appear in a data
set frequently. For example, a set of items, such as milk and bread, that appear frequently together in a
transaction data set is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera,
and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential
pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices,
which may be combined with itemsets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Finding such frequent patterns plays an essential role in mining associations,
correlations, and many other interesting relationships among data. Moreover, it helps in data
classification, clustering, and other data mining tasks as well. Thus, frequent pattern mining has become
an important data mining task and a focused theme in data mining research. In this lesson, we introduce
the concepts of frequent patterns and associations and study how they can be mined efficiently. This
lesson is dedicated to methods of frequent itemset mining.

Learning Outcomes:
After successful completion of this lesson, the students will be able to understand how to find frequent
itemsets from large datasets with candidate generation.

Lesson Outline:
• What is a frequent itemset?
• How can we find frequent itemsets from large amounts of data?
• What is frequent pattern mining with candidate generation?

9.1 Motivation Scenario : Market Basket Analysis.

Frequent itemset mining leads to the discovery of associations and correlations among items in large
transactional or relational data sets. With massive amounts of data continuously being collected and
stored, many industries are becoming interested in mining such patterns from their databases. The
discovery of interesting correlation relationships among huge amounts of business transaction records
can help in many business decision-making processes, such as catalog design, cross-marketing, and
customer shopping behavior analysis.

A typical example of frequent itemset mining is market basket analysis. This process analyzes customer
buying habits by finding associations between the different items that customers place in their “shopping
baskets” (Figure 9-1). The discovery of such associations can help retailers develop marketing strategies
by gaining insight into which items are frequently purchased together by customers. For instance, if
customers are buying milk, how likely are they to also buy bread (and what kind of bread) on the same
trip to the supermarket? Such information can lead to increased sales by helping retailers do selective
marketing and plan their shelf space.

Figure 9-1:Market basket analysis.

Let’s look at an example of how market basket analysis can be useful.

Suppose, as manager of a shopping mall, you would like to learn more about the buying habits of your
customers. Specifically, you wonder, “Which groups or sets of items are customers likely to purchase on
a given trip to the store?” To answer your question, market basket analysis may be performed on the
retail data of customer transactions at your store. You can then use the results to plan marketing or

advertising strategies, or in the design of a new catalog. For instance, market basket analysis may help
you design different store layouts. In one strategy, items that are frequently purchased together can be
placed in proximity in order to further encourage the sale of such items together. If customers who
purchase computers also tend to buy antivirus software at the same time, then placing the hardware
display close to the software display may help increase the sales of both items. Market basket analysis
can also help retailers plan which items to put on sale at reduced prices. If customers tend to purchase
computers and printers together, then having a sale on printers may encourage the sale of printers as well
as computers.
These patterns can be represented in the form of association rules. For example, the information that
customers who purchase computers also tend to buy antivirus software at the same time is represented
in Association Rule below:

computer antivirus software [support = 2%; confidence = 60%]

Rule support and confidence are two measures of rule interestingness. They respectively reflect the
usefulness and certainty of discovered rules. A support of 2% for the above Association Rule means that
2% of all the transactions under analysis show that computer and antivirus software are purchased
together. A confidence of 60% means that 60% of the customers who purchased a computer also bought
the software. Typically, association rules are considered interesting if they satisfy both a minimum support
threshold and a minimum confidence threshold. Such thresholds can be set by users or domain experts.
Additional analysis can be performed to uncover interesting statistical correlations between associated
items.

In general, association rule mining can be viewed as a two-step process:

 Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as
a predetermined minimum support count, min sup.
 Generate strong association rules from the frequent itemsets: By definition, these rules must
satisfy minimum support and minimum confidence.

A major challenge in mining frequent itemsets from a large data set is the fact that such mining often
generates a huge number of itemsets satisfying the minimum support (min sup) threshold, especially
when min sup is set low. This is because if an itemset is frequent, each of its subsets is frequent as well.

9.2 Finding Frequent Itemsets Using Candidate Generation: The Apriori Algorithm.

Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining frequent itemsets
for Boolean association rules. The name of the algorithm is based on the fact that the algorithm uses prior
knowledge of frequent itemset properties, as we shall see following. Apriori employs an iterative
approach known as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets. First, the set
of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and
collecting those items that satisfy minimum support. The resulting set is denoted L 1.Next, L1 is used to find
L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no more frequent k-itemsets
can be found. The finding of each Lk requires one full scan of the database. To improve the efficiency of
the level-wise generation of frequent itemsets, a valuable property called the Apriori property, presented
below, is used to reduce the search space. We will first describe this property, and then show an example
illustrating its use.

To improve the efficiency of the level-wise generation of frequent itemsets, an important property called
the Apriori property, presented below, is used to reduce the search space. We will first describe this
property, and then show an example illustrating its use.
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

The Apriori property is based on the following observation. By definition, if an itemset I does not satisfy
the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min sup. If an item A is
added to the itemset I, then the resulting itemset (i.e., I [A) cannot occur more frequently than I. Therefore,
I [A is not frequent either; that is, P(I [A] < min sup.

Let us look at a simple example, shown Table 9-1. There are nine transactions in this database, that is, ǀDǀ=
9. We use Figure 9-1 to illustrate the Apriori algorithm for finding frequent itemsets in D.

Table 9-1: Transactional dataset

 In the first iteration of the algorithm, each item is a member of the set of candidate 1-itemsets,
C1. The algorithm scans all of the transactions in order to count the number of occurrences of
each item.
 Suppose that the minimum support count required is 2, that is, min sup = 2. (Here, we are
referring to absolute support because we are using a support count. The corresponding relative
support is 2/9 = 22%). The set of frequent 1-itemsets, L 1, can then be determined. It consists of
the candidate 1-itemsets satisfying minimum support. In the example, all of the candidates in C 1
satisfy minimum support.
 To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 X L1 to generate a
candidate set of 2-itemsets, C2. Note that no candidates are removed fromC2 during the prune
step because each subset of the candidates is also frequent.
 Next, the transactions in D are scanned, and the support count of each candidate itemset inC2 is
accumulated, as shown in the middle table of the second row in Figure 9-1.
 The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets
in C2 having minimum support.
 The generation of the set of candidate 3-itemsets, C3, is detailed in Figure 5.3. From the join step,
we first get C3 =L2 X L2 = { {I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5} }. Based
on the Apriori property that all subsets of a frequent itemset must also be frequent, we can
determine that the four latter candidates cannot possibly be frequent. We, therefore, remove
them from C3, thereby saving the effort of unnecessarily obtaining their counts during the
subsequent scan of D to determine L3. Note that when given a candidate k-itemset, we only need
to check if its (k -1)-subsets are frequent since the Apriori algorithm uses a level-wise search
strategy. The resulting pruned version of C3 is shown in the first table of the bottom row of Figure
9-1.
 The transactions in D are scanned in order to determine L3, consisting of those candidate 3-
itemsets in C3 having minimum support (Figure 9-1).

Figure 9-1: Example for Apriori Algorithm

The algorithm uses L3 X L3 to generate a candidate set of 4-itemsets, C4. Although the join results in { {I1,
I2, I3, I5} }, this itemset is pruned because its subset { {I2, I3, I5} } is not frequent. Thus, C 4 = null, and the
algorithm terminates, having found all of the frequent itemsets.

Suppose the data contain the frequent itemset l = {I1, I2, I5}. What are the association rules that can be
generated from l ? The nonempty subsets of l are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5}. The resulting
association rules are as shown below, each listed with its confidence:

I1 , I2 I5, confidence = 2=4 = 50%

I1, I5 I2, confidence = 2=2 = 100%

I2, I5 I1, confidence = 2=2 = 100%

I1 I2, I5, confidence = 2=6 = 33%

I2 I1, I5, confidence = 2=7 = 29%

I5 I1, I2, confidence = 2=2 = 100%

If the minimum confidence threshold is, say, 70%, then only the second, third, and last rules above are
output, because these are the only ones generated that are strong. Note that, unlike conventional
classification rules, association rules can contain more than one conjunct in the right-hand side of the rule.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Module 2
No ratings yet
Module 2
14 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Full UNIT 4 Notes
No ratings yet
Full UNIT 4 Notes
37 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Report of 2nd Defence
No ratings yet
Report of 2nd Defence
6 pages
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
4 pages
Unit 5 DWDM - 2
No ratings yet
Unit 5 DWDM - 2
50 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Module 2
No ratings yet
Module 2
13 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
No ratings yet
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
8 pages
Unit - III
No ratings yet
Unit - III
38 pages
Freq Pattern9
No ratings yet
Freq Pattern9
20 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
7 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Apriori Documentation
No ratings yet
Apriori Documentation
31 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
Unit II
No ratings yet
Unit II
22 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Unit 2
No ratings yet
Unit 2
14 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
MC5032 - DMDW
No ratings yet
MC5032 - DMDW
3 pages
Unsupervised Learning Review
No ratings yet
Unsupervised Learning Review
12 pages
(Ebook PDF) Data Mining Concepts and Techniques 3rdinstant Download
100% (3)
(Ebook PDF) Data Mining Concepts and Techniques 3rdinstant Download
44 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
TYBSc (CS) WT - DA Practical Slips
No ratings yet
TYBSc (CS) WT - DA Practical Slips
68 pages
Data Mining Methods
No ratings yet
Data Mining Methods
18 pages
Data Mining A Tutorial Based Primer Second Edition Roiger Full Digital Chapters
100% (4)
Data Mining A Tutorial Based Primer Second Edition Roiger Full Digital Chapters
147 pages
Big Data & NoSQL Exam Prep
No ratings yet
Big Data & NoSQL Exam Prep
5 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
Smart and Intelligent Production Strategy For The
No ratings yet
Smart and Intelligent Production Strategy For The
9 pages
Data Warehousing & Mining Course
No ratings yet
Data Warehousing & Mining Course
2 pages
Data Mining and Database Systems: Where Is The Intersection?
No ratings yet
Data Mining and Database Systems: Where Is The Intersection?
5 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
69 pages
A Kansei Mining System For Affective Design: Jianxin (Roger) Jiao, Yiyang Zhang, Martin Helander
No ratings yet
A Kansei Mining System For Affective Design: Jianxin (Roger) Jiao, Yiyang Zhang, Martin Helander
16 pages
MCA Curriculum Overview
No ratings yet
MCA Curriculum Overview
50 pages
Machine Learning For Business Analytics: Concepts, Techniques and Applications With JMP Pro, 2nd Edition Galit Shmueliinstant Download
100% (4)
Machine Learning For Business Analytics: Concepts, Techniques and Applications With JMP Pro, 2nd Edition Galit Shmueliinstant Download
51 pages
Implementasi Data Mining Menggunakan Algoritma Apriori
No ratings yet
Implementasi Data Mining Menggunakan Algoritma Apriori
10 pages
Exercises 695 Ar
No ratings yet
Exercises 695 Ar
1 page
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Data Mining Task Primitives
No ratings yet
Data Mining Task Primitives
12 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
Apriori
No ratings yet
Apriori
13 pages
IT446 Test Bank
No ratings yet
IT446 Test Bank
57 pages
BCA-511 Data Mining & Warehousing - VK BCA
No ratings yet
BCA-511 Data Mining & Warehousing - VK BCA
3 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Cluster Analysis in Business Analytics
No ratings yet
Cluster Analysis in Business Analytics
57 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Daily Expense Tracker Project Report
100% (1)
Daily Expense Tracker Project Report
49 pages
Apriori Algorithm for Rule Mining
No ratings yet
Apriori Algorithm for Rule Mining
22 pages
"Fast Algorithms For Mining Association Rules" by Rakesh Agarwal Ramakrishnan Srikant
No ratings yet
"Fast Algorithms For Mining Association Rules" by Rakesh Agarwal Ramakrishnan Srikant
5 pages

Frequent Pattern Mining With Associations: Lesson Introduction

Uploaded by

Frequent Pattern Mining With Associations: Lesson Introduction

Uploaded by

9.

Frequent Pattern Mining with Associations

9.1 Motivation Scenario : Market Basket Analysis.

Figure 9-1:Market basket analysis.

Let’s look at an example of how market basket analysis can be useful.

computer antivirus software [support = 2%; confidence = 60%]

In general, association rule mining can be viewed as a two-step process:

Table 9-1: Transactional dataset

Figure 9-1: Example for Apriori Algorithm

I1 , I2 I5, confidence = 2=4 = 50%

I1, I5 I2, confidence = 2=2 = 100%

I2, I5 I1, confidence = 2=2 = 100%

I1 I2, I5, confidence = 2=6 = 33%

I2 I1, I5, confidence = 2=7 = 29%

I5 I1, I2, confidence = 2=2 = 100%

You might also like