0% found this document useful (0 votes)

11 views11 pages

UNIT 3 Mining Frequent Pattern

Frequent pattern mining is a key data mining process that identifies patterns or associations within large datasets, crucial for tasks like market basket analysis. It involves concepts such as itemsets, support, confidence, and lift to analyze customer buying habits and improve business decision-making. The Apriori algorithm is a prominent method for mining frequent itemsets, employing a level-wise search approach to discover associations between items in transactional databases.

Uploaded by

ganavig291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

UNIT 3 Mining Frequent Pattern

Uploaded by

ganavig291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Fundamentals of Data Science

UNIT-3
Mining Frequent pattern
Introduction:
Frequent pattern mining in data mining is the process of identifying patterns or associations within a
dataset that occur frequently. This is typically done by analysing large datasets to find items or sets of
items that appear together frequently. It encompasses recognising collections of components that occur
together frequently in a transactional or relational database.
Frequent patterns are patterns (e.g., itemsets, subsequences, or substructures) that appear frequently in a
data set For example, a set of items, such as milk and bread, that appear frequently together in a
transaction data set is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera,
and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential
pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices,
which may be combined with itemsets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Finding frequent patterns plays an essential role in mining associations,
correlations, and many other interesting relationships among data. Moreover, it helps in data
classification, clustering, and other data mining tasks. Thus, frequent pattern mining has become an
important data mining task and a focused theme in data mining research.

Example of market basket analysis, the earliest form of frequent pattern mining for association rules.

Definition of Frequent Patterns: Frequent patterns refer to combinations of items, sequences, or

substructures that occur frequently in a dataset. For example, in a retail dataset, a frequent pattern
could be the association between certain products that are of ten purchased together ,like bread and
butter.

Mining frequent patterns in data science involves identifying recurring associations or relationships
within a dataset.

Basic Concepts in Frequent Pattern Mining

The technique of frequent pattern mining is built upon a number of fundamental ideas. The analysis is
based on transaction databases, which include records or transactions that represent collections of objects.
Items inside these uansactions are grouped together as itemsets.

An itemset is a collection of one or more items that are considered as a single entity. Each item within an
itemset is typically an element or attribute associated with data. Itemsets play a crucial role in the analysis
of datasets to identify patterns, associations, or relationships among items.

There are two main types of Itemsets:

1. Frequent Itemset: A frequent itemset is an itemset that appears in a dataset with a frequency greater
than or equal to a specified minimum support threshold. The support of an itemset is the proportion of
transactions in which the itemset occurs.

2 Association Rule: An association rule is a relationship between two itemsets, often represented in the
form of “ if X, then Y." The two parts of the rule are called the antecedent (X) and the consequent (Y).
The strength of the association rule is measured by metrics such as confidence and lift.

Page 1
Fundamentals of Data Science

Support

It has been calculated with the number of transactions divided by the total number of transactions made

support(A) = number of transactions in which A occurs

(number of all transactions)
EX:- support(pen) = transactions related to pen/total transactions
support -> 500/5000 = 10 percent

* Confidence

Whether the product sales are popular on individual sales or through combined sales has been calculated.
That is calculated with combined transactions/individual transactions.

Confidence( A=>B)= P B sup(A U B)

A sup( A)
EX:- Confidence = combine transactions/individual transactions
confidence-> 1000/500 = 20 percent

* Lift

Lift is calculated for knowing the ratio for the sales.

Lift( A->B)= support( A and B)

support( A)*support (B)
EX:- Lift = confidence percent/support percent Lift->20/10=2

When the Lift value is below 1, the combination is not so frequently bought by consumers. But in this
case, it shows that the probability of buying both the things together is high when compared to the
transaction for the individual items sold.

Market Basket Analysis (MBA)

Frequent itemset mining leads to the discovery of associations and correlations among items in large
transactional or relational data sets. With massive amounts of data continuously being collected and
stored, many industries are becoming interested in mining such patterns from their databases.

The discovery of interesting correlation relationships among huge amounts of business transaction
records can help in many business decision-making processes such as catalog design, cross-
marketing, and customer shopping behaviour analysis.

A typical example of frequent itemset mining is market basket analysis. This process analyzes
customer buying habits by finding associations between the different items that customers place in
their “shopping baskets” (Figure). The discovery of these associations can help retailers develop
Fundamentals of Data Science

marketing strategies by gaining insight into which items are frequently purchased together by
customers.

For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of
bread) on the same trip to the supermarket? This information can lead to increased sales by helping
retailers do selective marketing and plan their shelf space.

Market Basket Analysis Examples are Retail, Telecom, IBFS, Medicine etc...

How does Market Basket Analysis Work?

Market Basket Analysis is modeled on Association rule mining, i.e., the IF {}, THEN () construct. For
example, IF a customer buys bread, THEN he is likely to buy butter as well.

Association rules are usually represented as

Example:- {Bread}–>{Butter}

Some terminologies to familiarize yourself with Market Basket Analysis are:

◆Antecedent:- Items or 'itemsets' found within the data are antecedents. In simpler words, it's the IF
component, written on the left-hand side. In the above example, bread is the antecedent.

◆Consequent:- A consequent is an item or set of items found in combination with the antecedent. It's the
THEN component, written on the right-hand side. In the above example, butter is the consequent.

Working of Market Basket Analysis

The steps involved are:
(i) Transaction Data Collection: Data on customer transactions, such as receipts or online order
histories, are gathered. Each transaction should contain a list of items purchased by a customer.

(ii) Creation of a Transaction Database: The data is organized into a transactional database, where
each row represents a unique transaction, and the columns represent the items purchased.

(iii) Generation of Itemsets: All possible combinations of items that appear together in transactions are
identified. These combinations are known as itemsets.

(iv) Calculation of Support: The support for each itemset, which is the proportion of transactions that
contain the itemset, is calculated. Support is a measure of how frequently a particular combination of
items occurs.
(v) Setting a Minimum Support Threshold: A minimum support threshold to filter out itemsets with

Page 3
Fundamentals of Data Science

low occurrence, focusing on the most relevant associations is defined.

(vi) Generation of Association Rules: Based on the frequent itemsets, generate association rules that
express relationships between items are generated. These rules typically have a format like "If
(antecedent) Then (consequent) with a certain confidence."

(vii) Calculation of Confidence: Confidence measures how often the rule is correct. It is calculated as
the support for the combined itemset divided by the support for the antecedent.

(viii) Setting a Minimum Confidence Threshold: A minimum confidence threshold is established to

select the most meaningful and actionable rules.

(ix) Interpretation and Action: The generated rules are analyzed to understand the associations between
products. Strategies such as product placement, bundling, or targeted marketing are implemented based
on the discovered patterns.

Types of Market Basket Analysis

Market Basket Analysis techniques can be categorised based on how the available data is utilised. Here
are the following types of market basket analysis in data mining, such as:

1) Descriptive market basket analysis:-

This type only derives insights from past data and is the most frequently used approach. The analysis here
does not make any predictions but rates the association between products using statistical techniques. For
those familiar with the basics of Data Analysis, this type of modeling is known as unsupervised learning.

2) Predictive market basket analysis:-

This type uses supervised learning models like classification and regression. It essentially aims to mimic
the market to analyse what causes what to happen. Essentially, it considers items purchased in a sequence
to determine cross-selling. For example, buying an extended warranty is more likely to follow the
purchase of an iPhone. While it isn't as widely used as a descriptive MBA, it is still a very valuable tool
for marketers.

3) Differential market basket analysis:-

This type of analysis is beneficial for competitor analysis. It compares purchase history between stores,
between seasons, between two time periods, between different days of the week, etc., to find interesting
patterns in consumer behaviour. For example, it can help determine why some users prefer to purchase
the same product at the same price on Amazon vs Flipkart. The answer can be that the Amazon reseller
has more warehouses and can deliver faster, or maybe something more profound like user experience.

Benefits / Advantages of (MBA)

1) Increasing market share:- Once a company hits peak growth, it becomes challenging to determine
new ways of increasing market share. Market Basket Analysis can be used put together demographic
and gentrification data to determine the location of new store or geo-targeted ads.

2) Behaviour analysis:- Understanding customer behaviour patterns is a primal stone in the foundations
of marketing. MBA can be used anywhere from a simple catalogue desig to UI/UX.
Fundamentals of Data Science

3) Optimisation of in-store operations:- MBA is not only helpful in determining what goes on the
shelves but also behind the store. Geographical patterns play a key role in determining the popularity
or strength of certain products, and therefore, MBA has been increasingly used to optimize inventory
for each store or warehouse.

4) Campaigns and promotions:- Not only is MBA used to determine which products together but also
about which products form keystones in their product line.

5) Recommendations:- OTT platforms like Netflix and Amazon Prime benefit from IFA by
understanding what kind of movies people tend to watch frequently.

6) Increasing sales of return on investment.

7) Boosts consumer engagement.
8) Increasing client satisfaction.
9) Improves marketing initiatives and strategies.

Disadvantages of MBA:-

1) It identifies hypothesis which need to be tested.

2) Measurement of impact needed.

3) Difficult to identify product grouping.

4) A large number of real transaction are needed to do an effective basket analysis.

5) The analysis can capture results were due to the success of previous marketing campaign.

6) Possible measure of information

7) Accuracy of data

AprioriAlgorithm:

Finding Frequent Itemsets Using Candidate Generation: The Apriori Algorithm

• Apriori is a seminal algorithm proposed by R.Agrawal and R.Srikanthin1994 for

miningfrequent itemsets for Boolean association rules.
• Mining single dimensional Boolean association rules from transaction database.
• The name of the algorithm is based on the fact that the algorithm uses prior knowledge
of frequent item set properties.
• Apriori employs an iterative approach known as a level-wise search, where k-itemsets
areused to explore (k+1)-itemsets.
• First, the set of frequent 1-itemsets is found by scanning the database to accumulate the
countfor each item, and collecting those items that satisfy minimum support. The resulting set
is denoted L1.Next, L1 is used to find L2, the set of frequent2-itemsets,which is used to find
L3,and soon, until no more frequent k-item sets can be found.

Page 5
Fundamentals of Data Science

• The finding of each Lk requires one full scan of the database.

• A two-step process is followed in Apriori consisting of join and prune action.

Steps to solve Apriori Algorithm

1. Define minimum support threshold

2. Generate candidate item sets C1,C2…..
3. Count the support of each candidate item set
4. Prune the candidate item sets - Remove the item sets that do not meet the minimum support threshold.
5. Generate a list of frequent 1-item sets L1,L2….
6. Repeat steps 3-5 until no more frequent item sets can be generated.
7. Generate association rules
8. Evaluate the strong association rules confidence >= the minimum confidence threshold

Problem:

Minimum support threshold is 2 minimum confident threshold (c = 60%). Find the frequent
itemsets and generate association rules on this.
Fundamentals of Data Science

Frequent Itemset (I) = {Hot Dogs, Coke, Chips}

Association rules,

• [Hot Dogs^Coke]=>[Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Coke) = 2/2*100=100% //Selected

• [Hot Dogs^Chips]=>[Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Chips) = 2/2*100=100% //Selected

• [Coke^Chips]=>[Hot Dogs] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke^Chips) = 2/3*100=66.67% //Selected

• [Hot Dogs]=>[Coke^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs) = 2/4*100=50% //Rejected

• [Coke]=>[Hot Dogs^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke) = 2/3*100=66.67% //Selected

• [Chips]=>[Hot Dogs^Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Chips) = 2/4*100=50% //Rejected

There are four strong results (minimum confidence greater than 60%)

Algorithm:

Advantages of Apriori algorithm

1. Efficient discovery of patterns: Association rule mining algorithms are efficient at discovering patterns in large
datasets, making them useful for tasks such as market basket analysis and recommendation systems.

Page 7
Fundamentals of Data Science

2. Easy to interpret: The results of association rule mining are easy to understand and interpret, making it possible
to explain the patterns found in the data.

3. Can be used in a wide range of applications: Association rule mining can be used in a wide range of
applications such as retail, finance, and healthcare, which can help to improve decision-making and increase
revenue.

4. Handling large datasets: These algorithms can handle large datasets with many items and transactions, which
makes them suitable for big-data scenarios.

Disadvantages of Apriori algorithm

1. Large number of generated rules: Association rule mining can generate a large number of rules, many of which
may be irrelevant or uninteresting, which can make it difficult to identify the most important patterns.

2. Limited in detecting complex relationships: Association rule mining is limited in its ability to detect complex
relationships between items, and it only considers the co- occurrence of items in the same transaction.

3. Can be computationally expensive: As the number of items and transactions increases, the number of candidate
item sets also increases, which can make the algorithm computationally expensive.

4. Need to define the minimum support and confidence threshold: The minimum support and confidence
threshold must be set before the association rule mining process, which can be difficult and requires a good
understanding of the data.

Improving the Efficiency of Apriori :

Improving the efficiency of the Apriori algorithm is essential for handling large datasets and reducing computational
complexity. Here are several techniques and strategies to enhance the efficiency of the Apriori algorithm:

1. Transaction Reduction or Eliminate infrequent items from transactions: Before applying the Apriori
algorithm, remove items that do not meet the minimum support threshold. This reduces the size of transactions and
speeds up the subsequent steps.

2. Use of Data Structures: Utilize efficient data structures like hash tables or trees to store and manipulate itemsets.
This can significantly speed up the process of checking subset relationships and support counting.

3. Transaction Pruning: Discard transactions that do not contain any frequent items. If a transaction doesn't have a
single item that meets the minimum support, it cannot contribute to the discovery of frequent itemsets.

4. Caching Intermediate Results: Cache and reuse intermediate results during the candidate generation phase. If
the support of an itemset is calculated multiple times, store the result for reuse instead of recalculating it.

5. Dynamic Itemset Counting: Keep track of the count of each candidate itemset dynamically during the pass
through the dataset. This avoids the need for a separate pass to count support, reducing the number of scans through
the data.

6. Efficient Candidate Generation: Optimize the generation of candidate itemsets. Techniques such as pruning
based on frequent (k-1)-itemsets and avoiding duplicate generation can reduce the number of candidates considered.

7. Parallelisation: Parallelize the computation of support for different itemsets or transactions. This is particularly
effective when dealing with large datasets and multiple processors or machines are available.

8. Apriori Property: Leverage the Apriori property, which states that if an itemset is infrequent, all its supersets
will also be infrequent. This property can be used to prune the search space and avoid unnecessary calculations.
Fundamentals of Data Science

9. Bitwise Operations: Represent itemsets and transactions as bit vectors and use bitwise operations for set
intersection and union. This can lead to more efficient support counting.

10. Sampling: Use sampling techniques to analyze a subset of the dataset instead of the entire dataset. While this
may not guarantee the discovery of all frequent itemsets, it can provide approximate results with significantly less
computational cost.

11. Memory Efficiency: Optimize memory usage by using data structures that consume less memory, especially for
large datasets.

Frequent Pattern Growth Algorithm

The two primary drawbacks of the Apriori Algorithm are:
1. At each step, candidate sets have to be built.
2. To build the candidate sets, the algorithm has to repeatedly scan the database.

These two properties inevitably make the algorithm slower. To overcome these redundant steps, a new
association-rule mining algorithm was developed named Frequent Pattern Growth Algorithm. It overcomes the
disadvantages of the Apriori algorithm by storing all the transactions in a Tire Data Structure. The FP-Growth
Algorithm proposed by Han in.

Advantages of FP Growth Algorithm

o This algorithm needs to scan the database twice (only two passes over dataset)
o The pairing of items is not done in this algorithm, making it faster. (faster than apriori)
o No candidate generation required.
o It is efficient and scalable for mining both long and short frequent patterns.

Disadvantages of FP-Growth Algorithm

o FP Tree is more difficult to build than Apriori.
o FP Tree is expensive to build.
o The algorithm may not fit in the shared memory when the database is large.

FP Growth vs Apriori algorithm

Page 9
Fundamentals of Data Science

Problem Consider the following data let the minimum support be 3 .

Mining Multilevel Association Rules from Transaction Databases:

Types of Multilevel Association Rule: There are two types of association rules in multilevel, namely:
Intra-level Association Rule: These rules recognize patterns or relationships within a hierarchy level in data items.
The rule involves data items that are at the same level or belong to the same category in the hierarchy.

Inter-level Association Rule: In contrast to the intra-level association rule, the inter-level association rule finds
patterns across different levels of hierarchies. The rule involves data items from different levels or belonging to
different categories in the hierarchy.

Approaches For Mining Multilevel Association Rules

1. Uniform MinimumSupport:
• The same minimum support threshold is used when mining at each level of abstraction. When a
uniform minimum support threshold is used, the search procedure is simplified. The method is also
simple in that users are required to specify only one minimum support threshold.
Fundamentals of Data Science

• The uniform support approach, however, has some difficulties. It is unlikely that items at lower
levels of abstraction will occur as frequently as those at higher levels of abstraction.
• If the minimum support threshold is set too high, it could miss some meaningful associations
occurring at low abstraction levels. If the threshold is set too low, it may generate any uninteresting
associations occurring at high abstraction levels.
2. Reduced Minimum Support:
• Each level of abstraction has its own minimum support threshold.
• The deeper the level of abstraction, the smaller the corresponding threshold is. For
example, the minimum support thresholds for levels1 and 2 are 5%
and %, respectively. In this way, ―computer,¦ ―laptop computer,¦ and ―desktop computer¦
are allconsidered frequent.

3. Group-Based Minimum Support:

• Because users or experts often have insight as to which groups are more important thanothers,
it is sometimes more desirable to set up user-specific, item, or group based minimal support
thresholds when mining multilevel rules.
• For example, a user could set up the minimum support thresholds based on product
price, or on items of interest, such as by setting particularly low support thresholds for
laptop computers and flash drives in order to pay particular attention to the association
patterns containing items in these categories.

*********************************************************************

Page 11

DA Unit 4
100% (1)
DA Unit 4
125 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Freq Pattern9
No ratings yet
Freq Pattern9
20 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Data Warehouse and Data Mining - Unit 5
No ratings yet
Data Warehouse and Data Mining - Unit 5
30 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Association Rule
No ratings yet
Association Rule
20 pages
Unit 3 Mining Frequent Patterens
No ratings yet
Unit 3 Mining Frequent Patterens
30 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Module 2
No ratings yet
Module 2
13 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Frequent Itemsets & Market-Basket Analysis
No ratings yet
Frequent Itemsets & Market-Basket Analysis
31 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
11 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
No ratings yet
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
9 pages
Unit 5 DWDM - 2
No ratings yet
Unit 5 DWDM - 2
50 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
12 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
17 pages
Untitled Document
No ratings yet
Untitled Document
59 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Unit 2
No ratings yet
Unit 2
14 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
4 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Association Rules
No ratings yet
Association Rules
39 pages
Module-IV (Frequent Pattern & Association Rule Mining)
No ratings yet
Module-IV (Frequent Pattern & Association Rule Mining)
59 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Frequent Itemset and Association Rule Mining
No ratings yet
Frequent Itemset and Association Rule Mining
34 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Mining Frequent Patterns, Associations, and Correlations
No ratings yet
Mining Frequent Patterns, Associations, and Correlations
12 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Full UNIT 4 Notes
No ratings yet
Full UNIT 4 Notes
37 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Lect 4. Frequent Pattern Mining
100% (1)
Lect 4. Frequent Pattern Mining
60 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
Chapater 1 Data Mining 2025
No ratings yet
Chapater 1 Data Mining 2025
7 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
Unit-04 PHP Class and Object
No ratings yet
Unit-04 PHP Class and Object
13 pages
Manya .M
No ratings yet
Manya .M
36 pages
Lab Exam Details1
No ratings yet
Lab Exam Details1
3 pages
Retail Analytics & Advertising Insights
No ratings yet
Retail Analytics & Advertising Insights
22 pages
Session 3 - Market Basket Analysis and Association Rules
No ratings yet
Session 3 - Market Basket Analysis and Association Rules
18 pages
Problem Statement
100% (1)
Problem Statement
17 pages
Café Sales Insights & Strategies
No ratings yet
Café Sales Insights & Strategies
40 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
Unit 4
No ratings yet
Unit 4
24 pages
Market Basket
No ratings yet
Market Basket
13 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
Market Basket Analysis For Data Mining Concepts and Techniques
No ratings yet
Market Basket Analysis For Data Mining Concepts and Techniques
4 pages
AI Unit 4
No ratings yet
AI Unit 4
20 pages
174819-Market Basket Analysis
No ratings yet
174819-Market Basket Analysis
54 pages
Project Report
No ratings yet
Project Report
57 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
22 pages
Oral Questions LP-II: Star Schema
No ratings yet
Oral Questions LP-II: Star Schema
21 pages
Grocery Store POS Data Analysis
No ratings yet
Grocery Store POS Data Analysis
24 pages
MRA Extended Project Business Report
No ratings yet
MRA Extended Project Business Report
29 pages
Capstone-2 Market Basket Analysis Vinothkumar R
No ratings yet
Capstone-2 Market Basket Analysis Vinothkumar R
18 pages
Clustering Techniques Comparison
No ratings yet
Clustering Techniques Comparison
18 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
21 pages
Marketing Analytics - Week 11 - LAQ
No ratings yet
Marketing Analytics - Week 11 - LAQ
5 pages
Using Market Basket Analysis To Increase Sales and Heighten Marketing Effectiveness-Case Study
No ratings yet
Using Market Basket Analysis To Increase Sales and Heighten Marketing Effectiveness-Case Study
3 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
17 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
CRM Unit 4
No ratings yet
CRM Unit 4
22 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
10 pages
How To Increase Sales in Retail With Market Basket Analysis: Marko Svetina, Jože Zupančič
No ratings yet
How To Increase Sales in Retail With Market Basket Analysis: Marko Svetina, Jože Zupančič
11 pages
Boost Grocery Sales with Market Basket Analysis
No ratings yet
Boost Grocery Sales with Market Basket Analysis
4 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
4 pages

UNIT 3 Mining Frequent Pattern

Uploaded by

UNIT 3 Mining Frequent Pattern

Uploaded by

Fundamentals of Data Science

Definition of Frequent Patterns: Frequent patterns refer to combinations of items, sequences, or

Basic Concepts in Frequent Pattern Mining

There are two main types of Itemsets:

support(A) = number of transactions in which A occurs

Confidence( A=>B)= P B sup(A U B)

Lift is calculated for knowing the ratio for the sales.

Lift( A->B)= support( A and B)

Market Basket Analysis (MBA)

How does Market Basket Analysis Work?

Association rules are usually represented as

Some terminologies to familiarize yourself with Market Basket Analysis are:

Working of Market Basket Analysis

low occurrence, focusing on the most relevant associations is defined.

(viii) Setting a Minimum Confidence Threshold: A minimum confidence threshold is established to

Types of Market Basket Analysis

1) Descriptive market basket analysis:-

2) Predictive market basket analysis:-

3) Differential market basket analysis:-

Benefits / Advantages of (MBA)

6) Increasing sales of return on investment.

1) It identifies hypothesis which need to be tested.

2) Measurement of impact needed.

3) Difficult to identify product grouping.

4) A large number of real transaction are needed to do an effective basket analysis.

6) Possible measure of information

Finding Frequent Itemsets Using Candidate Generation: The Apriori Algorithm

• Apriori is a seminal algorithm proposed by R.Agrawal and R.Srikanthin1994 for

• The finding of each Lk requires one full scan of the database.

Steps to solve Apriori Algorithm

1. Define minimum support threshold

Frequent Itemset (I) = {Hot Dogs, Coke, Chips}

• [Hot Dogs^Coke]=>[Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Coke) = 2/2*100=100% //Selected

• [Hot Dogs^Chips]=>[Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Chips) = 2/2*100=100% //Selected

• [Coke^Chips]=>[Hot Dogs] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke^Chips) = 2/3*100=66.67% //Selected

• [Hot Dogs]=>[Coke^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs) = 2/4*100=50% //Rejected

• [Coke]=>[Hot Dogs^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke) = 2/3*100=66.67% //Selected

• [Chips]=>[Hot Dogs^Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Chips) = 2/4*100=50% //Rejected

Advantages of Apriori algorithm

Disadvantages of Apriori algorithm

Improving the Efficiency of Apriori :

Frequent Pattern Growth Algorithm

Advantages of FP Growth Algorithm

Disadvantages of FP-Growth Algorithm

FP Growth vs Apriori algorithm

Problem Consider the following data let the minimum support be 3 .

Mining Multilevel Association Rules from Transaction Databases:

Approaches For Mining Multilevel Association Rules

3. Group-Based Minimum Support:

You might also like