0% found this document useful (0 votes)

41 views18 pages

UNIT-4 DMCT Discovering Patterns and Rules

Dmct unit 4

Uploaded by

Suryadev Singh Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views18 pages

UNIT-4 DMCT Discovering Patterns and Rules

Dmct unit 4

Uploaded by

Suryadev Singh Rathore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Frequent pattern mining

The concept of frequent pattern mining in data mining, which is the process of identifying recurring
relationships or patterns in a set of data. Imagine you work at a store and want to know what
products customers tend to buy together. For example, if a customer buys a PC and then a digital
camera, they might frequently also buy a memory card.

In this scenario:

Frequent patterns are combinations of items, sequences, or structures that appear repeatedly in the
data. For instance, a common purchase pattern might be "PC → digital camera → memory card."

Frequent itemsets are groups of items that appear together often, like "milk and bread" in a grocery
store.

Frequent sequential patterns refer to items bought in a specific order over time, like a sequence of
"PC, then digital camera, then memory card."

Frequent structured patterns involve more complex relationships, like parts of a larger structure or
network.

These patterns are useful for making recommendations, identifying trends, and improving marketing
strategies.

TWO KEY POINTS

1. Market Basket Analysis, a common technique in data mining--The idea is to analyze

shopping patterns by finding items that people frequently buy together.

For example, imagine customers at a grocery store. When one customer buys milk, bread, and
cereal, another buys milk, sugar, and bread, and a third buys milk, bread, and butter. Market basket
analysis would look at these "shopping baskets" and find patterns, like noticing that people who buy
milk often also buy bread.

Businesses use this information to make decisions about things like product placement, promotions,
or recommendations. If a store knows that milk and bread are often bought together, it might place
them closer to each other or offer a discount on one when the other is purchased.
In short, market basket analysis helps stores understand and predict customer buying behavior by
identifying items that are often bought together.

Example of market basket analysis - how Amazon and other e-commerce websites make product
recommendations. When a customer views or adds an item to their shopping cart, Amazon shows a
"Frequently Bought Together" or "Customers Who Bought This Also Bought" section. This feature
uses market basket analysis to suggest related products based on patterns observed in other
customers' purchases.

For instance, if many customers who buy a laptop also purchase a laptop bag and a wireless mouse,
Amazon will suggest these items to anyone buying a laptop. This recommendation system is
designed to increase the chance of customers buying additional products, ultimately boosting sales.

Another example is in grocery stores. Market basket analysis might reveal that customers who buy
diapers often buy baby wipes and maybe even a soft drink. This insight could lead the store to place
these items closer together, run joint promotions, or bundle them as a "baby essentials" pack,
making it easier and more appealing for shoppers to buy multiple related items in one trip.

This page explains association rules in data mining, which help us find relationships between items
that frequently appear together in a set of transactions, like shopping carts.

Here are the key ideas in simple terms:

Itemset: This is a group of items that appear together in a transaction. For example, {milk, bread,
butter} could be an itemset if these items are often bought together.

2.Association Rule: This is a rule that shows a relationship between two itemsets. For
example, a rule might be “If a customer buys milk, they are also likely to buy bread.” We write this as
milk → bread, meaning milk "implies" bread.

Support: Support measures how often an item or a set of items appears in the entire set of
transactions. For example, if 20% of all transactions include both milk and bread, then the support of
{milk, bread} is 20%. This tells us how common an itemset is.

Confidence: Confidence measures how often the rule is true. For example, if 80% of the
transactions that include milk also include bread, the confidence of the rule milk → bread is 80%.
This tells us how reliable the rule is.
Minimum Support and Confidence: To decide which rules are useful, we often set
minimum thresholds for support and confidence. Only rules that meet or exceed these thresholds
are considered strong or interesting enough to use.

In summary, association rules help businesses see which products are often bought together and
how strong those relationships are, which they can use for recommendations, promotions, and
improving customer experience.
Efficient and Scalable Frequent Itemset Mining

Efficient and Scalable Frequent Itemset Mining refers to methods used to quickly and effectively find
sets of items that frequently appear together in large datasets. One of the most well-known
algorithms for this is the Apriori algorithm.

1. What is Frequent Itemset Mining?

Frequent itemset mining is the process of discovering sets of items that appear together often in a
dataset. This technique is commonly used in market basket analysis, where we try to find
combinations of products that are frequently bought together. For example, in a grocery store, we
might want to know if customers who buy bread are also likely to buy milk.

1.Apriori Algorithm:
The Apriori algorithm is a popular method for finding these frequent itemsets.
It was one of the first algorithms developed for this purpose and is relatively
straightforward to understand. Here’s how it works in simple terms:

Apriori Principle: The core idea behind Apriori is that if an itemset (a set of
items) is frequent, then all smaller subsets of that itemset must also be
frequent. For example, if the combination of {bread, milk, eggs} is frequent in
transactions, then {bread, milk} and {milk, eggs} should also be frequent. This
principle helps reduce the number of combinations we need to check.

How Apriori Works:

Step 1 - Find Frequent 1-Itemsets: The algorithm first scans the
database to find individual items that are frequent (appear in
transactions at least a certain number of times). Let’s say we set a
threshold of 20% for an item to be considered frequent. If bread
appears in 25% of transactions, it qualifies as a frequent item.
Step 2 - Generate Candidate 2-Itemsets: The algorithm then pairs
these frequent 1-itemsets to create candidate 2-itemsets (sets of
two items). It checks which pairs meet the frequency threshold.
Step 3 - Filter 2-Itemsets: Any 2-itemset that meets the frequency
threshold is kept, while those that don’t are discarded.
Step 4 - Generate Candidate 3-Itemsets: The algorithm continues to
combine items to create larger itemsets (e.g., 3-itemsets, 4-itemsets)
and filters them by frequency, stopping when no more frequent
itemsets can be generated.
This process of building up larger sets from smaller ones and
discarding those that don't meet the threshold is what makes the
Apriori algorithm efficient.

Why is Apriori Efficient and Scalable?

The Apriori algorithm is efficient because it reduces the number of
itemsets it needs to check. By using the Apriori principle, it doesn’t
have to look at every possible combination of items in the data
(which would be computationally expensive). Instead, it only focuses
on combinations that could potentially be frequent based on the
results of previous steps.

Additionally, Apriori can handle large datasets, making it scalable.

However, for very large datasets, there are more advanced
algorithms (like FP-Growth) that can be even more efficient.

Other Real-Life Applications of Apriori

E-commerce Recommendations: Online stores like Amazon use
frequent itemset mining to suggest "Frequently Bought Together"
items. For example, if customers often buy a phone case with a new
smartphone, the site may recommend the case when someone views
the phone.
Fraud Detection in Banking: Banks use frequent itemset mining to
detect common patterns in fraudulent transactions. If certain
transaction sequences are often linked with fraud, the bank can set
up alerts for similar patterns.
Healthcare Analysis: Hospitals can analyze frequent itemsets of
symptoms or medications that are often prescribed together. This
helps them understand common treatment patterns and could aid in
diagnosis.
Summary
The Apriori algorithm is an efficient method for finding frequently
occurring item combinations in large datasets by building up from
individual items to larger sets. Its ability to reduce the number of
checks needed makes it practical for large-scale applications, helping
businesses like retail and e-commerce optimize their operations and
make personalized recommendation
2. FP-Growth Algorithm
The FP-Growth algorithm (Frequent Pattern Growth) is an efficient algorithm for finding frequent
itemsets in large datasets. It’s considered a faster and more efficient alternative to the Apriori
algorithm because it doesn’t require multiple scans of the database or generation of many candidate
itemsets. Let’s go through how it works and a real-life example to make it easier to understand.

1. What is the FP-Growth Algorithm?

FP-Growth, short for Frequent Pattern Growth, is used in frequent itemset mining to find patterns or
sets of items that appear frequently together. Instead of generating all possible item combinations
(like Apriori does), FP-Growth uses a more compact data structure called an FP-tree (Frequent
Pattern Tree) to store the database, which allows it to find frequent itemsets more efficiently.

How Does the FP-Growth Algorithm Work?

Build the FP-Tree:

The algorithm first scans the database to find the frequency of each item (called support).

It discards items that don’t meet the minimum support threshold (a set percentage that determines
whether an item is “frequent”).

Next, it arranges the remaining frequent items in a specific order based on their frequency.

The algorithm then builds an FP-tree, which is a compressed representation of the transactions, by
grouping similar items and sharing branches where possible.

Extract Frequent Patterns:

Once the FP-tree is built, the algorithm extracts frequent itemsets from it.

It starts from the bottom of the tree and looks for patterns by identifying paths that share common
items.

Using these paths, it generates frequent itemsets without needing to scan the entire database
multiple times.

By using this method, FP-Growth avoids the time-consuming process of generating candidate
itemsets like Apriori, making it faster and more suitable for large datasets.
Example: Retail Market Basket Analysis

Let’s look at how a grocery store might use FP-Growth for market basket analysis:

Scenario: A large grocery store wants to identify frequent item combinations in their transactions to
improve product placement and create targeted promotions. For example, they want to know if
items like "bread," "milk," and "eggs" are often purchased together.

Using FP-Growth:

Step 1 - Build the FP-Tree: The store’s transaction data is fed into the FP-Growth algorithm, which
first counts how often each item appears (e.g., bread appears in 50% of transactions, milk in 40%,
eggs in 30%). Items that don’t meet a certain threshold (e.g., 10%) are ignored.

The frequent items are then arranged by frequency, and an FP-tree is created. For example, if
"bread" and "milk" are frequently bought together, they might share a path in the tree, representing
this combination in a compressed way.

Step 2 - Extract Patterns: The algorithm then extracts patterns from the FP-tree. It finds that "bread"
and "milk" are frequently bought together, as are "milk" and "eggs." It might also find a 3-itemset
like {bread, milk, eggs} if customers often buy all three items together.

Results and Actions:

Product Placement: The store could place bread, milk, and eggs closer together to encourage
customers to pick up all three.

Bundling and Promotions: The store could create a "breakfast bundle" promotion where customers
get a discount if they buy bread, milk, and eggs together.

Personalized Offers: The store could send coupons for eggs to customers who frequently buy bread
and milk, increasing the likelihood that they’ll buy all three items on their next visit.

Why is FP-Growth Better Than Apriori?

FP-Growth is often faster and more efficient than Apriori because:

It avoids generating candidate itemsets. Instead of trying every possible combination, FP-Growth
uses the FP-tree to directly find frequent itemsets.

It reduces the number of database scans. The FP-tree structure allows it to work with a compressed
version of the data, whereas Apriori requires multiple scans of the entire database, which is time-
consuming for large datasets.

Other Real-Life Applications of FP-Growth

E-commerce Recommendations: Online stores can use FP-Growth to find combinations of items that
are frequently bought together, like a laptop with a laptop bag and mouse. The site can then
recommend these items as a bundle to customers browsing laptops.

Healthcare and Diagnosis Patterns: Hospitals can use FP-Growth to identify common combinations
of symptoms or treatments. For instance, if certain symptoms often occur together in patients with
a specific condition, doctors can be alerted to consider that diagnosis when those symptoms are
present.
Telecommunications: Telecom companies can use FP-Growth to identify frequently used service
packages. If customers often purchase a specific combination of internet, mobile, and streaming
services, the company can create bundled packages for these services.

Summary

The FP-Growth algorithm is a fast, efficient method for finding frequently occurring patterns in large
datasets by using an FP-tree structure to compress data and avoid unnecessary calculations. It’s
particularly useful in areas like retail, healthcare, and e-commerce, where understanding common
item combinations helps improve sales, personalize recommendations, and make better
business decisions.

Mining frequent itemsets using the vertical data format in data mining

In data mining, we analyze transactions (like shopping carts) to find frequent itemsets (groups of
items that are often bought together).

Horizontal format: Data is stored where each row represents a transaction, and the items in the
transaction are listed together (e.g., Transaction 1: {milk, bread, butter}).

Vertical format: Data is stored where each item is associated with a list of transaction IDs (TIDs) that
contain it. For example:

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Butter: {T1, T4}

The vertical format is useful because it directly tells us where an item appears and makes it easier to
intersect transaction IDs to find frequent itemsets.

Real-Life Example

Imagine a grocery store's data:

In the horizontal format:

T1: {milk, bread, butter}

T2: {bread, eggs}

T3: {milk, eggs}

T4: {milk, bread, butter, eggs}

In the vertical format:

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Butter: {T1, T4}

Eggs: {T2, T3, T4}

Using the vertical format, you can quickly find intersections:

To see how often {milk, bread} appears, intersect their TID lists:

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Intersection: {T1, T4} → {milk, bread} appears in 2 transactions.

Why Use Vertical Format?

It’s efficient for mining frequent patterns because finding intersections is faster than scanning all
rows repeatedly (as in horizontal format).

Vertical format simplifies the process when working with algorithms like Eclat, which focuses on TID
intersections.

In real life, this approach helps businesses quickly analyze customer transactions and find
combinations of products that sell well together, enabling better product placement
or bundle offers.

Mining Closed and Max Patterns in frequent itemset mining. Let me explain these concepts in simple
terms and provide a real-life example.

Key Terms:

Frequent Itemset:

A set of items (like products) that appear together in transactions with enough frequency, meeting a
minimum support threshold.

Closed Frequent Itemset:

A frequent itemset is closed if there is no superset (larger set containing it) that has the same
support count.

It’s a way to reduce redundancy while preserving all the necessary information about frequent
patterns.
Max Frequent Itemset:

A frequent itemset is maximal if it has no frequent superset (no larger frequent itemset contains it).

Max patterns are the most compact representation but may lose some details about subsets.

Why Use Closed and Max Patterns?

When the number of frequent itemsets grows very large (e.g., when there are many items), it can
become overwhelming and computationally expensive to analyze all frequent patterns. By focusing
on closed or max patterns, we can reduce the number of patterns without losing too much critical
information.

Real-Life Example:

Scenario: Analyzing shopping behavior in a grocery store.

Transactions:

T1: {milk, bread, butter}

T2: {milk, bread}

T3: {bread, butter}

T4: {milk, bread, butter}

Minimum Support = 2.

Frequent Itemsets:

{milk}: Appears in 3 transactions.

{bread}: Appears in 4 transactions.

{butter}: Appears in 3 transactions.

{milk, bread}: Appears in 3 transactions.

{bread, butter}: Appears in 3 transactions.

{milk, bread, butter}: Appears in 2 transactions.

Closed Frequent Itemsets:

{milk, bread, butter}: Closed because no superset has the same support (2).

{milk, bread}: Closed because {milk, bread, butter} has lower support (2 vs. 3).
{bread, butter}: Closed for the same reason as above.

Max Frequent Itemset:

{milk, bread, butter}: Maximal because no larger frequent itemsets exist.

Application:

Imagine a store owner analyzing buying patterns. They don't need all frequent itemsets—just the
closed ones to see meaningful groupings (e.g., "milk and bread are bought together frequently, but
adding butter doesn't happen in all such cases"). If they only want the broadest patterns, they can
use the max itemsets (e.g., "the largest group of items people often buy together is milk, bread, and
butter").

This helps in reducing data complexity while still allowing for decisions like bundling
items for discounts.

Key Terms
Frequent Itemset: A group of items that occur together in transactions with a frequency above a
minimum threshold.

Closed Frequent Itemset: A frequent itemset where no proper superset has the same support
(occurrence count).

Pruning: Techniques to eliminate unnecessary itemsets, speeding up the mining process.

Strategies Explained
1. Item Merging

If every transaction containing a frequent itemset X also contains an itemset Y, and the union of X
and Y forms a closed frequent itemset, then:

There’s no need to search for itemsets that include X but not Y.

Example:

Consider the transactions:

T1: {1, 2, 11}

T2: {1, 2, 11, 13}

Frequent itemsets include {1, 2} and {1, 2, 11}:

{1, 2, 11} is a closed itemset because adding any additional items would reduce its support.

No need to consider subsets like {1, 2} alone without {11}, as they are already part of {1, 2, 11}.
2. Sub-Itemset Pruning
If a frequent itemset X is a subset of a closed frequent itemset Y, and both have the same support,
then:

All descendants (larger itemsets formed from X) can be ignored.

Example:

Transactions:

T1: {a1, a2, ..., a100}

T2: {a1, a2, ..., a50}

Minimum support = 2.

Frequent itemsets:

{a1}: Appears in both transactions (support = 2).

{a1, a2}: Appears in both transactions (support = 2).

Pruning:

{a2} is a proper subset of {a1, a2} with the same support (2). So, {a2} and its descendants do not
need further exploration.

3. Item Skipping

During depth-first search, if an item’s support is equal across different header tables, it can be
pruned early.

Example:

Header Table:

a1: Support = 2

a2: Support = 2

If both a1 and a2 have the same support at different levels, you can skip a2 after analyzing a1.

4. Superset and Subset Checking

To confirm if a frequent itemset is closed, check:

Superset Checking: If any existing closed itemset with the same support includes the current
itemset.

Subset Checking: If the current itemset is part of any already-found closed itemset.
Example:

Found itemsets:

{a, b} (support = 3)

{a, b, c} (support = 3)

Superset Checking: {a, b, c} is a superset of {a, b} with the same support. So, {a, b} is not closed.

Simplified Real-Life Analogy

Imagine you're grouping customers by shared product purchases:

Item Merging: If everyone buying "milk and bread" also buys "butter," focus on the group "milk,
bread, and butter."

Sub-Itemset Pruning: If the group "milk and bread" behaves the same as "milk, bread, and butter,"
there's no need to analyze "milk and bread" separately.

Item Skipping: If "milk" and "bread" show the same trends, skip redundant checks.

Superset Checking: Ensure you're only tracking unique complete combinations, not overlapping
subsets.

These optimizations save time by eliminating unnecessary checks while preserving the
important patterns.
Advanced Pattern Mining: Pattern Mining in Multilevel,
Multidimensional space
Pattern mining in multilevel, multidimensional space, which refers to
analyzing data with various levels of abstraction (general to specific)
and across multiple attributes or dimensions (like age, location, or
product type).

1. What is Multilevel Pattern Mining?

In many real-world scenarios, data is organized in a hierarchical
manner.
Multilevel mining allows us to discover patterns at different levels of
detail.
High-level patterns are broader and more general (e.g., "Electronics
are popular").
Low-level patterns are more detailed and specific (e.g., "Dell laptops
are frequently purchased").
Example
Imagine a retail store selling electronics. The data about purchased
items is arranged in a hierarchical structure (concept hierarchy), as
shown below:

Hierarchy Levels (Figure 7.2 explained):

Level 0 (Root Level): General category for all products, e.g.,

Electronics.
Level 1: Broad categories like Computers, Software, and Printers.
Level 2: More specific subcategories like Laptops, Desktops, Office
Software.
Level 3: Brand-specific items, e.g., Dell Desktop Computers,
Microsoft Office Software.
Level 4: Most detailed level, including raw data (e.g., specific product
IDs).
Why Multilevel Mining?
General patterns at higher levels (e.g., "Electronics are popular")
might seem obvious.
Specific patterns at lower levels (e.g., "Dell laptops with Microsoft
Office are often bought together") can uncover hidden insights.
By analyzing patterns across levels, businesses can make better
decisions, like customizing promotions.

What is Multidimensional Pattern Mining?

This looks at multiple dimensions of data, such as:
What a customer buys (products).
Who the customer is (age, gender, etc.).
Where the purchase happened (location).
Real-Life Example:
Consider an electronics store analyzing sales data:

At a high level: "Electronics are frequently purchased by all

customers."
At a medium level: "Laptops and office software are popular among
working professionals."
At a detailed level: "Customers aged 25-35 often buy Dell laptops
bundled with Microsoft Office from New York stores."
By mining patterns across these levels and dimensions, businesses
can:
Develop tailored marketing strategies.
Improve inventory management.
Identify customer preferences at different levels of detail.
Key Takeaway
Multilevel and multidimensional pattern mining helps uncover
general trends as well as specific insights from complex datasets by
analyzing different levels of abstraction and multiple dimensions.
This flexibility is particularly useful in industries like retail,
finance, and healthcare.

DA Unit 4
100% (1)
DA Unit 4
125 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Frequent Itemsets & Market-Basket Analysis
No ratings yet
Frequent Itemsets & Market-Basket Analysis
31 pages
Freq Pattern9
No ratings yet
Freq Pattern9
20 pages
Data Warehouse and Data Mining - Unit 5
No ratings yet
Data Warehouse and Data Mining - Unit 5
30 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
UNIT 3 Mining Frequent Pattern
No ratings yet
UNIT 3 Mining Frequent Pattern
11 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Unit 5 DWDM - 2
No ratings yet
Unit 5 DWDM - 2
50 pages
Association Rule
No ratings yet
Association Rule
20 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Module 3 Mining Frequent Patterns and Associations
No ratings yet
Module 3 Mining Frequent Patterns and Associations
37 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
17 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Unit - III
No ratings yet
Unit - III
38 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
ARM Merged
No ratings yet
ARM Merged
11 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
36 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Frequent Itemset and Association Rule Mining
No ratings yet
Frequent Itemset and Association Rule Mining
34 pages
Market Basket Analysis with Apriori & FP Growth
No ratings yet
Market Basket Analysis with Apriori & FP Growth
7 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Unit II
No ratings yet
Unit II
22 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
12 pages
Ambarella CV2S66 Preliminary Datasheet
No ratings yet
Ambarella CV2S66 Preliminary Datasheet
88 pages
Nygren Et Al. 2019 Lifelong Learning Formal Non Formal and Informal Learning in The Context of The
No ratings yet
Nygren Et Al. 2019 Lifelong Learning Formal Non Formal and Informal Learning in The Context of The
12 pages
Mathematical Ship Modeling For Control Applications Perez Blanke
No ratings yet
Mathematical Ship Modeling For Control Applications Perez Blanke
23 pages
Process Costing Weighted-Average Worksheet
No ratings yet
Process Costing Weighted-Average Worksheet
5 pages
Advance CSS Properties: Prepared By: Sonia Narang
No ratings yet
Advance CSS Properties: Prepared By: Sonia Narang
29 pages
MSDS 6. 33kv, 33 KV, PT
No ratings yet
MSDS 6. 33kv, 33 KV, PT
2 pages
Marine Instrument User Guide
No ratings yet
Marine Instrument User Guide
38 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Greengrass v2 Developer Guide
No ratings yet
Greengrass v2 Developer Guide
947 pages
All Inequalities Questions From CAT Previous Papers PDF
No ratings yet
All Inequalities Questions From CAT Previous Papers PDF
24 pages
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
No ratings yet
Cariology: Presented By-Dr. Neha Sultana Post Graduate Student Department of Conservative Dentistry and Endodontics
93 pages
Selecting The Right Thermodynamic Models For Process Simulation PDF
No ratings yet
Selecting The Right Thermodynamic Models For Process Simulation PDF
5 pages
CS6402 - Daa 16marks With Answers
No ratings yet
CS6402 - Daa 16marks With Answers
20 pages
Pps Question Bank
No ratings yet
Pps Question Bank
2 pages
Datasheet STVC101WT-01
No ratings yet
Datasheet STVC101WT-01
36 pages
Harmonic Form (Worksheet)
No ratings yet
Harmonic Form (Worksheet)
2 pages
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
No ratings yet
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
13 pages
Business Intelligence Architectures
No ratings yet
Business Intelligence Architectures
14 pages
Tesla Patent 685957
100% (2)
Tesla Patent 685957
5 pages
(25434292 - Power Electronics and Drives) Single-Phase Line Start Permanent Magnet Synchronous Motor With Skewed Stator
No ratings yet
(25434292 - Power Electronics and Drives) Single-Phase Line Start Permanent Magnet Synchronous Motor With Skewed Stator
8 pages
Marine Fuel Oil Insights
No ratings yet
Marine Fuel Oil Insights
118 pages
Python Course Syllabus
No ratings yet
Python Course Syllabus
5 pages
Statistics Syllabus
No ratings yet
Statistics Syllabus
3 pages
Power Plant Steam Turbine Issues
100% (1)
Power Plant Steam Turbine Issues
10 pages
A Review of Engine Downsizing and Its Effects
No ratings yet
A Review of Engine Downsizing and Its Effects
6 pages
L-3 CSS From Basic To Advance
No ratings yet
L-3 CSS From Basic To Advance
18 pages
Com 101
No ratings yet
Com 101
76 pages
Required Returns and The Cost of Capital Required Returns and The Cost of Capital
No ratings yet
Required Returns and The Cost of Capital Required Returns and The Cost of Capital
49 pages
Doleh Sufian ch10 p23 Build A Model PDF Free
No ratings yet
Doleh Sufian ch10 p23 Build A Model PDF Free
6 pages
ELE220A - Electrical Drives Exp.9: Speed Control of BLDC Motor
No ratings yet
ELE220A - Electrical Drives Exp.9: Speed Control of BLDC Motor
19 pages

UNIT-4 DMCT Discovering Patterns and Rules

Uploaded by

UNIT-4 DMCT Discovering Patterns and Rules

Uploaded by

Frequent pattern mining

TWO KEY POINTS

1. Market Basket Analysis, a common technique in data mining--The idea is to analyze

Here are the key ideas in simple terms:

1. What is Frequent Itemset Mining?

How Apriori Works:

Why is Apriori Efficient and Scalable?

Additionally, Apriori can handle large datasets, making it scalable.

Other Real-Life Applications of Apriori

1. What is the FP-Growth Algorithm?

How Does the FP-Growth Algorithm Work?

Extract Frequent Patterns:

Results and Actions:

Why is FP-Growth Better Than Apriori?

Other Real-Life Applications of FP-Growth

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Butter: {T1, T4}

Imagine a grocery store's data:

In the horizontal format:

T1: {milk, bread, butter}

T2: {bread, eggs}

T4: {milk, bread, butter, eggs}

In the vertical format:

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Butter: {T1, T4}

Eggs: {T2, T3, T4}

Using the vertical format, you can quickly find intersections:

Milk: {T1, T3, T4}

Bread: {T1, T2, T4}

Intersection: {T1, T4} → {milk, bread} appears in 2 transactions.

Why Use Vertical Format?

Closed Frequent Itemset:

Why Use Closed and Max Patterns?

Scenario: Analyzing shopping behavior in a grocery store.

T1: {milk, bread, butter}

T2: {milk, bread}

T3: {bread, butter}

T4: {milk, bread, butter}

{milk}: Appears in 3 transactions.

{bread}: Appears in 4 transactions.

{butter}: Appears in 3 transactions.

{milk, bread}: Appears in 3 transactions.

{bread, butter}: Appears in 3 transactions.

{milk, bread, butter}: Appears in 2 transactions.

Closed Frequent Itemsets:

Max Frequent Itemset:

{milk, bread, butter}: Maximal because no larger frequent itemsets exist.

Pruning: Techniques to eliminate unnecessary itemsets, speeding up the mining process.

There’s no need to search for itemsets that include X but not Y.

Consider the transactions:

T1: {1, 2, 11}

T2: {1, 2, 11, 13}

Frequent itemsets include {1, 2} and {1, 2, 11}:

All descendants (larger itemsets formed from X) can be ignored.

T1: {a1, a2, ..., a100}

T2: {a1, a2, ..., a50}

{a1}: Appears in both transactions (support = 2).

{a1, a2}: Appears in both transactions (support = 2).

4. Superset and Subset Checking

To confirm if a frequent itemset is closed, check:

Simplified Real-Life Analogy

Imagine you're grouping customers by shared product purchases:

1. What is Multilevel Pattern Mining?

Hierarchy Levels (Figure 7.2 explained):

Level 0 (Root Level): General category for all products, e.g.,

What is Multidimensional Pattern Mining?

At a high level: "Electronics are frequently purchased by all

You might also like