KEMBAR78
L8 - Support Count Using Hash Tree | PDF | Algorithms And Data Structures | Algorithms
0% found this document useful (0 votes)
840 views14 pages

L8 - Support Count Using Hash Tree

This document discusses generating a hash tree for candidate item sets in association rule mining. It explains how a hash tree can be used to store candidate item sets to reduce the number of comparisons needed when scanning a transaction database. An example hash tree is shown for 15 candidate item sets of length 3. It also describes how subset operations can be performed by matching a transaction against candidate item sets in the relevant branches of the hash tree. Finally, it discusses some factors that can affect the performance of association rule mining algorithms, such as minimum support threshold, data dimensionality, database size, and average transaction width.

Uploaded by

Veena Tella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
840 views14 pages

L8 - Support Count Using Hash Tree

This document discusses generating a hash tree for candidate item sets in association rule mining. It explains how a hash tree can be used to store candidate item sets to reduce the number of comparisons needed when scanning a transaction database. An example hash tree is shown for 15 candidate item sets of length 3. It also describes how subset operations can be performed by matching a transaction against candidate item sets in the relevant branches of the hash tree. Finally, it discusses some factors that can affect the performance of association rule mining algorithms, such as minimum support threshold, data dimensionality, database size, and average transaction width.

Uploaded by

Veena Tella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

BITS Pilani

BITS Pilani Dr.Aruna Malapati


Asst Professor
Hyderabad Campus Department of CSIS
BITS Pilani
Hyderabad Campus

Association Rule Mining


Today’s Learning objective

• Generate hash tree for K-candidate item sets

BITS Pilani, Hyderabad Campus


Reducing Number of
Comparisons
• Candidate counting:
– Scan the database of transactions to determine the support of
each candidate itemset
– To reduce the number of comparisons, store the candidates in
a hash structure
• Instead of matching each transaction against every
candidate, match it against candidates contained in the
hashed buckets
Transactions Hash Structure
Ck
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke k
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Buckets
BITS Pilani, Hyderabad Campus
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5},
{3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
Hash function
• Hash function
3,6,9
1,4,7
2,5,8

•Max leaf size: max number of itemsets stored in a leaf node (if number of
candidate itemsets exceeds max leaf size, split the node)

BITS Pilani, Hyderabad Campus


Association Rule
Discovery: Hash tree
Hash Function Candidate Hash Tree {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5
{1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5
{6 8 9}, {3 6 7}, {3 6 8}
1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458

BITS Pilani, Hyderabad Campus


Association Rule
Discovery: Hash tree
Hash Function Candidate Hash Tree

1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458

BITS Pilani, Hyderabad Campus


Association Rule
Discovery: Hash tree
Hash Function Candidate Hash Tree

1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458

BITS Pilani, Hyderabad Campus


Subset Operation
Given a transaction t, what are Transaction, t
the possible subsets of size 3?
1 2 3 5 6

Level 1
1 2 3 5 6 2 3 5 6 3 5 6

Level 2

12 3 5 6 13 5 6 15 6 23 5 6 25 6 35 6

123
135 235
125 156 256 356
136 236
126

Level 3 Subsets of 3 items


BITS Pilani, Hyderabad Campus
Subset Operation Using
Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458

BITS Pilani, Hyderabad Campus


Subset Operation Using
Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

BITS Pilani, Hyderabad Campus


Subset Operation Using
Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
BITS Pilani, Hyderabad Campus
Factors Affecting
Performance
• Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of
frequent itemsets
• Dimensionality (number of items) of the data set
– more space is needed to store support count of each item
– if number of frequent items also increases, both computation and
I/O costs may also increase
• Size of database
– since Apriori makes multiple passes, run time of algorithm may
increase with number of transactions
• Average transaction width
– number of subsets in a transaction increases with its width
– this may increase max length of frequent itemsets and traversals
of hash tree

BITS Pilani, Hyderabad Campus


Take home message

• Association rule mining is traditionally called Market Basket

analysis.

• Support and confidence are used to find interesting rules.

• Generating Association Rules is a combinatorial problem

and hence need heretics.

BITS Pilani, Hyderabad Campus

You might also like