0% found this document useful (0 votes)

50 views7 pages

Algorithms For Association Rule Mining - A General Survey and Comparison

The document provides a general survey and comparison of algorithms for association rule mining. It begins with an introduction to association rule mining and outlines the paper. The paper then describes the basic strategies for association rule mining and uses this framework to systematically present today's most common approaches, highlighting both their common principles and differences. It completes the overview with a comparison of the algorithms' efficiency based on theoretical considerations and runtime experiments.

Uploaded by

jkl316

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views7 pages

Algorithms For Association Rule Mining - A General Survey and Comparison

Uploaded by

jkl316

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Algorithms for Association Rule Mining - A General

Survey and Comparison

Jochen Hipp Ulrich G0ntzer Gholamreza

Wilhelm Schickard-lnstitute Wilhelm Schickard-lnstitute Nakhaeizadeh
University of TObingen Universityof T0bingen DaimlerChrysler AG
72076 TObingen,Germany 72076 TObingen, Germany Research & Technology
FT3/AD
jochen.hipp @infor- guentzer@ infor- 89081 UIm, Germany
matik.uni-tuebingen.de matik.uni-tuebingen.de
rheza.nakhaeizadeh @
daimlerchrysler.com

ABSTRACT W h e n mining association rules there are mainly two prob-

lems to deal with: First of all there is the algorithmic com-
Today there are several efficient algorithms t h a t cope with
plexity. The number of rules grows exponentially with the
the popular a n d computationally expensive task of associ-
number of items. Fortunately today's algorithms are able to
ation rule mining. Actually, these algorithms are more or
efficiently prune this immense search space based on mini-
less described on their own. In this paper we explain t h e
real thresholds for quality measures on the rules. Second,
f u n d a m e n t a l s of association rule mining a n d moreover de-
interesting rules must be picked from the set of generated
rive a general framework. Based on this we describe to- rules. This might be quite costly because the generated rule
day's approaches in context by pointing out c o m m o n aspects
sets normally are quite large - e.g. more than 100, 000 rules
a n d differences. After t h a t we thoroughly investigate their are n o t u n c o m m o n - a n d in c o n t r a s t t h e percentage of use-
s t r e n g t h s a n d weaknesses a n d carry out several r u n t i m e ex-
ful rules is typically only a very small fraction. T h e work
periments. It t u r n s out t h a t t h e r u n t i m e behavior of t h e
concerning t h e second problem mainly focuses on support-
algorithms is much more similar as to be expected.
ing t h e user when browsing the rule set, e.g. [14] a n d t h e
development of further useful quality measures on t h e rules,
1. INTRODUCTION e.g. [7; 6; 22].

1.1 Association Rules 1.2 Outline of the Paper

Since its introduction in 1993 [1] the task of association rule In this p a p e r we deal w i t h t h e algorithmic aspects of associ-
mining has received a great deal of attention. Today t h e ation rule mining. In fact, a b r o a d variety of efficient algo-
mining of such rules is still one of the most popular p a t t e r n - r i t h m s to mine association rules have b e e n developed during
discovery m e t h o d s in KDD. the last years. These approaches are more or less described
separately in t h e corresponding literature. To overcome this
In brief, a n association rule is an expression X =~ Y, where
X a n d Y are sets of items. T h e meaning of such rules is situation we give a general survey of t h e basic ideas b e h i n d
association rule mining. In Section 2 we identify the ba-
quite intuitive: Given a d a t a b a s e ~ of transactions - where
sic strategies a n d describe t h e m in detail. T h e resulting
each t r a n s a c t i o n T E D is a set of items - , X :=~ Y expresses
framework is used in Section 3 to systematize a n d present
t h a t whenever a t r a n s a c t i o n T contains X t h a n T proba-
t o d a y ' s most c o m m o n approaches in context. F u r t h e r m o r e
bly contains Y also. T h e probability or rule confidence is
we show t h e c o m m o n principles a n d differences between t h e
defined as t h e percentage of transactions containing Y in
addition to X with regard to t h e overall n u m b e r of trans- algorithms. Finally in Section 4 we complete our overview
actions containing X . T h a t is, the rule confidence can be with a comparison of t h e algorithms concerning efficiency.
This comparison is based on theoretic considerations a n d
u n d e r s t o o d as t h e conditional probability p(Y C TIX C T).
concrete r u n t i m e experiments. In Section 5 we conclude
T h e idea of mining association rules originates from t h e
with a short s u m m a r y of our results.
analysis of market-basket d a t a where rules like "A customer
who buys p r o d u c t s xl a n d x2 will also b u y p r o d u c t y with
probability c%." axe found. Their direct applicability to 1.3 Related Work
business problems together w i t h their inherent tmderstand- In our work we mainly restrict ourselves to w h a t we call
ability - even for n o n d a t a mining experts - m a d e associ- t h e "classic association rule problem". T h a t is, t h e min-

ation rules a popular mining method. Moreover it became ing of all rules existing in a d a t a b a s e ~ w i t h respect to
clear t h a t association rules are not restricted to dependency minimal thresholds on certain quality measures, l ) in this
analysis in t h e context of retail applications, b u t are suc- case consists of m a r k e t - b a s k e t like data, t h a t is, t r a n s a c t i o n s
cessfully applicable to a wide range of business problems. containing 1 0 - 20 i t e m s in t h e average o u t of a t o t a l set of
1,000 - 100, 000 items.
A l t h o u g h the "classic problem" is still topic of further re-
search, during recent years m a n y algorithms for special-

S I G K D D Explorations. Volume 2, Issue 1 - page 58

ized tasks have been developed: First of all, there axe the early growing number of items still implies an exponential
approaches that enhance the association rules itself. E.g. growing number of itemsets that need to be considered.
quantitative association rules, e.g. [24], generalized associ- For the special case I = {1, 2, 3, 4} we visualize the search
ation rules, e.g. [23; 12] and to some extent the work on space that forms a lattice in Figure 1, c.f. [28]. The frequent
sequential patterns, e.g. [3; 15]. Moreover there axe several
generalizations of the rule problem, e.g. [16; 27]. (}
In addition algorithms were developed that mine well de-
fined subsets of the rule set according to specified items or
quality measures etc, e.g. general constraints [17; 25], op-
timized rules [8; 20], maximal frequent itemsets [28], and {I} {2} {3} [4}
frequent closed itemsets [18; 19]. Moreover there axe algo-
rithms to mine dense databases [5]. These approaches axe
supplemented by algorithms for online mining of association
rules, e.g. [10] and incremental algorithms, e.g. [26; 4].
{I , ~
2. BASIC PRINCIPLES
2.1 Formal Problem Description {L2,3} [ [ ,2,4} I {1,3.41 12,3,4}
Let Z = {Xl,... ,xn} be a set of distinct literals, called
items. A set X C Z with k = IX I is called a k-itemset
or simply an itemset. Let a database ~ be a multi-set of
subsets of Z. Each T E D is called a transaction. We say {1,2,3.4}
that a transaction T E :D supports an itemset X C Z if
X C T holds. An association rule is an expression X =~ Y, Figure 1: Lattice for I = {1, 2, 3, 4}
where X, Y axe itemsets and X n Y = O holds. The frac-
tion of transactions T supporting an itemset X with re- itemsets axe located in the upper part of the figure whereas
spect to database ~D is called the support of X, supp(X) = the infrequent ones axe located in the lower part. Although
[{T 6 ~D [ X C T } [ / I ~ [. The support of a rule X =:~ Y is de- we do not explicitly specify support values for each of the
fined as supp(X =~ Y) = supp(XUY). The confidence of this itemsets, we assume that the bold border separates the fre-
rule is defined as conf(X =~ Y) = supp(X U Y)/supp(X), quent from the infrequent itemsets. The existence of such
c.f. [2]. a border is independent of any particular database :D and
As mentioned before the main challenge when mining associ- minsupp. Its existence is solely guaranteed by the downward
ation rules is the immense number of rules that theoretically closure property of itemset support.
must be considered. In fact the number of rules grows expo- The basic principle of the common algorithms is to employ
nentially with [I[. Since it is neither practical nor desirable this border to efficiently prune the search space. As soon
to mine such a huge set of rules, the rule sets axe typically as the border is found, we axe able to restrict ourselves on
restricted by minimal thresholds for the quality measures determining the support values of the itemsets above the
support and confidence, rninsupp and rninconf respectively. border and to ignore the itemsets below.
This restriction allows us to split the problem into two sep- Let map: I ~ { 1 , . . . , l I I } be a mapping that maps all
axate parts [2]: An itemset X is frequent if supp(X) > min- items x E I one-to-one onto natural numbers. Now the
supp. Once, .T = {X C I I X frequent}, the set of all fre- items can be seen as totally ordered by the relation "<" be-
quent itemsets together with their support values is known, tween natural numbers. In addition, for X C Z let X.item :
deriving the desired association rules is straight forwaxd {1,... , IXI} ~ I : n ~ X.itemn be a mapping with X.item,~
(See [2] for minor enhancements.): For every X E .T check denoting the n-th item of the items x E X increasingly
the confidence of all rules X \ Y =~ Y, Y C X, ~ # Y ¢ X sorted by "<". The n-prefix of an itemset X with n _< IX]
and drop those that do not achieve minconf. According to is then defined by P = {X.item,n I 1 < m < n}, c.f. [12].
its definition above, it suffices to know all support values of Let the classes E ( P ) , P C I with E ( P ) = { X C I I IXI =
the subsets of X to determine the confidence of each rule. IPI + 1 and P is a prefix of X} be the nodes of a tree. Two
The knowledge about the support values of all subsets of X nodes axe connected by an edge, if all itemsets of a class E
is ensured by the downward closure property of itemset sup- can be generated by joining two itemsets of the parent class
port: All subsets of a frequent itemset must also be frequent, E ' , e.g. Figure 2.
c.f. I2]. Together with the downward closure property of itemset
With that in mind the task of association rule mining can support this implies the following: If the parent class El
be reduced to the problem of finding all itemsets that axe of a class E does not contain at least two frequent itemsets
frequent with respect to a given minimal threshold minsupp. than E must also not contain any frequent itemset. If we
The rest of this paper and most of the literature on associ- encounter such a class E ' on our way down the tree, then
ation rule mining addresses exactly this topic. we have reached the border separating the infrequent from
the frequent itemsets. We do not need to go behind this
2.2 Traversing the Search Space border so we prune E and all descendants of E from the
As explained we need to find all itemsets that satisfy min- search space.
supp. For practical applications looking at all subsets of I The latter procedure allows us to efficiently restrict the num-
is doomed to failure by the huge search space. In fact, a lin- ber of itemsets to investigate. We simply determine the

SIGKDD Explorations. Volume 2, Issue 1 - page 59

~{]} {2} {3} {4}1 between t h e approaches.

3.1 Systematization
The algorithms that we consider in this paper are system-

ii 3 atized in Figure 3. W e characterize each of the algorithms

a) by its strategy to traverse the search space and b) by its
strategy to determine the support values of the itemsets. In

Figure 2: Tree for :Z = {1, 2, 3, 4}

s u p p o r t values only of those itemsets t h a t we "visit" on

our search for the border between frequent a n d infrequent
itemsets. Finally, t h e actual strategy to search for t h e bor-
der is at our own choice. Today's c o m m o n approaches em-
ploy b o t h breadth-first search (BFS) or depth-first search Apriori Partition FP- Eclat
(DFS). W i t h BFS t h e s u p p o r t values of all (k - 1)-itemsets growth
are determined before counting t h e support values of t h e AprioriTID
k-itemsets. In contrast, DFS recursively descends following DIC
t h e tree s t r u c t u r e defined above.

2.3 Determine Itemset Supports Figure 3: Systematization of t h e Algorithms

In the following a n itemset t h a t is potentially frequent a n d
for which we decide to determine its support during lattice addition a n algorithm m a y employ specific optimizations for
traversal is called a candidate itemset or simply a candidate. further speedup.
One c o m m o n approach to determine the support value of a n
itemset is to directly count its occurrences in the database. 3.2 BFS and Counting Occurrences
For t h a t purpose a counter is set up and initialized to zero T h e most popular algorithm of this t y p e is A p r i o r i [2]
for each itemset t h a t is currently under investigation. T h e n where also t h e downward closure p r o p e r t y of itemset sup-
all transactions are scanned a n d whenever one of the candi- port was introduced. Apriori makes additional use of this
dates is recognized as a subset of a transaction, its counter property by pruning those candidates that have an infre-
is incremented. Typically subset generation a n d candidate quent subset before counting their supports. This optimiza-
lookup is integrated a n d i m p l e m e n t e d on a hashtree or a tion becomes possible because B F S ensures that the support
similar d a t a structure. In brief, not all subsets of each trans- values of all subsets of a candidate are known in advance.
action are generated b u t only those t h a t are contained in t h e Apriori counts all candidates of a cardinality k together in
candidates or t h a t have a prefix in c o m m o n w i t h a t least one one scan over the database. The criticalpart is looking up
of t h e candidates, c.f. [2] for further details. the candidates in each of the transactions. For this purpose
A n o t h e r approach is to determine the s u p p o r t values of can- [2] introduces a so called hashtree structure. The items in
didates by set intersections. A rid is a unique t r a n s a c t i o n each transaction are used to descend in the hashtree. W h e n -
identifier. For a single item the tidlist is the set of identifiers ever we reach one of its leafs, we find a set of candidates
t h a t correspond to the transactions containing this item. having a c o m m o n prefix that is contained in the transaction.
Accordingly tidlists also exist for every itemset X a n d are Then these candidates are searched in the transaction that
denoted by X.tidlist. T h e tidlist of a candidate C = X U Y is has been encoded as a bitmap before. In the case of success
o b t a i n e d by C.tidlist = X.tidlist A Y.tidlist. T h e tidlists are the counter of the candidate in the tree is incremented.
sorted in ascending order to allow efficient intersections. A p r i o r i T I D [2] is an extension of the basic Apriori ap-
Note t h a t by buffering t h e tidlists of frequent candidates as proach. Instead of relying on the raw database AprioriTID
intermediate results, we remarkably speedup t h e generation internally represents each transaction by the current candi-
of t h e tidlists of t h e following candidates. Finally the actual dates it contains. With AprioriHybrid both approaches
s u p p o r t of a candidate is o b t a i n e d by determining [C.tlist[. are combined, c.f. [2]. To some extent also S E T M [13] is
an Apriori(TID)-like algorithm which is intended to be im-
3. COMMON ALGORITHMS plemented directly in SQL.
In this section we briefly describe a n d systematize the most D I C is a further variation of the Apriori-Algorithm [7]. D I C
c o m m o n algorithms. We do this by referring to t h e funda- softens the strictseparation between counting and generat-
mentals of frequent itemset generation t h a t we identified in ing candidates. Whenever a candidate reaches minsupp, that
t h e previous section. Our goal is n o t to go to m u c h into is even when this candidate has not yet "seen" all trans-
detail b u t to show t h e basic principles a n d t h e differences actions, D I C starts generating additional candidates based

S I G K D D Explorations. Volume 2, Issue 1 - page 60

on it. For t h a t purpose a prefix-tree is employed. In con- In addition in [28] algorithms t h a t mine only t h e maximal
t r a s t to t h e hashtree, each node - leaf node or inner node - frequent itemsets are introduced, e.g. M a x E c l a t . An item-
of the prefix-tree is assigned to exactly one candidate re- set X is maximal frequent if for every frequent itemset Y
spectively frequent itemset. In contrast to the usage of a X C Y ~ Y = X holds. We do not consider these algo-
hashtree t h a t means whenever we reach a node we can be r i t h m s because although it is straight forward to derive t h e
sure t h a t the itemset associated with this node is contained set of all frequent itemsets from t h e maximal frequent item-
in the transaction. Furthermore interlocking s u p p o r t deter- sets this does not hold for t h e corresponding s u p p o r t values.
m i n a t i o n and candidate generation decreases t h e n u m b e r of W i t h o u t those, we are not able to derive rule confidences a n d
database scans. therefore we c a n n o t generate association rules.

3.3 BFS and TID-List Intersections

T h e P a r t i t l o n - A l g o r i t h m [21] is a n Apriori-like algorithm 4. COMPARISON OF THE ALGORITHMS
t h a t uses set intersections to determine support values. As In this section we compare t h e algorithms a n d explain t h e
described above Apriori determines the support values of all observed differences in performance behavior.
(k - 1)-candidates before counting the k-candidates. T h e
problem is t h a t Partition of course wants to use t h e tidlists 4.1 Experiments
of the frequent (k - 1)-itemsets to generate the tidlists of To carry out performance studies we implemented t h e most
t h e k-candidates. Obviously the size of those intermediate c o m m o n algorithms to mine frequent itemsets, namely Apri-
results easily grows beyond the physical memory limitations ori, DIC, Partition, a n d Eclat, in C + + . Actually we h a d to
of c o m m o n machines. To overcome this Partition splits the leave out DIC in t h e charts for reasons explained later. In
d a t a b a s e into several chunks t h a t are t r e a t e d independently. addition we did not consider AprioriTID a n d F P - g r o w t h be-
T h e size of each chunk is chosen in such a way t h a t all in- cause these algorithms were designed to mine d a t a t h a t is
termediate tidlists fit into m a i n memory. After determining not typical for retail environments, t h a t is d a t a containing
t h e frequent itemsets for each d a t a b a s e chunk, an extra scan quite long patterns.
is necessary to ensure t h a t t h e locally frequent itemsets are The experiments were performed on a SUN UltraSPARC-II
also globally frequent. workstation clocked at 248Mhz. T h e experiments in Fig-
ures 4 - 11 were carried out on synthetic datasets from [2;
3.4 DFS and Counting Occurrences 21]. These datasets were generated with a d a t a generator [2]
Counting occurrences assumes candidate sets of a reasonable t h a t simulates t h e buying behavior of customers in retail
size. For each of those candidate sets a database scan is per- business. D a t a s e t "T10.I4.D100K" means a n average trans-
formed. E.g. Apriori t h a t relies on BFS scans the d a t a b a s e action size of 10, a n average size of the maximal potentially
once for every candidate size k. W h e n using DFS the candi- frequent itemsets of 4 a n d 100,000 generated transactions.
date sets consist only of the itemsets of one of t h e nodes of T h e n u m b e r of p a t t e r n s was set to 2,000 a n d the n u m b e r of
the tree from Section 2.2. Obviously scanning t h e d a t a b a s e items to 1,000.
for every node results in tremendous overhead. The simple In addition to the experiments from [2; 21], we restricted
combination of DFS with counting occurrences is therefore t h e maximal length of generated itemsets from 1 up to 9
of no practical relevance, c.f.[ll]. on t h e dataset "T20.I4.D100K" a t minsupp = 0.33%, c.f.
Recently in [9] a fundamentally new approach called F P - Figure 9. Figures 10 a n d 11 show the behavior of t h e al-
g r o w t h was introduced. In a preprocessing step FP-growth gorithms on real-world applications. T h e basket d a t a con-
derives a highly condensed representation of t h e t r a n s a c t i o n sists of a b o u t 70,000 customer transactions with approxi-
data, the so called FP-tree. T h e generation of t h e FP-tree is mately 60,000 different items. T h e average transaction size
done by counting occurrences a n d DFS. In contrast to for- is ~ 10.5 items. T h e car equipment d a t a contains infor-
mer DFS-approaches, F P - g r o w t h does not follow the nodes m a t i o n a b o u t 700, 000 cars with a b o u t 6,000 items. In the
of the tree from Subsection 2.2, b u t directly descends to average ~ 20 items are assigned to each ear.
s o m e part of the itemsets in t h e search space. In a sec- It is i m p o r t a n t to say t h a t all algorithms scale linearly with
ond step F P - g r o w t h uses the FP-tree to derive the s u p p o r t the d a t a b a s e size.
values of all frequent itemsets.
4.2 Counting Occurrences vs. Intersecting Sets
3.5 DFS and TID-List Intersections The basic question concerning the r u n t i m e of the algorithms
In [28] the algorithm E c l a t is introduced, t h a t combines is w h e t h e r counting occurrences or intersecting tidlists shows
DFS with tidlist intersections. W h e n using DFS it suffices b e t t e r performance results. T h e advantage of counting is
to keep the tidlists on t h e p a t h from the root down to the t h a t only candidates t h a t actually occur in the transactions
class currently investigated in memory. T h a t is, splitting cause any effort. In contrast, a n intersection m e a n s at least
t h e d a t a b a s e as done by Partition is no longer needed. passing t h r o u g h all tids of the smaller of t h e two tidiists,
Eclat employs an optimization called "fast intersections". even if the candidate is not contained in the d a t a b a s e at all.
Whenever we intersect two tidlists t h e n we are only inter- ("Fast Intersections" save some costs b u t we still need to
ested in t h e resulting tidlist if its cardinality reaches min- pass a substantial n u m b e r of tids.) B u t intersections also
supp. In other words, we should break off each intersection have their benefits. Counting implies looking up the candi-
as soon as it is sure t h a t it will not achieve this threshold. dates in t h e transactions. Of course this can get quite ex-
Eclat originally generates only frequent itemsets of size > pensive for candidates of higher cardinality. O n t h e contrary
3. We modified Eclat to mine also the frequent 1- a n d when using intersections t h e size of the candidate under in-
2-itemsets by calling it on t h e class t h a t contains the 1- vestigation does not have any influence.
itemsets together with their tidlists. In practice b o t h effects seem to balance out on t h e basket-

SIGKDD Explorations. Volume 2, Issue 1 - page 61

................ 1.......................... T......................~,............................................................
~ ~ ;
i ', i -e-,-

................. f ................................. !.................. ! ........................ ::::~ ...............................

iT.:::........ ! i i i ~ ..........
il ~ i i i .,~" j
! ......................
i
'~f ..................................
{i.............................
- i--::::-,.=:::~...........................
-;::::
::;-~-~--::................. k .................... . . . . . . . . . . . . . . . . .

10 "'~ .... , i i l ................

......... ~.............................i................. +...............................i .................................i....................

o i
mmupp ~%

Figure 8: T 2 0 . I 6 . D 1 0 0 K
Figure 4: T 1 0 . I 2 . D 1 0 0 K

14o| ~ i ! ~ .i.:

/ ~ "~ ~ ~' ~ ...... %=.:;.~-::-~'-~==~

; " ..... i i i ~ i

| i)]/ I ~.................. i ........... ~ ........... " ................... i ..................... iL ................... ' ..............
+ ................................ !~............. ~ .................. .7 .................................... ~ ................

..........i ~ " i ~ i i i
- ' -i ............ i ................... i........... i .....................i ................... i................ i ...................

1 2 4 s • 7 m

2 0,715 0.5 0,30 0.~

enllnsup~ kt %

Figure 9: M a x i m a l Frequent Itemset Size

Figure 5: T 1 0 . I 4 . D 1 0 0 K

i l ................. ~ ...................................................... .............................. ~ .................... ~II" " ::~;~::: I

.................. --I ........................ I .......................{ ..............................! ..............................! 7': "...........

iiiZiii ;ii !.............................
. . . . . . . . .i,..............................
. .. .. .. . . .i. . .ii. . . . I. . . . .
, .......................i................................................i ..........................
............... i ............................. i ................. I ........................
~.......i:~'
.. ~:i~':::~
i-.-. i:::.--~,-<=~.'. ...........
I
j ............................................................................... [..................... t--/-;:7
II i i --I i ! ........ .......
1~ ..............................t.........................r........................1.........................! 7 - - ) ~ ..........
• , j,j /

I / t ...... ~" /
.................. 4=:: .............. : = : : ~ : : : : i i ~ i = ~ 4 ~ ~ ...............
., y t ................. ................................... ...................
°1 0.6 o.2s o.12 o.oe
mmuppln %

mmpp~%

Figure 10: B a s k e t D a t a
Figure 6: T 2 0 . I 2 . D 1 0 0 K

m~mupp ~ %

Figure 11: Car E q u i p m e n t

Figure 7: T 2 0 . I 4 . D 1 0 0 K

S I G K D D Explorations. Volume 2, Issue 1 - page 62

like data. The runtime behavior in Figures 4 - 8 does Our experiments suggest that the effect of additional can-
not show any substantial differences between the algorithm didate pruning is rather small for the datasets we took into
Apriori that counts occurrences and the tidlists intersecting consideration. Additional pruning does not help Partition
algorithms Partition and Eclat. Only at quite low aver- to compensate Eclat's advantage based on the "fast inter-
age size of the maximal potentially frequent itemsets, e.g. sections". This impression was also supported by further
"T10.I2.D100K" in Figure 4, Apriori is somehow superior studies with an enhanced version of Eclat that incorporates
whereas at larger average size of the maximal potentially fre- additional candidates pruning by using right-most DFS.
quent itemsets, e.g. "T20.I6.D100K" in Figure 8, Partition
and Eclat perform better. The same explanation holds for 4.5 Splitting the Database
the real-world experiments. With an average size of ~ 2.2 Partition needs to split the database. Whereas this opti-
items at rninsupp = 0.06% the frequent itemsets found in mization helps to cope with large databases it adds the ad-
the basket data were rather short compared to the frequent ditional overhead of an extra pass to determine the globally
itemsets from the car equipment database, that contained frequent itemsets. In our experiments the size of the trans-
4.1 items in the average at minsupp ----3%. action sets were always small enough that we could employ
In Figure 9 it becomes clear what happens behind the scenes: Partition without splitting. In [21] especially at lower val-
Eclat and Partition spend most of their time with determin- ues for minsupp Partition that splits suffers strongly. The
ing the support values of the 2- and 3-candidates whereas reason is the increasing number of locally frequent itemsets
Apriori is efficientlyhandling such small itemsets. In con- that finally turn out to be globally infrequent.
trast for itemsets with size > 4 the additional effort caused
for Eclat and Partition is to be neglected whereas this does 4.6 "Fast Intersections"
not hold for Apriori. Eclat's fast intersections are obviously an advantage. The
overhead caused by checking whether minsupp is still reach-
4.3 Relaxing the Separation between Candi- able is clearly outweighed by breaking off unnecessary in-
date Generation and Support Counting tersections. As a result in all our experiments Eclat beats
O n the one hand the introduction of the prefix-treewith D I C Partition with a nearly constant factor.
allows relaxing the separation between candidate generation
and support counting and therefore reduces the number of
database scans. Moreover each node is assigned to precisely 5. CONCLUSION
one itemset. Consequently looking up candidates in bitmap- In this paper we dealt with the algorithmic aspects of asso-
encoded transactions is no longer necessary. O n the other ciation rule mining. We restricted ourselves to the "classic"
hand the m e m o r y usage of the prefix-tree caused problems, association rule problem, that is the generation of all associ-
when we experimented with our own implementation of DIC: ation rules that exist in market basket-like data with respect
The prefix-treeis already setup before all frequent 1-itemsets to minimal thresholds for support and confidence.
are known. That means a mapping to frequent items as From the broad variety of efficient algorithms that have been
described in [2] to keep the m e m o r y usage of hash tables in developed we compared the most important ones. We sys-
each node of the tree small is not possible. Moreover this tematized the algorithms and analyzed their performance
effect is strengthened by the fact that every candidate is based on both runtime experiments and theoretic consid-
stored in its own node and that in addition a separate node erations. The results were quite surprising: Although we
for each prefix of the candidate exists. Another drawback of identified fundamental differences concerning the employed
the prefix-tree is that the frequent itemsets are stored in the strategies, the algorithms show quite similar runtime behav-
same tree as the candidates. Each frequent k-itemset that is ior in our experiments. At least there is no algorithm that
not prefix of any kS-candidate with k ~ > k imposes overhead is fundamentally beating out the other ones. In fact our
when counting that is avoided by the hashtree-approach of experiments showed that the advantages and disadvantages
Apriori. we identified concerning the strategy to determine the sup-
Actually we did not come to a final result concerning the port values of the frequent itemsets nearly balance out on
efficiency of DIC. But what we want to say is that DIC's market basket-like data.
advantage of reducing the number of database scans should In a forthcoming paper we pursue the development of a hy-
not be overestimated in a retailenvironment. In [7] a perfor- brid approach that efficiently combines counting occurrences
mance gain of only about 3 0 % compared with basic Apriori and ticUist intersections [11].
is detected for data with quite small average size of the max-
imal potentially frequent itemsets.
6. REFERENCES
4.4 Additional Candidate Pruning
Typically the main task of the algorithms is determining [1] It. Agrawal, T. Imielinski, and A. Swami. Mining asso-
support values. That is, the time spent with candidate gen- ciation rules between sets of items in large databases.
eration - and candidate pruning - can be neglected. Conse- In Proc. of the ACM SIGMOD Int'l Conf. on Manage-
quently Apriori's candidate pruning step helps to reduce the ment of Data (A CM SIGMOD '93), Washington, USA,
candidates to be counted but does not add any substantial May 1993.
overhead, it is important to note that basic DFS, as em-
ployed by Eclat does not allow proper subset pruning. Only [2] It. Agrawal and It. Srikant. Fast algorithms for min-
right-most D F S as introduced in [12] for the mining of gen- ing association rules. In Proc. of the 20th Int'l Conf.
eralized association rules allows transferring the additional on Very Large Databases (VLDB 'g~), Santiago, Chile,
prune step of Apriori to algorithms using DFS. June 1994.

SIGKDD Explorations. Volume 2, Issue 1 - page 63

[3] R. Agrawal and It. Srikant. Mining sequential patterns. [16] It. Motwani, E. Cohen, M. Datar, S. Fujiware, A. Gio-
In Proc. of the Int'l Conf. on Data Engineering (ICDE), nis, P. Indyk, J. D. Ullman, and C. Yang. Finding inter-
Taipei, Taiwan, March 1995. esting associations without support pruning. In Proc. of
the 16th Int'l Conf. on Data engineering (ICDE). IEEE,
[4] N. F. Ayan, A. U. Tansel, and E. Arkun. An efficient 2000.
algorithm to update large itemsets with early pruning.
In Proc. of the 5th Int'l Conf. on Knowledge Discovery [17] R. Ng, L. S. Lakshmanan, J. Han, and A. Pang. Ex-
and Data Mining (KDD '99), San Diego, California, ploratory mining and pruning optimizations of con-
USA, August 1999. strained associations rules. In Proc. of 1998 A CM SIG-
MOD Int'l Conf. on Management of Data, Seattle,
[5] It. J. Bayardo Jr., It. Agrawal, and D. Gunopulos. Washington, USA, June 1998.
Constraint-based rule mining in large, dense databases. [18] N. Pasqnier, Y. Bastide, It. Taouil, and L. Lakhal. Dis-
In Proe. of the 15th Int'l Conf. on Data Engineering, covering frequent closed itemsets for association rules.
Sydney, Australia, March 1999. In Proc. of the 7th Int'l Conf. on Database Theory
[6] S. Brin, It. Motwani, and C. Silverstein. Beyond market (ICDT'99), Jerusalem, Israel, January 1999.
baskets: Generalizing association rules to correlations. [19] J. Pei, J. Hart, and It. Mao. An efficient algorithm for
In Proc. of the ACM SIGMOD Int'l Conf. on Manage- mining frequent closed itemsets. In Proc. of the 2000
ment of Data (ACM SIGMOD '97), 1997. ACM-SIGMOD Int'l Conf. on Management of Data,
Dallas, Texas, USA, May 2000.
[7] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dy-
namic itemset counting and implication rules for mar- [20] It. Rastogi and K. Shim. Mining optimized support
ket basket data. In Proc. of the ACM SIGMOD Int'l rules for numeric attributes. In Proe. of the 15th Int'l
Conf. on Management of Data, 1997. Conf. on Data Engineering, Sydney, Australia, March
1999. IEEE Computer Society Press.
[8] T. Fukuda, Y. Morimoto, S. Morishita, and
T. Tokuyama. Mining optimized association rules for [21] A. Savasere, E. Omiecinski, and S. Navathe. An effi-
numeric attributes. In Proc. of the 15th ACM SIGACT- cient algorithm for mining association rules in large
SIGMOD-SIGART Syrup. on Principles of Database databases. In Proe. of the 21st Conf. on Very Large
Systems (PODS '96), Montreal, Canada, June 1996. Databases (VLDB '95), Ziirich, Switzerland, Septem-
ber 1995.
[9] J. Han, J. Pei, and Y. Yin. Mining frequent patterns [22] C. Silverstein, S. Brin, It. Motwani, and J. D. Ull-
without candidate generation. In Proc. of the 2000 man. Scalable techniques for mining causal structures.
ACM-SIGMOD Int'l Conf. on Management of Data, In Proe. of 1998 A C M SIGMOD Int'l Conf. on Manage-
Dallas, Texas, USA, May 2000. ment of Data, Seattle, Washington, USA, June 1998.
[10] C. Hidber. Online association rule mining. In Proe. [23] It. Srikant and It. Agrawal. Mining generalized associ-
of the 1999 ACM SIGMOD Conf. on Management of ation rules. In Proc. of the 21st Conf. on Very Large
Data, 1999. Databases (VLDB '95), Ziirich, Switzerland, Septem-
ber 1995.
[11] J. Hipp, U. Giintzer, and G. Nakhaeizadeh. Mining
association rules: Deriving a superior algorithm by [24] It. Srikant and It. Agrawal. Mining quantitative asso-
analysing today's approaches. In Proe. of the 4th Eu- ciation rules in large relational tables. In Proe. of the
ropean Conf. on Principles and Practice of Knowledge 1996 A C M SIGMOD Conf. on Management of Data,
Discovery, Lyon, France, September 2000. to appear. Montreal, Canada, June 1996.

[12] J. Hipp, A. Myka, It. Wirth, and U. Gfintzer. A new [25] It. Srikant, Q. Vu, and It. Agrawal. Mining association-
algorithm for faster mining of generalized association rules with item constraints. In Proe. of the 3rd Int'l
rules. In Proc. of the 2nd European Symposium on Conf. on KDD and Data Mining (KDD '97), Newport
Principles of Data Mining and Knowledge Discovery Beach, California, August 1997.
(PKDD '98), Nantes, France, September 1998. [26] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka. An
efficient algorithm for the incremental updation of asso-
[13] M. Houtsma and A. Swami. Set-oriented mining for as- ciation rules in large databases. In Proe. of the 3rd Int 'l
sociation rules in relational databases. Technical Report Conf. on KDD and Data Mining (KDD '97), Newport
ttJ 9567, IBM Almaden Research Center, San Jose, Beach, California, August 1997.
California, Oktober 1993.
[27] D. Tsur, J. D. UUman, S. Abitboul, C. Clifton, It. Mot-
[14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivo- wani, S. Nestorov, and A. Rosenthal. Query flocks: A
nen, and A. I. Verkamo. Finding interesting rules from generalization of association-rule mining. In Proc. of
large sets of discovered association rules. In Proc. of the 1998 ACM SIGMOD Int'l Conf. on Management of
3rd Int'l Conf. on Information and Knowledge Manage- Data, Seattle, Washington, USA, June 1998.
ment, Gaithersburg, Maryland, 29. Nov - 2. Dez 1994.
[28] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li.
[15] H. Mannila, H. Toivonen, and I. Verkamo. Discovery of New algorithms for fast discovery of association rules.
frequent episodes in event sequences. Data Mining and In Proc. of the 8rd Int'l Conf. on KDD and Data Minin 9
Knowledge Discovery, 1(3), November 1997. (KDD '97), Newport Beach, California, August 1997.

SIGKDD Explorations. Volume 2, Issue 1 - page 64

Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
No ratings yet
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
14 pages
Performance Evaluation of Sequential and Parallel Mining of Association Rules Using Apriori Algorithms
No ratings yet
Performance Evaluation of Sequential and Parallel Mining of Association Rules Using Apriori Algorithms
6 pages
Cap. 10 Reglas de Asociacion Sobre Intervalos Miller1997
No ratings yet
Cap. 10 Reglas de Asociacion Sobre Intervalos Miller1997
10 pages
A New SQL-like Operator For Mining Association Rules: Rosa Meo
No ratings yet
A New SQL-like Operator For Mining Association Rules: Rosa Meo
12 pages
Data Stream Phenomenon
No ratings yet
Data Stream Phenomenon
1 page
Fast Algorithms For Mining Association Rules
No ratings yet
Fast Algorithms For Mining Association Rules
13 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Unit - IV DW
No ratings yet
Unit - IV DW
3 pages
PRP Icp 1
No ratings yet
PRP Icp 1
5 pages
2some Advanced Topics in Association Rule Mining
No ratings yet
2some Advanced Topics in Association Rule Mining
11 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Understanding Association Rule in Data Mining
No ratings yet
Understanding Association Rule in Data Mining
4 pages
Association Models For Prediction With Apriori Concept
No ratings yet
Association Models For Prediction With Apriori Concept
7 pages
Data Mining For Supermarket Sale Analysis Using Association Rule
No ratings yet
Data Mining For Supermarket Sale Analysis Using Association Rule
5 pages
On Optimal Rule Discovery
No ratings yet
On Optimal Rule Discovery
1 page
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Parallel Data Mining of Association Rules
No ratings yet
Parallel Data Mining of Association Rules
10 pages
Scalable Algorithms for Association Mining
No ratings yet
Scalable Algorithms for Association Mining
19 pages
Basic Association Rules
No ratings yet
Basic Association Rules
12 pages
Knowledge Engineering Report: Apriori Algorithm
0% (1)
Knowledge Engineering Report: Apriori Algorithm
13 pages
Association Rule in Data Mining
No ratings yet
Association Rule in Data Mining
4 pages
Non-Redundant Association Rules
No ratings yet
Non-Redundant Association Rules
18 pages
I Jcs It 2014050535
No ratings yet
I Jcs It 2014050535
5 pages
Chapter 4 New
No ratings yet
Chapter 4 New
17 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Classify: What Are Associations in Machine Learning?
No ratings yet
Classify: What Are Associations in Machine Learning?
5 pages
DM Unit 2
No ratings yet
DM Unit 2
55 pages
Contents
No ratings yet
Contents
59 pages
E-Learning Data Mining Insights
No ratings yet
E-Learning Data Mining Insights
6 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
Notes On Association Rules
No ratings yet
Notes On Association Rules
3 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
45 pages
Efficient Adaptive-Support Association Rule Mining
No ratings yet
Efficient Adaptive-Support Association Rule Mining
23 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
4.Eng-Association Rule Mining For Different Minimum
No ratings yet
4.Eng-Association Rule Mining For Different Minimum
8 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
BIA Unit2
No ratings yet
BIA Unit2
17 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
Chapter 4 Association Rule Mining1
No ratings yet
Chapter 4 Association Rule Mining1
44 pages
Sarath 2013
No ratings yet
Sarath 2013
9 pages
C ARM: An Efficient Algorithm For Closed Association Rule Mining
No ratings yet
C ARM: An Efficient Algorithm For Closed Association Rule Mining
20 pages
Discovering Hidden Association Rules
No ratings yet
Discovering Hidden Association Rules
8 pages
Efficient Mining of Both Positive and Negative Association Rules
No ratings yet
Efficient Mining of Both Positive and Negative Association Rules
25 pages
Study On The Method of Road Transport Management I
No ratings yet
Study On The Method of Road Transport Management I
10 pages
KDD98 012
No ratings yet
KDD98 012
7 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
77 pages
Unit 3 Final
No ratings yet
Unit 3 Final
13 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
RST Approach For Efficient CARs Mining
No ratings yet
RST Approach For Efficient CARs Mining
7 pages
Report of 2nd Defence
No ratings yet
Report of 2nd Defence
6 pages
DWM
No ratings yet
DWM
66 pages
Mining Association Rules From Infrequent Itemsets: A Survey
No ratings yet
Mining Association Rules From Infrequent Itemsets: A Survey
8 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Data Mining: Classification & Association
No ratings yet
Data Mining: Classification & Association
7 pages
PRP Icp 2
No ratings yet
PRP Icp 2
11 pages
Multiobjective Optimization Based On Reputation
No ratings yet
Multiobjective Optimization Based On Reputation
22 pages
Microsoft Word - JCD Final
No ratings yet
Microsoft Word - JCD Final
8 pages
Tabu Search
No ratings yet
Tabu Search
6 pages
A Hybrid Tabu-SA Algorithm For Location-Inventory Model With Considering Capacity Levels and Uncertain Demands
No ratings yet
A Hybrid Tabu-SA Algorithm For Location-Inventory Model With Considering Capacity Levels and Uncertain Demands
15 pages
Information Sciences: Doina Bucur
No ratings yet
Information Sciences: Doina Bucur
16 pages
Genetic Algorithms in Data Mining
No ratings yet
Genetic Algorithms in Data Mining
12 pages
Multiple Criteria Decision Analysis Using A Likelihood-Based Outranking Method Based On Interval-Valued Intuitionistic Fuzzy Sets
No ratings yet
Multiple Criteria Decision Analysis Using A Likelihood-Based Outranking Method Based On Interval-Valued Intuitionistic Fuzzy Sets
21 pages
Validationofassociationrulemining ACI2013 Wright Sittig
No ratings yet
Validationofassociationrulemining ACI2013 Wright Sittig
10 pages
Design of Fuzzy Rule-Based Classifiers With Semantic Cointension
No ratings yet
Design of Fuzzy Rule-Based Classifiers With Semantic Cointension
17 pages
Genetic Algorithms Rule Discovery Data Mining: For in
No ratings yet
Genetic Algorithms Rule Discovery Data Mining: For in
7 pages
Measuring The Accuracy and Interest of Association Rules: A New Framework
No ratings yet
Measuring The Accuracy and Interest of Association Rules: A New Framework
15 pages
Information Sciences: Marta Galende, María José Gacto, Gregorio Sainz, Rafael Alcalá
No ratings yet
Information Sciences: Marta Galende, María José Gacto, Gregorio Sainz, Rafael Alcalá
24 pages
A Hierarchical Model of A Linguistic Variable
No ratings yet
A Hierarchical Model of A Linguistic Variable
15 pages
Association Rule Mining - Models and Algorithms (Zhang & Zhang 2002-05-28)
50% (2)
Association Rule Mining - Models and Algorithms (Zhang & Zhang 2002-05-28)
248 pages
Weighted Association Rule Mining Using Weighted Support and Significance Framework
No ratings yet
Weighted Association Rule Mining Using Weighted Support and Significance Framework
6 pages
Dependence Factor For Association Rules: Abstract. Certainty Factor and Lift Are Known Evaluation Measures of Associa
No ratings yet
Dependence Factor For Association Rules: Abstract. Certainty Factor and Lift Are Known Evaluation Measures of Associa
2 pages
Association Rule Mining On Distributed Data: Pallavi Dubey
No ratings yet
Association Rule Mining On Distributed Data: Pallavi Dubey
6 pages
Pone 0201868 PDF
No ratings yet
Pone 0201868 PDF
20 pages
Vinod Bodempudi Resume
No ratings yet
Vinod Bodempudi Resume
2 pages
Tools and Techniques of SSAD
88% (8)
Tools and Techniques of SSAD
37 pages
Generative AI in Business Data Analytics
No ratings yet
Generative AI in Business Data Analytics
1 page
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
100% (3)
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
138 pages
Ansible Roles Quick Guide
No ratings yet
Ansible Roles Quick Guide
1 page
Lab 6 - 2 - 2 - 5
No ratings yet
Lab 6 - 2 - 2 - 5
3 pages
Teen Career Path Finder
0% (1)
Teen Career Path Finder
13 pages
Practical Issues of Database Application
No ratings yet
Practical Issues of Database Application
41 pages
PuTTY Add-Ons for Power Users
No ratings yet
PuTTY Add-Ons for Power Users
7 pages
DM30L Steel Track
No ratings yet
DM30L Steel Track
1 page
Naukri KishoreAakula (3y 0m)
No ratings yet
Naukri KishoreAakula (3y 0m)
3 pages
Introduction To LlamaIndex With Python 2024
No ratings yet
Introduction To LlamaIndex With Python 2024
9 pages
Data Engineering Interview Prep
No ratings yet
Data Engineering Interview Prep
8 pages
User Interface Design: Software Engineering: A Practitioner's Approach, 7/e
No ratings yet
User Interface Design: Software Engineering: A Practitioner's Approach, 7/e
26 pages
Interview Prep Tracker 2024
No ratings yet
Interview Prep Tracker 2024
6 pages
EDI Processing & SAP Adapter Guide
50% (2)
EDI Processing & SAP Adapter Guide
40 pages
Oracle WebLogic Server 12c
0% (1)
Oracle WebLogic Server 12c
7 pages
Data Blocks Consistent Reads - Undo Records Applied
No ratings yet
Data Blocks Consistent Reads - Undo Records Applied
33 pages
Nextcloud Manual
No ratings yet
Nextcloud Manual
101 pages
PostgreSQL Solaris PKG Installation
No ratings yet
PostgreSQL Solaris PKG Installation
8 pages
Cisco Application Control Engine (ACE) 4710 Appliance: Product Overview
No ratings yet
Cisco Application Control Engine (ACE) 4710 Appliance: Product Overview
9 pages
Airline Booking System ER Diagram
No ratings yet
Airline Booking System ER Diagram
4 pages
C Circular Linked List Guide
100% (1)
C Circular Linked List Guide
3 pages
Dental Clinic Appointment System
88% (8)
Dental Clinic Appointment System
44 pages
Barracuda Web Application Firewall Best Practices Guide PDF
No ratings yet
Barracuda Web Application Firewall Best Practices Guide PDF
14 pages
FreeBSD Disk Management Guide
No ratings yet
FreeBSD Disk Management Guide
2 pages
Efficient Reservation System
83% (29)
Efficient Reservation System
5 pages
Azure DevOps Engineer Profile
No ratings yet
Azure DevOps Engineer Profile
2 pages
CS 11 C Track: Lecture 1: Preliminaries
No ratings yet
CS 11 C Track: Lecture 1: Preliminaries
60 pages
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
No ratings yet
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
4 pages

Algorithms For Association Rule Mining - A General Survey and Comparison

Uploaded by

Algorithms For Association Rule Mining - A General Survey and Comparison

Uploaded by

Algorithms for Association Rule Mining - A General

Survey and Comparison

Jochen Hipp Ulrich G0ntzer Gholamreza

ABSTRACT W h e n mining association rules there are mainly two prob-

1.1 Association Rules 1.2 Outline of the Paper

S I G K D D Explorations. Volume 2, Issue 1 - page 58

SIGKDD Explorations. Volume 2, Issue 1 - page 59

ii 3 atized in Figure 3. W e characterize each of the algorithms

Figure 2: Tree for :Z = {1, 2, 3, 4}

s u p p o r t values only of those itemsets t h a t we "visit" on

2.3 Determine Itemset Supports Figure 3: Systematization of t h e Algorithms

S I G K D D Explorations. Volume 2, Issue 1 - page 60

3.3 BFS and TID-List Intersections

SIGKDD Explorations. Volume 2, Issue 1 - page 61

................. f ................................. !.................. ! ........................ ::::~ ...............................

10 "'~ .... , i i l ................

......... ~.............................i................. +...............................i .................................i....................

/ ~ "~ ~ ~' ~ ...... %=.:;.~-::-~'-~==~

2 0,715 0.5 0,30 0.~

Figure 9: M a x i m a l Frequent Itemset Size

i l ................. ~ ...................................................... .............................. ~ .................... ~II" " ::~;~::: I

.................. --I ........................ I .......................{ ..............................! ..............................! 7': "...........

Figure 11: Car E q u i p m e n t

S I G K D D Explorations. Volume 2, Issue 1 - page 62

SIGKDD Explorations. Volume 2, Issue 1 - page 63

SIGKDD Explorations. Volume 2, Issue 1 - page 64

You might also like