KEMBAR78
Covering (Rules-based) Algorithm | PPT
Chapter 8 Covering (Rules-based) Algorithm Data Mining Technology
Chapter 8 Covering (Rules-based) Algorithm Written by Shakhina Pulatova  Presented by Zhao Xinyou [email_address] 2007.11.13 Data Mining Technology Some materials (Examples) are taken from Website.
Contents What is the Covering (Rule-based) algorithm? Classification Rules- Straightforward 1. If-Then rule 2. Generating rules from Decision Tree Rule-based Algorithm 1. The 1R Algorithm / Learn One Rule 2. The PRISM Algorithm 3. Other Algorithm Application of Covering algorithm Discussion on e/m-learning application
Introduction-App-1 PP87-88 Training Data Attributes Record Rules Rules given by people Rules generated by computer Setting 1.(1.75, 0)  short 2. [1.75, 1.95) Medium 3. [1.95, ..) tall
Introduction-App-2 PP87-88 How to get all tall people from B based on A A B + Training Data
What is Rule-based Algorithm? Definition : Each classification method uses an algorithm to generate rules from the sample data. These rules are then applied to new data. Rule-based algorithm  provide mechanisms that generate rules by  1. concentrating on a specific class at a time 2. maximizing the probability of the desired classification. PP87-88 Should be compact, easy-to-interpret, and accurate.
Classification Rules- Straightforward If-Then rule Generating rules from Decision Tree PP88-89
formal Specification of Rule-based Algorithm The classification  r ules, r=<a, c>, consists of : a  ( a ntecedent/precondition): a series of tests that be valuated as  true  or  false ; c  ( c onsequent/conclusion): the class or classes that apply to instances covered by rule r. PP88 a=0,b=0 a=0,b=1 a=1,b=0 a=1,b=1 a = x y c = a=0 b=0 b=0 yes no X X Y Y no no yes yes
Remarks of Straightforward classification The  a ntecedent contains a predicate that can be valuated as true or false against each tuple in database. These rules relate directly to corresponding decision tree (DT) that could be created. A DT can always be used to generate rules, but they are not equivalent. Differences: -the tree has a implied order in which the splitting is performed; rules have no order. -a tree is created based on looking at all classes; only one class must be examined at a time. PP88-89
If-Then rule Straightforward way to perform classification is to generate if-then rules that cover all cases. 1 PP88
Generating rules from Decision Tree -1-Con’ Decision Tree 2
Generating rules from Decision Tree -2-Con’ y n a b c d x y y
Generating rules from Decision Tree -3-Con’
Remarks Rules may be more complex and incomprehensible from DT. A new test or rules need reshaping the whole tree Rules obtained without decision trees are more compact and accurate. So many other covering algorithms have been proposed. PP89-90 a b x y y c d x y y n n n n c d x y y n n c d x y y n n c d x y y n n duplicate subtrees a=0 b=0 b=0 yes no X X Y Y no no yes yes a=1 and c=0  Y
Rule-based Classification Generate rules The 1R Algorithm / Learn One Rule The PRISM Algorithm Other Algorithm PP90
Generating rules without Decision Trees-1-con’ Goal: find rules that identify the instances of a specific class Generate the “best” rule possible by optimizing the desired classification probability Usually, the “best” attribute-pair is chosen Remark -these technologies are also called covering algorithms because they attempt to generate rules which exactly  cover  a specific class.
Generate Rules-Example-2-Con' Example 3 Question: We want to generate a rule to classify persons as tall. Basic format of the rule: if ? then class = tall Goal: replace “?” with predicates that can be used to obtain the “best” probability of being tall PP90
Generate Rules-Algorithms-3-Con' 1.Generate rule R on training data S; 2.Remove the training data covered by rule R; 3. Repeat the process. PP90
Generate Rules-Example-4-Con' Sequential Covering (I) Original data (ii) Step 1 r = NULL (iii) Step 2 R1 r = R1 (iii) Step 3 R1 R2 r = R1  U R2 (iii) Step 4 R1 R2 R3 r = R1  U R2  U R3 Wrong Class
1R Algorithm/ Learn One Rule-Con’  Simple and cheap method it only generates a one level decision tree. Classify an object on the basis of a single attribute. Idea: Rules will be constructed to test a single attribute and branch for every value of that attribute. For each branch, the class with the test classification is the one occurring  PP91
1R Algorithm/ Learn One Rule-Con’  Idea : 1. Rules will be constructed to test a single attribute and branch for every value of that attribute.  Step   2. For each branch, the class with the test classification is the one occurring. 3. Find one biggest number as rules 4. Error rate will be evaluated. 5. The minimum error rate will be chosen.  PP91 M->T  Error=5 F->M  Error=3 Total  Error=8 Total  Error=3 Total  Error=.. A2 An Gender F 2 5 1 S M T M 1 4 10 S M T
1R Algorithm Input: D   //Training Data T   //Attributes to consider for rules   C   //Classes Output : R   //Rules ALgorithm : R=Φ; for all A in T do R A =Φ; for all possbile value, v, of A do for all C j ∈C do find count(C j ) end for let C m  be the class with the largest count; R A =R A ((A=v) ->(class= C m )); end for ERR A =number of tuples incorrectly classified by R A ; e nd for R=R A  where ERR A  is minimum T={Gender, Height} D C={{F, M},  {0, ∞}} C1 C2 Training Data Gender F M Short Medium Tall 3 6 0 Short Medium Tall 1 2 3 R1=F->medium R2=M->tall Height
Example 5 – 1R-3-Con’ Rules  based on  height … ... … 0/2 0/2 0/3 0/4 1/2 0/2 3/9 3/6 Error 1/15 (0  , 1.6]-> short (1.6, 1.7]->short (1.7, 1.8]-> medium (1.8, 1.9]-> medium (1.9, 2.0]-> medium (2.0,  ∞ ]-> tall Height (Step=0.1) 2 6/15 F->medium M->tall Gender 1 Total Error Rules Attribute Option
Example 6 -1R PP92-93 5/14 2/8 3/6 False->yes True->no windy 4 4/14 3/7 1/7 High->no Normal->yes humidity 3 2/4 2/6 1/4 2/5 0/4 2/5 Error 5/14 Hot->no Mild->yes Cool->yes temperature 2 4/14 Sunny->no Overcast->yes Rainy->yes outlook 1 Total Error Rules Attribute Rules  based on humidity  OR High->no Normal->yes Rules  based on outlook Sunny->no Overcast->yes Rainy->yes
PRISM Algorithm-Con’ PRISM generate rules for each class by looking at the training data and adding rules that completely describe all tuples in that class. Generates only correct or perfect rules: the accuracy of so-constructed PRISM is 100%. Measures the success of a rule by a p/t, where  -p is number of positive instance,  -T is total number of instance covered by the rule. Gender=Male  P=10, T=10 Gender=Female  P=1 T=8  R=Gender = Male …… A2 An Gender F 2 5 1 S M T M 0 0 10 S M T
PRISM Algorithm Step Input  D  and  C  (Attribute -> Value) 1.Compute all class P/T  (Attribute->Value) 2. Find one or more pair of  (Attribute->Value)   P/T = 100% 3. Select  (Attribute->Value)  as  Rule 4. Repeat 1-3 until no data in  D Input: D   //Training Data C   //Classes Output: R //Rules
Example 8-Con’-which class may be tall? Compute the value  p / t Which one is 100% PP94-95 0/9 Gender = F 1 2/2 2.0< Height 8 ½ 1.9< Height  ≤ 2.0 7 0/4 1.8< Height  ≤ 1.9 6 0/3 1.7< Height  ≤ 1.8 5 0/2 1.6< Height  ≤ 1.7 4 0/2 Height  ≤ 1.6 3 3/6 Gender = M 2 p / t (Attribute, value) Num R1  = 2.0< Height
R2  = 1.95< Height ≤ 2.0 R = R1 U R2 PP94-96 … … … 1/1 1.95< Height  ≤ 2.0 0/1 1.9< Height  ≤ 1.95 p / t (Attribute, value) Num
Example 9-Con’-which days may play? The predicate  outlook=overcast   correctly implies  play=yes  on all four rows R1 =if outlook=overcast, then play=yes Compute the value  p / t
Example 8-Con’ R2= if humidity=normal and windy=false, then play=yes
Example 8-Con’ R3 =….. R = R1 U R2 U R3 U…
Application of Covering Algorithm To derive classification rules applied for diagnosing illness, business planning, banking, government. Machine learning Text classification. But to photos, it is difficult… And so on.
Application on E-learning/M-learning Adaptive and personalized learning materials Virtual Group Classification Initial Learner’s information Classification of learning styles or some Provide adaptive and personalized materials Collect learning styles feedback Chapter 2 or 3 Similarity, Bayesian… Rule-based algorithm
Discussion

Covering (Rules-based) Algorithm

  • 1.
    Chapter 8 Covering(Rules-based) Algorithm Data Mining Technology
  • 2.
    Chapter 8 Covering(Rules-based) Algorithm Written by Shakhina Pulatova Presented by Zhao Xinyou [email_address] 2007.11.13 Data Mining Technology Some materials (Examples) are taken from Website.
  • 3.
    Contents What isthe Covering (Rule-based) algorithm? Classification Rules- Straightforward 1. If-Then rule 2. Generating rules from Decision Tree Rule-based Algorithm 1. The 1R Algorithm / Learn One Rule 2. The PRISM Algorithm 3. Other Algorithm Application of Covering algorithm Discussion on e/m-learning application
  • 4.
    Introduction-App-1 PP87-88 TrainingData Attributes Record Rules Rules given by people Rules generated by computer Setting 1.(1.75, 0) short 2. [1.75, 1.95) Medium 3. [1.95, ..) tall
  • 5.
    Introduction-App-2 PP87-88 Howto get all tall people from B based on A A B + Training Data
  • 6.
    What is Rule-basedAlgorithm? Definition : Each classification method uses an algorithm to generate rules from the sample data. These rules are then applied to new data. Rule-based algorithm provide mechanisms that generate rules by 1. concentrating on a specific class at a time 2. maximizing the probability of the desired classification. PP87-88 Should be compact, easy-to-interpret, and accurate.
  • 7.
    Classification Rules- StraightforwardIf-Then rule Generating rules from Decision Tree PP88-89
  • 8.
    formal Specification ofRule-based Algorithm The classification r ules, r=<a, c>, consists of : a ( a ntecedent/precondition): a series of tests that be valuated as true or false ; c ( c onsequent/conclusion): the class or classes that apply to instances covered by rule r. PP88 a=0,b=0 a=0,b=1 a=1,b=0 a=1,b=1 a = x y c = a=0 b=0 b=0 yes no X X Y Y no no yes yes
  • 9.
    Remarks of Straightforwardclassification The a ntecedent contains a predicate that can be valuated as true or false against each tuple in database. These rules relate directly to corresponding decision tree (DT) that could be created. A DT can always be used to generate rules, but they are not equivalent. Differences: -the tree has a implied order in which the splitting is performed; rules have no order. -a tree is created based on looking at all classes; only one class must be examined at a time. PP88-89
  • 10.
    If-Then rule Straightforwardway to perform classification is to generate if-then rules that cover all cases. 1 PP88
  • 11.
    Generating rules fromDecision Tree -1-Con’ Decision Tree 2
  • 12.
    Generating rules fromDecision Tree -2-Con’ y n a b c d x y y
  • 13.
    Generating rules fromDecision Tree -3-Con’
  • 14.
    Remarks Rules maybe more complex and incomprehensible from DT. A new test or rules need reshaping the whole tree Rules obtained without decision trees are more compact and accurate. So many other covering algorithms have been proposed. PP89-90 a b x y y c d x y y n n n n c d x y y n n c d x y y n n c d x y y n n duplicate subtrees a=0 b=0 b=0 yes no X X Y Y no no yes yes a=1 and c=0 Y
  • 15.
    Rule-based Classification Generaterules The 1R Algorithm / Learn One Rule The PRISM Algorithm Other Algorithm PP90
  • 16.
    Generating rules withoutDecision Trees-1-con’ Goal: find rules that identify the instances of a specific class Generate the “best” rule possible by optimizing the desired classification probability Usually, the “best” attribute-pair is chosen Remark -these technologies are also called covering algorithms because they attempt to generate rules which exactly cover a specific class.
  • 17.
    Generate Rules-Example-2-Con' Example3 Question: We want to generate a rule to classify persons as tall. Basic format of the rule: if ? then class = tall Goal: replace “?” with predicates that can be used to obtain the “best” probability of being tall PP90
  • 18.
    Generate Rules-Algorithms-3-Con' 1.Generaterule R on training data S; 2.Remove the training data covered by rule R; 3. Repeat the process. PP90
  • 19.
    Generate Rules-Example-4-Con' SequentialCovering (I) Original data (ii) Step 1 r = NULL (iii) Step 2 R1 r = R1 (iii) Step 3 R1 R2 r = R1 U R2 (iii) Step 4 R1 R2 R3 r = R1 U R2 U R3 Wrong Class
  • 20.
    1R Algorithm/ LearnOne Rule-Con’ Simple and cheap method it only generates a one level decision tree. Classify an object on the basis of a single attribute. Idea: Rules will be constructed to test a single attribute and branch for every value of that attribute. For each branch, the class with the test classification is the one occurring PP91
  • 21.
    1R Algorithm/ LearnOne Rule-Con’ Idea : 1. Rules will be constructed to test a single attribute and branch for every value of that attribute. Step 2. For each branch, the class with the test classification is the one occurring. 3. Find one biggest number as rules 4. Error rate will be evaluated. 5. The minimum error rate will be chosen. PP91 M->T Error=5 F->M Error=3 Total Error=8 Total Error=3 Total Error=.. A2 An Gender F 2 5 1 S M T M 1 4 10 S M T
  • 22.
    1R Algorithm Input:D //Training Data T //Attributes to consider for rules C //Classes Output : R //Rules ALgorithm : R=Φ; for all A in T do R A =Φ; for all possbile value, v, of A do for all C j ∈C do find count(C j ) end for let C m be the class with the largest count; R A =R A ((A=v) ->(class= C m )); end for ERR A =number of tuples incorrectly classified by R A ; e nd for R=R A where ERR A is minimum T={Gender, Height} D C={{F, M}, {0, ∞}} C1 C2 Training Data Gender F M Short Medium Tall 3 6 0 Short Medium Tall 1 2 3 R1=F->medium R2=M->tall Height
  • 23.
    Example 5 –1R-3-Con’ Rules based on height … ... … 0/2 0/2 0/3 0/4 1/2 0/2 3/9 3/6 Error 1/15 (0 , 1.6]-> short (1.6, 1.7]->short (1.7, 1.8]-> medium (1.8, 1.9]-> medium (1.9, 2.0]-> medium (2.0, ∞ ]-> tall Height (Step=0.1) 2 6/15 F->medium M->tall Gender 1 Total Error Rules Attribute Option
  • 24.
    Example 6 -1RPP92-93 5/14 2/8 3/6 False->yes True->no windy 4 4/14 3/7 1/7 High->no Normal->yes humidity 3 2/4 2/6 1/4 2/5 0/4 2/5 Error 5/14 Hot->no Mild->yes Cool->yes temperature 2 4/14 Sunny->no Overcast->yes Rainy->yes outlook 1 Total Error Rules Attribute Rules based on humidity OR High->no Normal->yes Rules based on outlook Sunny->no Overcast->yes Rainy->yes
  • 25.
    PRISM Algorithm-Con’ PRISMgenerate rules for each class by looking at the training data and adding rules that completely describe all tuples in that class. Generates only correct or perfect rules: the accuracy of so-constructed PRISM is 100%. Measures the success of a rule by a p/t, where -p is number of positive instance, -T is total number of instance covered by the rule. Gender=Male P=10, T=10 Gender=Female P=1 T=8 R=Gender = Male …… A2 An Gender F 2 5 1 S M T M 0 0 10 S M T
  • 26.
    PRISM Algorithm StepInput D and C (Attribute -> Value) 1.Compute all class P/T (Attribute->Value) 2. Find one or more pair of (Attribute->Value) P/T = 100% 3. Select (Attribute->Value) as Rule 4. Repeat 1-3 until no data in D Input: D //Training Data C //Classes Output: R //Rules
  • 27.
    Example 8-Con’-which classmay be tall? Compute the value p / t Which one is 100% PP94-95 0/9 Gender = F 1 2/2 2.0< Height 8 ½ 1.9< Height ≤ 2.0 7 0/4 1.8< Height ≤ 1.9 6 0/3 1.7< Height ≤ 1.8 5 0/2 1.6< Height ≤ 1.7 4 0/2 Height ≤ 1.6 3 3/6 Gender = M 2 p / t (Attribute, value) Num R1 = 2.0< Height
  • 28.
    R2 =1.95< Height ≤ 2.0 R = R1 U R2 PP94-96 … … … 1/1 1.95< Height ≤ 2.0 0/1 1.9< Height ≤ 1.95 p / t (Attribute, value) Num
  • 29.
    Example 9-Con’-which daysmay play? The predicate outlook=overcast correctly implies play=yes on all four rows R1 =if outlook=overcast, then play=yes Compute the value p / t
  • 30.
    Example 8-Con’ R2=if humidity=normal and windy=false, then play=yes
  • 31.
    Example 8-Con’ R3=….. R = R1 U R2 U R3 U…
  • 32.
    Application of CoveringAlgorithm To derive classification rules applied for diagnosing illness, business planning, banking, government. Machine learning Text classification. But to photos, it is difficult… And so on.
  • 33.
    Application on E-learning/M-learningAdaptive and personalized learning materials Virtual Group Classification Initial Learner’s information Classification of learning styles or some Provide adaptive and personalized materials Collect learning styles feedback Chapter 2 or 3 Similarity, Bayesian… Rule-based algorithm
  • 34.