Classification: Basic Concepts and
Decision Trees
and Decision Rules
Data Classification and Prediction
Data classification
classification
prediction
Methods of classification
decision tree induction
Forward and Back-propagation (Neural Network)
Bayesian classification
Association rule mining
Classification: Definition
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is
the class.
Find a model for class attribute as a function of
the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test sets,
with training set used to build the model and test set used to
validate it.
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Examples of Classification Task
Predicting tumor cells as benign or malignant
Classifying credit card transactions
as legitimate or fraudulent
Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
Categorizing news stories as finance,
weather, entertainment, sports, etc.
Classification Techniques
Decision Tree based Methods
Rule-based Methods / Decision Rules
Neural Networks (NN)
Artificial Neural Network (ANN)
Recurrence Neural Network (RNN)
Association Rules (Apriori Algorithm)
Naïve Bayes and Bayesian Belief Networks
Memory based reasoning(LSTM)
Support Vector Machines (SVM)
Description of
Decision Rules or Trees
Native appeal for users
Presentation Forms
graphically –decision trees (ID3: Iterative Dichotomiser 3)
(dīˈkädəməser)
“if, then” statements (decision rules)
What They Look Like
Works like a flow chart
Looks like an upside down tree
Nodes
appear as rectangles or circles
represent test or decision
Lines or branches - represent outcome of a test
Circles - terminal (leaf) nodes
Top or starting node- root node
Internal nodes - rectangles
Example of a Decision Tree
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat
1 Yes Single 125K No
2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10
Training Data Model: Decision Tree
Another Example of Decision Tree
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
How They Work
Decision rules - partition sample of data
Terminal node (leaf) indicates the class assignment
Tree partitions samples into mutually exclusive groups
One group for each terminal node
All paths
start at the root node
end at a leaf
Each path represents a decision rule
joining (AND) of all the tests along that path
separate paths that result in the same class are disjunctions (ORs)
All paths - mutually exclusive
for any one case - only one path will be followed
false decisions on the left branch
true decisions on the right branch
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Entropy
S is a sample of training examples
p is the proportion of positive examples
n is the proportion of negative examples
Calculate
I(pi,ni)=((-p/p+n)log2(p/p+n))–((n/p+n)log2(n/p+n))
Calculate
Entropy(S) = pi+ni/P I(pi,ni)
20
Splitting Based on INFO...
Information Gain:
Entropy ( p) Entropy (i )
n
k
GAIN i
n
split i 1
Parent Node, p is split into k partitions;
ni is number of records in partition i
Building Decision Tree
Step – 1 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No
Calculate “Class” attribute Entropy.
No Married 100K No
No Single 70K No
Class attribute is “Cheat” Yes
No
Married
Divorced
120K
95K
No
Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
No Married 75K No
p=3, n=7, p + n=10
No Single 90K Yes
Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
=-3/10 log2 3/10 – 7/10 log2 7/10
= ((-3/10)(log(3/10) / log(2))) – ((7/10)(log (7/10) / log(2)))
= 0.88
Building Decision Tree
Step – 2 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No
Calculate Entropy of other attributes
No Married 100K No
No Single 70K No
such as Refund, Marital Status and Yes
No
Married
Divorced
120K
95K
No
Yes
Taxable Income. No Married 60K No
Yes Divorced 220K No
Entropy (Refund) No
No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
Yes 0 3 0
No 3 4 0.98
I(3,4)=-3/7log23/7-4/7log24/7 I(0,3)=-0/3log20/3-3/3log23/3
=0.98 =0
E(Refund)=7/10x0 + 7/10 x 0.98=0.686
Gain(Refund)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Refund Marital Status Taxable Inc. Cheat
Yes Single 125K No
No Married 100K No
Building Decision Tree
No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
Entropy (Marital St.) No
No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
Single 2 2 1
Married 0 4 0
Divorced 1 1 1
I(2,2)=-2/4log22/4-2/4log22/4 I(2,2)=-1/2log21/2-1/2log21/2
=1 =1
I(2,2)=-0/4log20/4-4/4log24/4
=0
E(Marital Status)=4/10x1 + 4/10 x 0 + 4/10 x 1 = 0.6
Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.6=0.28
Refund Marital Status Taxable Inc. Cheat
Yes Single 125K No
No Married 100K No
Building Decision Tree
No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
Entropy (Taxable Inc.) No
No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
>80 3 4 1
<80 0 3 0
I(3,4)=-3/7log23/7-4/7log24/7
=1
I(0,3)=-0/3log20/3-3/3log23/3
=0
E(Marital Status)=7/10x1 + 3/10 x 0 = 0.686
Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Refund Marital Status Taxable Inc. Cheat
Yes Single 125K No
No Married 100K No
Building Decision Tree
No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
Which attribute got larger Gain? No
No
Married
Single
75K
90K
No
Yes
Gain(Refund) =0.19
Gain(Marital Status) =0.28 (Greater than other)
Gain(Taxable Inc.) =0.19
Refund Marital Status Taxable Inc. Cheat
Yes Single 125K No
No Married 100K No
Building Decision Tree
No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
Step – 2 No
No
Married
Single
75K
90K
No
Yes
Split the attributes according to the Marital Status
Marital Status Refund Taxable Inc. Cheat
MarSt Married No 100K No
Married Single, Married Yes 120K No
Divorced Married No 60K No
Married No 75K No
NO Single Yes 125K No
Single No 70K No
Single No 85K Yes
Single No 90K Yes
Divorced No 95K Yes
Divorced Yes 220K No
Marital Status Refund Taxable Inc. Cheat
Single Yes 125K No
Single No 70K No
Single No 85K Yes
Building Decision Tree Single
Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No
Step – 3
Calculate the Entropy with respect to Marital
Status=“Single” Refund Taxable Inc. Cheat
Yes 125K No
Entropy (MS=Single =>Refund) No
No
70K
85K
No
Yes
No 90K Yes
pi ni I(pi,ni)
Yes 0 1 0
No 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Marital Status Refund Taxable Inc. Cheat
Single Yes 125K No
Single No 70K No
Single No 85K Yes
Building Decision Tree Single
Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No
Step – 3
Calculate the Entropy with respect to Marital
Status=“Single” Taxable Inc. Cheat
125K No
Entropy (MS=Single =>Taxable Inc.) 70K
85K
No
Yes
90K Yes
pi ni I(pi,ni)
<80 0 1 0
>80 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Building Decision Tree
Marital Status Refund Taxable Inc. Cheat
Single Yes 125K No
MarSt Single No 70K No
Married Single,
Single No 85K Yes
Divorced Single No 90K Yes
Divorced No 95K Yes
NO Refund Divorced Yes 220K No
Yes No Marital Status Refund Taxable Inc. Cheat
Single Yes 125K No
NO TaxInc Divorced Yes 220K No
Marital Status Refund Taxable Inc. Cheat
< 80K > 80K
Single No 70K <80 No
Single No 85K >80 Yes
NO YES Single No 90K >80 Yes
Divorced No 95K >80 Yes
Decision Rules:
Marital St.= Married then Cheat= No
Marital St.= (Single Or Divorced) And Refund = Yes then Cheat = No
Marital St.= (Single Or Divorced) And Refund = No And TaxInc <= 80K then Cheat = No
Marital St.= (Single Or Divorced) And Refund = No And TaxInc >= 80K then Cheat = Yes
Building Decision Tree
Step – 1 Age
Adult
Income
low
Cartype
Family
Class
High
Calculate “Class” attribute Entropy. Adult high Sports High
Younger low Sports Low
Old low Family Low
Younger high Truck High
p=4, n=2, p + n=6 Old high Truck High
Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
=-4/6 log2 4/6 – 2/6 log2 2/6
= ((-4/6)(log(4/6) / log(2))) – ((2/6)(log (2/6) / log(2)))
= 0.92
Building Decision Tree
Age Income Cartype Class
Step – 2 Adult low Family High
Adult high Sports High
Entropy of Age? Gain of Age ? Younger low Sports Low
Old low Family Low
Entropy of Income? Gain of Income ? Younger
Old
high
high
Truck
Truck
High
High
Entropy of Cartype? Gain of Cartype ?
Which attribute’s Gain is larger…?
Split the table according to the larger Gain.
Calculate Entropy and Gain if needed
Build Tree
Age Income Cartype Class
Adult low Family High
Adult high Sports High
Building Decision Tree
Younger low Sports Low
Old low Family Low
Younger high Truck High
Old high Truck High
Step – 2
Entropy of Age=0.667 Gain of Age=0.256
Entropy of Income=0.459 Gain of Income=0.46
Entropy of Cartype=0.667 Gain of Cartype=0.256
Which attribute’s Gain is larger…?
Split the table according to the larger Gain.
Calculate Entropy and Gain Income Age Cartype Class
low Adult Family High
Build Tree low Younger Sports Low
Income low Old Family Low
High Low high Adult Sports High
high Younger Truck High
High high Old Truck High
Building Decision Tree
Income Age Cartype Class
Income
High Low low Adult Family High
low Younger Sports Low
low Old Family Low
High Age
Adult Younger, Old
High Low
Decision Rules:
Income = High then Class = High
Income = Low And Age = Adult then Class = High
Income = Low And (Age = Younger Or Old) then Class = Low
Building Decision Tree
Try Yourself:
Building Decision Tree
Try Yourself:
https://kindsonthegenius.com/blog/2018/04/how-to-build-a-decision-tree-for-
classification-step-by-step-procedure-using-entropy-and-gain.html
Under-fitting and Over-fitting
Producing a model that doesn’t perform well even
on the training data, is called under-fitting.
Although typically when this happens we decide
our model isn’t good enough and keep looking for
a better one.
Producing a model that performs well on the
training data but that generalizes poorly to any
new data, is called over-fitting. This could involve
learning noise in the data.
Underfitting and Overfitting
Under-fitting Just Right Over-fitting
• A very small model • A very large model
• Too few samples • Too many samples