Decision Tree Classification
Decision tree builds classification or regression models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy).
Leaf node (e.g., Play) represents a classification or decision.
The topmost decision node in a tree which corresponds to the best predictor called root node.
Decision trees can handle both categorical and numerical data.
Information theory
Introduced by Claude Shannon. It has quantified entropy. This is key measure of information
which is usually expressed by the average number of bits needed to store or communicate one
symbol in a message.
Information theory measure information in bits
ID3 Algorithm:
Based on greedy search through the space of all possible branches with no backtracking
Includes all predictors
Entropy
A decision tree is built top-down from a root node and involves partitioning the data into
subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to
calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is
zero and if the sample is an equally divided it has entropy of one.
entropy(p1,p2,…,pn)=−p1log(p1)−p2log(p2)−⋯−pnlog(pn)
To build a decision tree, we need to calculate two types of entropy using frequency tables as
follows:
a) Entropy using the frequency table of one attribute:
b) Entropy using the frequency table of two attributes:
Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches).
Step 1: Calculate entropy of the target.
Step 2: The dataset is then split on the different attributes. The entropy for each branch is
calculated. Then it is added proportionally, to get total entropy for the split. The resulting
entropy is subtracted from the entropy before the split. The result is the Information Gain, or
decrease in entropy.
Step 3: Choose attribute with the largest information gain as the decision node, divide the
dataset by its branches and repeat the same process on every branch.
Step 4a: A branch with entropy of 0 is a leaf node.
Step 4b: A branch with entropy more than 0 needs further splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to
the leaf nodes one by one.
Detailed Explanation about Calculations
Overall Entropy
Expected new entropy for each attribute
Outlook
Outlook attribute contains 3 distinct values: Overcast, Rainy, Sunny
Overcast: 4 records; 4 are “Yes”
-[(4/4) log2 (4/4)] = 0
Rainy: 5 Records; 3 are “Yes”, 2 are “No”
E(3,2) = E(0.6, 0.4)
= -[(0.6) log2 (0.6) + (0.4) log2 (0.4)]
= -[(0.6) (-0.7369) + (0.4) (-1.3219)]
= 0.971
sunny: 5 records, 2 are “yes”, 3 are “No”
E(2,3) = E(0.4, 0.6)
= -[(0.4) log2 (0.4) + (0.6) log2 (0.6) ]
= 0.971
Thus, Expected New Entropy =
-[(4/14) * 0 + (5/14) * 0.971 + (5/14) * 0.971]
= 0.693
Gain (Outlook): 0.940 – 0.693 = 0.247
Temp
Temp attribute contains 3 distinct values: Hot, Mild, Cool
Hot: 4 records; 2 are “Yes”, 2 are “No”
E(2,2) = E(0.5, 0.5) = -[(0.5) log2 (0.5) + (0.5) log2 (0.5)]
= 1.0 { Put log2 (0.5) = -1}
Mild: 6 records; 4 are “Yes”, 2 are “No”
E(4,2) = E(0.666, 0.333) = -[(0.666) log2 (0.666) + (0.333) log2 (0.333)]
= -[(0.666) log2 (-0.586) + (0.333) log2 (-1.586)] = 0.918
Cool: 4 Records; 3 are “Yes”, 1 are “No”
E(3,1) = E(0.75, 0.25)= -[(0.75) log2 (0.75) + (0.25) log2 (0.25)]
= -[(0.75) (-0.4150) + (0.25) (-2)]
= 0.81125
Thus, Expected New Entropy =
-[(4/14) * (1) + (6/14) * 0.918 + (4/14) * 0.81125]
= 0.9109
Gain (Temp): 0.940 – 0.9109 = 0.029
Similarly, you can compute
Gain (Humidity) = .152
Gain (Windy) = .048
Gain by splitting at Outlook is largest., so we take outlook as decision node and divide the dataset by
its branches
Repeat the steps for Sunny, Overcast and Rainy
(Calculation sheets circulated)
Overall Entropy
E(Sunny): Yes - 3; No – 2
E(3,2) = E(0.6, 0.4) = -[(0.6) log2 (0.6) + (0.4) log2 (0.4)] = -[(0.6) (-0.737) + (0.4) (-1.322)] = 0.971
Now compute for temp, humidity and windy
Temp
2 distinct values: mild, cool
Mild: yes: 2; no: 1
E(2,1) = E(0.67, 0.33) = -[(0.67) log2 (0.67) + (0.33) log2 (0.33)] = -[(0.67) (-0.577) + (0.33) (-1.599)] = 0.914
Cool: yes: 1; no: 1
E(1,1) = E(0.67, 0.33) = -[(0.5) log2 (0.5) + (0.5) log2 (0.5)] = -[(0.5 (-1) + (0.5) (-1)] = 1.0
Expected New Entropy for Temp= (3/5) (0.914) + (2/5)(1.0)= 0.9484
Gain (Temp) = 0.971 – 0.9484 = .0226
Humidity
2 distinct values: high, normal
high: yes: 1; no: 1
E(1,1) = E(0.5, 0.5) = 1.0
normal: yes: 2; no: 1
E(2,1) = E(0.67, 0.33) = 0.914
Expected New Entropy for humidity= (2/5) (1.0) + (3/5) (0.914)= 0.9484
Gain (humidity) = 0.971 – 0.9484 = .0226
Windy
2 distinct values: false, true
false: yes: 3; no: 0
E(3,0) = -[(3/3) log2 (3/3) = 0
true: yes: 0; no: 2
E(0,2) = -[(2/2) log2 (2/2) = 0
Expected New Entropy for humidity= (3/5) (0) + (2/5) (0)= 0
Gain (windy) = 0.971 – 0 = .971 (Maximum)
Thus, split at Windy
Rainy
Overall Entropy
E(Rainy): Yes - 2; No – 3
E(2,3) = E(0.4, 0.6) = 0.971
Now compute expected new entropy for temp, humidity and windy
Temp
3 distinct values: hot, mild, cool
Hot: yes: 0; no: 2
E(0,2) = 0
Mild: yes: 1; no: 1
E(1,1) = 1
Cool: yes: 1; no: 0
E(1,0) = 0
Expected New Entropy for Temp= (2/5) (0) + (2/5)(1.0) + (1/5)(0)= 0.40
Gain (Temp) = 0.971 – 0.400 = .571
Humidity
2 distinct values: high, normal
high: yes: 0; no: 3
E(0,3) = 0
normal: yes: 2; no: 0
E(2,0) = 0
Expected New Entropy for humidity= (3/5) (0) + (2/5) (0)= 0
Gain (humidity) = 0.971 – 0 = .971 (Maximum)
Windy
2 distinct values: false, true
false: yes: 1; no: 2
E(1,2) = 0.914
true: yes: 1; no: 1
E(1,1) = 1
Expected New Entropy for humidity= (3/5) (0.914) + (2/5) (1)= 0.9484
Gain (windy) = 0.971 – 0.9484= .0226
Thus, split at WIndy