KEMBAR78
Module 5 Notes | PDF | Applied Mathematics | Algorithms
0% found this document useful (0 votes)
16 views8 pages

Module 5 Notes

EFGR

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Module 5 Notes

EFGR

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

INTRODUCTION TO DECISION TREES

Introduction
Decision trees are one of the popular machine learning methods (Rokach &
Maimon, 2010). They can be represented in a hierarchical manner in which
case, they appear like an inverted tree with the widest portion of the tree being
at the bottom. The tree structure represents a partition of the attribute space.
As an example, consider the attribute gender in deciding if a person invited to
a wedding will attend or not. The division can be expressed as shown in 1

Figure 1: Very simple decision tree for wedding attendance

Very simple decision tree for wedding attendance


A typical decision tree will have many more nodes as shown in the figure
Example application areas for decision tree include medical diagnosis, credit
risk assessment, and cause of equipment malfunction.
Decision makers always prefer less complex decision trees because they are
easier to understand. Such trees are also usually less prone to overfitting. To
reduce complexity of decision trees, methods such as pruning (removing some
nodes are employed).

Decision Tree Structure


The top node of the tree is known as the root node. The other nodes that have
at least one child are known as internal nodes. The nodes that do not have even

1
Figure 2: A decision tree on if to go and play a soccer game based on the weather
status

one child are known as the leaf nodes. The leaf nodes represent a decision that
is made if a particular path through the tree is followed. The root node and
the internal nodes represent attributes of the data. The branches represent the
attribute values. Given a particular sample of the data and it attribute values,
you can only follow one path through the tree. You will end up in a leaf node
which will represent the classification or regression value for that
particular sample. The challenge in decision tree learning is to create the deci-
sion tree.

ID3 Algorithm
The classical algorithm for creation of decision trees is known as iterative di-
chitomiser 3 (ID3) algorithm invented by Ross Quinlan. It starts by selecting
the most appropriate attribute for the root node. Nodes are then added to
the branches of the root node based on some ranking criteria of the remaining
attributes with the highest ranked attribute being added next. The ranking cri-
teria used to select an attribute is known as information gain. The information
gain of an attribute is a measure of how well it partitions data into separate
groups (classes). An attribute with high information gain will separate data
into groups with low entropy. This means that the data in each of the groups
have high purity (are mostly of the same class). Information gain is usually
calculated from the entropy.

Given two attributes A and B, with A having higher information gain than

2
Figure 3: A group with high entropy (impure)

Figure 4: A group with low entropy (pure)

3
B, then you will select A for addition to the tree.
The formulaPtor the calculation of information entropy is given below:
c
Entropy(S) = i=1 −pi log2 pi
S is a group, C is the number of classes in the S , and pi is the proportion of
each class in the group.
The information Gain is calculated per attribute as given in the formula
below:
n
X |SAi |
Gain(A, S) = Entropy(S) − Entropy (SAi )
i=1
|S|

Where A is the attribute, S is a dataset (group of data points), SAi i = 1..n


is the set of subgroups created by separating the members of the dataset based
on the possible values of attribute A.
As an example, given a dataset S where one of the attributes is gender, using
the value of gender, we can divide S into two subgroups: the subgroup SA 1
where the gender is female and the subgroup SA2 where the gender is male.
From the formula we are subtracting the sum of the weighted entropy of the
subgroups ( S1 A1 and SA2 ) from the entropy of the main group (S) to get the
gain of attribute A.
One weakness of the information gain measure is that it tends to favour at-
tributes that have many possible values. As an examples, consider the attribute
in the dataset in 1 If we were to split the data using this attribute, each sample
would be in its own group i.e. we would have fourteen groups with each group
having only one member. The entropy of each of the subgroups would be zero.
Given the gain formula, you will be subtracting zero from entropy (S). Hence
this attribute would be the best to split on. However, the Sno attribute is not
a relevant attribute. To overcome this problem, another measure called gain
ratio is used. This measure penalizes attributes that split data into too many
subgroups. It is implemented by dividing the gain with a value called the split
information of the attribute
Pkwhich is calculated as shown in the formula below:
splitInformation(A, S) = i=1 − count(s i)
count(S) log count(si )
2 count(S)
Where si are the subgroups results from splitting data set S by attribute A.
The higher the number of subgroups that an attribute creates, the higher its
split information value.
The resulting formula for gain ratio is therefore

Gain(A, S)
GainRatio(S, A) =
SplitInformation(A, S)
The attribute with the highest gain ration is then added to the tree next.
Example 1. In this example, the problem you are going to solve is to calculate
the gain ratio of outlook attribute given the dataset below:
All this dataset belong to one group.
To get the entropy of the entire group, we get the proportion of each class and
apply the formula for entropy.

4
Table 1: Weather data
Sno Outlook e emperatur
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

Total samples=14, total(Yes)=9, total(No)=5;


p( Yes ) = 9/14, p(No) = 5/9
We will use log to base 2
9 9 5 5
Entropy(S) = − 14 log 14 + − 14 log 14 = 0.94
If we use the attribute outlook, we can divide the data into the following
subgroups:

Temperatur
Outlook Humidity Windy Play
e
1 Overcast Hot High Weak Yes
2 Overcast Cool Normal Strong Yes
3 Overcast Mild High Strong Yes
4 Overcast Hot Normal Weak Yes
5 Rain Cool Normal Strong No
6 Rain Mild High Strong No
7 Rain Mild High Weak Yes
8 Rain Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Hot High Weak No
11 Sunny Hot High Strong No
12 Sunny Mild High Weak No
13 Sunny Cool Normal Weak Yes
14 Sunny Mild Normal Strong Yes

The gain of attribute s given the subgroups is therefore

5
     
5 3 3 2 2
Gain ( outlook , S) = 0.94 − (0) + − log2 + − log2 +
14 5 5 5 5
    
5 3 3 2 2
− log2 + − log2 = 0.246
14 5 5 5 5
4 4 5 5 5 5
splitInformation ( outlook, S) = − 14 log2 14 − 14 log2 14 − 14 log2 14 = 1.58
0.246
GainRatio ( outlook, S) = 1.58 = 0.16
GainRatio(temperature, S) = 0.019
GainRatio(Humidity, S) = 0.15
GainRatio(Windy, S)=0.045
Outlook is the attribute with the highest gain ratio so it will be used to make the
node of the tree as shown below:

Figure 5: Caption

The overcast branch will have a value of Yes as a leaf node because all the
examples in that subgroup are have a label of Yes. The tree does not grow any
more along that path.
For the Rain and Sunny branches. Using the five examples under the Rain
branch, you calculate gain for the remaining attributes (Temperature, Humidity
and Windy) and select one of them to be the child node of Outlook under the
Rain branch. The same applies to the Sunny branch. You repeat the process
until all examples in a branch are of the same classification or until we run out
of attributes in which case. The classification is the same as the class of the
majority of the remaining examples.

6
IMPLEMENTATION OF DECISION TREEE ALGORITHM
USING PYTHON
You can implement decisions trees using python. The most important module
is the

from sklearn.model_selection import train_test_split


from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from google.colab import drive
#CSv version of the label data should be mounted in google drive
drive.mount(’/content/drive’)
path= ’weather.csv’
data=pd.read_csv(path)
data.head()
#encode the label/class column (Play)
classEncoder = LabelEncoder()
y = classEncoder.fit_transform(data[’Play’])
data.drop(columns=[’Play’], inplace=True)
labels=list(classEncoder.classes_)
#create encoders for all other columns
labelEncoders=[]
for column in data.columns:
encoder = LabelEncoder()
data[column] = encoder.fit_transform(data[column])
labelEncoders.append(encoder)
#create an encoder for any input
def encodeInput(x):
values=[]
for i in range(len(labelEncoders)):
values.append(labelEncoders[i].transform([x[i]])[0])
return values
trainX, testX, trainY, testY = train_test_split(data, y, test_size=0.3,
random_state=42)
model=DecisionTreeClassifier()
model.fit(trainX,trainY)
predicted = model.predict(testX)

#confusion matrix
cm=confusion_matrix(testY,predicted)
print(cm)
#confusion matrix display
from sklearn.metrics import ConfusionMatrixDisplay

7
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=model.classes_)
disp.plot()
from sklearn.metrics import accuracy_score
acc=accuracy_score(testY,predicted)
print("accuracy={:0.2f}%".format(acc))
import warnings
warnings.filterwarnings(’ignore’)
x= ["Rain", "Mild", "Normal", "Weak"]
x= encodeInput(x)
model.predict([x])
plot_tree(model, feature_names=attributes, class_names=labels, fontsize=8)
plt.show()

You might also like