High Impact Skills Development Program
in Artificial Intelligence, Data Science, and Blockchain
Module 1: AI Fundamentals
Lecture 6: Decision Trees
Instructor: Dr Syed Imran Ali
Assistant Professor, SEECS, NUST
Courtesy: Dr. Ahsan Sadaat, Dr. Faisal Shafait and Dr. Adnan ul Hasan 1
Supervised Learning
- Regression
- Classification
2
Classification
• Human learning from past experiences
• A computer does not have “experiences”
• A computer system learns from data, which represent some “past
experiences” of an application domain
• Our focus: learn a target function that can be used to predict the
values of a discrete class attribute, e.g., approve or not-approved,
and high-risk or low risk.
• The task is commonly called: Supervised learning, classification, or
inductive learning.
3
Classification
• Supervised learning: classification is seen as supervised learning
from examples.
• Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes.
• Test data are classified into these classes too.
• Goal: To learn a classification model from the data that can be used
to predict the classes of new (future, or test) cases/instances
4
Supervised learning process: two
steps
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the
model accuracy
Number of correct classifications
Accuracy
Total number of test cases 5
An example: data (loan application)
Approved or not
6
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.
No learning: classify all future applications (test data) to the
majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
7
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values of other
attributes.
• Goal: previously unseen records should be assigned a class as accurately
as possible.
• A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
8
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set 9
Examples of Classification Task
• Predicting tumor cells as benign or malignant
• Classifying credit card transactions
as legitimate or fraudulent
• Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
• Categorizing news stories as finance,
weather, entertainment, sports, etc 10
Classification Techniques
• Decision Tree based Methods
• Naïve Bayes and Bayesian Belief Networks
• Memory based reasoning
• Neural Networks
• Support Vector Machines
• Rule-based Methods
11
Decision Tree Learning
• Decision tree learning is one of the most widely used
techniques for classification.
• Its classification accuracy is competitive with other methods,
and
• it is very efficient.
• The classification model is a tree, called decision tree.
12
Example of a Decision Tree
Tid Refund Marital Taxable Splitting Attributes
Status Income Cheat
1 Yes Single 125K No
2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10
Training Data Model: Decision Tree
13
Another Example of Decision Tree
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than
10
10 No Single 90K Yes
one tree that fits the same
data! 14
Decision Tree Classification Task
15
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
16
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
17
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
18
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
19
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
20
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
21
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
22
Example: Weather Dataset
23
Which attribute to select?
24
Criterion for attribute/ feature selection
Which is the best attribute?
Want to get the smallest tree
Heuristic: choose the attribute that produces the
“purest” nodes
25
Final decision tree
Splitting stops when data can’t be split any further
26
Entropy
• A decision tree is built top-down from a root node and
involves partitioning the data into subsets that contain
instances with similar values (homogeneous)
• ID3 algorithm uses entropy to calculate the
homogeneity of a sample
• If the sample is completely homogeneous the entropy
is zero and if the sample is an equally divided it has
entropy of one
27
Computing Entropy
• To build a decision tree, we need to calculate two types of entropy
values using frequency tables as follows:
• Entropy using the freq. table of one attribute (entropy of target/class)
28
Computing Entropy
• Entropy using the
freq. table of two
attributes
29
Information Gain
• The information gain is based on the decrease in
entropy after a dataset is split on an attribute
• Constructing a decision tree is all about finding
attribute that returns the highest information gain (i.e.,
the most homogeneous branches)
30
Example: Weather dataset
31
Example: Weather dataset
• Step 2: The dataset is then split on the different
attributes. The entropy for each branch is calculated.
• Then it is added proportionally, to get total entropy for
the split.
• The resulting entropy is subtracted from the entropy
before the split.
• result is the Information Gain, or decrease in entropy.
32
Example: Weather dataset
33
Example: Weather dataset
• Step 3: Choose attribute with the largest information
gain as the decision node.
34
Example: Weather dataset
• Step 4a: A branch with entropy of 0 is a leaf node
35
Example: Weather Dataset
36
Example: Weather dataset
• Step 4b: A branch with entropy of more than 0 needs
further splitting
37
Example: Weather dataset
• Step 5: The ID3 algorithm is run recursively on the
non-leaf branches, until all data is classified.
38
Happy
Learning!
The difference has to be maximum – difference is between two entropies