Lecture 4 - Decision Tree
Lecture 4 - Decision Tree
• Dr Victor Lavrenko
– University of Edinburgh
School of Informatics
– http://www.inf.ed.ac.uk/teaching/courses/iaml/2011/slides/dt.pdf
• Hard to guess
• Divide & conquer:
– Split into subsets
– Are they pure?
(all yes or all no)
– If yes: stop
– If no: repeat
• See which subset new data falls into
4
4 yes / 0 no
pure subset
2 yes / 3 no 3 yes / 2 no
split further split further
5
4 yes / 0 no
pure subset
3 yes / 2 no
split further
6
7
Decision Tree with Labels
8
ID3 Algorithm
• Split(node,{examples}):
1. A the best attribute for splitting the examples
2. Decision attribute for this node A
3. For each value of A, create new child node
4. Split training {examples} to child node
5. If examples perfectly classified: STOP
else: Iterate over the new child nodes
Split (child_node, {subset of examples})
• Ross Quinlan (ID3:1986), (C4.5:1993)
• Breimanetal (CaRT: 1984) from statistics
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
9
Which attribute to split on?
10
Entropy
• Entropy:
– S…..subset of training examples
– 𝑝(+)/ 𝑝(−) … % of positive / negative examples in S
• Impure (3 yes / 3 no):
11
Information Gain
• Mutual Information
– Between attribute A and class labels of S
12
Predicting potential loan defaults
Did I pay previous
loans on time?
• What makes loan risky?
Example: excellent,
good, or fair
What’s my income?
Example:
I want a to buy $80K per year
a new house! Credit
★★★★
How soon do I need to
Income pay the loan?
★★★
Example: 3 years,
Term
5 years,…
★★★★★
Loan Age, reason for the
Application
Personal Info loan, marital status,…
★★★ Example: Home loan
for a married couple
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
13
Intelligent application
Loan
Applications
Safe
✓
Risky
✘
ŷi = +1
Output: ŷ
Input: xi Predicted
class ŷi = -1
14
What does a decision tree represent?
3 year loans with fair Risky Safe Risky Safe Term? Risky
credit history are risky
3 years 5 years
15
Decision tree classification task
6 No Medium 60K No
Training Set
Apply
Model Decision
Tid Attrib1 Attrib2 Attrib3 Class
Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
16
Decision Tree learning problem
17
How do we find the best tree?
18
Simple (greedy) algorithm finds “good” tree
19
Simple (greedy) algorithm finds “good” tree
excellent poor
Credit ?
fair
20
Simple (greedy) algorithm finds “good” tree
Step 3: Making predictions Step 4: Recursion
(all data) (all data)
Safe Safe
Risky Risky
Nothing more
to do here
Credit? Credit?
21
Greedy decision tree learning
22
Feature split learning = Decision stump learning
23
Feature split learning = Decision stump learning
24
Feature split learning = Decision stump learning
credit?
Credit?
excellent fair poor
9 0 9 4 4 14
How do we select
the best feature?
Credit?
OR Term?
Loan status:
Safe Risky Root Error = .
22 18
=
22 correct 18 mistakes
Safe
Tree Classification error
(root) 0.45
ŷ = majority class
27
Choice 1: Split on credit history?
Split on Credit: Classification Error
Loan status: Root Does a split on Credit
22 18 Loan status: Root
Safe Risky
reduce classification Safe Risky 22 18
error below 0.45?
Credit?
Credit?
Credit?
Step 1: For each
excellent fair poor intermediate node,
9 0 9 4 4 14 set ŷ = majority value
Term?
3 years 5 years
16 4 6 14
Evaluating the split on Term
Safe Risky
Term?
3 years 5 years =
16 4 6 14
Tree Classification error
Safe Risky (root) 0.45
Split on credit 0.2
4 mistakes 6 mistakes Split on term 0.25 29
Choice 1 vs Choice 2
Credit? OR Term?
30
Decision Tree Learning – Recursion and Stopping Conditions
excellent poor
fair
high Low
Credit?
32
Second level Final Decision Tree
33
Simple greedy decision tree learning
When
Risky do we stop???
34
Stopping Conditions
excellent excellent
Fair Fair
9 0 9 0 high low
9 4 high low 9 4
4 5 0 9 4 5 0
9 9
Safe Term? Safe Term?
Term? Risky
Term? Risky
35
Multiclass prediction
Safe
Loan Classifier
Application
MODEL
Risky
Output: ŷ i
Input: xi Predicted
class
Danger
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
36
Multiclass decision stump
N = 40,
1 feature,
3 classes Loan status: Root
Credit y Safe Risky Danger 18 12 10
excellent safe
fair risky
Credit?
fair safe
poor danger
excellent risky excellent fair poor
fair safe 9 2 1 6 9 2 3 1 7
poor danger
poor safe
fair safe
… … Safe Risky Danger
37
How do we use real values inputs?
Income Credit Term y
Spilt on each numeric value?
$105 K excellent 3 yrs Safe
$112 K good 5 yrs Risky
$73 K fair 3 yrs Safe Danger: May only Root Loan status:
$69 K excellent 5 yrs Safe contain one data point 22 18 Safe Risky
$217 K excellent 3 yrs Risky per node
$120 K good 5 yrs Safe
Income?
$64 K fair 3 yrs Risky
$340 K excellent 5 yrs Safe
$60 K good 3 yrs Risky
$30K $31.4K $39.5K $61.1K $91.3K
0 1 1 0 0 1 0 1 0 1
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
38
Alternative: Threshold split
Income?
39
Threshold splits in 1-D Visualizing the threshold split
$0K
0 10 20 30 40 …
Age
40
Split on Age >= 38 Depth 2: Split on Income >= $60K
$40K
Predict Safe
$40K
$0K $0K
0 10 20 30 40 …
Age Age
0 10 20 30 40 …
41
Each split partitions the 2-D space
Age >= 38
Income Age < 38 Income >= 60K
$80K
$40K
Age >= 38
Income < 60K
$0K
0 10 20 30 40 …
Age
42
Finding the best threshold split Consider a threshold between points
Infinite possible
values of t
Income = t * Same classification error for any
threshold split between vA and vB
Income < t * Income >= t *
Safe
Risky
Income Safe
Risky
Income vA vB
$10K $120K
$10K $120K
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
43
Consider mid-points Threshold split selection algorithm
Finite number of
splits to consider • Step 1: Sort the values of a feature h j(x) :
Let {v1, v2, v3, …vN} denote sorted values
• Step 2:
Safe - For i = 1 …N-1
Income
Risky
• Consider split t i = (vi + vi+1) / 2
• Compute classification error for treshold
$10K $120K
split h j(x) >= t i
- Chose the t * with the lowest classification error
44
Overfitting in Decision Trees
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
45
Avoid Overfitting
47
Trees are Interpretable
48
Summary