KEMBAR78
Lecture 4 - Decision Tree | PDF | Statistical Classification | Algorithms And Data Structures
0% found this document useful (0 votes)
20 views48 pages

Lecture 4 - Decision Tree

The document outlines the principles and algorithms behind decision trees, focusing on the ID3 algorithm and its application in classification tasks such as predicting loan defaults. It discusses key concepts like entropy, information gain, and the process of recursively splitting data based on features to create a decision tree. Additionally, it highlights the challenges of overfitting and the importance of evaluating the effectiveness of splits in decision tree learning.

Uploaded by

imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views48 pages

Lecture 4 - Decision Tree

The document outlines the principles and algorithms behind decision trees, focusing on the ID3 algorithm and its application in classification tasks such as predicting loan defaults. It discusses key concepts like entropy, information gain, and the process of recursively splitting data based on features to create a decision tree. Additionally, it highlights the challenges of overfitting and the importance of evaluating the effectiveness of splits in decision tree learning.

Uploaded by

imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

References

• Dr Victor Lavrenko
– University of Edinburgh
School of Informatics
– http://www.inf.ed.ac.uk/teaching/courses/iaml/2011/slides/dt.pdf

• Emily Fox & Carlos Guestrin


– University of Washington
– https://www.coursera.org/learn/ml-
classification/supplement/2PWpF/slides-presented-in-this-
module 2
Outline
• Decision Tree Intuition
• ID3 Algorithm
• Decision Stump
• Multiclass Classification & Predicting Probabilities
• Real valued features
• Threshold Split
• Overfitting in Decision Trees
• Problems and advantages of Decision Trees
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees 3
Predict if John will play tennis

• Hard to guess
• Divide & conquer:
– Split into subsets
– Are they pure?
(all yes or all no)
– If yes: stop
– If no: repeat
• See which subset new data falls into

4
4 yes / 0 no
pure subset

2 yes / 3 no 3 yes / 2 no
split further split further

5
4 yes / 0 no
pure subset

3 yes / 2 no
split further

6
7
Decision Tree with Labels

8
ID3 Algorithm
• Split(node,{examples}):
1. A the best attribute for splitting the examples
2. Decision attribute for this node A
3. For each value of A, create new child node
4. Split training {examples} to child node
5. If examples perfectly classified: STOP
else: Iterate over the new child nodes
Split (child_node, {subset of examples})
• Ross Quinlan (ID3:1986), (C4.5:1993)
• Breimanetal (CaRT: 1984) from statistics

Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
9
Which attribute to split on?

• Want to measure “purity” of the split


– More certain about Yes/No after the split
• Pure set (4 yes/ 0 no) => completely certain (100%)
• Impure (3 yes/ 3 no) => completely uncertain (50%)

10
Entropy

• Entropy:
– S…..subset of training examples
– 𝑝(+)/ 𝑝(−) … % of positive / negative examples in S
• Impure (3 yes / 3 no):

• Pure set (4 yes / 0 no):

11
Information Gain

• Want many items in pure sets


• Expected drop in entropy after split:

• Mutual Information
– Between attribute A and class labels of S

12
Predicting potential loan defaults
Did I pay previous
loans on time?
• What makes loan risky?
Example: excellent,
good, or fair
What’s my income?

Example:
I want a to buy $80K per year
a new house! Credit
★★★★
How soon do I need to
Income pay the loan?
★★★
Example: 3 years,
Term
5 years,…
★★★★★
Loan Age, reason for the
Application
Personal Info loan, marital status,…
★★★ Example: Home loan
for a married couple
Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
13
Intelligent application
Loan
Applications

Safe

Intelligent loan application Risky


review system ✘

Risky

ŷi = +1

Loan Classifier Safe


Classifier Review Application
MODEL Risky

Output: ŷ
Input: xi Predicted
class ŷi = -1
14
What does a decision tree represent?

Scoring a loan application


Start
x i = (Credit = poor, Income = high, Term = 5 years)
excellent poor
Credit?
Start
fair
Income?
Safe Term? excellent poor
high Low Credit?
3 years 5 years
fair

Risky Safe Term? Risky Income?


Safe Term?
high Low
3 years 5 years 3 years 5 years

3 year loans with fair Risky Safe Risky Safe Term? Risky
credit history are risky
3 years 5 years

3 year loans with high


Risky Safe ŷi = Safe
income & poor credit
history are risky

15
Decision tree classification task

Tid Attrib1 Attrib2 Attrib3 Class


Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Model Decision
Tid Attrib1 Attrib2 Attrib3 Class
Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
16
Decision Tree learning problem

Training data: N observations (xi,yi)


Credit Term Income y
• Error measures fraction of mistakes
excellent 3 yrs high safe
fair 5 yrs low risky Error = # incorrect predictions
Optimize
# examples
fair
poor
3 yrs
5 yrs
high
high
safe
risky
quality metric
on training data
T(X)
excellent 3 yrs low risky
- Best possible value : 0.0
fair 5 yrs low safe
poor 3 yrs high risky - Worst possible value: 1.0
poor 5 yrs low safe
fair 3 yrs high safe

17
How do we find the best tree?

• Lowest classification error


Exponentially large number of possible
trees makes decision tree learning hard!
(NP-hard problem)

T1(X) T2(X) T3(X)

T4 (X) T5(X) T6 (X)

18
Simple (greedy) algorithm finds “good” tree

Step 1: Start with an empty tree

(all data) Safe


Risky

Histogram All points in the


of y values dataset

19
Simple (greedy) algorithm finds “good” tree

Step 2: Split on a feature

(all data) Split/partition


data on Credit

excellent poor
Credit ?

fair

Subset of data with Subset of data with Subset of data with


Credit = excellent Credit = fair Credit = poor

20
Simple (greedy) algorithm finds “good” tree
Step 3: Making predictions Step 4: Recursion
(all data) (all data)
Safe Safe
Risky Risky
Nothing more
to do here

Credit? Credit?

excellent fair poor excellent fair poor

Here, all examples Build tree from Build tree from


Predict Safe Predict Safe
are Safe loans these data points these data points

21
Greedy decision tree learning

• Step 1: Start with an empty tree


• Step 2: Select a feature to split data
• For each split of the tree:
Problem 1: Feature
• Step 3: If nothing more to, split selection
make predictions
Problem 2:
• Step 4: Otherwise, go to Step 2 & Stopping condition
continue (recurse) on this split
Recursion

22
Feature split learning = Decision stump learning

Start with the data


Assume N = 40, 3 features
Credit Term Income y Loan status: Safe Risky
excellent 3 yrs high safe
(all data)
fair 5 yrs low risky Number of Risky
22
fair 3 yrs high safe 18 loans
poor 5 yrs high risky
excellent 3 yrs low risky
fair 5 yrs low safe
poor 3 yrs high risky Number of Safe
poor 5 yrs low safe loans N = 40 examples
fair 3 yrs high safe

23
Feature split learning = Decision stump learning

Compact visual Decision stump: Single


notation: Root node level tree
Loan status: Root
(a2l2
l dat1a8)
Safe Risky
Loan status: Safe Risky Split on Credit
Root
22 18 Credit?
Number of risky
loans excellent fair poor

excellent fair poor


9 0 9 4 4 14
Number of safe
loans N = 40 examples Subset of data with Subset of data with Subset of data with
Credit = excellent Credit = fair Credit = poor

24
Feature split learning = Decision stump learning

Visual Notation: Making predictions


Intermediate nodes with a decision stump

Loan status: root


Loan status: Root Safe Risky 22 18
Safe Risky 22 18

credit?

Credit?
excellent fair poor
9 0 9 4 4 14

excellent fair poor


9 0 9 4 4 14 Safe Safe Risky

For each intermediate node,


set ŷ = majority value
Intermediate nodes
25
Selecting best feature to split on
Loan status: Root Find the “best”
Safe Risky 22 18 feature to split on!
How do we learn a
Credit? decision stump?
excellent fair poor
9 0 9 4 4 14

Choice 1: Split on Credit Choice 2: Split on Term


Loan status: Loan status:
Root Root
Safe Risky Safe Risky
22 18 22 18

How do we select
the best feature?
Credit?
OR Term?

excellent fair poor 3 years 5 years


9 0 9 4 4 14 16 4 6 14
26
How do we measure eff ectiveness of a split?

Calculating classification error


• Step 1: ŷ = class of majority of data in node
• Step 2: Calculate classification error of
predicting ŷ for this data

Loan status:
Safe Risky Root Error = .
22 18

=
22 correct 18 mistakes
Safe
Tree Classification error
(root) 0.45
ŷ = majority class

27
Choice 1: Split on credit history?
Split on Credit: Classification Error
Loan status: Root Does a split on Credit
22 18 Loan status: Root
Safe Risky
reduce classification Safe Risky 22 18
error below 0.45?
Credit?
Credit?

excellent fair poor


9 4 4 14 excellent fair poor
9 0
9 0 9 4 4 14

How good is the split on Credit ? Safe Safe Risky

Loan status: Root


Safe Risky 22 18 0 mistakes 4 mistakes 4 mistakes

Credit?
Step 1: For each
excellent fair poor intermediate node,
9 0 9 4 4 14 set ŷ = majority value

Safe Safe Risky


28
Choice 2: Split on Term?
Loan status: Root
Safe Risky 22 18

Term?

3 years 5 years
16 4 6 14
Evaluating the split on Term
Safe Risky

Loan status: Root


Safe Risky 22 18
Error = .

Term?

3 years 5 years =
16 4 6 14
Tree Classification error
Safe Risky (root) 0.45
Split on credit 0.2
4 mistakes 6 mistakes Split on term 0.25 29
Choice 1 vs Choice 2

Tree Classification error


(root) 0.45
split on credit 0.2
split on loan term 0.25

Choice 1: Split on Credit Choice 2: Split on Term


Loan status: Root Loan status: Root
Safe Risky 22 18 Safe Risky 22 18

Credit? OR Term?

excellent fair poor 3 years 5 years


9 0 8 4 4 14 16 4 6 14
WINNER

30
Decision Tree Learning – Recursion and Stopping Conditions

Credit Term Income y


excellent 3 yrs high safe

Learn decision tree from data? fair


fair
5 yrs
3 yrs
low
high
risky
safe
Start

excellent poor

poor 5 yrs high risky


Credit?

fair

excellent 3 yrs low risky Safe Term?


Income?

high Low

fair 5 yrs low safe


3 years 5 years

Risky Safe Term? Risky

poor 3 yrs high risky


3 years 5 years

Loan status: Root poor 5 yrs low safe


Safe Risky Risky Safe

22 18 fair 3 yrs high safe

Credit?

excellent fair poor


9 0 9 4 4 14
We’ve learned a decision
Safe
All data points are Safe è stump, what next?
nothing else to do with
this subset of data
Leaf node Risky Safe
31
Tree Learning = Recursive stump learning

32
Second level Final Decision Tree

Loan status: Root


Loan status:
Root poor
22 18 22 18 4 14
Safe Risky Safe Risky
Credit? Income?
Credit?
excellent
excellent fair poor Fair
9 0 high low
9 0 9 4 4 14 9 4
4 5 0
9
Term? Safe Term?
Safe Income?
Term? Risky
3 years 5 years
3 years 5 years high Low 0 4 9 0
0 4 9 0 4 5 0 9 3 years 5 years
0 2 4 3
Risky Safe
Risky Safe Risky

Build another stump Risky Safe


these data points

33
Simple greedy decision tree learning

Pick best feature to split on

Learn decision stump with


this split

For each leaf of decision


stump, recurse

When
Risky do we stop???

34
Stopping Conditions

All data in these Already split on all


nodes have same possible features
Root poor
y value
Loan è
status:
Root poor Loan è
status: 22 18
22 18 4 14 4 14
Safe Risky
Nothing to do Safe Risky
Nothing to do
Credit? Credit? Income?
Income?

excellent excellent
Fair Fair
9 0 9 0 high low
9 4 high low 9 4
4 5 0 9 4 5 0
9 9
Safe Term? Safe Term?
Term? Risky
Term? Risky

5 years 3 years 5 years


3 years
9 0 0 4 9 0
0 4
3 years 5 years 3 years 5 years
0 2 4 3 0 2 4 3
Risky Safe Risky Safe

Risky Risky Safe


Safe

35
Multiclass prediction

Safe

Loan Classifier
Application
MODEL
Risky
Output: ŷ i
Input: xi Predicted
class
Danger

Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
36
Multiclass decision stump
N = 40,
1 feature,
3 classes Loan status: Root
Credit y Safe Risky Danger 18 12 10
excellent safe
fair risky
Credit?
fair safe
poor danger
excellent risky excellent fair poor
fair safe 9 2 1 6 9 2 3 1 7
poor danger
poor safe
fair safe
… … Safe Risky Danger

37
How do we use real values inputs?
Income Credit Term y
Spilt on each numeric value?
$105 K excellent 3 yrs Safe
$112 K good 5 yrs Risky
$73 K fair 3 yrs Safe Danger: May only Root Loan status:
$69 K excellent 5 yrs Safe contain one data point 22 18 Safe Risky
$217 K excellent 3 yrs Risky per node
$120 K good 5 yrs Safe
Income?
$64 K fair 3 yrs Risky
$340 K excellent 5 yrs Safe
$60 K good 3 yrs Risky
$30K $31.4K $39.5K $61.1K $91.3K
0 1 1 0 0 1 0 1 0 1

Can’t trust prediction


(overfitting)

Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
38
Alternative: Threshold split

Loan status: Root


Safe Risky 22 18 Split on the
feature Income

Income?

< $60K >= $60K


8 13 14 5

Subset of data with


Income >= $60K Many data points è
lower chance of overfitting

39
Threshold splits in 1-D Visualizing the threshold split

Threshold split is the line


Threshold split is
Income = $60K the line Age = 38
Income
Income < $60K Income >= $60K
Safe …
Risky $80K
Income

$10K $120K $40K

$0K
0 10 20 30 40 …
Age

40
Split on Age >= 38 Depth 2: Split on Income >= $60K

Threshold split is the


line Income = 60K
Income age < 38 age >= 38 Income
Predict Risky
… …
$80K $80K

$40K
Predict Safe
$40K

$0K $0K
0 10 20 30 40 …
Age Age
0 10 20 30 40 …

41
Each split partitions the 2-D space

Age >= 38
Income Age < 38 Income >= 60K

$80K

$40K
Age >= 38
Income < 60K
$0K
0 10 20 30 40 …
Age

42
Finding the best threshold split Consider a threshold between points

Infinite possible
values of t
Income = t * Same classification error for any
threshold split between vA and vB
Income < t * Income >= t *
Safe
Risky
Income Safe
Risky
Income vA vB
$10K $120K

$10K $120K

Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
43
Consider mid-points Threshold split selection algorithm

Finite number of
splits to consider • Step 1: Sort the values of a feature h j(x) :
Let {v1, v2, v3, …vN} denote sorted values
• Step 2:
Safe - For i = 1 …N-1
Income
Risky
• Consider split t i = (vi + vi+1) / 2
• Compute classification error for treshold
$10K $120K
split h j(x) >= t i
- Chose the t * with the lowest classification error

44
Overfitting in Decision Trees

• Can always classify training examples perfectly


– Keep splitting until each node contains 1 example
– Singleton = pure
• Doesn’t work on new data

Decision Tree Intuition - ID3 Algorithm - Decision Stump - Multiclass Classification - Real valued features - Threshold Split - Pros and Cons of Decision Trees
45
Avoid Overfitting

• Stop splitting when not statistically significant


• Grow, then post-prune
– Based on validation set
• Sub-tree replacement pruning
– For each node
• Pretend remove node+ all children from the tree
• Measure performance on validation set
– Remove node that results in greatest improvement
– Repeat until further pruning is harmful
46
Key Problem with Information Gain
• Biased towards attributes with many
values
• Won’t work for new data
D15 Rain High Weak

47
Trees are Interpretable

• Read rules off the tree


– Concise description of what
makes an item positive
• No “black box”
– Important for users

48
Summary

• ID3: grows decision tree from the root down


– Greedily selects next best attribute (Gain)
• Searches a complete hypothesis space
– Prefers smaller trees, high gain at the root
• Overfitting addressed by post-pruning
– Prune nodes, while accuracy on validation set
• Can handle missing data
• Easily handles irrelevant variables
– Information Gain = 0 => will not be selected
49

You might also like