Lecture 3b: Decision Trees (1 part)

Machine Learning for Language Technology 2015
http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm
Decision Trees (1 part)
Marina Santini
santinim@stp.lingfil.uu.se
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Autumn 2015

Outline
• Greediness
• Divide and Conquer
• Inductive Bias of the Decision Tree
• Loss function
• Expected loss
• Empirical error
• Induction
Lecture 3: Decision Trees (1) 2

Learning: Generalization Ability
• Predicting the future based on the past

Predict whether a student will like a course

Training Data

That is, ....
• Questions = Features
• Answers = Feature Values
• Ratings = Class Labels
• An example is a set of feature values.
• Traning data is a set of examples associated
with class labels.

”Greedy model”: the most useful feature
– Histograms
– Rood node

Divide & Conquer
• Divide:
– Partition the date into 2 parts:
• YES part vs NO part
• Conquer:
– Recurse and run the Divide routine

The end of the cycle
• ... When it becomes useless to query on
additional features

Decision tree: Inductive Bias
• The goal of the decision tree learning model
is:
– to figure out what questions to ask
– in what order
– what answer to predict once you have asked
enough questions
– The inductive bias of decision trees: The things
that we want to learn to predect are more like the
root node and less like the other branch nodes.

Informal Definition
• A decision tree is:
– a flow-chart-like structure, where
• each internal (non-leaf) node denotes a test on an
attribute,
• each branch represents the outcome of a test, and
• each leaf (or terminal) node holds a class label.
• The topmost node in a tree is the root node.

Formalising the learning problem:
1) the loss function
loss function

Formalising the learning problem:
2) Data Generating Distribution
D ( x, y )

Expected Loss
1. The loss function
2. The data generating distribution

Formulae: Expected Value
How to read:
= epsilon
= equal by definition to (or: is defined as)
= blackboard-bold E
= sub the pair xy
= over script D
= l of the pair y f of x
15
Sum over all the pairs xy in script D
of x and y times l of y and f of x

Training Error
• The training error is the average error over the
training data
• How to read: the training error epsilon-hat is
equal by definition to 1 over N of the Sum from
n=1 to capital N of “l” of y and f of x.

Empirical Error
• Alpaydin (2010: 24): the empirical error is the
proportion of training instances where the
predictions of h (the hypothesis = the
informed guess) do not mach the required
values given in X (the training set). The error
of the the hypothesis h given the training set X
is:

Induction
Given:
• a loss function l
and
• a sample d from some unknown distribution D
• you must compute a function f that has low
expected error ε over D with respect to l.

Quiz 1: Training error
• How would you define a training error on a
dataset:
1. Training error is the average loss over the
training sample
2. Training error is the expected prediction error
over an independent test sample
3. None of the above

Quiz 2: Distributions
What kind of distribution is D
in the formula above?
1. Normal
2. Unknown
3. None of the above

Quiz 3: Loss function
• How would you define a loss function?
1. The loss function L(actual value, predicted value)
characterizes how bad predictions are
2. The loss function is an unknown distribution
3. Both definitions are incorrect.

The End

Lecture 3b: Decision Trees (1 part)

More Related Content

What's hot

Viewers also liked

Similar to Lecture 3b: Decision Trees (1 part)

More from Marina Santini

Recently uploaded

Lecture 3b: Decision Trees (1 part)