0% found this document useful (0 votes)

30 views61 pages

Lecture 4 - Intro To Machine Learning and Decision Trees

The document outlines Lecture 4 of CS2109S, focusing on Machine Learning and Decision Trees. It covers key concepts such as supervised learning, performance measures, and the mechanics of decision tree learning including entropy and information gain. Additionally, it discusses various applications and types of feedback in machine learning.

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views61 pages

Lecture 4 - Intro To Machine Learning and Decision Trees

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

CS2109S: Introduction to AI and Machine Learning

Lecture 4:
Intro to Machine Learning & Decision Trees

6 February 2024

1
Admin

2
Final Assessment Timing Survey
250

200

150

100

0
Saturday 8pm - Sunday 11:59pm Saturday 8pm - Sunday 9pm Sunday 8am - Sunday 11:59pm Sunday 8am - Sunday 9pm
(original) (sleep earlier) (avoid pulling an allnighter) (avoid pulling an allnighter
and sleep earlier)

3
Materials

4
Overview
You are here

“Classical” AI Machine Learning

Search Algorithms ML Models & Techniques Miscellaneous

• Uninformed search: BFS, DFS, UCS, IDS • Decision Trees • Unsupervised Learning
• AI & Ethics
• Informed search: Greedy best-first, A* • Linear/Logistic Regression
• Local Search: hill-climbing, SA, Beam, GA • Support Vector Machines
• Adversarial search: Minimax, Alpha-beta • Neural Networks

Applied CS2040S, CS1231 Applied Linear Algebra, Calculus, Statistics & Probabilities
Python Numpy, Scikit-learn, PyTorch

5
Outline
• Machine Learning
• What is ML?
• Types of Feedback
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL)
• Entropy and Information Gain
• Different types of attributes
• Pruning
• Ensemble Methods

6
Outline
• Machine Learning
• What is ML?
• Types of Feedback
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL)
• Entropy and Information Gain
• Different types of attributes
• Pruning
• Ensemble Methods

7
What is Machine Learning? Applications?

Credit: Google
Credit: VentureBeat

Credit: Skyfish
Credit: Futuremind 8
What is Machine Learning?
“The field of study that gives computers the ability to learn without
being explicitly programmed”
- Arthur Samuel (1959)

“A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.”
- Tom Mitchell

9
Why Machine Learning?
Function 5

“What is AI?” Function “AI is …”

F
u
n
Function
c Bb6 (chess move)
t
Credit: Chess.com
i
o
“Pikachu in the Eiffel tower” Function
n

How do we write this Function? 10

Types of Feedback
Apple

Indomie

Supervised Unsupervised
“Teacher’s feedback” Semi/Weakly Supervised “No feedback”

observation, reward

action

Reinforcement
“Trial and error” 11
Supervised Learning
Learns from being given the right answers: 𝑿 → 𝒀
Input (X) Output (Y) Application
Email Spam? Spam filtering
Audio Text transcripts Speech recognition
English Chinese Machine translation
Image Position of cars Self-driving car

Types:
• Regression: predict continuous output (e.g., temperature: 0.567)
• Classification: predict discrete output (e.g., animal: cat vs dog)
12
Supervised Learning: Regression
Housing price prediction

500

400

300

200

100
($M)
500 1000 1500 2000 2500
(sqft)

1150?
13
Supervised Learning: Classification
• Cancer prediction: benign (0), malignant (1)

0
1 2 3 4 5 6
(cm)
5 cm?

14
Formalism
• In supervised learning, we assume that 𝑦 is generated by a true
mapping function 𝑓: 𝑥 → 𝑦.
• We want to find a hypothesis ℎ: 𝑥 → 𝑦ො (from a hypothesis class 𝐻)
s.t. ℎ ≈ 𝑓 given a training set 𝑥1 , 𝑓 𝑥1 , … , 𝑥𝑁 , 𝑓 𝑥𝑁 .
• We use a learning algorithm to find this hypothesis

15
Formalism (Illustrated)

Training Set
𝑥1 , 𝑓 𝑥1 , … , 𝑥𝑁 , 𝑓 𝑥𝑁

Hypothesis Class 𝐻 Learning Algorithm 𝐴

Input 𝑥 Hypothesis ℎ Output 𝑦 ≈ 𝑓 𝑥

16
Outline
• Machine Learning
• What is ML?
• Types of Feedback
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL)
• Entropy and Information Gain
• Different types of attributes
• Pruning
• Ensemble Methods

17
Performance Measure
How do we know that our hypothesis is good? i.e., ℎ ≈ 𝑓
1. Use theorems from statistical learning theory
2. Try the hypothesis on a new set of examples (test set)

train

Training set Hypothesis

Data test

Test set
18
Regression: Error
If the output of the hypothesis is a continuous value, then we can
measure its error. For an input 𝑥 with a true output 𝑦, we can
compute:

𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 = |𝑦ො − 𝑦|

𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝐸𝑟𝑟𝑜𝑟 = 𝑦ො − 𝑦 2

Where 𝑦ො = ℎ(𝑥).

19
Regression: Mean Squared Error
For a set of 𝑁 examples 𝑥1 , 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁 we can compute the
average (mean) squared error as follows.

𝑁
1 2
𝑀𝑆𝐸 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝑁
𝑖=1

Where 𝑦ො𝑖 = ℎ(𝑥𝑖 ) and 𝑦𝑖 = 𝑓(𝑥𝑖 ).

20
Regression: Mean Absolute Error
For a set of 𝑁 examples 𝑥1 , 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁 we can compute the
average (mean) absolute error as follows.

𝑁
1
𝑀𝐴𝐸 = ෍ |𝑦ො𝑖 − 𝑦𝑖 |
𝑁
𝑖=1

Where 𝑦ො𝑖 = ℎ(𝑥𝑖 ) and 𝑦𝑖 = 𝑓(𝑥𝑖 ).

21
Classification: Correctness & Accuracy
Classification is correct when the prediction 𝑦ො = 𝑦 (true label).

For a set of 𝑁 examples 𝑥1 , 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁 we can compute the

average correctness (accuracy) as follows.
𝑁
1
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ෍ 𝟙𝑦ො 𝑖 =𝑦𝑖
𝑁
𝑖=1

Where 𝑦ො𝑖 = ℎ(𝑥𝑖 ) and 𝑦𝑖 = 𝑓(𝑥𝑖 ).

22
Classification: Confusion Matrix
Can we combine the
FP: Type I error
Accuracy =
TP+TN best of both metrics?
TP+FN+FP+TN
FN: Type II error
F1 Score
Inst. Actual 𝑦 Predicted 𝑦ො
Actual Label 2
1 Cancer Cancer 𝐹1 =
1 1
TP Cancer Benign +
2 Cancer Cancer 𝑃 𝑅

Cancer
3 Cancer Benign
2 1 Precision

Predicted Label
4 Cancer Benign FN True Positive False Positive 𝑃 = TP / (TP+FP)
5 Cancer Benign
How many selected items are relevant?
3

Benign
6 Benign Benign
False
4 How precise were the positive predicted instances?
7 Benign Benign True Negative
TN Negative Maximize this if false positive (FP) is very costly.
8 Benign Benign E.g., email spam, satellite launch date prediction
9 Benign Benign Recall
10 Benign Cancer FP 𝑅 = TP / (TP+FN)

How many relevant items are selected?

How many positive instances can be recalled (predicted)?
Maximize this if false negative (FN) is very dangerous.
E.g., cancer prediction but not music recommendation
23
Outline
• Machine Learning
• What is ML?
• Types of Feedback
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL)
• Entropy and Information Gain
• Different types of attributes
• Pruning
• Ensemble Methods

24
Life’s Difficult Decision: What to Eat?
Decide whether to wait for a table at a restaurant based on the following input
attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?

8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time in minutes (0-10, 10-30, 30-60, >60)

25
Life’s Difficult Decision: What to Eat?

You are hungry, it’s Friday, raining, …; Should you wait?

26
Decision Trees
One possible representation
for hypotheses

27
Expressiveness
• Decision trees can express any function of the input attributes.
• Example: Boolean functions, each row → path from root to leaf
A B A xor B A
F F F F T
F T T B B
T F T F T F T
T T F F T T F

• Trivially, there is a consistent decision tree for any training set, but
probably won’t generalize to new examples.
Prefer a more “compact” tree
28
The Size of the Hypothesis Class
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
= number of distinct truth tables with 2𝑛 rows
2𝑛
= 2
A B A <op> B
F F T/F
F T T/F
T F T/F
T T T/F

With 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees

29
Decision Tree Learning Greedy, top-down, recursive algorithm.

def DTL(examples, attributes, default):

if examples is empty: return default
if examples have the same classification:
return classification Need to define this!
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

30
Choosing an Attribute
How do we choose an attribute?

Patrons? Type?
None Full French Burger
Some Italian Thai

Ideally: we want to select an attribute that split the examples into “all
positive” or “all negative”
31
Entropy
• Entropy is a measure of randomness:
• 𝐼 𝑃 𝑣1 , … , 𝑃 𝑣𝑛 = − σ𝑛𝑖=1 𝑃 𝑣𝑖 log 2 𝑃(𝑣𝑖 )
• For a data set containing p positive and n negative examples:
𝑝 𝑛 𝑝 𝑝 𝑛 𝑛 Maximum randomness,
•𝐼 , = − log 2 − log 2 Maximum entropy
𝑝+𝑛 𝑝+𝑛 𝑝+𝑛 𝑝+𝑛 𝑝+𝑛 𝑝+𝑛

Patrons? Type?
None Full French Burger
Some Italian Thai

32
Entropy of this node - Total entropy of children nodes
Information Gain
𝐼( )
• A chosen attribute A divides the training set E Patrons?
into subsets E1, … , Ev according to their values None Full
Some
for A, where A has v distinct values. 2 4 6
𝑣 12 12 12
𝑝𝑖 + 𝑛𝑖 𝑝𝑖 𝑛𝑖 𝐼( ) 𝐼( ) 𝐼( )
𝑟𝑒𝑚𝑎𝑖𝑛𝑑𝑒𝑟(𝐴) = ෍ 𝐼( , )
𝑝 + 𝑛 𝑝𝑖 + 𝑛𝑖 𝑝𝑖 + 𝑛𝑖
𝑖=1

• Information Gain (IG) or reduction in entropy

𝐼( )
p n
IG ( A) = I( , ) − remainder ( A)
p+n p+n 2 French
Type?
Burger
12
Italian Thai
Entropy of this node Entropy of children nodes
𝐼( )𝐼 𝐼 𝐼
2 4 4
12 12 12
33
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
(2,4,6) (2,2,4,4)
return classification (None, Some, Full) (French, Italian, Thai, Burger)
if attributes is empty:
return mode(examples) Patrons: (2,4,6)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
2F 4T 2T, 4F
for each value 𝑣𝑖 of best: 2 4 6 2 4
𝐼𝐺 𝑃𝑎𝑡𝑟𝑜𝑛𝑠 = 1 − 𝐼 0,1 + 𝐼 1,0 + 𝐼 , = 0.541 𝑏𝑖𝑡𝑠
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 } 12 12 12 6 6
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples)) Type: (2,2,4,4)
add a branch to tree with label 𝑣𝑖 and subtree subtree
T,F T,F 2T,2F 2T,2 F

𝐼𝐺 𝐴𝑙𝑡 , 𝐼𝐺 𝐵𝑎𝑟 , 𝐼𝐺 𝐹𝑟𝑖 , 𝐼𝐺 𝐻𝑢𝑛 , … 34

Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
???
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

35
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
???
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝐼𝐺 𝐴𝑙𝑡 , 𝐼𝐺 𝐵𝑎𝑟 , 𝐼𝐺 𝐹𝑟𝑖 , 𝐼𝐺 𝐻𝑢𝑛 , … except 𝐼𝐺(𝑃𝑎𝑡)
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

36
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
???
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

37
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
???
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
𝐼𝐺 𝐴𝑙𝑡 , 𝐼𝐺 𝐵𝑎𝑟 , 𝐼𝐺 𝑇𝑦𝑝𝑒 , … except 𝐼𝐺(𝑃𝑎𝑡), 𝐼𝐺 𝐻𝑢𝑛
add a branch to tree with label 𝑣𝑖 and subtree subtree

38
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree ???
There is no examples for “French”
Since there is no mode (majority),
we select an arbitrary decision: “T”

39
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree ???

𝐼𝐺 𝐴𝑙𝑡 , 𝐼𝐺 𝐹𝑟𝑖 , … except 𝐼𝐺(𝑃𝑎𝑡), 𝐼𝐺 𝐻𝑢𝑛 , 𝐼𝐺 𝑇𝑦𝑝𝑒

40
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

41
Decision Tree Learning
with Information Gain
def DTL(examples, attributes, default):
if examples is empty: return default
if examples have the same classification:
return classification
if attributes is empty:
return mode(examples)
best = choose_attribute(attributes, examples)
tree = a new decision tree with root best
for each value 𝑣𝑖 of best:
𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 = {rows in examples with best = 𝑣𝑖 }
subtree = DTL(𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠𝑖 , attributes – best, mode(examples))
add a branch to tree with label 𝑣𝑖 and subtree subtree

42
Dealing with Attributes with Many Values
Information gain will select attribute with many values (e.g., dates, phone
numbers) because it splits the data “well”. In the extreme case, each branch
will have a single example, so “all positive” or “all negative”.
𝐸
Balance the information gain with the number of branches.

𝐼𝐺 𝐴 𝐸1 𝐸2
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝐴 = SplitInformation = 1
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛(𝐴)
𝐸
𝑑
|𝐸𝑖 | |𝐸𝑖 |
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐴 = − ෍ log 2 𝐸1 𝐸2 𝐸3
|𝐸| |𝐸|
𝑖=1 SplitInformation = 1.6

Examples 43
Dealing with Attributes with Differing Costs
Certain attributes may have costs (e.g., in medical diagnosis). They
may vary significantly in their costs, both in terms of monetary cost or
patient comfort. Example: biopsy, blood test, etc.

Make decision trees to use low-cost attributes where possible using

Cost-Normalized-Gain:

𝐼𝐺 2 𝐴 2𝐼𝐺 𝐴 − 1
OR 𝑤
, 𝑤 ∈ [0,1]
𝐶𝑜𝑠𝑡(𝐴) 𝐶𝑜𝑠𝑡 𝐴 + 1
Determine the importance of cost
44
Dealing with Continues-valued Attributes
Define a discrete-valued input attribute to partition the values into a
discrete set of intervals.

Examples:
• Estimated waiting time (minutes): 0-10, 10-30, 30-60, >60
• Age (year): 0-12, 12-25, 25-40, 40-60, 60-80, >80
•…

45
Dealing with Missing Values
What if some values are missing?
• Assign the most common value of the attribute
• Assign the most common value of the attribute with the same output
• Assign probability to each possible value and sample
• Drop the attribute
• Drop the rows
•…

46
H?
Overfitting Y N

Y N

Decision Trees performance is perfect on training data, but worse on test data
• DT captures the data perfectly, including the noise

Hungry? Moon align with the sun? Pimple? Eat? H?

Yes Yes Yes Yes Y N
Yes No Yes No
M? N
Yes No No Yes Y N
No Yes No No
Y P?
No No No No
Y N
N Y

47
H?
Occam’s Razor Y N

Y N

• Prefer short/simple hypotheses

• In favor:
• Short/simple hypothesis that fits the data is unlikely to be coincidence
• Long/complex hypothesis that fits the data may be coincidence
• Against:
• Many ways to define small sets of hypotheses (e.g., trees with prime number
of nodes that uses attribute beginning with “Z”)
• Different hypotheses representations may be used instead

48
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.

Cond1?
Yes No

25 T Cond2?
Yes No

Cond3? F 13
Yes No

9 F Cond4?
Yes No

1 T F 2
49
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5

Cond1? Cond1?
Yes No Yes No

25 T Cond2? 25 T Cond2?
Yes No Yes No

Cond3? F 13 Cond3? F 13
Yes No Yes No

9 F Cond4? 9 F Cond4?
Yes No Yes No

1 T F 2 1 T F 2
50
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5

Cond1? Cond1?
Yes No Yes No

25 T Cond2? 25 T Cond2?
Yes No Yes No

Cond3? F 13 Cond3? F 13
Yes No Yes No

9 F Cond4? 9 F 1 T F 2
Yes No
F (majority)
1 T F 2
51
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5

Cond1? Cond1?
Yes No Yes No

25 T Cond2? 25 T Cond2?
Yes No Yes No

Cond3? F 13 1 T F 11 F 13
Yes No F (majority)

9 F Cond4? Done!
Yes No But can be simplified…

1 T F 2
52
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5

Cond1? Cond1?
Yes No Yes No

25 T Cond2? 25 T 1 T F 24
Yes No F (majority)
Cond3? F 13
Yes No

9 F Cond4?
Yes No

1 T F 2
53
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5 Max-depth = 3
Cond1? Cond1? Cond1?
Yes No Yes No Yes No

25 T Cond2? 25 T 1 T F 24 25 T Cond2?
Yes No F (majority) Yes No

Cond3? F 13 Cond3? F 13
Yes No Yes No

9 F Cond4? 9 F Cond4?
Yes No Yes No

1 T F 2 1 T F 2
54
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5 Max-depth = 3
Cond1? Cond1? Cond1?
Yes No Yes No Yes No

25 T Cond2? 25 T 1 T F 24 25 T Cond2?
Yes No F (majority) Yes No

Cond3? F 13 Cond3? F 13
Yes No Yes No

9 F Cond4? 9 F 1 T F 2
Yes No F (majority)

1 T F 2 Done!
But can be simplified… 55
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5 Max-depth = 3
Cond1? Cond1? Cond1?
Yes No Yes No Yes No

25 T Cond2? 25 T 1 T F 24 25 T Cond2?
Yes No F (majority) Yes No

Cond3? F 13 1 T F 11 F 13
Yes No F (majority)

9 F Cond4?
Yes No

1 T F 2
56
Pruning
Prevent nodes from being split even when if fails to cleanly separate examples.
Min-sample = 5 Max-depth = 3
Cond1? Cond1? Cond1?
Yes No Yes No Yes No

25 T Cond2? 25 T 1 T F 24 25 T 1 T F 24
Yes No F (majority) F (majority)
Cond3? F 13
Yes No

9 F Cond4?
Yes No
The sample that is likely a noise (1T) is ignored!
1 T F 2
Results in a smaller tree which may have higher accuracy 57
Ensemble Methods

• Bootstrap Aggregation (Bagging)

• Random Forests
• Boosting
• Adaboost, XGBoost
58
Image Credit: https://www.researchgate.net/figure/Illustrations-of-A-bagging-and-B-boosting-ensemble-algorithms_fig4_334404567
Summary
• Machine Learning
• What is ML? – machine that learns through data
• Types of Feedback: supervised, unsupervised, semi-supervised, reinforcement
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL): greedy, top-down, recursive algorithm
• Entropy and Information Gain
• Different types of attributes: many values, differing costs, missing values
• Pruning: min-sample, max-depth
• Ensemble Methods: bagging, boosting

59
Coming Up Next Week
• Linear Regression
• Multiple Features
• Polynomial Regression
• Optimization Algorithms
• Gradient Descent
• Variants of Gradient Descent
• Normal Equation

60
To Do
• Lecture Training 4
• +100 Free EXP
• +50 Early bird bonus
• Problem Set 2
• Due Saturday, 10th February
• Problem Set 3
• Out today!
• Contest +500 EXP

ES335
No ratings yet
ES335
22 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
76 pages
TTNT 09 Learning From Examples
No ratings yet
TTNT 09 Learning From Examples
58 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Week 2 Watermark
No ratings yet
Week 2 Watermark
84 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
Data Mining Classification Guide
No ratings yet
Data Mining Classification Guide
54 pages
JU Ch9
No ratings yet
JU Ch9
21 pages
Chapter Five Learning
No ratings yet
Chapter Five Learning
50 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
Lecture1 MCQ Guide
No ratings yet
Lecture1 MCQ Guide
4 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
ML1 5
No ratings yet
ML1 5
4 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Learning
No ratings yet
Learning
51 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
11 pages
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
40 pages
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
100% (1)
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
35 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
Machine Learning Learning
No ratings yet
Machine Learning Learning
35 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
L03 The Regression Pipeline - 2
No ratings yet
L03 The Regression Pipeline - 2
58 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
55 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Unit 7 Deterministic Models
No ratings yet
Unit 7 Deterministic Models
71 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Machine Learning New
No ratings yet
Machine Learning New
8 pages
Artificial Intelligence: Machine Learning
No ratings yet
Artificial Intelligence: Machine Learning
110 pages
Machine Learning Concepts Guide
No ratings yet
Machine Learning Concepts Guide
23 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
1 Introduction To ML
No ratings yet
1 Introduction To ML
43 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
AIch 5
No ratings yet
AIch 5
50 pages
AML Imp Ques
No ratings yet
AML Imp Ques
10 pages
Complexity Slides
No ratings yet
Complexity Slides
301 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Lecture 3 - Informed, Local, and Adversarial Search
No ratings yet
Lecture 3 - Informed, Local, and Adversarial Search
81 pages
CS2103 2
No ratings yet
CS2103 2
4 pages
Data Science Bro - 2a
No ratings yet
Data Science Bro - 2a
28 pages
Uncertainty Estimation With Deep Ensembles
No ratings yet
Uncertainty Estimation With Deep Ensembles
15 pages
Project Report
No ratings yet
Project Report
39 pages
Syllabus - Introduction To Machine Learning
No ratings yet
Syllabus - Introduction To Machine Learning
3 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
No ratings yet
Performance Comparison of Simple Regression Random Forest and XGBoost Algorithms For Forecasting Electricity Demand
7 pages
Used Car Price Prediction Model
No ratings yet
Used Car Price Prediction Model
10 pages
Cyber Empower
No ratings yet
Cyber Empower
7 pages
Forest Fire Prediction Sem 8 - Review 1
No ratings yet
Forest Fire Prediction Sem 8 - Review 1
33 pages
Random Forest Thesis
100% (3)
Random Forest Thesis
6 pages
AIML Domestic Executive Brochure Dec 10 2024
No ratings yet
AIML Domestic Executive Brochure Dec 10 2024
25 pages
Data Science for Customer Retention
No ratings yet
Data Science for Customer Retention
89 pages
Lab Manual
No ratings yet
Lab Manual
80 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Stock Market Forecasting with AI
No ratings yet
Stock Market Forecasting with AI
8 pages
Machine Learning Interview Questions & Answers - MIQ
No ratings yet
Machine Learning Interview Questions & Answers - MIQ
17 pages
Ai Unit 4
No ratings yet
Ai Unit 4
26 pages
B.Tech Project: Liver Disease Prediction
No ratings yet
B.Tech Project: Liver Disease Prediction
51 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
Module 6
No ratings yet
Module 6
82 pages
Deep Learning Heart Disease Prediction
No ratings yet
Deep Learning Heart Disease Prediction
14 pages
Encoding IP Address As A Feature For Network Intrusion Detectio
No ratings yet
Encoding IP Address As A Feature For Network Intrusion Detectio
64 pages
Distinguishing Human Generated Text From Chatgpt Generated Text Using Machine Learning
No ratings yet
Distinguishing Human Generated Text From Chatgpt Generated Text Using Machine Learning
6 pages
Varshenee Report
No ratings yet
Varshenee Report
111 pages
7 A High-Performance Customer Churn Prediction
No ratings yet
7 A High-Performance Customer Churn Prediction
35 pages
Social Media Tourism: Model Analysis
No ratings yet
Social Media Tourism: Model Analysis
39 pages
Ensemble Learning-Bagging-Boosting-Stacking
No ratings yet
Ensemble Learning-Bagging-Boosting-Stacking
12 pages
Solved Paper-2024 (Raihan-13017704423) - MUHAMMED RAIHAN
No ratings yet
Solved Paper-2024 (Raihan-13017704423) - MUHAMMED RAIHAN
14 pages
Kidneystoneprediction 12
No ratings yet
Kidneystoneprediction 12
14 pages