0% found this document useful (0 votes)

131 views19 pages

ch9 Ensemble Learning

This document discusses ensemble learning techniques for machine learning. It introduces formulation of ensemble learning, bagging methods like random forests, and boosting methods. For boosting, it outlines gradient boosting, AdaBoost, and gradient tree boosting. Decision trees are also covered, including how they are constructed for regression and classification problems.

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views19 pages

ch9 Ensemble Learning

Uploaded by

Juan Zarate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Formulation Bagging Boosting

Chapter 9
Ensemble Learning

supplementary slides to
Machine Learning Fundamentals

c
Hui Jiang 2020
published by Cambridge University Press

August 2020

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Outline

1 Formulation of Ensemble Learning

2 Bagging

3 Boosting

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Ensemble Learning

ensemble learning: combine multiple base models that are

learned separately for the same task

how to choose base models?

◦ neural networks, linear models, decision trees, etc.

how to learn base models to ensure the diversity?

◦ re-sampling the training set, re-weighting training samples, etc.

how to combine base models optimally?

◦ bagging, boosting, stacking

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees (I)

a popular non-parametric model for

regression or classification tasks
a tree-structured model:
◦ each non-terminal node is associated
with a binary question regarding an
input feature element xi and a threshold
tj , e.g. xi ≤ tj
◦ each leaf node represents a homogeneous
region Rl in the input space
each decision tree represents a particular
partition of the input space
decision trees are a highly interpretable
machine learning method
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees (II)

fit a simple model to all y values in each x y
region Rl ML model
◦ regression: use a constant cl for each Rl
◦ classification: assign all x in each Rl to y = f¯(x)
one particular class
approximate the unknown target function
by a piece-wise constant function
X
y = f (x) = cl I(x ∈ Rl )
l

where

1 if x ∈ Rl
I(x ∈ Rl ) =
0 otherwise
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees for Regression

a training set: D = (x(n) , y (n) ) n = 1, 2, · · · , N

construct the loss functional using a loss function l(·):

1
PN 1
PN 2
L(f ; D) = N n=1 l y (n) , f (x(n) ) = N n=1 y (n) − f (x(n) )
computationally infeasible to find the best partition to
minimize the above loss
use the greedy algorithm to recursively find an optimal split
x∗i ≤ t∗j at a time
h X 2 X 2 i
x∗i , t∗j = arg min y (n) − c∗l y (n) − c∗r

+
xi ,tj
x(n) ∈Dl x(n) ∈Dr

where
(n) (n)
Dl = (x(n) , y (n) ) xi ≤ tj , Dr = (x(n) , y (n) ) xi > tj ,
and c∗l and c∗r are the centroids of Dl and Dr
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Decision Trees for Classification

classification
problem involving K classes,
i.e. ω1 , ω2 , · · · , ωK
plk (k = 1, 2, · · · , K): the portion of class
k among all training samples assigned to
leaf node l representing Rl
1 X
plk = I(y (n) = ωk )
Nl
x(n) ∈Rl ◦ misclassification error:
1
P (n)
Nl x (n) ∈R l
I(y 6=
all input x in each region Rl is assigned ωkl∗ ) = 1 − plkl∗
to the majority class ◦ Gini index: 1 − K
P 2
k=1 plk
kl∗ = arg maxk plk ◦ entropy:
− K
P
the criteria for the best split {x∗i , t∗j } k=1 plk log(plk )

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Bagging and Random Forests

bagging stands for bootstrap aggregating

bootstrapping (sampling with replacement) a training set into
M subsets
use M bootstrap subsets to independently learn M models
combine M models by averaging or majority-voting
random forests: use decision trees as base models in bagging
◦ row sampling
◦ column sampling
◦ sub-optimal splitting

random forests are much more powerful than decision trees

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Boosting: Outline

1 Gradient Boosting

2 AdaBoost

3 Gradient Tree Boosting

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Boosting
consider an additive model for ensemble learning

Fm (x) = w1 f1 (x) + w2 f2 (x) + · · · + wm fm (x)

each base model fm (x) ∈ H, then Fm (x) ∈ lin(H) ⊇ H

ensemble learning: ⇐⇒ functional minimization
N
X
Fm (x) = arg min l f (xn ), yn
f ∈ lin(H)
n=1

boosting: a sequential learning strategy to add a new base

model to improve the current ensemble

Fm (x) = Fm−1 (x) + wm fm (x)

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Boosting
gradient boosting: estimate the new base
model along the direction of the gradient
at the current ensemble Fm−1 :

∂l f (x), y
∆
∇l Fm−1 (x) =
∂f
f =Fm−1

project the gradient into H: N

∆ 1 X

hf, gi = f (xi )g(xi )
fm = arg max f, −∇l Fm−1 (x) N i=1
f ∈H

estimate the optimal weight:

PN
wm = arg minw n=1 l Fm−1 (xn ) + w fm (xn ), yn

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (I)
apply gradient boosting to binary classification problems
H: all binary functions, i.e. ∀f ∈ H, f (x) ∈ {−1, +1}
the exponential loss function: l F (x), y = e −yF (x)

given a training set: D = (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN ) ,
where xn ∈ Rd and yn ∈ {−1, +1}
the functional gradient:

∆ ∂l f (x), y
∇l Fm−1 (x) = = −y e−yFm−1 (x)
∂f
f =Fm−1

project into H:

fm = arg max f, −∇l Fm−1 (x)
f ∈H
N
1 X
= arg max yn f (xn )e−yn Fm−1 (xn )
f ∈H N n=1
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (II)
(m) ∆
denote αn = exp(−yn Fm−1 xn ) :
X X
fm = arg max αn(m) − αn(m)
f ∈H
yn =f (xn ) yn 6=f (xn )
N
X X
= arg max αn(m) − 2 αn(m)
f ∈H
n=1 yn 6=f (xn )
X
= arg min αn(m)
f ∈H
yn 6=f (xn )

(m)
(m) ∆ αn
normalize all weights as ᾱn = PN (m) , we have
n=1 αn
X
fm = arg min ᾱn(m)
f ∈H
yn 6=f (xn )

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (III)
estimate fm to minimize the weighted classification error:
X
m = ᾱn(m) (0 ≤ m ≤ 1)
yn 6=fm (xn )

replace the 0-1 loss function with a weighted loss function,

(m)
where ᾱn is treated as the loss if (xn , yn ) is misclassified
estimate the optimal weight:
N
X
wm = arg min e−yn Fm−1 (xn )+w fm (xn )
w
n=1

P (m)
ᾱn

1 yn =fm (xn ) 1 1 − m
=⇒ wm = ln (m)
= ln
2 2 m
P
yn 6=fm (xn ) ᾱn

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (IV)

AdaBoost algorithm

input: (x1 , y1 ), · · · , (xN , yN ) , where xn ∈ Rd and yn ∈ {−1, +1}

output: an ensemble model Fm (x)

m = 1 and F0 (x) = 0
(1)
initialize ᾱn = N1 for all n = 1, 2, · · · , N
while not converged do P (m)
learn a binary classifier fm (x) to minimize m = yn 6=fm (xn ) ᾱn

estimate ensemble weight: wm = 21 ln 1− m
m

add to ensemble: Fm (x) = Fm−1 (x) + wm fm (x)

(m) −yn wm fm (xn )
(m+1) ᾱn e
update ᾱn = PN (m) −yn wm fm (xn ) for all n = 1, 2, · · · , N
n=1 ᾱn e
m=m+1
end while

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

AdaBoost (V)

Theorem
If AdaBoost generates m base models with errors 1 , 2 , · · · , m ,
the error of the ensemble model Fm (x) is bounded as:
m p
Y
ε ≤ 2m t (1 − t )
t=1

combine many weak classifiers towards a strong classifier, i.e.

ε → 0 as m → ∞ if all t 6= 12 (better than random guessing)
generalize well into unseen samples since it improves the
margin distribution of training samples

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (I)

apply gradient boosting to regression problems
H: all decision trees
use the square error as the loss functional:
2
l f (x), y = 12 f (x) − y

the functional gradient: ∇l Fm−1 (x) = Fm−1 (x) − y

given a training set: D = (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
project it to minimize:
2
fm = arg min f + ∇l Fm−1 (x)
f ∈H
N
X 2
= arg min f (xn ) − yn − Fm−1 (xn )
f ∈H
n=1
| {z }
residual

supplementary slides to Machine Learning Fundamentals

c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (II)

gradient tree boosting: build a decision tree fm to
approximate the residuals, i.e. yn − Fm−1 (xn ), for all n
X
y = fm (x) = cml I(x ∈ Rml )
l

where cml is the mean of all residuals in the region Rml

a.k.a. gradient boosting machine (GBM), gradient boosted
regression tree (GBRT)
use a pre-set ”shrinkage” parameter ν as the weight:

Fm (x) = Fm−1 (x) + ν fm (x)

also applicable to multi-class classification problems

supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9
Formulation Bagging Boosting

Gradient Tree Boosting (III)

Gradient Tree Boosting

input: (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
output: an ensemble model Fm (x)

fit a regression tree f0 (x) to (x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )
F0 (x) = ν f0 (x)
m=1
while not converged do
compute the negative gradients as pseudo outputs:
ỹn = −∇l Fm−1 (xn) for all n = 1, 2, · · · , N
fit a regression tree fm (x) to (x1 , ỹ1 ), · · · , (xN , ỹN )
Fm (x) = Fm−1 (x) + νfm (x)
m=m+1
end while
supplementary slides to Machine Learning Fundamentals
c
Hui Jiang 2020 published by Cambridge University Press
Chapter 9

TF-IDF and Ranked Retrieval Basics
No ratings yet
TF-IDF and Ranked Retrieval Basics
51 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
Predicting BMW Prices with Regression
No ratings yet
Predicting BMW Prices with Regression
5 pages
Technical Seminar: Sapthagiri College of Engineering
No ratings yet
Technical Seminar: Sapthagiri College of Engineering
18 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
AI-Powered Music Creation Tool
No ratings yet
AI-Powered Music Creation Tool
16 pages
Application of First-Order Logic in Knowledge Based Systems PDF
No ratings yet
Application of First-Order Logic in Knowledge Based Systems PDF
7 pages
Inference in First-Order Logic
No ratings yet
Inference in First-Order Logic
16 pages
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
No ratings yet
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
8 pages
Default Reasoning
No ratings yet
Default Reasoning
20 pages
Artificial Intelligence Unit IV
No ratings yet
Artificial Intelligence Unit IV
105 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
m8 Fol
No ratings yet
m8 Fol
27 pages
Knowledge Representation First Order Logic
No ratings yet
Knowledge Representation First Order Logic
49 pages
NLP and ML Project
100% (1)
NLP and ML Project
37 pages
Mining The Web Graph: Technical Seminar Presentation On
No ratings yet
Mining The Web Graph: Technical Seminar Presentation On
15 pages
AI Knowledge Representation Guide
No ratings yet
AI Knowledge Representation Guide
26 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
Prompt Engineering For Vision Models Slides 1720084286
No ratings yet
Prompt Engineering For Vision Models Slides 1720084286
17 pages
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
No ratings yet
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
80 pages
ANN-Unit 6 - Deep Neural Networks
No ratings yet
ANN-Unit 6 - Deep Neural Networks
29 pages
Logistic Regression & Model Evaluation
100% (1)
Logistic Regression & Model Evaluation
11 pages
CSE860 - 08 - Searching For Solutions
No ratings yet
CSE860 - 08 - Searching For Solutions
11 pages
Lecture-5 (Knowledge - Representation, Reasoning, Logic)
No ratings yet
Lecture-5 (Knowledge - Representation, Reasoning, Logic)
40 pages
PPT03-First Order Logic & Inference in FOL
No ratings yet
PPT03-First Order Logic & Inference in FOL
59 pages
Description Logic Introduction
No ratings yet
Description Logic Introduction
27 pages
DEEP LEARNING Import Questions For External Exam
No ratings yet
DEEP LEARNING Import Questions For External Exam
1 page
AI Search Strategies Explained
No ratings yet
AI Search Strategies Explained
43 pages
BLOCKCHAIN REVOLUTION Understanding The 2nd Generation of The Internet and The New Economy - Compressed
No ratings yet
BLOCKCHAIN REVOLUTION Understanding The 2nd Generation of The Internet and The New Economy - Compressed
56 pages
Statistics Presentation
No ratings yet
Statistics Presentation
21 pages
Lecture6 Tfidf
No ratings yet
Lecture6 Tfidf
45 pages
Expert-Systems AI Pres
No ratings yet
Expert-Systems AI Pres
21 pages
Topic For The Class:: Knowledge and Reasoning
No ratings yet
Topic For The Class:: Knowledge and Reasoning
41 pages
Gradient Descent for Deep Learning
No ratings yet
Gradient Descent for Deep Learning
21 pages
AI.02a - Solving Problems by Searching - T
No ratings yet
AI.02a - Solving Problems by Searching - T
118 pages
Bayes' Rule and Its Use
No ratings yet
Bayes' Rule and Its Use
13 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
AI - 03 (Problems, State Space)
No ratings yet
AI - 03 (Problems, State Space)
44 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Expert Systems Unit-5
No ratings yet
Expert Systems Unit-5
19 pages
Unit 4 Knowledge Representation and Reasoning
No ratings yet
Unit 4 Knowledge Representation and Reasoning
78 pages
Explainable Ai in Pervasive Healthcare
No ratings yet
Explainable Ai in Pervasive Healthcare
25 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
CNNs for ECE Students
No ratings yet
CNNs for ECE Students
60 pages
Web 3.0 Knowledge Sharing Def
No ratings yet
Web 3.0 Knowledge Sharing Def
4 pages
SVM
No ratings yet
SVM
12 pages
Advanced LangChain AI Assistant Framework For Comp
No ratings yet
Advanced LangChain AI Assistant Framework For Comp
7 pages
Knowledge Engineering in First-Order Logic: Identify The Task
No ratings yet
Knowledge Engineering in First-Order Logic: Identify The Task
4 pages
Fundamentals of Mathematical Statistics
0% (1)
Fundamentals of Mathematical Statistics
23 pages
04 - Probability in AI
No ratings yet
04 - Probability in AI
169 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
21 pages
Single Layer Perceptron Experiment
No ratings yet
Single Layer Perceptron Experiment
11 pages
Fine-Tuning GPT-3.5 for Sentiment Analysis
No ratings yet
Fine-Tuning GPT-3.5 for Sentiment Analysis
14 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Data Science Project
No ratings yet
Data Science Project
3 pages
Inference in First Order Logic
No ratings yet
Inference in First Order Logic
26 pages
ML Cheatsheet 2024-2025
No ratings yet
ML Cheatsheet 2024-2025
2 pages
Adversarial Training For Large Neural Language Models
No ratings yet
Adversarial Training For Large Neural Language Models
13 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Medical Image Processing Course
No ratings yet
Medical Image Processing Course
36 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
1 s2.0 S0168169919311573 Main
No ratings yet
1 s2.0 S0168169919311573 Main
11 pages
Lab2 Linear Regression
100% (1)
Lab2 Linear Regression
18 pages
Uni2 NNDL
No ratings yet
Uni2 NNDL
21 pages
ML Lab3 PGM
No ratings yet
ML Lab3 PGM
3 pages
Single-Layer Perceptrons Guide
No ratings yet
Single-Layer Perceptrons Guide
11 pages
Module 2
No ratings yet
Module 2
13 pages
Support Vector Machines
No ratings yet
Support Vector Machines
69 pages
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
No ratings yet
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
8 pages
2 Ann Architecture Nafees
100% (1)
2 Ann Architecture Nafees
30 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
If4071 - Deep Learnig - Me - Iii Sem
No ratings yet
If4071 - Deep Learnig - Me - Iii Sem
1 page
Model Questions DWT
No ratings yet
Model Questions DWT
2 pages
Report On Neural Networks
No ratings yet
Report On Neural Networks
15 pages
Machine Learning 18CSE18
No ratings yet
Machine Learning 18CSE18
2 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Handwritten Digit Recognition Using Machine and Deep Learning Algorithms
No ratings yet
Handwritten Digit Recognition Using Machine and Deep Learning Algorithms
6 pages
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
No ratings yet
DEEP LEARNING IIT Kharagpur Assignment - 5 - 2024
9 pages
Quiz-2: Attempt History
No ratings yet
Quiz-2: Attempt History
7 pages
MLT Notes
No ratings yet
MLT Notes
17 pages
Sentiment Classification System of Twitter Data For US Airline Service Analysis
No ratings yet
Sentiment Classification System of Twitter Data For US Airline Service Analysis
5 pages
Sentiment Classification With Deep Neural Networks: Yi Zhou
No ratings yet
Sentiment Classification With Deep Neural Networks: Yi Zhou
58 pages
MNIST CNN Model Training Guide
No ratings yet
MNIST CNN Model Training Guide
6 pages
Data Science Algorithm Guide
No ratings yet
Data Science Algorithm Guide
5 pages
ML Lecture 7 - Ensemble Learning
No ratings yet
ML Lecture 7 - Ensemble Learning
18 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
ML Module 5
No ratings yet
ML Module 5
5 pages
Deep Learning QP
No ratings yet
Deep Learning QP
4 pages
Deep Neural Networks: Amity Centre For Artificial Intelligence, Amity University, Noida, India
No ratings yet
Deep Neural Networks: Amity Centre For Artificial Intelligence, Amity University, Noida, India
62 pages
DL Mod1.PDF Flashcards
No ratings yet
DL Mod1.PDF Flashcards
10 pages
Taud 2017
No ratings yet
Taud 2017
5 pages