0% found this document useful (0 votes)

6 views19 pages

Lecture 4

This lecture covers active learning in the context of supervised learning, focusing on the selection of input examples to minimize uncertainty in parameter estimates and predictions. It discusses the formulation of active learning, selection criteria, and the application of these criteria in both batch and sequential methods. The lecture emphasizes the importance of choosing the right function class and selection criteria for effective active learning, particularly in linear regression scenarios.

Uploaded by

abdul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

Lecture 4

Uploaded by

abdul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Machine learning: lecture 4

Tommi S. Jaakkola
MIT AI Lab
Topics
• Active learning and regression
– formulation
– selection criteria
– examples

Tommi Jaakkola, MIT AI Lab 2

Active learning: rules of the game
• Supervised learning:
– (input,output) pairs are sampled from an unknown joint
distribution P (x, y)
• Active supervised learning:
– We select the input examples and the corresponding
outputs are sampled from an unknown conditional
distribution P (y|x)

Tommi Jaakkola, MIT AI Lab 3

Active learning
• Why active learning?
– we often need dramatically fewer training examples; the
time/cost of getting enough training examples may be
otherwise prohibitive
• Dangers of (this type of) active learning
– since we select the inputs, we may focus on inputs that
are unimportant, rare, or even invalid

Tommi Jaakkola, MIT AI Lab 4

Active learning
• We need to decide:
1. the function class (the result will be highly dependent on
what we wish to learn)
2. the selection criterion, i.e., how we decide which inputs are
worth querying
3. how to apply the selection criterion (sequential or batch)
• Function class: we’ll focus on linear/polynomial regression

y = w0 + w1x + , ∼ N (0, σ 2)

Tommi Jaakkola, MIT AI Lab 5

Active linear regression
• We perform the selection of inputs to uncover the (assumed)
“true” underlying linear relation:
     
y1 1 x1 ∗ 1
 w0
 ···  =  ··· ···  +  ··· 
    
∗
w1
yn 1 xn n

y = Xw∗ +

where i ∼ N (0, σ 2).

• We need to first understand how our parameter estimates
relate to w∗ as a function of inputs

Tommi Jaakkola, MIT AI Lab 6

Properties of regression models
• The outputs corresponding to the inputs arranged in X are
assumed to be generated according to:

y = Xw∗ + , ∼ N (0, I · σ 2 )

• The resulting parameter estimates, ŵ = (XT X)−1XT y,

based on the same inputs X and sampled outputs y are
normally distributed:
∗ 2 T −1

ŵ ∼ N w , σ (X X)

Tommi Jaakkola, MIT AI Lab 7

Active learning: selection criterion
• Two main types of selection criteria
1. select inputs so as to minimize some measure of uncertainty
in the parameters
2. select inputs to minimize the uncertainty in the predicted
outputs
• Two main ways of applying such criteria
1. batch – all the inputs are chosen prior to seeing any
responses
2. sequential – the next query input is chosen with the full
knowledge of all the responses so far

Tommi Jaakkola, MIT AI Lab 8

Batch selection, parameter criterion
We have to select the input examples prior to seeing any
outputs
• We wish to find n inputs x1, . . . , xn (which determine the
matrix X) so as to minimize a measure of uncertainty in the
resulting parameters ŵ
∗ 2 T −1

ŵ ∼ N w , σ (X X)

• For example, we can find the inputs that minimize the

determinant of the covariance matrix
T −1
det (X X)

Tommi Jaakkola, MIT AI Lab 9

Determinant as a measure of “volume”
• Any covariance matrix has an eigen-decomposition:
 
σ12
 T
C = R ... R

2
σm
where the orthonormal rotation matrix R specifies the
principal axes of variation and each eigenvalue σi2 gives
the variance along one of the principal directions
• The “volume” of a Gaussian 2.5

distribution is a function of only 2

1.5

σi2, i = 1, . . . , m. Specifically 1

0.5

0
m √
Y −0.5

“volume” ∝ σi = det C −1

−1.5

i=1 −2

−2.5
−3 −2 −1 0 1 2 3

Tommi Jaakkola, MIT AI Lab 10

Determinant criterion: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−1 −0.5 0 0.5 1

Tommi Jaakkola, MIT AI Lab 11

Determinant criterion: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−1 −0.5 0 0.5 1

x1 = −1, x2 = 0, x3 = 0, x4 = 1

Tommi Jaakkola, MIT AI Lab 12

Sequential selection, uncertainty in predictions
The next input is chosen on the basis of all the information
available so far
• The prediction at a new point x is
T
1
ŷ(x) = ŵ0 + ŵ1x = ŵ
x
The variance in this prediction (due to the noise in the
outputs observed so far) is
T
1 1
V ar { ŷ(x) } = Cov(ŵ)
x x
T
2 1 T −1 1
= σ (X X)
x x

Tommi Jaakkola, MIT AI Lab 13

Sequential selection cont’d

T
1 1
V ar { ŷ(x) } = σ 2 (XT X)−1
x x

– the noise variance σ 2 only affects the overall scale (set to

1 from hereafter)
– the variance is a function of previously chosen inputs, not
outputs!
• Assuming the input points are contained within, e.g., an
interval X , we can select the new point to reduce the
variance of the most uncertain prediction:
( )
xnew = argmax V ar { ŷ(x) }
x∈X

Tommi Jaakkola, MIT AI Lab 14

Sequential selection: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

ŷ(x) = ŵ0 + ŵ1x + ŵ2x2

A priori selected inputs x1 = −1, x2 = 0, x3 = 1.

 T  
1 1
V ar { ŷ(x) } =  x  (XT X)−1  x 
   
x2 x2
 
1 x1 x21
where X =  1 x2 x22 
 
... ... ...

Tommi Jaakkola, MIT AI Lab 15

Example cont’d
1 1

0.9 0.9

0.8 0.8
output variance Var( y(x) )

output variance Var( y(x) )

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x

1 1

0.9 0.9

0.8 0.8

output variance Var( y(x) )

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x

Tommi Jaakkola, MIT AI Lab 16

Sequential selection: properties
• In the linear/additive regression context the variance cannot
increase anywhere as new points are added

C = (XT X)−1 covariance of ŵ

A = (XT X) inverse covariance
 T    T  
1 1 1 1
V ar { ŷ(x) } =  x  C  x  =  x  A−1  x 
       
x2 x2 x2 x2

The variance never increases for any point x if the eigenvalues

of the inverse covariance matrix A increase (or stay the same)
as we add new points

Tommi Jaakkola, MIT AI Lab 17

Brief derivation
New query point x0,
0 02
T 0 02

0 1x x 1x x
A =
X X
  T
1 1
= XT X +  x0   x0 
  
x02 x02
  T
1 1
= A +  x0   x0 
  
x02 x02

In other words, we add to A a matrix whose eigenvalues are

all non-negative ⇒ eigenvalues of A are non-decreasing

Tommi Jaakkola, MIT AI Lab 18

Active learning more generally
• To perform active learning we have to evaluate “the value
of new information”, i.e., how much we expect to gain from
querying another response
• Such calculations can be done in the context of almost any
learning task

we will revisit the issue later on in the course ...

Tommi Jaakkola, MIT AI Lab 19

Lecture 3
No ratings yet
Lecture 3
24 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
Class 2
No ratings yet
Class 2
107 pages
Supervised Learning: Linear Models
No ratings yet
Supervised Learning: Linear Models
34 pages
NN Theory
No ratings yet
NN Theory
138 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
ML 3
No ratings yet
ML 3
66 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Intelligent Systems
No ratings yet
Intelligent Systems
20 pages
ML Merge
No ratings yet
ML Merge
145 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Unit 1
No ratings yet
Unit 1
92 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
Lecture 13 - Kernels
No ratings yet
Lecture 13 - Kernels
5 pages
Lab NN KNN SVM
No ratings yet
Lab NN KNN SVM
13 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
92 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
56 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Lecture 1and2-Revision Part1
No ratings yet
Lecture 1and2-Revision Part1
53 pages
Neural Networks for MagLev Control
No ratings yet
Neural Networks for MagLev Control
6 pages
Lecture 04
No ratings yet
Lecture 04
28 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
No ratings yet
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
8 pages
5.1 ML Basics M1
No ratings yet
5.1 ML Basics M1
37 pages
Classification
No ratings yet
Classification
47 pages
Midterm f01
No ratings yet
Midterm f01
10 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Wilson2020 Part1
No ratings yet
Wilson2020 Part1
52 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
Lecture 03
No ratings yet
Lecture 03
47 pages
ML Repeated Questions With Solutions
No ratings yet
ML Repeated Questions With Solutions
47 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Andrew NG Main - Notes PDF
100% (1)
Andrew NG Main - Notes PDF
226 pages
Lecture1 2015
No ratings yet
Lecture1 2015
52 pages
Markov Chains To Model Genetic Algorithms
100% (2)
Markov Chains To Model Genetic Algorithms
1 page
AP Physics 1 - CH 3 Projectile Motion
No ratings yet
AP Physics 1 - CH 3 Projectile Motion
35 pages
Part 3 - Matrix Algebra - Matrix Operations
No ratings yet
Part 3 - Matrix Algebra - Matrix Operations
18 pages
OFM Plots PDF
100% (1)
OFM Plots PDF
71 pages
LM Add Maths Section 4 LVersion
No ratings yet
LM Add Maths Section 4 LVersion
26 pages
Lecture 8-Shearing Forces and Bending Moments in Beams (DELIVERED)
No ratings yet
Lecture 8-Shearing Forces and Bending Moments in Beams (DELIVERED)
30 pages
Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF Download
100% (2)
Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF Download
129 pages
Revision Guide EoY1 Stage 6
No ratings yet
Revision Guide EoY1 Stage 6
6 pages
LMTD Correction Factor Guide
No ratings yet
LMTD Correction Factor Guide
3 pages
MR-Class A Python Tool For Brain MR Image Classifi
No ratings yet
MR-Class A Python Tool For Brain MR Image Classifi
18 pages
Mathematics PDF
No ratings yet
Mathematics PDF
200 pages
Instant Access To Waves and Particles Two Essays On Fundamental Physics 1st Edition Roger G Newton Ebook Full Chapters
100% (15)
Instant Access To Waves and Particles Two Essays On Fundamental Physics 1st Edition Roger G Newton Ebook Full Chapters
85 pages
Discrete Math Final Review
No ratings yet
Discrete Math Final Review
58 pages
Consciousness and MetaPhysics - Foreword
No ratings yet
Consciousness and MetaPhysics - Foreword
4 pages
Advanced Optics for Geologists
No ratings yet
Advanced Optics for Geologists
22 pages
Flexmark User English PDF
No ratings yet
Flexmark User English PDF
49 pages
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
No ratings yet
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
2 pages
Collision Physics Notes
No ratings yet
Collision Physics Notes
61 pages
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
No ratings yet
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
11 pages
Question Paper PDF
No ratings yet
Question Paper PDF
2 pages
Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration
No ratings yet
Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration
11 pages
Bitsat Syllabus 2020 PDF: Subjects Type of Exam No of Questions
No ratings yet
Bitsat Syllabus 2020 PDF: Subjects Type of Exam No of Questions
3 pages
Goldberger Watson Collision Theory
No ratings yet
Goldberger Watson Collision Theory
2 pages
MATLAB for Thermal Coating Analysis
No ratings yet
MATLAB for Thermal Coating Analysis
12 pages
Arguments
No ratings yet
Arguments
16 pages
The Normal Distribution
No ratings yet
The Normal Distribution
26 pages
Hill Climbing Algorithm in AI
No ratings yet
Hill Climbing Algorithm in AI
5 pages
Ddco m-2
No ratings yet
Ddco m-2
2 pages
Background: 1.1. DNA - Deoxyribonucleic Acid
No ratings yet
Background: 1.1. DNA - Deoxyribonucleic Acid
19 pages
Graphical Solution - Mohr'S Stress Circle
No ratings yet
Graphical Solution - Mohr'S Stress Circle
4 pages

Lecture 4

Uploaded by

Lecture 4

Uploaded by

Machine learning: lecture 4

Tommi Jaakkola, MIT AI Lab 2

Tommi Jaakkola, MIT AI Lab 3

Tommi Jaakkola, MIT AI Lab 4

Tommi Jaakkola, MIT AI Lab 5

where i ∼ N (0, σ 2).

Tommi Jaakkola, MIT AI Lab 6

• The resulting parameter estimates, ŵ = (XT X)−1XT y,

Tommi Jaakkola, MIT AI Lab 7

Tommi Jaakkola, MIT AI Lab 8

• For example, we can find the inputs that minimize the

Tommi Jaakkola, MIT AI Lab 9

distribution is a function of only 2

Tommi Jaakkola, MIT AI Lab 10

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?

Tommi Jaakkola, MIT AI Lab 11

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?

Tommi Jaakkola, MIT AI Lab 12

Tommi Jaakkola, MIT AI Lab 13

– the noise variance σ 2 only affects the overall scale (set to

Tommi Jaakkola, MIT AI Lab 14

ŷ(x) = ŵ0 + ŵ1x + ŵ2x2

A priori selected inputs x1 = −1, x2 = 0, x3 = 1.

Tommi Jaakkola, MIT AI Lab 15

output variance Var( y(x) )

output variance Var( y(x) )

Tommi Jaakkola, MIT AI Lab 16

C = (XT X)−1 covariance of ŵ

The variance never increases for any point x if the eigenvalues

Tommi Jaakkola, MIT AI Lab 17

In other words, we add to A a matrix whose eigenvalues are

Tommi Jaakkola, MIT AI Lab 18

we will revisit the issue later on in the course ...

Tommi Jaakkola, MIT AI Lab 19

You might also like

where i ∼ N (0, σ 2).