KEMBAR78
Lecture 4 | PDF | Determinant | Regression Analysis
0% found this document useful (0 votes)
6 views19 pages

Lecture 4

This lecture covers active learning in the context of supervised learning, focusing on the selection of input examples to minimize uncertainty in parameter estimates and predictions. It discusses the formulation of active learning, selection criteria, and the application of these criteria in both batch and sequential methods. The lecture emphasizes the importance of choosing the right function class and selection criteria for effective active learning, particularly in linear regression scenarios.

Uploaded by

abdul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views19 pages

Lecture 4

This lecture covers active learning in the context of supervised learning, focusing on the selection of input examples to minimize uncertainty in parameter estimates and predictions. It discusses the formulation of active learning, selection criteria, and the application of these criteria in both batch and sequential methods. The lecture emphasizes the importance of choosing the right function class and selection criteria for effective active learning, particularly in linear regression scenarios.

Uploaded by

abdul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine learning: lecture 4

Tommi S. Jaakkola
MIT AI Lab
Topics
• Active learning and regression
– formulation
– selection criteria
– examples

Tommi Jaakkola, MIT AI Lab 2


Active learning: rules of the game
• Supervised learning:
– (input,output) pairs are sampled from an unknown joint
distribution P (x, y)
• Active supervised learning:
– We select the input examples and the corresponding
outputs are sampled from an unknown conditional
distribution P (y|x)

Tommi Jaakkola, MIT AI Lab 3


Active learning
• Why active learning?
– we often need dramatically fewer training examples; the
time/cost of getting enough training examples may be
otherwise prohibitive
• Dangers of (this type of) active learning
– since we select the inputs, we may focus on inputs that
are unimportant, rare, or even invalid

Tommi Jaakkola, MIT AI Lab 4


Active learning
• We need to decide:
1. the function class (the result will be highly dependent on
what we wish to learn)
2. the selection criterion, i.e., how we decide which inputs are
worth querying
3. how to apply the selection criterion (sequential or batch)
• Function class: we’ll focus on linear/polynomial regression

y = w0 + w1x + ,  ∼ N (0, σ 2)

Tommi Jaakkola, MIT AI Lab 5


Active linear regression
• We perform the selection of inputs to uncover the (assumed)
“true” underlying linear relation:
     
y1 1 x1  ∗  1
 w0
 ···  =  ··· ···  +  ··· 
    

w1
yn 1 xn n

y = Xw∗ + 

where i ∼ N (0, σ 2).


• We need to first understand how our parameter estimates
relate to w∗ as a function of inputs

Tommi Jaakkola, MIT AI Lab 6


Properties of regression models
• The outputs corresponding to the inputs arranged in X are
assumed to be generated according to:

y = Xw∗ + ,  ∼ N (0, I · σ 2 )

• The resulting parameter estimates, ŵ = (XT X)−1XT y,


based on the same inputs X and sampled outputs y are
normally distributed:
∗ 2 T −1

ŵ ∼ N w , σ (X X)

Tommi Jaakkola, MIT AI Lab 7


Active learning: selection criterion
• Two main types of selection criteria
1. select inputs so as to minimize some measure of uncertainty
in the parameters
2. select inputs to minimize the uncertainty in the predicted
outputs
• Two main ways of applying such criteria
1. batch – all the inputs are chosen prior to seeing any
responses
2. sequential – the next query input is chosen with the full
knowledge of all the responses so far

Tommi Jaakkola, MIT AI Lab 8


Batch selection, parameter criterion
We have to select the input examples prior to seeing any
outputs
• We wish to find n inputs x1, . . . , xn (which determine the
matrix X) so as to minimize a measure of uncertainty in the
resulting parameters ŵ
∗ 2 T −1

ŵ ∼ N w , σ (X X)

• For example, we can find the inputs that minimize the


determinant of the covariance matrix
 T −1 
det (X X)

Tommi Jaakkola, MIT AI Lab 9


Determinant as a measure of “volume”
• Any covariance matrix has an eigen-decomposition:
 
σ12
 T
C = R ... R

2
σm
where the orthonormal rotation matrix R specifies the
principal axes of variation and each eigenvalue σi2 gives
the variance along one of the principal directions
• The “volume” of a Gaussian 2.5

distribution is a function of only 2

1.5

σi2, i = 1, . . . , m. Specifically 1

0.5

0
m √
Y −0.5

“volume” ∝ σi = det C −1

−1.5

i=1 −2

−2.5
−3 −2 −1 0 1 2 3

Tommi Jaakkola, MIT AI Lab 10


Determinant criterion: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−1 −0.5 0 0.5 1

Tommi Jaakkola, MIT AI Lab 11


Determinant criterion: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

f (x; w) = w0 + w1x + w2x2

For n = 4, what points would we select?


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−1 −0.5 0 0.5 1

x1 = −1, x2 = 0, x3 = 0, x4 = 1

Tommi Jaakkola, MIT AI Lab 12


Sequential selection, uncertainty in predictions
The next input is chosen on the basis of all the information
available so far
• The prediction at a new point x is
 T
1
ŷ(x) = ŵ0 + ŵ1x = ŵ
x
The variance in this prediction (due to the noise in the
outputs observed so far) is
 T  
1 1
V ar { ŷ(x) } = Cov(ŵ)
x x
 T  
2 1 T −1 1
= σ (X X)
x x

Tommi Jaakkola, MIT AI Lab 13


Sequential selection cont’d

 T  
1 1
V ar { ŷ(x) } = σ 2 (XT X)−1
x x

– the noise variance σ 2 only affects the overall scale (set to


1 from hereafter)
– the variance is a function of previously chosen inputs, not
outputs!
• Assuming the input points are contained within, e.g., an
interval X , we can select the new point to reduce the
variance of the most uncertain prediction:
( )
xnew = argmax V ar { ŷ(x) }
x∈X

Tommi Jaakkola, MIT AI Lab 14


Sequential selection: example
• 1-d problem, 2nd order polynomial regression within x ∈
[−1, 1]

ŷ(x) = ŵ0 + ŵ1x + ŵ2x2

A priori selected inputs x1 = −1, x2 = 0, x3 = 1.


 T  
1 1
V ar { ŷ(x) } =  x  (XT X)−1  x 
   
x2 x2
 
1 x1 x21
where X =  1 x2 x22 
 
... ... ...

Tommi Jaakkola, MIT AI Lab 15


Example cont’d
1 1

0.9 0.9

0.8 0.8
output variance Var( y(x) )

output variance Var( y(x) )


0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x

1 1

0.9 0.9

0.8 0.8

output variance Var( y(x) )


output variance Var( y(x) )

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x

Tommi Jaakkola, MIT AI Lab 16


Sequential selection: properties
• In the linear/additive regression context the variance cannot
increase anywhere as new points are added

C = (XT X)−1 covariance of ŵ


A = (XT X) inverse covariance
 T    T  
1 1 1 1
V ar { ŷ(x) } =  x  C  x  =  x  A−1  x 
       
x2 x2 x2 x2

The variance never increases for any point x if the eigenvalues


of the inverse covariance matrix A increase (or stay the same)
as we add new points

Tommi Jaakkola, MIT AI Lab 17


Brief derivation
New query point x0,
 0 02
T  0 02

0 1x x 1x x
A =
X X
  T
1 1
= XT X +  x0   x0 
  
x02 x02
  T
1 1
= A +  x0   x0 
  
x02 x02

In other words, we add to A a matrix whose eigenvalues are


all non-negative ⇒ eigenvalues of A are non-decreasing

Tommi Jaakkola, MIT AI Lab 18


Active learning more generally
• To perform active learning we have to evaluate “the value
of new information”, i.e., how much we expect to gain from
querying another response
• Such calculations can be done in the context of almost any
learning task

we will revisit the issue later on in the course ...

Tommi Jaakkola, MIT AI Lab 19

You might also like