LINEAR REGRESSION ANALYSIS
MODULE – VII
             Lecture – 25
 Generalized and Weighted
 Least Squares Estimation
                    Dr. Shalabh
      Department of Mathematics and Statistics
        Indian Institute of Technology Kanpur
                                                                                                                                       2
The usual linear regression model assumes that all the random error components are identically and independently
distributed with constant variance. When this assumption is violated, then ordinary least squares estimator of regression
coefficient looses its property of minimum variance in the class of linear and unbiased estimators. The violation of such
assumption can arise in anyone of the following situations:
       1.     The variance of random error components is not constant.
       2.     The random error components are not independent.
       3.     The random error components do not have constant variance as well as they are not independent.
In such cases, the covariance matrix of random error components does not remain in the form of an identity matrix but can
be considered as any positive definite matrix. Under such assumption, the OLSE does not remain efficient as in the case of
identity covariance matrix. The generalized or weighted least squares method is used in such situations to estimate the
parameters of the model.
In this method, the deviation between the observed and expected values of yi is multiplied by a weight            ωi      where   ωi
is chosen to be inversely proportional to the variance of yi.
                                                                                           n
For simple linear regression model, the weighted least squares function is S ( β 0=
                                                                                  , β1 )   ∑ω ( y − β       − β1 xi ) .
                                                                                                                     2
                                                                                               i   i    0
The least squares normal equations are obtained by differentiating S ( β 0 , β1 ) with respect to β 0 and β1 and equating
them to zero as              n       n         n
                        βˆ0 ∑ ωi +βˆ1 ∑ ωi xi =
                                              ∑ ωi yi
                    =i 1 =i 1 =i 1
                             n           n         n
                         0 ∑ ωi xi +β1 ∑ ωi xi =
                        βˆ                     ∑ ωi xi yi .
                                     ˆ       2
                   =i 1 =i 1 =i 1
Solution of these two normal equations give the weighted least squares estimate of β 0 and β1 .
                                                                                                                           3
     Generalized least squares estimation
    Suppose in usual multiple regression model
       X β + ε with E (ε ) ==
      y=                   0, V (ε ) σ 2 I ,
    the assumption V (ε ) = σ 2 I is violated and become
                                           V (ε=
                                               ) σ 2Ω
    where   Ω is a known n × n nonsingular, positive definite and symmetric matrix.
                    Ω incorporates both the cases.
    This structure of
             when Ω is diagonal but with unequal variances and
             when Ω is not necessarily diagonal depending on the presence of correlated errors, then the off - diagonal
              elements are nonzero.
    The OLSE of     β   is
                 b = ( X ' X ) −1 X ' y.
    In such cases OLSE gives unbiased estimate but has more variability as
=E (b) ( X=                     X ' X ) −1 X ' X β β
          ' X ) −1 X ' E ( y ) (=
=                         y ) X ( X ' X ) −1 σ 2 ( X ' X ) −1 X ' Ω X ( X ' X ) − 1 .
 V (b) ( X ' X ) −1 X 'V (=
    Now we attempt to find better estimator as follows:
                                                                                            4
Since Ω is positive definite, symmetric, so there exists a nonsingular matrix K such that
                     KK ' = Ω .
Then in the model
                     y X β +ε,
                     =
premutliply by K −1 gives
         K −1 y K −1 X β + K −1ε
         =
or
       z Bβ + g
       =
 =
where
         −1
      z K=        −1
            y, B K=  X , g K −1ε . Now observe that
=        −1)
E ( g ) K=   E (ε ) 0
           E { g − E ( g )}{ g − E ( g )} '
     V (g) =
           = E ( gg ')
           = E  K −1εε ' K '−1 
           = K −1 E (εε ') K '−1
     = σ 2 K −1ΩK '−1
           = σ 2 K −1 KK ' K '−1
           = σ 2I.
Thus the elements of g have 0 mean, common variance σ and they are uncorrelated.
                                                     2
                                                                                   5
So either minimize
      S (β ) = g ' g
           = ε ' Ω −1ε
             =( y − X β ) ' Ω −1 ( y − X β )
and get normal equations as
    (X'Ω-1 X ) βˆ =
                  X ' Ω −1 y
or βˆ =Ω
      ( X ' −1 X ) −1 X ' Ω −1 y.
Alternatively, we can apply OLS to transformed model and obtain OLSE of   β   as
           βˆ = ( B ' B) −1 B ' z
              = ( X ' K '−1 K −1 X ) −1 X ' K '−1 K −1 y
              ( X ' −1 X ) −1 X ' Ω −1 y.
              =Ω
This is termed as generalized least squares estimator (GLSE) of   β   .
The estimation error of GLSE is
  =βˆ ( B ' B ) −1 B '( Bβ + g )
                  = β + ( B ' B ) −1 B ' g
          or βˆ − β =
                    ( B ' B ) −1 B ' g .
                                                                                                                       6
Then
     E ( βˆ − β ) ( B ' B=
     =                   ) −1 B ' E ( g ) 0
which shows that GLSE is an unbiased estimator of                    β . The covariance matrix of GLSE   is given by
                            {               }{
                     E  βˆ − E ( βˆ ) βˆ − E ( βˆ ) ' 
            V ( βˆ ) =
                                                          }
                    = E ( B ' B ) −1 B ' gg ' B '( B ' B ) −1 
                     = ( B ' B ) −1 B ' E ( gg ') B '( B ' B ) −1.
Since
             E ( gg ') = K −1 E (εε ') K '−1
               = σ 2 K −1ΩK '−1
                       = σ 2 K −1 KK ' K '−1
                       = σ 2I,
so
           V ( βˆ ) = σ 2 ( B ' B ) −1 B ' B( B ' B ) −1
                   = σ 2 ( B ' B ) −1
                   = σ 2 ( X ' K ' −1 K −1 X ) −1
           = σ 2 ( X ' Ω −1 X ) −1 .
Now we prove that GLSE is the best linear unbiased estimator of                  β.
                                                                                                                       7
  The Gauss-Markov theorem for the case Var (ε ) = Ω
 The Gauss-Markov theorem establishes that the generalized least-squares (GLS) estimator of        β given by
    ( X ' −1 X ) −1 X ' Ω −1 y , is BLUE (best linear unbiased estimator). By best β , we mean that β̂ minimizes the
 βˆ =Ω
 variance for any linear combination of the estimated coefficients,  ' βˆ . We note that
            E ( βˆ ) =E ( X ' Ω −1 X ) −1 X ' Ω −1 y 
                    ( X ' −1 X ) −1 X ' Ω −1 E ( y )
                    =Ω
                    ( X ' −1 X ) −1 X ' Ω −1 X β
                    =Ω
                    = β.
 Thus   β̂ is an unbiased estimator of β .
The covariance matrix of          β̂ is given by
    V ( βˆ ) =
             ( X ' Ω X ) X ' Ω  V ( y ) ( X ' Ω X ) X ' Ω  '
                      −1  −1    −1                   −1  −1    −1
           ( X ' Ω −1 X ) −1 X ' Ω −1  Ω ( X ' Ω −1 X ) −1 X ' Ω −1  '
           =
        = ( X ' Ω −1 X ) −1 X ' Ω −1  Ω Ω −1 X ( X ' Ω −1 X ) −1 
        = ( X ' Ω −1 X ) −1.
Thus,
           Var ( ' βˆ ) =  'Var ( βˆ )
               =  ' ( X ' Ω −1 X ) −1  .
                                                                                                                                             8
Let   β    be another unbiased estimator of β                   that is a linear combination of the data. Our goal, then, is to show that
Var ( ' β ) ≥  '( X ' Ω X )  with at least one  such that Var ( ' β ) ≥  '( X ' Ω −1 X ) −1  .
                          −1     −1
We first note that we can write any other estimator of β that is a linear combination of the data as
             ( X ' Ω −1 X ) −1 X ' Ω −1 + B  y + b*
           β=                                      0
where B is an p x n matrix and bo* is a p x 1 vector of constants that appropriately adjusts the GLS estimator to form
the alternative estimate. Then
                                 (
                      ( β ) E ( X ' Ω −1 X ) −1 X ' Ω −1 + B  y + bo*
                     E=                                                      )
                         = ( X ' Ω −1 X ) −1 X ' Ω −1 + B  E ( y ) + b0*
                         = ( X ' Ω −1 X ) −1 X ' Ω −1 + B  XB + b0*
                            =( X ' Ω −1 X ) −1 X ' Ω −1 X β + BX β + b0*
                            β + BX β + b0* .
                            =
Consequently, β is unbiased if and only if both b0 = 0 and BX = 0. The covariance matrix of                         β
                                                  *
                                                                                                                          is
                                     (
                  V ( β ) Var ( X ' Ω −1 X ) −1 X ' Ω −1 + B  y
                  =                                                      )
                        = ( X ' Ω −1 X ) −1 X ' Ω −1 + B  V ( y ) ( X ' Ω −1 X ) −1 X ' Ω −1 + B  '
                        = ( X ' Ω −1 X ) −1 X ' Ω −1 + B  Ω ( X ' Ω −1 X ) −1 X ' Ω −1 + B  '
                       = ( X ' Ω −1 X ) −1 X ' Ω −1 + B  Ω Ω −1 X ( X ' Ω −1 X ) −1 + B '
                        = ( X ' Ω −1 X ) −1 + BΩB '
                                                                                                                                             9
because BX = 0, which implies that ( BX
                                     =  ) ' X=
                                             ' B ' 0. Then
         Var ( ' β ) =  'V ( β )
                            (
                 =  ' ( X ' Ω −1 X ) −1 + BΩB '     )
                      =  '( X ' Ω −1 X ) −1  +  ' BΩB ' 
                  = Var ( ' βˆ ) +  ' BΩB ' .
We note that     Ω is a positive definite matrix. Consequently, there exists some nonsingular matrix K such that Ω =K ' K As
a result, BΩB ' =
                BK ' KB ' is at least positive semidefinite matrix; hence,               ' BΩB '  ≥ 0.   Next note that we can define
* = KB ' .
As a result,
                                            p
                ' BΩB '  =   =*' *
                                          ∑
                                           i =1
                                                  *2
                                                  i
which must be strictly greater than 0 for some                 ≠ 0 unless B = 0.
For    ( 0,   0,..., 0,1, 0, …, 0 ) where 1 occurs at the ith place,  ' βˆ = βˆi is the best linear unbiased estimator of  ' β = βi for
all i = 1, 2, …, k. Thus, the GLS estimate of            β     is the best linear unbiased estimator.
                                                                                                                          10
Weighted least squares estimation
When   ε 's   are uncorrelated and have unequal variances, then
                        1                   
                        ω       0   0  0
                         1                  
                                1           
                      2 
                           0         0  0
       V (ε )= σ Ω= σ
                2
                                ω2             .
                                            
                                     
                                        1
                        0       0   0      
                                       ωn 
The estimation procedure is usually called as weighted least squares.
Let W = Ω −1 then the weighted least squares estimator of β is obtained by solving normal equation ( X 'WX ) βˆ = X 'Wy
which gives βˆ = ( X 'WX ) X 'Wy where
                          −1
                                          ω1 , ω2 ,..., ωn are called the weights.
The observations with large variances usual have smaller weights than observations with small variance.