Multiple Regression Model and Multicollinearity
Multiple Regression Model and Multicollinearity
Multiple regression model is the extension of the two variable linear regression model. In
multiple regression model we assume the dependent variable Y is a linear function of more
than one independent variables, such that Y i=β 0 + β 1 X 1 i+ β 2 X 2 i +… … ..+ β k X ki +ui . Without
loss of generality we can describe most of the aspects of multiple regression model
considering a model where dependent variable is a linear function of two independent
variables, such that Y i=β 0 + β 1 X 1 i+ β 2 X 2 i +ui .............................(1)
Like the two variable model here we assume that (1) Y is a linear function of X 1 and X2 and
ui (2) X1 and X2 are non-stochastic and there is no exact linear relationship between X 1 and
X2. (3) E(ui ¿=0 for every i (4) E( u i ) =σ for every i . It means that ui is homoscedastic. (5)
2 2
Then fitted model is Y^i= β^0 + β^1 X 1 i+ β^2 X 2 i .............................(3) such that e i=Y i−Y^i
∑ e2i =∑ (Y i−Y^i)2
¿ ∑ (Y i− ^
β 0− β^1 X 1 i− ^
2
β 2 X 2i )
δ ∑ ei
2
=−2 ∑ (¿ Y i− ^
β0 − ^
β 1 X 1 i− β^2 X 2 i)=0 ¿ .........................(3a)
δ^
β 0
δ ∑ ei
2
=−2 ∑ X 1 i ( Y i− ^
β 0− β^1 X 1 i− ^
β 2 X 2i ) =0 ....................(3b)
δ^
β 1
2
δ ∑ ei
2
=−2 ∑ X 2 i ( Y i− ^
β 0− β^1 X 1 i− ^
β 2 X 2i ) =0 ...................(3c)
δ^
β 2
∑ Y i=n ^
β 0+ ¿ ^
β1 ∑ X 1 i +¿ ^
β 2 ∑ X 2i ¿ ¿ ......................................(4)
Y=^
β0 + β^1 X 1 + ^
β 2 X 2 ..........................................(7)
∴ Y i−Y = ^
β1 ( X 1 i − X 1 ) + ^
β 2 ( X 2i −X 2 ) +e i
⇒ y i= β^1 x1 i + ^
β 2 x 2 i+ ei
⇒ ∑ x 1 i yi = ^
β 1 ∑ x 1 i+ β^2 ∑ x 1i x 2i + ∑ ei x1 i
2
^
β 1=
|
∑ x 1 i y i ∑ x1 i x 2i
=
|
∑ x 2 i y i ∑ x 22 i ∑ y i x 1i ∑ x 22 i−∑ y i x 2 i ∑ x1 i x2 i
.......................(10)
| ∑
| ∑
2 2
∑ 1 i ∑ 1i 2i
x
2
x x x 1 i x 2 i −¿ ¿ ¿ ¿ ¿
∑ x 1i x 2 i ∑ x 22 i
3
And ^
β 2=
|
∑ x12i
∑ x1 i x2i
∑ y i x 1i
|
∑ y i x 2i ∑ y i x 2i ∑ x 21 i−∑ y i x 1 i ∑ x1 i x2 i
= ......................(11)
|
∑ x 21 i
∑ x1 i x2i |
∑ x 1i x 2 i
∑ x 22 i
∑ x 21 i ∑ x 22 i−¿ ¿ ¿ ¿ ¿
^
β 0=Y − ^
β 1 X 1− ^
β2 X 2 ....................................(12)
Therefore, from the mean variance and covariance of sample observations for X 1 ∧X 2 we get
the values of ^
β 0 , β^1∧ β^2. ∴ Values of ^
β 0 , β^1∧ β^2 varies from sample to sample. It indicates
that estimators have separate sampling distribution.
First, we consider ^
β 1=
∑ y i x 1 i ∑ x 2 i−∑ y i x 2i ∑ x 1 i x 2 i =∑ a y
2
i i
A
x 1 i ∑ x2 i−x 2 i ∑ x 1 i x 2 i
2
∧ A=∑ x1 i ∑ x 2i −¿ ¿ ¿ ¿
2 2
Where a i=
A
^
β 1=
∑ x1 i {β 1 x 1 i + β 2 x 2 i+u i−u }∑ x 2 i−∑ x 2 i { β1 x 1i + β 2 x 2 i +ui−u }∑ x1 i x 2i
2
¿β +
∑ (ui−u) x1 i ∑ x 2i −∑ (u i−u)x 2 i ∑ x 1 i x 2 i
2
1
A
Thus ^
β 1 is an unbiased estimator of β 1.
And ^
β 0=β 0 + β 1 X 1+ β2 X 2 +u− ^
β1 X 1− β^2 X 2
4
¿ β 0−( ^
β1 −β1 ) X 1−( β^2−β 2 ) X 2 +u
∴^
β 1 is a linear function of y i and thereby linear in Y i∧u i.
Var ( ^
β 1 )=E ¿ ¿
¿ E(
A
1
¿ 2
E¿
A
−2 ∑ x 1i (u i−u) ∑ x 2 i (ui−u) ∑ x 2 i ∑ x 1 i x 2 i ¿
2
1
¿ 2
E¿
A
−2 {∑ x 1 i ui −u ∑ x1 i }{∑ x2 i ui−u ∑ x 2 i }∑ x 2i ∑ x 1i x 2i ¿ ¿
2
1
¿ 2
E¿
A
1
¿ 2
¿
A
1
¿ 2
¿
A
[Since E ¿ ¿
for every i≠ j
Similarly, E ¿ ¿
5
And E ¿
¿σ
2
∑ x1 i x2 i ¿
β 1) =σ 2 ∑ x 22 i {∑ x 21i ∑ x 22 i−¿ ¿ ¿ ¿ ¿ ¿
∴ Var ( ^
¿
∑ x 22 i
σ
2
∑ x12i ∑ x 22i −¿ ¿ ¿ ¿ ¿
2
σ
¿
∑ x12i ¿ ¿ ¿
Similarly we can derive
2
σ
Var ( ^
β 2 )=
∑ x 2 i (1−r 212 )
2
2
σ
Thus if r 12=0 i.e. if X 1 ∧X 2 are not linearly correlated then Var ( ^
β 1 )= and
∑ x 21 i
2
σ
Var ( ^
β 2 )= but it is a rare case, but if the degree of linear association between the
∑ x 22 i
1
independent variables increases Var ( ^
β 1 ) and Var ( ^
β 2 ) will increase. That is why 2 is
1−r 12
called variance inflating factor (VIF) which is a positive function of r 212 [0<r 212 <1]
Gauss-Markov Theorem:
^
^
β 1= ∑ w i Y i
¿ ∑ wi ( β0 + β 1 X 1 i+ β2 X 2 i +ui )
¿ β 0 ∑ wi + β 1 ∑ wi X 1 i + β 2 ∑ wi X 2 i+ ∑ wi ui
6
And E ^ ( )
β 1 =β1 if ∑ w i=0 ....................................(i)
^
∑ wi X 1 i=1 ................................(ii)
∑ wi X 2 i=0 ...............................(iii)
Therefore, ^
^
β 1 is a linear and unbiased estimator of β 1 if conditions (i), (ii) and (iii) are
satisfied.
^ is unbiased by assumption.]
Now Var ^
^ ( )^ ^ 2
( )
β1 =E [ β^1−E β^1 ] =E ¿ ¿ [As ^
β1
¿ E¿
¿σ
2
∑ w 2i + 0=σ 2 ∑ w2i
Now to find out the minimum variance unbiased estimator of β 1 we choose w ithat minimises
L=∑ wi − λ1 ∑ wi− λ2 ¿ ¿ ¿
2
δL
=2 wi−λ 1−λ 2 X 1 i− λ3 X 2 i=0 .............................................(15)
δ wi
(15) ⇒w i=
λ1 + λ 2 X 1 i + λ 3 X 2i
2
n λ 1+ λ2 ∑ X 1 i+ λ3 ∑ X 2 i=0 .................................(16a)
λ 1 ∑ X 1i + λ 2 ∑ X 1i + λ 3 ∑ X 1 i X 2i=2 ...........................(16b)
2
λ 1 ∑ X 2i + λ 2 ∑ X 1i X 2 i + λ 3 ∑ X 2i =0 ...........................(16c)
2
λ 1=−λ2 X 1− λ3 X 2
−λ 2 X 1 n X 1− λ3 X 2 n X 1+ λ2 ∑ X 1 i + λ3 ∑ X 1 i X 2 i=2
2
⇒ λ 2 [ ∑ X 1i −n X 1 ] + λ 3 [ ∑ X 1i X 2 i−n X 1 X 2 ]=2
2 2
⇒ λ 2 ∑ x 1 i + λ3 ∑ x 1 i x 2 i=2 ..........................................(16b’)
2
⇒ λ 2 [ ∑ X 1 i X 2i −n X 1 X 2 ] + λ3 [ ∑ X 22i −n X 22 ]=0
⇒ λ 2 ∑ x 1 i x 2 i + λ3 ∑ x 2 i=0 ...........................................(16c’)
2
Solving (16b’) and (16c’) we get λ 2and λ 3, then putting λ 2and λ 3 in (16a) we get λ 1.
2( X 2 ∑ x1 i x2 i− X 1 ∑ x 2 i)
2
λ 1=
A
2 ∑ x2 i
2
λ 2=
A
−2 ∑ x 1 i x 2 i
And λ 3=
A
1
Hence w i= ( λ1 + λ2 X 1 i + λ3 X 2 i )
2
X 1 i ∑ x 2 i + X 2 ∑ x 1i x 2i− X 1 ∑ x2 i− X 2 i ∑ x 1 i x 2 i
2 2
¿
A
x1 i ∑ x 2i−x 2 i ∑ x1 i x 2i
2
¿
A
∴ w i=ai
^
⇒ β^1=∑ wi Y i=∑ a i Y i
¿ ∑ a i ( y i +Y ) =∑ a i y i +Y ∑ ai
¿ ∑ ai yi ¿ ¿
¿^
β1
-----
Cov ( ^
β1 ^
β 2) =E {β[
^ −E ( β^ ) { ^
1 1 }
β 2−E ( β^2 ) }¿
¿ E ( β^1−β 1 ) ( β^2−β 2 )
1
¿ 2
E¿
A
1
¿ 2
E¿
A
1
¿ 2
E¿
A
+ ∑ x 1 i u i ∑ x 2 i ui ¿ ¿ ¿ ¿
1
¿ 2
¿
A
1 2
¿− 2
σ ∑ x 1i x 2i ¿
A
9
−σ ∑ x 1i x 2i
2
¿
−σ
2
∑ x 1i x 2i = √ ∑ x 21 i ∑ x 22 i = 2
−σ r 12
A √∑ x21 i ∑ x 22i (1−r 212) √∑ x 21 i ∑ x22 i (1−r 212)
Finally, Var ( ^
β 0 ) =E[ ^
β 0−E ( β^0 ) ] =E( β^0 −β 0)
2 2
¿ E {−( ^
β1−β 1 ) X 1 −( ^
β 2−β 2 ) X 2 +u }
2
¿ E( β^1−β 1 )2 X 12 + E( ^
β2 −β2 )2 X 22+ E (u)2+2 E {( ^
β 1−β 1)( ^
β 2−β 2) }X 1 X 2 +0+0 ¿
[ Since E ( u )=0 ¿
2
σ
¿ X 1 Var ( ^
β 1) + X 2 Var ( ^
β2 ) + +2 X 1 X 2 Cov ( β^1 β^2 )
2 2
n
¿ E {( ^
β 0−β 0 )( ^
β 1−β 1 ) }
¿ E { u−( ^ β 2−β 2 ) X 2 } { ^
β1 −β1 ) X 1−( ^ β 1−β 1 }
¿ E [u ( β^1−β 1 ) −( ^
β 1−β 1 ) X 1−( ^
β1− β1 )( ^
β2 − β2 ) X 2 ]
2
[
¿ E u(^
β1− β1 ) −X 1 Var ( ^
β 1) − X 2 Cov ( ^
β1 ^
β2) ]
¿ E¿
1
¿ E¿
nA
1
¿ E¿
nA
−X 2 Cov ( ^
β1 ^
β 2)
1
¿ ¿
nA
¿ 0−X 1 Var ( ^
β1 ) −X 2 Cov ( ^ β 2) [Since ∑ x 1i =0=∑ x2 i ]
β1 ^
10
∴ Cov ( β^0 ^
β 1) =− X 1 Var ( β^1 ) −X 2 Cov ( ^
β 1 β^2 )
[
¿− X 1 Var ( ^
β 1 ) + X 2 Cov ( ^
β1 ^
β 2) =−[]X1 σ
2
∑ x 22 i − X 2 σ 2 ∑ x 1 i x 2 i ]
A A
2
σ
[ X 2 ∑ x 1 i x 2 i−X 1 ∑ x 2 i ]
2
¿
A
Similarly, Cov ( ^
β0 ^
β 2 )=−[ X 1 Cov ( β^1 β^2 ) + X 2 Var ( β^2 ) ]
2
σ
¿ ¿
A
Thus the variance and covariance of the estimators are expressed in terms of σ 2, the true
variance of the disturbance which is an unknown parameter.
¿ y i− β^1 x 1 i− β^2 x 2 i
∴ ∑ e i =∑ (β1 x 1i + β 2 x 2 i +ui−u− ^
β 1 x 1 i− ^
2 2
β 2 x 2 i)
¿ ∑ {−( ^
β1 −β1 ) x 1 i−( β^2−β 2 ) x 2 i+ ( ui−u ) }
2
¿∑ (^
β1 −β1 ) x 1 i+ ∑ ( β^2−β 2 ) x 2 i+ ∑ (ui−u) −2( ^
β 1−β 1) ∑ x1 i (ui −u)
2 2 2 2 2
∴ E ( ∑ e 2i ) =Var ( ^
β 1) ∑ x 21 i +Var ( ^
β 2 ) ∑ x 22i +2 Cov ( β^1 ^
β2 ) ∑ x 1 i x 2 i
β 1−β 1 ) ∑ x 1i ( ui−u )
2 E (^
¿2 E [ A ]
∑ x1 i ui ∑ x 22 i−∑ x 2 i u i ∑ x 1 i x 2 i ∑ x ( u −u )
1i i
11
¿2 E [ A 1i i ]
∑ x1 i ui ∑ x 22 i−∑ x 2 i u i ∑ x 1 i x 2 i [ ∑ x u ] [Since u ∑ x =0]
1i
2
¿ E¿
A
2
¿ E¿
A
2
¿ ¿
A
2
2 Aσ
¿ ¿
A
2
σ
Var ( ^
β 1 ) ∑ x 1i = ∑ x 2i ∑ x 1 i
st 2 2 2
Now, 1 term
A
2
σ
Var ( ^
β 2 ) ∑ x 2i = ∑ x 21 i ∑ x 22 i
2
2nd term
A
2
σ
3rd term 2 Cov ( β^1 ^
β2 ) ∑ x 1 i x 2 i=−2 ¿ ¿
A
2 2
Finally E(u i−u) =(n−1)σ
2 2 2
σ σ
∴ E (∑ e i )=
2
A
∑ x 2i ∑ x 1 i+
2 2
A
∑ x 21i ∑ x 22i −2 σA ¿ ¿
2
2σ
¿ ¿
A
2
2σ 2 2 2
¿ . A−2 σ −2 σ + ( n−1 ) σ
A
2 2 2 2 2
¿ 2 σ −2 σ −2 σ + ( n−1 ) σ =(n−3)σ
∴E ( ) ∑ e 2i
n−3
=σ
2
12
2
∴ σ^ =
∑ 2
ei
is the unbiased estimator of σ 2 in the three variable model.
n−3
^
β 1 measures the effect of unit change in X 1on fitted Y assuming the effect of X 2 held
constant. It is not the same as we estimate of the coefficient of X 1 when we consider only one
regressor X 1.
Now what is the meaning of the effect of X 2 hold constant, when we compute the partial
effect of X 1 on Y. To answer this question we show that the estimated coefficients, say, ^
β 1 in
the three variables model can be calculated by performing regression of adjusted Y on
adjusted X 1 .
First we compute adjusted Y, adjusted Y is the part of Y which is not explained by X 2 here.
To get adjusted Y we run the regression
Y i=α 0 + α 1 X 2 i+ v i
¿ y i=α 1 x 2 i +v i −v
And y i= α^1 x2 i + ^
vi
^
v i is the part of y i which is not explained by X 2 .
X 1 i=γ 0 + γ 1 X 2 i +w i
¿ x 1 i=γ 1 x 2 i+ wi−w
∴w
^i is the adjusted X 1 .
v i=δ 0+ δ 1 w
^ ^i + ε i
δ^1=
∑( γ 1 x 2 i) ( y i−^
x 1i −^ α 1 x 2 i)
∑ (x 1i− γ^1 x 2i )2
∑ x 1i x 2i x y −α^ x x + α^ γ^ x 2
∑ x1 i y i− ∑ 2i i 1∑ 1i 2i 1 1∑ 2i
¿
∑ x 22 i
( ) ( )
2
∑ x1 i x2 i ∑ x1 i x 2i x x
∑ x 1i +
2
∑ 2
x 2i −2 ∑ 1i 2i
∑ x2 i2
∑ x 22 i
∑ x 1i x 2i x y −α^ x x + α^ ∑ x1 i x2 i x 2
∑ x1 i y i− ∑ 2i i 1∑ 1i 2i 1 ∑ 2i
¿
∑ x 22 i ∑ x 22 i
∑
( ) ( )
∑ x1 i x 2i x x
2
x1 i x2 i
∑ x 1i +
2
∑ 2
x 2i −2 ∑ 1i 2i
∑ x2 i 2
∑ x 22 i
∑ x 1i y i ∑ x 22 i−∑ x 2 i yi ∑ x 1 i x 2 i
¿
∑ x1 i y i ∑ x 22 i−∑ x 2 i y i ∑ x 1 i x 2 i = ∑ x 21i ∑ x 22 i =
by 1−by 2b 21 ^
=β1
∑ 1i∑ 2i ∑ 1i 2i ∑ 1i∑ 2i ∑ 1i 2i
2 2
x
2
x
2
−( x x ) x
2
x
2
−( x x ) 1−b 12 b 21
∑ x 21i ∑ x 22 i
[Using equation (10)]
We use R2 as a measure of goodness of fit in the multiple regression model. R2 measures the
proportion of variations in Y which is explained by the multiple regression equation. Usually
2
R is used to compare the validity of regression results under alternative specification of the
explanatory variables in the model using same data.
To explain the goodness of fit decompose the variance of Y into two parts: - Explained in the
equation and unexplained part.
14
We have y i= β^1 x1 i + ^
β 2 x 2 i+ ei
y i= β^1 x1 i + ^
And ^ β2 x2 i
∴ y i=^
y i +e i
∑ yi2=∑ ( ^yi + ei )2
¿∑ ^
yi +∑ ei + 2 ∑ ^
2 2
yi ei
But ∑ ^
yi e i=∑ ( ^
β1 x 1i + ^
β 2 x 2 i ) (¿ yi − ^
β1 x 1i − ^
β 2 x 2i )¿
¿^
β1 ∑ x1 i yi − ^
β1 ∑ x 1i − ^
β1 ^
β 2 ∑ x 1 i x 2 i+ ^
β2 ∑ x 2 i y i−¿ β^1 ^
β2 ∑ x 1i x 2i − ^
β 2 ∑ x 2i ¿
2 2 2 2
∑ x 1 i yi =¿ ^β1 ∑ x 21 i+ ¿ β^2 ∑ x 2i x 1i ¿ ¿
∴^
β 1 ∑ x 1 i y i=¿ β^1 ∑ x1 i + ^
β1 ^
β 2 ∑ x1 i x2 i ¿
2 2
Similarly, ^
β 2 ∑ x 2 i y i= ^
β1 ^
β2 ∑ x1 i x2 i + ^
β2 ∑ x2i
2 2
∴∑ ^
y i ei=0
∴ ∑ y i =∑ ^
yi +∑ ei
2 2 2
⇒ TSS=ESS+ RSS
ESS RSS
∴ =1−
TSS TSS
2 RSS
⇒ R =1−
TSS
It measures the proportion of total variation in Y that is explained by the OLS regression
equation. Then higher value of R2 indicates the better fit of the model. Note that R2 is actually
the square of multiple correlation coefficient of Y on X 1and X 2 . Multiple correlation of Y on
X 1and X 2 is defined as the simple correlation between Y and Y^ where Y^ is the fitted values of
Y when it is regressed on X 1 and X 2 . Thus by definition
15
R=
Cov( Y i Y^i )
=
∑ ( Y i−Y ) (Y^i−Y^ ) = ∑ y i ^y i
√ Var ( Y ) Var (Y^ )
i i √∑ (Y i −Y )2 ∑ (Y^i−Y^ )2 √∑ y 2i ∑ ^y i2
( ∑ yi ^
2
2 yi )
∴R = =¿ ¿ ¿
∑ y 2i ∑ ^y i2
¿
∴ By definition0 ≤ R2 ≤ 1
However, there are some problems with the use of R2 as goodness of fit. (1) R2is a non
negative function of the number of regressor in the model. If number of explanatory variables
increases R2 value usually increases. When number of explanatory variable increases RSS
reduces, and TSS remains unchanged so R2 increases. But if the number of explanatory
variable increases degrees of freedom reduces. Therefore, there is cost of increasing the
number of explanatory variable to improve the goodness of fit.
Finally, the use of R2 is often problematic when the model is estimated assuming 0 intercept.
In such case the R2 may not lie within the range of 0 to 1.
In order to solve the problem of comparison between two models of same dependent variable
using R2 we should use the concept of Adjusted R2, which is defined as
Thus adjusted R2=R 2 is the R2 adjusted with the degrees of freedom. Here n= sample size, 3
indicates the number of regressor including intercept. We divide ( n−3) to compute variance
of error because we have to use the 3 normal equation restrictions. So the degrees of freedom
for RSS is (n−3) in our three variable model. We divide (n−1) to compute variance of Y
because we have use the restriction of mean of Y.
∑ 2
2( ) =1−(1−R )(
n−k )
2 ei n−1 2 n−1
R =1− ....................................(17)
∑ y i n−k
Where k is the number of explanatory variable including intercept term.
Now if we add one extra explanatory variable, it reduces ∑ e2i but at the same time (n−k )
will be reduced. So R2 may increase or decrease or remain same.
If k > 1, then R2 ≥ R 2 and for some cases R2 may be negative, for example suppose R2=0.2 ,
n=30 and k =8 then R2=−0.05
F Statistic:
We know that R2 is the proportion of explained variation to the total variation in dependent
variable. Alternatively we may consider the proportion of explained variation to the residual
variation in the regression. It is zero when the regression equation explains none of the
variation of dependent variable and infinite when the whole of the variation is explained by
the regression. Now if we adjust the ratio of explained variation (ESS) to the residual
ESS/k −1 ESS n−k
variation (RSS) by their respective degrees of freedom we find F= = ×
RSS /n−k RSS k −1
where k is the number of explanatory variables including intercept and n is the number of
observations. F Statistic is used in two ways. First, it is used to test the hypothesis that none
of the explanatory variables explains the variation in explained variable about its mean.
Technically F Statistic is used to test the joint hypothesis H 0 : β 1=β 2=0 against the
alternative hypothesis H 1 : β1 ≠ β2 ≠ 0 in our three variable regression model. Therefore, F test
helps us to understand the overall significance of the regression. If the computed value of
F k−1 ,n−k is greater than the tabulated value at a given level of significance we reject the null
hypothesis, otherwise we cannot reject the null. Note that our F test may reject the null
hypothesis even though none of the regression coefficients are found to be significant in
respect of the individual t tests. This situation may occur, if the explanatory variables are
highly correlated i.e. if there is multicollinearity.
17
Second, F Statistic may be used to compare the different regressions. The equation with the
higher values of F can be regarded as the statistically better of the two alternative
regressions.
In general F test can be used to test the linear restriction in a general linear regression model
such that
Y = β0 + β 1 X 1 + β 2 X 2 + β 3 X 3 +… … … …+ βk X k +u ..................................(1)
Thus we have p restriction such that last p explanatory variable have no joint impact on Y.
Y = β0 + β 1 X 1 + β 2 X 2 + β 3 X 3 +… … … …+ βk −p X k−p +u ...........................(2)
Now we compute the RSS for the model (1) which is called unrestricted RSS denoted by
RSSUR . Compute RSS for the model (2) which is called restricted RSS and denoted by RSS R.
Since RSS R ≥ RSSUR to test the null hypothesis define a F Statistic
(RSS R −RSSUR )/ p
F p ,n−k = Where p=df R −df UR =n− ( k− p )−( n−k )
RSSUR /n−k
It is the F Statistic in our three variable case which is used to judge the overall significance of
the regression.
We can go for other test with multiple linear regression model in addition to individual t test
and overall F test. For example, consider the hypothesis
18
H 0 : β 1=β 2 Against H 1 : β1 ≠ β2
^
β1 − ^
β2
t n−k =
SE ( β − ^
^
1 β)
2
The variance of ¿ is
Var ¿
¿ E {( ^ β 2−β 2) }
2
β 1−β 1 )−( ^
¿ E( β^1−β 1 ) + E ( ^
β 2−β 2) −2 E( β^1−β 1 )( ^
2 2
β 2−β 2)
¿ Var ( ^
β 1 ) +Var ( ^
β 2 )−2Cov ( ^
β1, ^
β 2)
Multicollinearity
19
One important assumption in the formulation of multiple linear regression model is that in the
sample and therefore in the population, there are not exact linear relationship among the
explanatory variables. We should note that in a sample the correlation among the explanatory
variables is trivial. But if it find an exact linear relation among the explanatory variables, then
we say that the multiple linear regression model suffers from the problem of exact/ perfect
multicollinearity.
for every i≠ j i.e. ui is non auto-correlated or disturbances are independent, but X 2 i=b X 1 i.
Then we say that the model suffers from exact multicollinearity.
For example, suppose we estimate the food expenditure for households of a city as a function
of the children population ratio ( CP ) and adult population ratio ( AP ) in the households and per
capita income. Then children population ratio and adult population ratio have exact linear
C A
relation such that × 100=100− ×100 . Therefore, we should not include an explanatory
P P
variable which can be expressed as a linear function of other one or more explanatory
variables to avoid the problem of multicollinearity.
presence of collinearity does not hamper the proportion of OLS estimates. Only in case of
exact multicollinearity we estimate the coefficients parameters separately. Thus
multicollinearity is a matter of degree not kind, because the existence of linear association
among the explanatory variables is a common fact.
Note that multicollinearity problems deals only to linear relation among the explanatory
variables. It does not rule out the non linear relation among the explanatory variables. For
example, suppose we consider the production function Y i=α+ β Li+ γ Li2+ ui . Here L∧L2 are
non-linearly related but there is no multicollinearity. Further, the presence of
multicollinearity does not mean that the model is mis-specified.
Sources of Multicollinearity:
We know that if the explanatory variables in a multiple regression model are linearly related
then we face the problem of multicollinearity. The problem is serious when the degree of
correlation high or near perfect. If the correlation is perfect we cannot estimate the
coefficients separately. Problem of multicollinearity is a data problem i.e. even if the
explanatory variables are not correlated in population but in a sample they may be correlated.
Let us now state the sources of multicollinearity.
(d) Limited range of values of the variable: The researcher may face the problem of
multicollinearity adding polynomial terms of the regressor with limited number of
values.
(e) Data shortage: In many situations like medical science, where model may have large
number of explanatory variables which may be greater than the number of
observations, the investigator faces the problem of multicollinearity. For example,
determinants of a rare disease are several symptoms which may be greater than the
number of patients suffering from the disease.
If y i=a x i +c
( y i− y )=a (x i−x) [where ≠ 0 ]
then r xy =
∑ ( x i−x ) ( y i− y)
√ ∑ ( xi −x)2 ∑ ( y i− y)2
a ∑ (x i−x) a ∑ (x i−x )
2 2
¿ 2 = =1
√ a ∑ (x i−x)2 ∑ (x i−x )2 a ∑ (x i−x )
2
a ∑ ( xi −x)
2
¿ ≤ 1 if Var ( c i )=0 , i. e . c=constant
a ∑ (x i−x ) +
2
√ ∑ (x i−x) 2
∑ (c i−c) 2
∴|r xy|<1 when y i=a x i +c i. it is the case of near exact multicollinearity. Correlation
may be negative or positive depending on the sign of a.
Consequences of Multicollinearity:
explain the consequence of exact multicollinearity we consider our three variable model
Y i=β 0 + β 1 X 1 i+ β 2 X 2 i +ui where X 1 i=α X 2 i
^
β 1=
∑ y i x 1 i ∑ x 2 i−∑ y i x 2i ∑ x 1 i x 2 i
2
∑ x21 i ∑ x 22i−¿ ¿¿ ¿ ¿
Now if X 1 i=α X 2 i
then X 1=α X 2
∴ x 1i=α x 2 i
α ∑ x 2 i y i ∑ x 2 i−α ∑ x2 i y i ∑ x 2 i
2 2
So , β^1= 2
α ¿¿¿
¿ , Y i=β 0 +γ X 2 i+ ui where γ =α β 1+ β 2
γ^ =
∑ y i x2 i
∑ x 22 i
^
¿ , ( α β 1 + β 2 )=¿
∑ yi x2i ¿
∑ x 22 i
Let us now consider the variance of ^
β 1.
2
σ
Var ( ^
β 1 )=
∑ x 1 i (1−r 212 )
2
23
√
2
σ
∴ SE ( ^
β 1) = =∞ if r 12=1
∑ x1 i (1−r 212)
2
√
2
σ
Similarly, SE ( ^
β 2 )= =∞ if r 12=1
∑ x 2i (1−r 212)
2
2
−σ r 12
Note that Cov ( ^
β1 ^
β 2) = =∞ if r 12=1
√ ∑ x1 i ∑ x 2i (1−r 12)
2 2 2
To explain the point consider the model with all standard assumptions
Y i=β 0 + β 1 X 1 i+ β 2 X 2 i +ui
∴ Y =β 0 + β 1 X 1+ β2 X 2 +u
⇒ y i= β1 x 1i + β 2 x 2 i +ui−u
^
β 1=β 1+
∑ x 1 i u i ∑ x 2 i−∑ x 1 i x 2 i ∑ x 2 i u i
2
∑ x 21 i ∑ x 22 i−¿ ¿ ¿ ¿ ¿
Putting x 1 i=α x 2 i+ v i we get
^
β 1=β 1+ ∑ (α x2 i + v i)ui ∑ x 2 i−¿ ¿ ¿ ¿
2
¿ β 1+ ¿ ¿
α ∑ x 2i ui ∑ x2 i + ∑ ui v i ∑ x 2i −α ∑ x 2 i ∑ x 2 i u i−∑ x 2 i v i ∑ x 2 i ui
2 2 2
¿ β 1+ 2 2 2
[Since Cov ( v i x 2 i )=0]
n σ v σ 2 +0
1
E ( β^1 ) =β 1+ 2 2∑
E(ui v i ) ∑ x 2 i−∑ x 2 i E (ui v i )
2 2
2
n σvσ2
∴ E (^
β 1 )= β1 [Since E ( ui vi ) =0]
2
σ
Now Var ( ^
β 1 )=
∑ x 1 i (1−r 212 )
2
2
σ
Var ( ^
β 2 )=
∑ x 2 i (1−r 212 )
2
Therefore, the presence of near exact multicollinearity does not hamper the BLUE properties
of the OLS estimator. Actually, near exact multicollinearity does not violate the assumptions
of classical linear regression model that is why we find the OLS estimators are BLUE. We
can easily prove that ∑ e 2i is still unbiased estimator of σ 2. Thus standard error of the model
n−3
remains unbiased in presence of imperfect multicollinearity.
25
Therefore, those are the problems with imperfect multicollinearity. If the strength of
multicollinearity is mild we should avoid it, but when the strength of multicollinearity is high
we face the same practical problem regarding inferential analysis.
Due to large SE one or more coefficients would not be significantly different from zero in the
presence of higher degree of multicollinearity. In case of imperfect multicollinearity we find
one or two regression coefficient statistically significant. Square of multiple correlation
coefficient or coefficient of determination may be very high. It usually happen when
investigator includes extra regressors which may creates multicollinearity but increase R2. So
2
R may be high with same coefficient insignificant due to multicollinearity.
The OLS estimators and their SE can be sensitive to small changes in data. As
multicollinearity is a data problem, addition of observation mat reduce the value of SE, and
may ensure that coefficient significant which was initially insignificant.