KEMBAR78
Econometrics Lecture Note Chapter 3 | PDF | Linear Regression | Ordinary Least Squares
0% found this document useful (0 votes)
38 views39 pages

Econometrics Lecture Note Chapter 3

Uploaded by

yade ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views39 pages

Econometrics Lecture Note Chapter 3

Uploaded by

yade ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CHAPTER TWO

PART 2
MULTIPLE REGRESSION ANALYSIS
Assumptions of the Multiple Linear Regression
Each econometric method that would be used for estimation purpose has its own assumptions.
Knowing the assumptions and their consequence if they are not maintained is very important
for the econometrician. In the previous section, there are certain assumptions underlying the
multiple regression model, under the method of ordinary least squares (OLS). Let us see them
one by one.

Assumption 1: Randomness of ui - the variable u is a real random variable.

Assumption 2: Zero mean of u i - the random variable u i has a zero mean for each value of
X i i.e. E(ui )  0
Assumption 3: Homoscedasticity of the random term - the random term u i has constant
variance. In other words, the variance of each u i is the same for all the X i values.
E(ui2 )   u2 Cons tan t
Assumption 4: Normality of u i - the values of each u i are normally distributed ui  N (0,  u2 )
Assumption 5: No autocorrelation or serial independence of the random terms - the
successive values of the random term are not strongly correlated. The values of
u i (corresponding to xi ) are independent of the values of any other u j (corresponding to X j ).
E(ui u j )  0 for i  j
Assumption 6: Independence of u i and X i - every disturbance term u i is independent of the
explanatory variables. E (u i X 1i )  E (u i X 2i )  0
Assumption 7: No errors of measurement in the X ' s - the explanatory variables are measured
without error.
Assumption 8: No perfect multicollinearity among the X ' s - the explanatory variables are not
perfectly linearly correlated.
Assumption 9: Correct specification of the model - the model has no specification error in that
all the important explanatory variables appear explicitly in the function and the mathematical
form is correctly defined (linear or non-linear form and the number of equations in the
model).

Oda Bultum University, Econometrics 1


Estimation of Partial Regression Coefficients
The process of estimating the parameters in the multiple regression model is similar with that
of the simple linear regression model. The main task is to derive the normal equations using
the same procedure as the case of simple regression. Like in the simple linear regression
model case, OLS and Maximum Likelihood (ML) methods can be used to estimate partial
regression coefficients of multiple regression models. But, due to their simplicity and
popularity, OLS methods can be used. The OLS procedure consists in so choosing the values
of the unknown parameters that the residual sum of squares is as small as possible.
Under the assumption of zero mean of the random term, the sample regression function will
look like the following.
^ ^ ^ ^
Yi   0  1 X 1   2 X 2 3.3
We call this equation, the fitted equation. Subtracting (3.3) from (3.2), we obtain:
^
ei  Yi  Y i 3.4
The method of ordinary least squares (OLS) or classical least square (CLS) involves obtaining
^ ^ ^
the values  0 ,  1 and  2 in such a way that  ei2 is minimum.
^ ^ ^
The values of  0 ,  1 and  2 for which e 2
i is minimum is obtained by differentiation this
sum of squares with respect to these coefficients and equate them to zero. That is,
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
  ei2
 i 1
0 3.5
ˆ0
^
 0
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2

 i 1
0
i
3.6
ˆ1
^
 1
n
  (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
  ei2
 i 1
0 3.7
ˆ 2
^
 2
Solving equations (3.5), (3.6) and (3.7) simultaneously, we obtain the system of normal
equations given as follows:
^ ^ ^
Y  n     X    X
i 0 1 1i 2 2i 3.8
^ ^ ^
X Y   X  X  X X
1i i 0 1i 1 1
2
2 1i 2i 3.9
^ ^ ^
X Y   X  X X b X
2i i 0 2i 2 1i 2i 2
2
2i 3.10

Then, letting

Oda Bultum University, Econometrics 2



x1i  X 1i  X 1 3.11

x 2i  X 2i  X 2 3.12

yi  Yi  Y 3.13
The above three equations (3.8), (3.9) and (3.10) can be solved using Matrix operations or
simultaneously to obtain the following estimates:
^  x y  x    x y  x x 
2

1 
1 1 2 2 1 2
3.14
 x  x    x x 
2
1
2
2 1 2
2

^  x y  x    x y  x x 
2

2 
2 1 1 1 2
3.15
 x  x    x x 
2
1
2
2 1 2
2

^   
 0  Y  ˆ1 X 1  ˆ 2 X 2 3.16

Variance and Standard errors of OLS Estimators


Estimating the numerical values of the parameters is not enough in econometrics if the data
are coming from the samples. The standard errors derived are important for two main
purposes: to establish confidence intervals for the parameters and to test statistical hypotheses.
They are important to look into their precision or statistical reliability. An estimator cannot be
used for any purpose if it is not a good estimator. The precision of an estimator is measured
by observing the standard error of the estimator.

Like in the case of simple linear regression, the standard errors of the coefficients are vital in
statistical inferences about the coefficients. We use standard the error of a coefficient to
construct confidence interval estimate for the population regression coefficient and to test the
significance of the variable to which the coefficient is attached in determining the dependent
variable in the model. In this section, we will see these standard errors. The standard error of a
coefficient is the positive square root of the variance of the coefficient. Thus, we start with
defining the variances of the coefficients.

Variance of the intercept   0 


^

 

 2
 2   
 X 1  x 2  X 2  x12  2 X 1 X 2  x1 x 2 
2

 ^  ^
2
Var   0    ui 1   3.17
n 
 
  x1  x2  ( x1 x2 )
2 2 2

 

Oda Bultum University, Econometrics 3


^
Variance of 1
 ^  
Var  1    u2 
 x22 
3.18
2 
    x1  x 2  ( x1 x 2 ) 
2 2

^
Variance of  2
^ ^ 2
Var (  2 )   u 
 x12  3.19
2 
  x1  x 2  ( x1 x 2 ) 
2 2

Where,
^
 u2 
e 2
i
3.20
n3
Equation 3.20 here gives the estimate of the variance of the random term. Then, the standard
errors are computed as follows:
^
Standard error of  0
^ ^
SE (  0 )  Var (  0 ) 3.21
^
Standard error of 1
^ ^
SE( 1 )  Var( 1 ) 3.22
^
Standard error of  2
^ ^
SE(  2 )  Var(  2 ) 3.23

Note: The OLS estimators of the multiple regression model have properties which are parallel
to those of the two-variable model.

Coefficient of Multiple Determination


In simple regression model we have discussed about the coefficient of determination and its
interpretation. In this section, we will discuss the coefficient of multiple determination which
has an equivalent role with that of the simple model. As coefficient of determination is the
square of the simple correlation in simple model, coefficient of multiple determination is the
square of multiple correlation coefficient.

The coefficient of multiple determination ( R 2 ) is the measure of the proportion of the


variation in the dependent variable that is explained jointly by the independent variables in the
model. One minus R 2 is called the coefficient of non-determination. It gives the proportion of
the variation in the dependent variable that remains unexplained by the independent variables
in the model. As in the case of simple linear regression, R 2 is the ratio of the explained
variation to the total variation. Mathematically:

Oda Bultum University, Econometrics 4


^

R2 
 y2 3.24
y 2

^ ^
Or R can also be given in terms of the slope coefficients 1 and  2 as :
2

^ ^
 1  x1 y   2  x 2 y
R 
2
3.25
y 2

In simple linear regression, the higher the R 2 means the better the model is determined by the
explanatory variable in the model. In multiple linear regression, however, every time we
insert additional explanatory variable in the model, the R 2 increases irrespective of the
improvement in the goodness-of- fit of the model. That means high R 2 may not imply that the
model is good.
Thus, we adjust the R 2 as follows:

(n  1)
2
Rady  1  (1  R 2 ) 3.26
(n  k )
Where, k = the number of explanatory variables in the model.
In multiple linear regression, therefore, we better interpret the adjusted R 2 than the ordinary
or the unadjusted R 2 . We have known that the value of R 2 is always between zero and one.
But the adjusted R 2 can lie outside this range even to be negative.

In the case of simple linear regression, R 2 is the square of linear correlation coefficient.
Again as the correlation coefficient lies between -1 and +1, the coefficient of determination
( R 2 ) lies between 0 and 1. The R 2 of multiple linear regression also lies between 0 and +1.
The adjusted R 2 , however, can sometimes be negative when the goodness of fit is poor. When
the adjusted R 2 value is negative, we considered it as zero and interpret as no variation of the
dependent variable is explained by regressors.

Confidence Interval Estimation


Confidence interval estimation in multiple linear regression follows the same formulae and
procedures that we followed in simple linear regression. You are, therefore, required to
practice finding the confidence interval estimates of the intercept and the slopes in multiple
regression with two explanatory variables.

Please recall that 100(1-  ) % confidence interval for  i is given as ˆi  t / 2,n  k se( ˆi ) where
k is the number of parameters to be estimated or the number of variables (both dependent and
explanatory)
Interpretation of the confidence interval: Values of the parameter lying in the interval are
plausible with 100(1-  ) % confidence.

Oda Bultum University, Econometrics 5


Hypothesis Testing in Multiple Regression
Hypothesis testing is important to draw inferences about the estimates and to know how
representative the estimates are to the true population parameter. Once we go beyond the
simple world of the two-variable linear regression model, hypothesis testing assumes several
interesting forms such as the following.
a) Testing hypothesis about an individual partial regression coefficient;
b) Testing the overall significance of the estimated multiple regression model (finding
out if all the partial slope coefficients are simultaneously equal to zero);
c) Testing if two or more coefficients are equal to one another;
d) Testing that the partial regression coefficients satisfy certain restrictions
e) Testing the stability of the estimated regression model over time or in different cross-
sectional units
f) Testing the functional form of regression models.
These and other types of hypotheses tests can be referred from different Econometrics books.
For the case in point, we will confine ourselves to the major ones.
Testing individual regression coefficients
The tests concerning the individual coefficients can be done using the standard error test or
the t-test. In all the cases the hypothesis is stated as:

H 0 : ˆ1  0 H 0 : ˆ 2  0 H 0 : ˆ K  0
a) b)
H : ˆ  0
1 1 H 1 : ˆ 2  0 H 1 : ˆ K  0

In a) we will like to test the hypothesis that X1 has no linear influence on Y holding other
variables constant. In b) we test the hypothesis that X2 has no linear relationship with Y
holding other factors constant. The above hypotheses will lead us to a two-tailed test however,
one-tailed test might also be important. There are two methods for testing significance of
individual regression coefficients.
a) Standard Error Test: Using the standard error test we can test the above hypothesis.
Thus the decision rule is based on the relationship between the numerical value of the
parameter and the standard error of the same.
1 ˆ
(i) If S ( ˆ i )   i , we accept the null hypothesis, i.e. the estimate of  i is not statistically
2
significant.
Conclusion: The coefficient ( ˆ i ) is not statistically significant. In other words, it does not
have a significant influence on the dependent variable.
(ii) If S ( ˆ i )  1 ˆ i , we fail to accept H0, i.e., we reject the null hypothesis in favour of the
2
alternative hypothesis meaning the estimate of i has a significant influence on the dependent
variable.

Oda Bultum University, Econometrics 6


Generalisation: The smaller the standard error, the stronger is the evidence that the estimates
are statistically significant.
(b) t-test
The more appropriate and formal way to test the above hypothesis is to use the t-test. As usual
we compute the t-ratios and compare them with the tabulated t-values and make our decision.
ˆ i
Therefore: t cal   t ( n  1)
S ( ˆ i )
Decision Rule: accept H0 if  t   t cal  t 
2 2

Otherwise, reject the null hypothesis. Rejecting H 0 means, the coefficient being tested is
significantly different from 0. Not rejecting H 0 , on the other hand, means we don’ t have
sufficient evidence to conclude that the coefficient is different from 0.
Testing the Overall Significance of Regression Model
Here, we are interested to test the overall significance of the observed or estimated regression
line, that is, whether the dependent variable is linearly related to all of the explanatory
variables. Hypotheses of such type are often called joint hypotheses. Testing the overall
significance of the model means testing the null hypothesis that none of the explanatory
variables in the model significantly determine the changes in the dependent variable. Put in
other words, it means testing the null hypothesis that none of the explanatory variables
significantly explain the dependent variable in the model. This can be stated as:
H 0 : 1   2  0
H 1 :  i  0, at least for one i.
The test statistic for this test is given by:

 yˆ 2

Fcal  k  1
 e2
nk
Where, k is the number of explanatory variables in the model.
The results of the overall significance test of a model are summarized in the analysis of
variance (ANOVA) table as follows.
Source of Sum of squares Degrees of Mean sum of Fcal
variation freedom squares
Regression ^2 k 1 ^
MSE
SSE   y MSE 
 y 2
F 
k 1 MSR
SSR   e 2 nk e 2
Residual
MSR 
nk
Total SST   y 2
n 1

Oda Bultum University, Econometrics 7


The values in this table are explained as follows:
^
SSE   y 2   (Yˆi  Y ) 2  Explained Sum of Squares
SSR   y i2   (Yi  Yˆ ) 2  Unexplaine d Sum of Squares

SST   y 2   (Yi  Y ) 2  Total Sum of Squares


These three sums of squares are related in such a way that
SST  SSE  SSR
This implies that the total sum of squares is the sum of the explained (regression) sum of
squares and the residual (unexplained) sum of squares. In other words, the total variation in
the dependent variable is the sum of the variation in the dependent variable due to the
variation in the independent variables included in the model and the variation that remained
unexplained by the explanatory variables in the model. Analysis of variance (ANOVA) is the
technique of decomposing the total sum of squares into its components. As we can see here,
the technique decomposes the total variation in the dependent variable into the explained and
the unexplained variations. The degrees of freedom of the total variation are also the sum of
the degrees of freedom of the two components. By dividing the sum of squares by the
corresponding degrees of freedom, we obtain what is called the Mean Sum of Squares
(MSS).
The Mean Sum of Squares due to regression, errors (residual) and Total are calculated as the
Sum of squares and the corresponding degrees of freedom (look at column 3 of the above
ANOVA table.
The final table shows computation of the test statistic which can be computed as follows:
MSR
Fcal   F (k  1, n  k ) [The F statistic follows F distribution]
MSE
The test rule: Reject H 0 if Fcal F (k  1, n  k ) where F (k  1, n  k ) is the value to be read
from the F- distribution table at a given  level.
Relationship between F and R2

You may recall that R 2 is given by R 2   yˆ 2


and  yˆ 2
 R2  y2
y 2

We also know that

R2  1
e 2
Hence, e 2

 1  R 2 which means e 2
 (1  R 2 ) y 2
y 2
y 2

The formula for F is:

Oda Bultum University, Econometrics 8


^

Fcal 
y 2

k 1
 e2
nk

R2  y2 R2  y2 (n  k )
Fcal   .
k 1 k 1 (1  R 2 ) y 2
(1  R 2 ) y 2
nk
(n  k ) R2
Fcal  .
k  1 (1  R 2 )
That means the calculated F can also be expressed in terms of the coefficient of
determination.
Testing the Equality of two Regression Coefficients
Given the multiple regression equation:
Yi   0   1 X 1i   2 X 2i   3 X 3i  ...   K X Ki  U i
We would like to test the hypothesis:
H 0 :  1   2 or 1   2  0 vs. H 1 : H 0 is not true
The null hypothesis says that the two slope coefficients are equal.
Example: If Y is quantity demanded of a commodity, X1 is the price of the commodity and
X2 is income of the consumer. The hypothesis suggests that the price and income elasticity of
demand are the same.
We can test the null hypothesis using the classical assumption that
ˆ 2  ˆ1
t ~ t distribution with N - K degrees of freedom.
SE( ˆ 2  ˆ1 )
Where K = the total number of parameters estimated.
The SE( ˆ 2  ˆ1 ) is given as SE( ˆ 2  ˆ1 )  Var ( ˆ 2 )  Var ( ˆ1 )  2 cov(  2 ,  1 )
Thus the t-statistic is:
ˆ 2  ˆ1
t
Var ( ˆ 2 )  Var ( ˆ1 )  2 cov(  2 ,  1 )

Decision: Reject H0 if tcal. > ttab.

Note: Using similar procedures one can also test linear equality restrictions, for example
1   2  1 and other restrictions.

Oda Bultum University, Econometrics 9


Illustration: The following table shows a particular country’ s the value of imports (Y), the
level of Gross National Product(X1) measured in arbitrary units, and the price index of
imported goods (X2), over 12 years period.
Table 1: Data for multiple regression examples
Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
Y 57 43 73 37 64 48 56 50 39 43 69 60
X1 220 215 250 241 305 258 354 321 370 375 385 385
X2 125 147 118 160 128 149 145 150 140 115 155 152

a) Estimate the coefficients of the economic relationship and fit the model.
To estimate the coefficients of the economic relationship, we compute the entries given in
Table 2

Oda Bultum University, Econometrics 10


Table 2: Computations of the summary statistics for coefficients for data of Table 1
Year Y X1 X2 x1 x2 Y X12 x2 2 x1y x2y x1x2 y2
1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625
1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625
1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625
1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625
1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625
1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625
1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625
1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625
1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625
1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625
1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625
1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
Sum 639 3679 1684 0.0004 0.0004 0 49206.92 2500.667 1043.25 -509 960.6667 1536.25
Mean 53.25 306.5833 140.3333 0 0 0

Oda Bultum University, Econometrics 11


From Table 2, we can take the following summary results.

 Y  639 X 1  3679 X 2  1684 n  12

Y 639
Y    53.25
n 12

X 1
3679
X1    306.5833
n 12

X 2
1684
X2    140.3333
n 12
The summary results in deviation forms are then given by:

x  49206.92 x  2500.667
2 2
1 2

 x y  1043.25
1 x 2 y  509

x x 1 2  960.6667 y 2
 1536.25

The coefficients are then obtained as follows.


^  x y  x   x y  x x  (1043.25)(2500.667)- (-509)(960.6667) 2608821 488979.4
2

1   
1 1 2 2 1 2

 x  x   x x 
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121 922880.51 2

3097800.2
  0.025365
122127241

^  x y  x   x y  x x  (-509)(49206.92)- (1043.25)(960.6667) - 2504632-1002216


2

2   
2 1 1 1 2

 x  x   x x 
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121 922880.51 2

- 26048538
  0.21329
122127241

ˆ0  Y  ˆ1 X 1  ˆ2 X 2  53.25  (0.025365)  (0.21329)  75.40512


The fitted model is then written as: Yˆi = 75.40512 + 0.025365X1 - 0.21329X2
b) Compute the variance and standard errors of the slopes.
First, you need to compute the estimate of the variance of the random term as follows


^
2

e 2
i

1401.223 1401.223
  155.69143
n3 12  3
u
9
^
Variance of 1

Oda Bultum University, Econometrics 12


 ^  
Var  1    u 
 x22 
 155.69143(
2500.667
)  0.003188
2
2

    x1  x2  ( x1 x2 ) 
2 2
12212724
^
Standard error of 1
^ ^
SE( 1 )  Var ( 1 )  0.003188  0.056462
^
Variance of  2
^
Var (  2 )   u 

^ 2 x12 
 155.69143(
49206.92
)  0.0627
2 
  x1  x2  ( x1 x 2 ) 
2 2
122127241
^
Standard error of  2
^ ^
SE(  2 )  Var (  2 )  0.0627  0.25046
Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for you
as an exercise.
c) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R2.

 yˆ 2
 135.0262

e 2
 1401.223

y 2
 1536.25 (The sum of the above two). Then,

^ ^
 1  x1 y   2  x 2 y (0.025365)(1043.25) (-0.21329)(-509)
R 
2
  0.087894
y 2
1356.25

e 2

1401.223
or R 2  1   1  0.087894
1356.25
y 2

d) Compute the adjusted R2.


(n  1 12 - 1
2
Rady  1  (1  R 2 )  1 - (1 - 0.087894)  0.114796
n  k) 12 - 3
e) Construct 95% confidence interval for the true population parameters (partial regression
coefficients).[Exercise: Base your work on Simple Linear Regression]
f) Test the significance of X1 and X2 in determining the changes in Y using t-test.
The hypotheses are summarized in the following table.

Oda Bultum University, Econometrics 13


Coefficient Hypothesis Estimate Std. error Calculated t Conclusion

1 H0: 1=0 0.025365 0.056462 0.025365 We do not


t cal   0.449249
H1: 10 0.056462 reject H0 since
tcal<ttab
2 H0: 2=0 -0.21329 0.25046  0.21329 We do not
t cal   0.85159
H1: 20  0.21329 reject H0 since
tcal<ttab

The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test
revealed that both X1 and X2 are insignificant to determine the change in Y since the
calculated t values are both less than the critical value.

Exercise: Test the significance of X1 and X2 in determining the changes in Y using the
standard error test.
g) Test the overall significance of the model. (Hint: use  = 0.05)
This involves testing whether at least one of the two variables X1 and X2 determine the
changes in Y. The hypothesis to be tested is given by:
H 0 : 1   2  0
H 1 :  i  0, at least for one i.
The ANOVA table for the test is give as follows:
Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
k  1 =3-1=2 MSR    135.0262 
SSR   y  135.0262 y 2
F
MSE
k 1 2  0.433634
67.51309
Residual SSE   e  1401.223
2
n  k =12-  e 2 1401.223
MSE   
3=9 nk 9
155.614
Total SST   y  1536.25 n  1
2 =12-
1=11

The tabulated F value (critical value) is F(2, 11) = 3.98

In this case, the calculated F value (0.4336) is less than the tabulated value (3.98). Hence, we
do not reject the null hypothesis and conclude that there is no significant contribution of the
variables X1 and X2 to the changes in Y.
h) Compute the F value using the R2.

Oda Bultum University, Econometrics 14


(n  k ) R2 (12 - 3) 0.087894
Fcal  .   0.433632
k  1 (1  R )
2
3 - 1 1  0.087894

Dummy Variable Regression Models


There are four basic types of variables we generally encounter in empirical analysis. These
are: nominal, ordinal, interval and ratio scale variables. In preceding sections, we have
encountered ratio scale variables. However, regression models do not deal only with ratio
scale variables; they can also involve nominal and ordinal scale variables. In regression
analysis, the dependent variable can be influenced by nominal variables such as sex, race,
color, geographical region etc. models where all regressors are nominal (categorical) variables
are called ANOVA (Analysis of Variance) models. If there is mixture of nominal and ratio
scale variables, the models are called ANCOVA (Analysis of Covariance) models. Look at
the following example.
Illustration: The following model represents the relationship between geographical location
and teachers’ average salary in public schools. The data were taken from 50 states for a
single year. The 50 states were classified into three regions: Northeast, South and West. The
regression models looks like the following.
Yi   0  1 D1i   2 D2i  u i
Where Yi = the (average) salary of public school teachers in state i
D1i = 1 if the state is in the Northeast
= 0 otherwise (i.e. in other regions of the country)
D2i = 1 if the state is in the South
= 0 otherwise (i.e. in other regions of the country)
Note that the above regression model is like any multiple regression model considered
previously, except that instead of quantitative regressors, we have only qualitative (dummy)
regressors. Dummy regressors take value of 1 if the observation belongs to that particular
category and 0 otherwise.
Note also that there are 3 states (categories) for which we have created only two dummy
variables (D1 and D2). One of the rules in dummy variable regression is that if there are m
categories, we need only m-1 dummy variables. If we are suppressing the intercept, we can
have m dummies but the interpretation will be a bit different.
The intercept value represents the mean value of the dependent variable for the bench mark
category. This is the category for which we do not assign a dummy (in our case, West is a
bench mark category). The coefficients of the dummy variable are called differential
intercept coefficients because they tell us by how much the value of the intercept that receives
the value of 1 differs from the intercept coefficient of the benchmark category.
Yˆ  26,158.62  1734.47D  3264.62 D
i 1i 2 2i i

Se  (1128.52) (1435.95) (1499.62)


t  (23.18) (1.21) (2.18)

Oda Bultum University, Econometrics 15


p  value (0.000) (0.233) (0.0349) R 2  0.0901

From the above fitted model, we can see that mean salary of public school teachers in the
West is about $26,158.62. The mean salary of teachers in the Northeast is lower by $1734.47
than those of the West and those in the South is lower by $3264.42. Doing this, we will find
the average salaries in the latter two regions are about $24,424 and $22,894.
In order to know the statistical significance of the mean salary differences, we can run the
tests we have discussed in previous sections. The other results can also be interpreted the way
we discussed previously.

Extensions of Regression Models


As pointed out earlier non linearity may be expected in many Economic Relationships. In
other words the relationship between Y and X can be non-linear rather than linear. Thus, once
the independent variables have been identified the next step is to choose the functional form
of the relationship between the dependent and the independent variables. Specification of the
functional form is important, because a correct explanatory variable may well appear to be
insignificant or to have an unexpected sign if an inappropriate functional form is used. Thus
the choice of a functional form for an equation is a vital part of the specification of that
equation. The choice of a functional form almost always should be based on an examination
of the underlying economic theory. The logical form of the relationship between the
dependent variable and the independent variable in question should be compared with the
properties of various functional forms, and the one that comes closest to that underlying
theory should be chosen for the equation.
Some Commonly Used Functional Forms
a) The Linear Form: It is based on the assumption that the slope of the relationship between
the independent variable and the dependent variable is constant.
Y
 i i=1,2,...K
X

Y
X => Y
 K
X

Oda Bultum University, Econometrics 16


In this case elasticity is not constant.
Y / Y Y X X
NY ,X     i
X / X X Y Y
If the hypothesized relationship between Y and X is such that the slope of the relationship can
be expected to be constant and the elasticity can therefore be expected to be variable, then the
linear functional form should be used.
Note: Economic theory frequently predicts only the sign of a relationship and not its
functional form. Under such circumstances, the linear form can be used until strong evidence
that it is inappropriate is found. Thus, unless theory, common sense, or experience justifies
using some other functional form, one should use the linear model.
b) Log-linear, double Log or constant elasticity model
The most common functional form that is non-linear in the variable (but still linear in the
coefficients) is the log-linear form. A log-linear form is often used, because the elasticities
and not the slopes are constant i.e.,  =  Constant.

Output

Input

Thus, given the assumption of a constant elasticity, the proper form is the exponential (log-
linear) form.

Given: Yi   0 X i i eU i
The log-linear functional form for the above equation can be obtained by a logarithmic
transformation of the equation.

ln Yi  ln  0   i ln X i  U i
The model can be estimated by OLS if the basic assumptions are fulfilled.

Oda Bultum University, Econometrics 17


demand gd(log f)

ln Yi  ln  0  1 ln X i
 1
Yi   0 X i

price log f price

The model is also called a constant elasticity model because the coefficient of elasticity
between Y and X (1) remains constant.
Y X d ln Y
   1
X Y d ln X
This functional form is used in the estimation of demand and production functions.
Note: We should make sure that there are no negative or zero observations in the data set
before we decide to use the log-linear model. Thus log-linear models should be run only if all
the variables take on positive values.
c) Semi-log Form
The semi-log functional form is a variant of the log-linear equation in which some but not all
of the variables (dependent and independent) are expressed in terms of their logs. Such
models expressed as:
( i ) Yi   0   1 ln X 1i  U i ( lin-log model ) and ( ii ) ln Yi   0   1 X 1i  U i ( log-lin
model ) are called semi-log models. The semi-log functional form, in the case of taking the
log of one of the independent variables, can be used to depict a situation in which the impact
of X on Y is expected to ‘ tail off’ as X gets bigger as long as 1 is greater than zero.

Oda Bultum University, Econometrics 18


y

1<0
Y=0+1Xi

1>0

Example: The Engel’ s curve tends to flatten out, because as incomes get higher, a smaller
percentage of income goes to consumption and a greater percentage goes to saving.
 Consumption thus increases at a decreasing rate.
 Growth models are examples of semi-log forms
d) Polynomial Form
Polynomial functional forms express Y as a function of independent variables some of which
are raised to powers others than one. For example in a second degree polynomial (quadratic)
equation, at least one independent variable is squared.
Y   0  1 X 1i   2 X 1i   3 X 2i  U i
2

Such models produce slopes that change as the independent variables change. Thus the slopes
of Y with respect to the Xs are
Y Y
  1  2  2 X 1 , and  3
X 1 X 2
In most cost functions, the slope of the cost curve changes as output changes.

Oda Bultum University, Econometrics 19


Y Y

A) B)

X
Xi Impact of age on earnings
a typical cost curve

Simple transformation of the polynomial could enable us to use the OLS method to estimate
the parameters of the model
X1  X 3
2
Setting
 Y   0   1 X 1i   2 X 3   3 X 2i  U i
e) Reciprocal Transformation (Inverse Functional Forms)
The inverse functional form expresses Y as a function of the reciprocal (or inverse) of one or
more of the independent variables (in this case X1):
1
Yi   0  1 ( )   2 X 2i  U i
X 1i
Or
1
Yi   0   1 ( )   2 X 2i  U i
X 1i

The reciprocal form should be used when the impact of a particular independent variable is
expected to approach zero as that independent variable increases and eventually approaches
infinity. Thus as X1 gets larger, its impact on Y decreases.

1 0  0
Y  0 
X 1i 1  0

0

1 0  0
Y  0 
X 1i 1  0

Oda Bultum University, Econometrics 20


An asymptote or limit value is set that the dependent variable will take if the value of the X-
variable increases indefinitely i.e. 0 provides the value in the above case. The function
approaches the asymptote from the top or bottom depending on the sign of 1.
Example: Phillips curve, a non-linear relationship between the rate of unemployment and the
percentage wage change.
1
Wt   0  1 ( ) Ut
Ut
Summary
Multiple regression model is an extension of the two variable regression model with new
concepts involved and more practicality. Such models can be used for the purpose of mean
and individual prediction. This is the simplest possible multiple linear regression model is the
three variable regression model. R2 and adjusted R2 are overall measures of how well the
chosen model fits the data. Hypothesis testing in multiple linear regression models include
testing the individual statistical significance of partial regression coefficients, testing overall
significance of model and others. Dummy variables classify set of samples into subgroups
based on qualities or attributes. It is common to see regression models with dummy variables
as regressors. The coefficients should be interpreted very carefully.

Statistical Software Application Applications and Practices

Oda Bultum University, Econometrics 21


CHAPTER THREE

ECONOMETRIC PROBLEMS
Pre-test Questions
1. What are the major CLRM assumptions?
2. What happens to the properties of the OLS estimators if one or more of these
assumptions are violated, i.e. not fulfilled?
3. How we can check if an assumption is violated or not?
Assumptions Revisited
In many practical cases, two major problems arise in applying the classical linear regression
model.
1) those due to assumptions about the specification of the model and about the
disturbances and
2) those due to assumptions about the data
The following assumptions fall in either of the categories.
 The regression model is linear in parameters.
 The values of the explanatory variables are fixed in repeated sampling (non-
stochastic).
 The mean of the disturbance (ui) is zero for any given value of X i.e. E(ui) = 0
 The variance of ui is constant i.e. homoscedastic
 There is no autocorrelation in the disturbance terms
 The explanatory variables are independently distributed with the ui.
 The number of observations must be greater than the number of explanatory
variables.
 There must be sufficient variability in the values taken by the explanatory
variables.
 There is no linear relationship (multicollinearity) among the explanatory variables.
 The stochastic (disturbance) term ui are normally distributed i.e., ui ~ N(0, ²)
 The regression model is correctly specified i.e., no specification error.

With these assumptions we can show that OLS are BLUE, and normally distributed. Hence it
was possible to test Hypothesis about the parameters. However, if any of such assumption is
relaxed, the OLS might not work. We shall not examine in detail the violation of some of the
assumptions.
Violations of Assumptions
The Zero Mean Assumption i.e. E(ui)=0

Oda Bultum University, Econometrics 22


If this assumption is violated, we obtain a biased estimate of the intercept term. But, since the
intercept term is not very important we can leave it. The slope coefficients remain unaffected
even if the assumption is violated. The intercept term does not also have physical
interpretation.

The Normality Assumption


This assumption is not very essential if the objective is estimation only. The OLS estimators
are BLUE regardless of whether the ui are normally distributed or not. In addition, because of
the central limit theorem, we can argue that the test procedures – the t-tests and F-tests - are
still valid asymptotically, i.e. in large sample.

Heteroscedasticity: The Error Variance is not Constant


The error terms in the regression equation have a common variance i.e., are Homoscedastic. If
they do not have common variance we say they are Hetroscedastic. The basic questions to be
addressed are:
 What is the nature of the problem?
 What are the consequences of the problem?
 How do we detect (diagnose) the problem?
 What remedies are available for the problem?

The Nature of the Problem


In the case of homoscedastic disturbance terms the spread around the mean is constant, i.e. =
². But in the case of heteroscedasticity disturbance terms the variance changes with the
explanatory variable. The problem of heteroscedasticity is likely to be more common in cross-
sectional than in time-series data.
Causes of Heteroscedasticity
There are several reasons why the variance of the error term may be variable, some of which
are as follows.
 Following the error-learning models, as people learn, their errors of behaviour
become smaller over time where the standard error of the regression model
decreases.
 As income grows people have discretionary income and hence more scope for
choice about the disposition of their income. Hence, the variance (standard error)
of the regression is more likely to increase with income.
 Improvement in data collection techniques will reduce errors (variance).
 Existence of outliers might also cause heteroscedasticity.
 Misspecification of a model can also be a cause for heteroscedasticity.
 Skewness in the distribution of one or more explanatory variables included in the
model is another source of heteroscedasticity.
 Incorrect data transformation and incorrect functional form are also other sources

Oda Bultum University, Econometrics 23


Note: Heteroscedasticity is likely to be more common in cross-sectional data than in time
series data. In cross-sectional data, individuals usually deal with samples (such as consumers,
producers, etc) taken from a population at a given point in time. Such members might be of
different size. In time series data, however, the variables tend to be of similar orders of
magnitude since data is collected over a period of time.

Consequences of Heteroscedasticity
If the error terms of an equation are heteroscedastic, there are three major consequences.
a) The ordinary least square estimators are still linear since heteroscedasticity does not
cause bias in the coefficient estimates. The least square estimators are still unbiased.
b) Heteroscedasticity increases the variance of the partial regression coefficients but it
does affect the minimum variance property. Thus, the OLS estimators are inefficient.
Thus the test statistics – t-test and F-test – cannot be relied on in the face of
uncorrected heteroscedasticity.
Detection of Heteroscedasticity
There are no hard and fast rules (universally agreed upon methods) for detecting the presence
of heteroscedasticity. But some rules of thumb can be suggested. Most of these methods are
based on the examination of the OLS residuals, ei, since these are the once we observe and
not the disturbance term ui. There are informal and formal methods of detecting
heteroscedasticity.
a) Nature of the problem
In cross-sectional studies involving heterogeneous units, heteroscedasticity is the rule rather
than the exception.
Example: In small, medium and large sized agribusiness firms in a study of input expenditure
in relation to sales, the rate of interest, etc. heteroscedasticity is expected.

b) Graphical method
If there is no a priori or empirical information about the nature of heteroscedasticity, one
could do an examination of the estimated residual squared, ei² to see if they exhibit any
systematic pattern. The squared residuals can be plotted either against Y or against one of the
explanatory variables. If there appears any systematic pattern, heteroscedasticity might exist.
These two methods are informal methods.
c) Park Test
Park suggested a statistical test for heteroscedasticity based on the assumption that the
variance of the disturbance term (i²) is some function of the explanatory variable Xi.

Park suggested a functional form as:  i   2 X i e vi which can be transferred to a linear
2

function using ln transformation. Hence, Var(ei )   2 X ii e vi where vi is the stochastic


disturbance term.

Oda Bultum University, Econometrics 24


ln  i  ln  2   ln X i  vi
2

ln ei  ln  2   ln X i  vi since ² is not known.


2

The Park-test is a two-stage procedure: run OLS regression disregarding the


heteroscedasticity question and obtain the ei and then run the above equation. The regression
is run and if  turns out to be statistically significant, then it would suggest that
heteroscedasticity is present in the data.
d) Spearman’ s Rank Correlation test
  di 2 
Recall that: rS  1  6   d = difference between ranks

 N ( N 2
 1 ) 

Step 1: Regress Y on X and obtain the residuals, ei.
Step 2: Ignoring the significance of ei or taking |ei| rank both ei and Xi and compute the rank
correlation coefficient.
Step 3: Test for the significance of the correlation statistic by the t-test
rS N 2
t ~ t (n  2)
1  rs
2

A high rank correlation suggests the presence of heteroscedasticity. If more than one
explanatory variable, compute the rank correlation coefficient between ei and each
explanatory variable separately.

e) Goldfeld and Quandt Test


This is the most popular test and usually suitable for large samples. If it is assumed that the
variance (i²) is positively related to one of the explanatory variables in the regression model
and if the number of observations is at least twice as many as the parameters to be estimated,
the test can be used.
Given the model
Yi   0   1 X i  U i
Suppose i² is positively related to Xi as
i2   2 Xi2
Goldfeld and Quandt suggest the following steps:
1. Rank the observation according to the values of Xi in ascending order.
2. Omit the central c observations (usually the middle third of the recorded observations),
or where c is specified a priori, and divide the remaining (n-c) observations into two
(n  c)
groups, each with observations.ss
2
3. Fit separate regressions for the two sub-samples and obtain the respective residuals

Oda Bultum University, Econometrics 25


(n  c)
RSS, and RSS2 with  k df
2
4. Compute the ratio:
Rss 2 / df ( n  c  2k )
F  ~ FV1V2 v1  v2 
Rss1 / df 2
If the two variances tend to be the same, then F approaches unity. If the variances differ we
will have values for F different from one. The higher the F-ratio, the stronger the evidence of
heteroscedasticity.
Note: There are also other methods of testing the existence of heteroscedasticity in your data.
These are Glejser Test, Breusch-Pagan-Godfrey Test, White’ s General Test and Koenker-
Bassett Test the details for which you are supposed to refer.

Remedial Measures
OLS estimators are still unbiased even in the presence of heteroscedasticity. But they are not
efficient, not even asymptotically. This lack of efficiency makes the usual hypothesis testing
procedure a dubious exercise. Remedial measures are, therefore, necessary. Generally the
solution is based on some form of transformation.
a) The Weighted Least Squares (WLS)
Given a regression equation model of the form
Yi   0   1 X i  U i

The weighted least square method requires running the OLS regression to a transformed data.
The transformation is based on the assumption of the form of heteroscedasticity.
Assumption One: Given the model Yi   0   1 X 1i  U i

If var(U i )   i   X i , then E (U i )   X 1i
2 2 2 2 2

Where ² is a constant variance of a classical error term. So if as a matter of speculation or


other tests indicate that the variance is proportional to the square of the explanatory variable
X, we may transform the original model as follows:
Yi 0  X Ui
  1 1i 
X 1i X 1i X 1i X 1i

1
 0 ( )   1  Vi
X 1i
Now E (Vi 2 )  E ( U i )  1
2
E (U i )   2
2

X 1i X 1i

Y 1
Hence the variance of Ui is now homoscedastic and regress on .
X 1i X 1i
Assumption Two: Again given the model

Oda Bultum University, Econometrics 26


Yi   0   1 X 1i  U i
Suppose now Var (U i )  E (U i )   U i   X 1i
2 2 2

It is believed that the variance of Ui is proportional to X1i, instead of being proportional to


the squared X1i. The original model can be transformed as follows:
Yi 0 X 1i Ui
  1 
X 1i X 1i X 1i X 1i
1
 0   1 X 1i  Vi
X 1i

Where Vi  U i / X 1i and where X 1i  0


Thus, E (Vi 2 )  E ( U i ) 2  1 E (U i 2 )  1   2 X 1i   2
X 1i X 1i X 1i
Now since the variance of Vi is constant (homoscedastic) one can apply the OLS technique to
Y 1
regress on and X 1i .
X 1i X 1i
To go back to the original model, one can simply multiply the transformed model by X 1i .
Assumption Three: Given the model let us assume that

E (U i ) 2   2 E (Yi )
2

The variance is proportional to the square of the expected value of Y.


Now, E (Yi )   0   1 X 1i
We can transform the original model as
Yi 0 X 1i Ui
  1 
E (Yi ) E (Yi ) E (Yi ) E (Yi )

1 X 1i
 0  1  Vi
E (Yi ) E (Yi )
Ui
Again it can be verified that Vi  gives us a constant variance ²
E (Yi )
2
 Ui 
  2 E (Yi )   2
1 1
E (Vi )  E    E (U i ) 2 
2

 E (Yi )  E (Yi ) 2
E (Yi )2

The disturbance Vi is homoscedastic and the regression can be run.


Assumption Four: If instead of running the regression Yi   0   1 X 1i  U i one could run
ln Yi   0   1 ln X 1i  U i
Then it reduces heteroscedasticity.

Oda Bultum University, Econometrics 27


b) Other Remedies for Heteroscedasticity
Two other approaches could be adopted to remove the effect of heteroscedasticity.
 Include a previously omitted variable(s) if heteroscedasticity is suspected due to
omission of variables.
 Redefine the variables in such a way that avoids heteroscedasticity. For example,
instead of total income, we can use Income per capita.

Autocorrelation: Error Terms are correlated


Another assumption of the regression model was the non-existence of serial correlation
(autocorrelation) between the disturbance terms, Ui.
Cov(U i ,V j )  0 i j
Serial correlation implies that the error term from one time period depends in some systematic
way on error terms from other time periods. Autocorrelation is more a problem of time series
data than cross-sectional data. If by chance, such a correlation is observed in cross-sectional
units, it is called spatial autocorrelation. So, it is important to understand serial correlation
and its consequences of the OLS estimators.
Nature of Autocorrelation
The classical model assumes that the disturbance term relating to any observation is not
influenced by the disturbance term relating to any other disturbance term.

E (U iU j )  0 , i  j

But if there is any interdependence between the disturbance terms then we have
autocorrelation
E(U iU j )  0 , i  j
Causes of Autocorrelation
Serial correlation may occur because of a number of reasons.
 Inertia (built in momentum) – a salient feature of most economic variables time series
(such as GDP, GNP, price indices, production, employment etc) is inertia or
sluggishness. Such variables exhibit (business) cycles.
 Specification bias – exclusion of important variables or incorrect functional forms
 Lags – in a time series regression, value of a variable for a certain period depends on
the variable’ s previous period value.
 Manipulation of data – if the raw data is manipulated (extrapolated or interpolated),
autocorrelation might result.
Autocorrelation can be negative as well as positive. The most common kind of serial
correlation is the first order serial correlation. This is the case in which this period error
terms are functions of the previous time period error term.

Oda Bultum University, Econometrics 28


Et  PE t 1  U t
This is also called the first order autoregressive model.
-1 < P < 1
The disturbance term Ut satisfies all the basic assumptions of the classical linear model.
E (U t )  0
E (U t ,U t 1 )  0 t  t 1
U t ~ N (0,  2 )

Consequences of serial correlation


When the disturbance term exhibits serial correlation, the values as well as the standard errors
of the parameters are affected.
1) The estimates of the parameters remain unbiased even in the presence of
autocorrelation but the X’ s and the u’ s must be uncorrelated.
2) Serial correlation increases the variance of the OLS estimators. The minimum
variance property of the OLS parameter estimates is violated. That means the OLS are
no longer efficient.

3) Due to serial correlation the variance of the disturbance term, Ui may be


underestimated. This problem is particularly pronounced when there is positive
autocorrelation.
4) If the Uis are autocorrelated, then prediction based on the ordinary least squares
estimates will be inefficient. This is because of larger variance of the parameters.
Since the variances of the OLS estimators are not minimal as compared with other
estimators, the standard error of the forecast from the OLS will not have the least
value.
Detecting Autocorrelation
Some rough idea about the existence of autocorrelation may be gained by plotting the
residuals either against their own lagged values or against time.

Oda Bultum University, Econometrics 29


et et
et = +ve
et = +ve
et-1 = -ve
et-1 = +ve

et-1 et-1
et = -ve
et-1 = -ve
et-1 = +ve
et = -ve

Figure 1: Graphical detection of autocorrelation

There are more accurate tests for the incidence of autocorrelation. The most common test of
autocorrelation is the Durbin-Watson Test.
The Durbin-Watson d Test
The test for serial correlation that is most widely used is the Durbin-Watson d test. This test is
appropriate only for the first order autoregressive scheme.
U t  PU t 1  Et then Et  PE t 1  U t
The test may be outlined as
HO : P  0
H1 : P  0
This test is, however, applicable where the underlying assumptions are met:
 The regression model includes an intercept term
 The serial correlation is first order in nature
 The regression does not include the lagged dependent variable as an explanatory
variable
 There are no missing observations in the data
The equation for the Durban-Watson d statistic is
N

 (e t  et 1 ) 2
d  t 2
N

e
2
t
t 1
Which is simply the ratio of the sum of squared differences in successive residuals to the RSS
Note that the numerator has one fewer observation than the denominator, because an
observation must be used to calculate et 1 . A great advantage of the d-statistic is that it is
based on the estimated residuals. Thus it is often reported together with R², t, etc.
The d-statistic equals zero if there is extreme positive serial correlation, two if there is no
serial correlation, and four if there is extreme negative correlation.

Oda Bultum University, Econometrics 30


1. Extreme positive serial correlation: d  0
et  et 1 so (et  et 1 )  0 and d  0.
2. Extreme negative correlation: d  4
et  et 1 and (et  et 1 )  (2et )

thus d 
 (2e ) t
2

and d  4
e
2
t

3. No serial correlation: d  2

d 
 (e  e t t 1 )2

e t
2
  et 1  2 et et 1
2

2
e e
2 2
t t

Since e et t 1  0 , because they are uncorrelated. Since e


t
2
and e t 1
2
differ in
only one observation, they are approximately equal.
The exact sampling or probability distribution of the d-statistic is not known and, therefore,
unlike the t, X² or F-tests there are no unique critical values which will lead to the acceptance
or rejection of the null hypothesis.

But Durbin and Watson have successfully derived the upper and lower bound so that if the
computed value d lies outside these critical values, a decision can be made regarding the
presence of a positive or negative serial autocorrelation.
Thus
 (e  e e   et 1  2 et et 1
2 2
t 1 )2
d 
t t

e e
2 2
t t

 2(1 
e e t t 1
)
e
2
t 1

ˆ ) since
d  2(1  P
e e t t 1 ˆ
P
e
2
t 1

But, since – 1  P  1 the above identity can be written as: 0  d  4


Therefore, the bounds of d must lie within these limits.

Oda Bultum University, Econometrics 31


zone of indecision zone of indecision
f(d)

Reject H0 Reject H0

+ve -ve
autocorr. autocorr.

accept H0
no serial
correlation

0 d
dL dU 4-dU 4-dL 4

Thus if Pˆ  0  d = 2, no serial autocorrelation.


if Pˆ  1  d = 0, evidence of positive autocorrelation.
if Pˆ  1  d = 4, evidence of negative autocorrelation.
Decision Rules for Durbin-Watson - d-test
Null hypothesis Decision If

No positive autocorrelation Reject 0 < d < dL


No positive autocorrelation No decision dL  d  dU
No negative autocorrelation Reject
4-dL < d < 4
No negative autocorrelation No decision
4-dU  d  4-dL
No autocorrelation Do not reject
dU < d < 4-dU

Note: Other tests for autocorrelation include the Runs test and the Breusch-Godfrey (BG) test.
There are so many tests of autocorrelation since there is no particular test that has been judged
to be unequivocally best or more powerful in the statistical sense.
Remedial Measures for Autocorrelation
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measures.
1) The solution depends on the source of the problem.

Oda Bultum University, Econometrics 32


 If the source is omitted variables, the appropriate solution is to include these
variables in the set of explanatory variables.
 If the source is misspecification of the mathematical form the relevant approach
will be to change the form.
2) If these sources are ruled out then the appropriate procedure will be to transform the
original data so as to produce a model whose random variable satisfies the assumptions of
non-autocorrelation. But the transformation depends on the pattern of autoregressive
structure. Here we deal with first order autoregressive scheme.
 t  P t 1  U t
For such a scheme the appropriate transformation is to subtract from the original
observations of each period the product of P̂ times the value of the variables in the
previous period.
Yt *  b0 * b1 X 1t * ...  bK X Kt * U t
Yt *  Yt  Pˆt 1
X it *  X it  PX i (t  1)
where:
Vt  U t  PU t 1
b0 *  b0  Pb0
Thus, if the structure of autocorrelation is known, then it is possible to make the above
transformation. But often the structure of the autocorrelation is not known. Thus, we need to
estimate P in order to make the transformation.
When  is not known
There are different ways of estimating the correlation coefficient,  , if it is unknown.
1) Estimation of  from the d-statistic
ˆ  1 d
ˆ ) or P
Recall that d  2(1  P
2
which suggest a simple way of obtaining an estimate of  from the estimated d statistic. Once
an estimate of  is made available one could proceed with the estimation of the OLS
parameters by making the necessary transformation.
2) Durbin’ s two step method
Given the original function as
ˆˆ ˆˆ ˆ
Yt  P Yt 1   0 (1  P )  1 ( X t  ˆX t 1 )  U t *x
let U t  U t 1  Vt
Step 1: start from the transformed model
(Yt  PYt 1 )   0 (1  P)  1 ( X 1t  PX 1t 1 )  ...   K ( X Kt PX t 1 )  U t *
rearranging and setting

Oda Bultum University, Econometrics 33


 0 (1  P)  a 0
 1  a1
 1  a 2
etc.

The above equation may be written as


Yt  a0  PYt 1  a1 X 1t  ...  a K X Kt  Vt
Applying OLS to the equation, we obtain an estimate of, which is the coefficient of the
lagged variable Yt 1 .
Step 2: We use this estimate, ̂ to obtain the transformed variables.
(Yt  PYt 1 )  Yt *
( X 1t  X 1t 1 )  X 1t *
...
( X Kt  PX Kt 1 )  X Kt *
We use this model to estimate the parameters of the original relationship.
Yt *   0   1 X 1t * ...   K X Kt * Vt

The methods discussed above to solve the problem of serial autocorrelation are basically two
step methods. In step 1, we obtain an estimate of the unknown  and in step 2, we use that
estimate to transform the variables to estimate the generalized difference equation.
Note: The Cochrane-Orcutt Iterative Method is also another method.
Multicollinearity: Exact linear correlation between Regressors
One of the classical assumptions of the regression model is that the explanatory variables are
uncorrelated. If the assumption that no independent variable is a perfect linear function of one
or more other independent variables is violated we have the problem of multicollinearity. If
the explanatory variables are perfectly linearly correlated, the parameters become
indeterminate. It is impossible to find the numerical values for each parameter and the method
of estimation breaks.
If the correlation coefficient is 0, the variables are called orthogonal; there is no problem of
multicollinearity. Neither of the above two extreme cases is often met. But some degree of
inter-correlation is expected among the explanatory variables, due to the interdependence of
economic variables.
Multicollinearity is not a condition that either exists or does not exist in economic functions,
but rather a phenomenon inherent in most relationships due to the nature of economic
magnitude. But there is no conclusive evidence which suggests that a certain degree of
multicollinearity will affect seriously the parameter estimates.
Reasons for Existence of Multicollinearity

Oda Bultum University, Econometrics 34


There is a tendency of economic variables to move together over time. For example, Income,
consumption, savings, investment, prices, employment tend to rise in the period of economic
expansion and decrease in a period of recession. The use of lagged values of some
explanatory variables as separate independent factors in the relationship also causes
multicollinearity problems.
Example: Consumption = f(Yt, Yt-1, ...)
Thus, it can be concluded that multicollinearity is expected in economic variables. Although
multicollinearity is present also in cross-sectional data it is more a problem of time series
data.
Consequences of Multicollinearity
Recall that, if the assumptions of the classical linear regression model are satisfied, the OLS
estimators of the regression estimators are BLUE. As stated above if there is perfect
multicollinearity between the explanatory variables, then it is not possible to determine the
regression coefficients and their standard errors. But if collinearity among the X-variables is
high, but not perfect, then the following might be expected.
Nevertheless, the effect of collinearity is controversial and by no means conclusive.
1) The estimates of the coefficients are statistically unbiased. Even if an equation has
significant multicollinearity, the estimates of the parameters will still be centered
around the true population parameters.
2) When multicollinearity is present in a function, the variances and therefore the
standard errors of the estimates will increase, although some econometricians argue
that this is not always the case.

without severe
multicollinearity

with severe
multicollinearity

̂

(3) The computed t-ratios will fall i.e. insignificant t-ratios will be observed in the
presence of multicollinearity. t   since SE( ˆ ) increases t-falls. Thus
ˆ
SE (  )
one may increasingly accept the null hypothesis that the relevant true population’ s
value is zero

Oda Bultum University, Econometrics 35


H 0 : i  0
Thus because of the high variances of the estimates the null hypothesis would be
accepted.
(4) A high R² but few significant t-ratios are expected in the presence of
multicollinearity. So one or more of the partial slope coefficients are individually
statistically insignificant on the basis of the t-test. Yet the R² may be so high. Indeed,
this is one of the signals of multicollinearity, insignificant t-values but a high overall
R² and F-values. Thus because multicollinearity has little effect on the overall fit of
the equation, it will also have little effect on the use of that equation for prediction or
forecasting.
Detecting Multicollinearity
Having studied the nature of multicollinearity and the consequences of multicollinearity, the
next question is how to detect multicollinearity. The main purpose in doing so is to decide
how much multicollinearity exists in an equation, not whether any multicollinearity exists. So
the important question is the degree of multicollinearity. But there is no one unique test that is
universally accepted. Instead, we have some rules of thumb for assessing the severity and
importance of multicollinearity in an equation. Some of the most commonly used approaches
are the following:
1) High R² but few significant t-ratios
This is the classical test or symptom of multicollinearity. Often if R² is high (R² > 0.8) the F-
test in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t-tests will show that none or very few partial
slope coefficients are statistically different from zero. In other words, multicollinearity that is
severe enough to substantially lower t-scores does very little to decrease R² or the F-statistic.

So the combination of high R² with low calculated t-values for the individual regression
coefficients is an indicator of the possible presence of severe multicollinearity.
Drawback: a non-multicollinear explanatory variable may still have a significant coefficient
even if there is multicollinearity between two or more other explanatory variables Thus,
equations with high levels of multicollinearity will often have one or two regression
coefficients significantly different from zero, thus making the “ high R² low t” rule a poor
indicator in such cases.
1) High pair-wise (simple) correlation coefficients among the regressors (explanatory
variables).
If the R’ s are high in absolute value, then it is highly probable that the X’ s are highly
correlated and that multicollinearity is a potential problem. The question is how high r should
be to suggest multicollinearity. Some suggest that if r is in excess of 0.80, then
multicollinearity could be suspected.

Oda Bultum University, Econometrics 36


Another rule of thumb is that multicollinearity is a potential problem when the squared simple
correlation coefficient is greater than the unadjusted R².

Two X’ s are severely multicollinear if ( rxi x j )


2
 R2 .
A major problem of this approach is that although high zero-order correlations may suggest
collinearity, it is not necessary that they be high to have collinearity in any specific case.
2) VIF and Tolerance
VIF shows the speed with which the variances and covariances increase. It also shows how
the variance of an estimator is influenced by the presence of multicollinearity. VIF is defined
as follows:
1
VIF 
(1  r 2 23 )
Where r23 is the correlation between two explanatory variables. As r 2 23 approaches 1, the
VIF approaches infinity. If there no collinearity, VIF will be 1. As a rule of thumb, VIF value
of 10 or more shows multicollinearity is sever problem. Tolerance is defined as the inverse of
VIF.
3) Other more formal tests for multicollinearity
The use of formal tests to give any indications of the severity of the multicollinearity in a
particular sample is controversial. Some econometricians reject even the simple indicators
developed above, mainly because of the limitations cited. Some people tend to use a number
of more formal tests. But none of these is accepted as the best.
Remedies for Multicollinearity
There is no automatic answer to the question “ what can be done to minimize the problem of
multicollinearity.” The possible solution which might be adopted if multicollinearity exists in
a function, vary depending on the severity of multicollinearity, on the availability of other
data sources, on the importance of factors which are multicollinear, on the purpose for which
the function is used. However, some alternative remedies could be suggested for reducing the
effect of multicollinearity.
1) Do Nothing
Some writers have suggested that if multicollinearity does not seriously affect the estimates of
the coefficients one may tolerate its presence in the function. In a sense, multicollinearity is
similar to a non-life threatening human disease that requires an operation only if the disease is
causing a significant problem. A remedy for multicollinearity should only be considered if
and when the consequences cause insignificant t-scores or widely unreliable estimated
coefficients.
2) Dropping one or more of the multicollinear variables
When faced with severe multicollinearity, one of the simplest way to get rid of (drop) one or
more of the collinear variables. Since multicollinearity is caused by correlation between the

Oda Bultum University, Econometrics 37


explanatory variables, if the multicollinear variables are dropped the correlation no longer
exists.
Some people argue that dropping a variable from the model may introduce specification error
or specification biases. According to them since OLS estimators are still BLUE despite near
collinearity omitting a variable may seriously mislead us as to the true values of the
parameters.
Example: If economic theory says that income and wealth should both be included in the
model explaining the consumption expenditure, dropping the wealth variable would constitute
specification bias.
3) Transformation of the variables
If the variables involved are all extremely important on theoretical grounds, neither doing
nothing nor dropping a variable could be helpful. But it is sometimes possible to transform the
variables in the equation to get rid of at least some of the multicollinearity.
Two common such transformations are:
(i) to form a linear combination of the multicollinear variables
(ii) to transform the equation into first differences (or logs)
The technique of forming a linear combination of two or more of the multicollinearity
variables consists of:

 creating a new variable that is a function of the multicollinear variables


 using the new variable to replace the old ones in the regression equation (if X1 and X2
are highly multicollinear, a new variable, X3 = X1 + X2 or K1X1 + K2X2 might be
substituted for both of the multicollinear variables in a re-estimation of the model)
The second kind of transformation to consider as possible remedy for severe multicollinearity
is to change the functional form of the equation.
A first difference is nothing more than the change in a variable from the previous time-period.
X t  X t  X t 1
If an equation (or some of the variables in an equation) is switched from its normal
specification to a first difference specification, it is quite likely that the degree of
multicollinearity will be significantly reduced for two reasons.
 Since multicollinearity is a sample phenomenon, any change in the definitions of the
variables in that sample will change the degree of multicollinearity.
 Multicollinearity takes place most frequently in time-series data, in which first
differences are far less likely to move steadily upward than are the aggregates from
which they are calculated.
(4) Increase the sample size
Another solution to reduce the degree of multicollinearity is to attempt to increase the size of
the sample. Larger data set (often requiring new data collection) will allow more accurate
estimates than a small one, since the large sample normally will reduce somewhat the
variance of the estimated coefficients reducing the impact of multicollinearity. But, for most

Oda Bultum University, Econometrics 38


economic and business applications, this solution is not feasible. As a result new data are
generally impossible or quite expensive to find. One way to increase the sample is to pool
cross-sectional and time series data.
3) Other Remedies
There are several other methods suggested to reduce the degree of multicollinearity. Often
multivariate statistical techniques such as Factor analysis and Principal component analysis or
other techniques such as ridge regression are often employed to solve the problem of
multicollinearity.
Learning Activities and Software practices:
Fitting Multiple Linear Regression model and check violations of assumptions of regression
models, identify the causes and apply remedies to solve these problems.
Summary
The Ordinary Least Squares methods will work when the assumptions of classical linear
regression models hold. One of the critical assumptions of the classical linear regression
model is that the disturbances have all same variance the violation of which leads to
heteroscedasticity. Heteroscedasticity does not destroy the unbiasedness and consistency
properties of OLS estimators but the efficiency property. There are several diagnostic tests
available for detecting it but one cannot tell for sure which will work in a given situation.
Eventhough it is detected, it is not easy to correct it. Transforming the data might be a
possible way out. The other assumption is that there is no multicollinearity (exact or
approximately exact linear relationship) among the explanatory variables. If there is perfect
collinearity, the regression coefficients are indeterminate. Although there are no sure methods
of detecting collinearity, there are several indicators of it. The clearest sign of it is when R2 is
very high but none of the regression coefficients is statistically significant. Detection of
multicollinearity is half the battle, the other half is concerned with how to get rid of it.
Although there are no sure methods, there are only few rules of thumb such as use of
extraneous or priori information, omitting highly collinear variable, transforming data and
increasing sample size. Serial autocorrelation is an econometric problem which arises when
the disturbance terms are correlated to each other. It might be caused due to sluggishness of
economic time series, specification bias, data massaging or data transformation. Although it
exists, the OLS estimators may remain unbiased, consistent and normally distributed but not
efficient. There are formal and informal methods of detecting autocorrelation of which the
Durbin-Watson d test is the popular one. The remedy depends on the nature of the
interdependence among the disturbance terms.

Oda Bultum University, Econometrics 39

You might also like