Specification test
Vid Adrison
Outline
Redundant Variable Omitted Variable Functional Specification Selection Criteria
Redundant Variable
Consequences
On the unbiasedness: remain unbiased On the variance: increases variance Proof:
Review the concept of unbiased estimator
Create a simulated demand function
Steps in conducting simulation;
Simulation is useful as we know the true value of the parameter Assume that Qx is only a function of Px and Income Generate 200 data of Px, Py, INC, and Error via random draw In excel the syntax is =rand() Generate log(Qx)= 0.5-0.5*log(Px)+0.5*log(INC)+Error Run log(Qx)=f[log(Px), log(INC)] The parameter will be closer to the assigned values, as the number of draws increase
Repeating the above procedure for N times and get the average values of the parameter Monte Carlo Simulation As the comparison, run log(Qx)=f[log(Px), log(Py), log(INC)], see how the parameter changes
Redundant Variable
Test procedure in EVIEWS:
View | Coefficient Test | Omitted Variables | (Write Variables | OK
H0: Variables do not belong to the model H1: Variables belong to the model
This procedure is the same as omitted variable test, thus, the hypotheses remain the same
Basically, omitted variable/redundant variable test are performed by comparing the likelihood ratio between restricted and unrestricted model
Correct Specification Regression
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/23/10 Time: 17:44 Sample: 1 60 Included observations: 60 Variable LOG(PX) LOG(INC) C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient -0.525034 0.514221 0.970042 0.828189 0.822161 0.305112 5.306335 -12.37302 2.276588 Std. Error 0.035679 0.045908 0.095809 t-Statistic -14.71562 11.20119 10.12477 Prob. 0.0000 0.0000 0.0000 1.237024 0.723513 0.512434 0.617151 137.3802 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Redundant Variable case
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/23/10 Time: 17:44 Sample: 1 60 Included observations: 60 Variable LOG(PX) LOG(INC) LOG(PY) C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient -0.521201 0.528201 0.070505 0.890289 0.835615 0.826809 0.301099 5.076984 -11.04750 2.360587 Std. Error 0.035292 0.046149 0.044328 0.107022 t-Statistic -14.76838 11.44567 1.590528 8.318742 Prob. 0.0000 0.0000 0.1173 0.0000 1.237024 0.723513 0.501583 0.641206 94.88810 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Omitted Variable
Consequences
On the unbiasedness: more serious than redundant variable case
Omitted variable may be due to ignorance or data unavailability Example:
Dropping INC from the previous regression Excluding ability in wage offer function
For two variable-model, the sign of bias depends on the correlation between excluded variable and included variable
Corr (X1, X2) > 0 Positive Bias Negative Bias Corr(X1, X2)<0 Negative Bias Positive Bias
B2 > 0 B2 < 0
The direction of bias can be more complicated if we have three or more regressors See Wooldridge Chapter 3 for detail derivation
Omitted Variable case
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/23/10 Time: 17:45 Sample: 1 60 Included observations: 60 Variable LOG(PX) C R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient -0.420876 1.800707 0.450005 0.440522 0.541175 16.98648 -47.27851 1.828653 Std. Error 0.061096 0.107595 t-Statistic -6.888789 16.73598 Prob. 0.0000 0.0000 1.237024 0.723513 1.642617 1.712429 47.45541 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Omitted Variable Test
Omitted Variables: LOG(INC) F-statistic Log likelihood ratio 125.4667 69.81100 Probability Probability 0.000000 0.000000
Test Equation: Dependent Variable: LOG(QX) Method: Least Squares Date: 02/23/10 Time: 23:52 Sample: 1 60 Included observations: 60 Variable C LOG(PX) LOG(INC) R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 0.970042 -0.525034 0.514221 0.828189 0.822161 0.305112 5.306335 -12.37302 2.276588 Std. Error 0.095809 0.035679 0.045908 t-Statistic 10.12477 -14.71562 11.20119 Prob. 0.0000 0.0000 0.0000 1.237024 0.723513 0.512434 0.617151 137.3802 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Regression through Origin
Recall the interpretation of intercept
For Keynesian consumption function, it reflects autonomous consumption; the amount of consumption one will have if his/her income is zero Some have no (logical) economic interpretation:
I.e., production function (K=0, L=0 will definitely lead to Y=0, demand function (price should be in the positive domain)
In the absence of economic interpretation, one is tempted to drop intercept
It is essentially dropping vector of ONE in the matrix notation Is it the correct treatment ???
Regression through Origin
Note that an intercept does not have to have economic interpretation
One of several role of an intercept is to ensure zero conditional mean on error Example of violation;
True Consumption = B0 + B1*Income + error If consumption is measured incorrectly, such as, understatement; such that
Observed consumption = True consumption understatement The regression would be; Observed Consumption = B0 + B1*Income + error understatement If we dont include B0, then E (error understatement) is different from zero Bias in B1 If we include B0, B1 is not biased
Cost of using intercept if B0 is truly zero None Cost of deleting intercept if B0 is not zero Biased in slope parameter
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/23/10 Time: 18:18 Sample: 1 60 Included observations: 60 Variable LOG(PX) LOG(INC) R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Coefficient -0.429977 0.873992 0.519198 0.510909 0.505989 14.84945 -43.24485 Std. Error 0.057083 0.048203 t-Statistic -7.532469 18.13144 Prob. 0.0000 0.0000 1.237024 0.723513 1.508162 1.577973 1.730641
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Durbin-Watson stat
Functional Specification
What to choose:
Nested Model: A Vs B, or C Vs D
A: ln(Qx)=f(Px, INC), B: ln(Qx)=f(Px, Py, INC), C: ln(Qx)=f(ln(Px),ln(INC)) D: ln(Qx)=f(ln(Px),ln(Py), ln(INC))??
Ramsey RESET
Basically add the polynomial of expected value as the regressor, as the proxy for unaccounted variable If adding this proxy variable leads to significant increase in adjusted R square, the regression contains misspecification
H0: No misspecification error H1: Model contains specification error
Steps in Eviews: View | Stability Test | Ramsey RESET test | (Include number of polynomial variable) | OK
Ramsey RESET Test: F-statistic Log likelihood ratio 0.784074 0.834253 Probability Probability 0.379684 0.361046
Test Equation: Dependent Variable: LOG(QX) Method: Least Squares Date: 02/24/10 Time: 00:33 Sample: 1 60 Included observations: 60 Variable C LOG(PX) LOG(INC) FITTED^2 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Ramsey RESET Test: F-statistic Log likelihood ratio 4.159492 4.298853 Probability Probability 0.046131 0.038138 Coefficient 1.088893 -0.601195 0.550684 -0.043772 0.830562 0.821485 0.305692 5.233065 -11.95589 2.255349 Std. Error 0.165015 0.093144 0.061735 0.049433 t-Statistic 6.598762 -6.454500 8.920113 -0.885480 Prob. 0.0000 0.0000 0.0000 0.3797 1.237024 0.723513 0.531863 0.671486 91.50122 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Test Equation: Dependent Variable: LOG(QX) Method: Least Squares Date: 02/24/10 Time: 00:34 Sample: 1 60 Included observations: 60 Variable C PX INC FITTED^2 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 0.502144 0.005691 -0.004399 0.413044 0.543989 0.519560 0.501494 14.08379 -41.65671 2.432008 Std. Error 0.492582 0.080800 0.042664 0.202524 t-Statistic 1.019413 0.070429 -0.103115 2.039483 Prob. 0.3124 0.9441 0.9182 0.0461 1.237024 0.723513 1.521890 1.661513 22.26804 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Functional Specification
Non Nested Model: A Vs C (In the previous slides)
Mizon and Richard (1986)
Ln(Qx) =B0 + B1*Px + B2*INC+B3*ln(Px)+B4*ln(INC)+e Test using Wald
B1=B2=0 if null is rejected, then specification A is preferred B3=B4=0 if null is rejected, then specification C is preferred
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/24/10 Time: 00:55 Sample: 1 60 Included observations: 60 Variable C LOG(PX) LOG(INC) PX INC R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Wald Test: Equation: Untitled Null Hypothesis: C(4)=0 C(5)=0 F-statistic Chi-square Wald Test: Equation: Untitled Null Hypothesis: C(2)=0 C(3)=0 F-statistic Chi-square 53.70023 107.4005 Probability Probability 0.000000 0.000000 0.978447 1.956895 Probability Probability 0.382341 0.375894 Coefficient 1.047136 -0.490712 0.604477 -0.017596 -0.024155 0.834092 0.822026 0.305228 5.124023 -11.32417 2.314130 Std. Error 0.119279 0.064609 0.083606 0.025031 0.018644 t-Statistic 8.778901 -7.595069 7.230085 -0.702977 -1.295620 Prob. 0.0000 0.0000 0.0000 0.4850 0.2005 1.237024 0.723513 0.544139 0.718668 69.12739 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Functional Specification
Davidson-MacKinnon (1981)
Use the similar principle as Ramsey, but different predicted values Recall
Spec A: ln(Qx)=f(Px, INC) Spec C: ln(Qx)=f(ln(Px), ln(INC))
Steps: to test if Spec A is correct:
Run Spec C, get predicted value, say Z1 Run Spec A by adding Z1 into the equation If Z1 is insignificant, then A is correctly specified
We can also perform the test in the other direction;
Run Spec A, get predicted value, say Z2 Run Spec C by adding Z2 into the equation If Z2 is insignificant, then C is correctly specified
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/24/10 Time: 00:59 Sample: 1 60 Included observations: 60 Variable C PX INC Z1 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 0.038918 0.000279 -0.008157 1.021596 0.829783 0.820665 0.306393 5.257103 -12.09338 2.239382 Std. Error 0.176320 0.020148 0.013027 0.099618 t-Statistic 0.220726 0.013860 -0.626168 10.25511 Prob. 0.8261 0.9890 0.5337 0.0000 1.237024 0.723513 0.536446 0.676069 90.99747 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Dependent Variable: LOG(QX) Method: Least Squares Date: 02/24/10 Time: 01:01 Sample: 1 60 Included observations: 60 Variable C LOG(PX) LOG(INC) Z2 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 1.001958 -0.534508 0.522164 -0.027657 0.828332 0.819136 0.307697 5.301924 -12.34807 2.265905 Std. Error 0.176638 0.056757 0.059141 0.128139 t-Statistic 5.672389 -9.417454 8.829176 -0.215839 Prob. 0.0000 0.0000 0.0000 0.8299 1.237024 0.723513 0.544936 0.684559 90.07041 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic)
Selection Criteria
According to Hendry and Richard (1983), a model chosen for empirical analysis should satisfy the following criteria:
Admissible (prediction made from the model must be logically possible) Consistent with theory: Make economic good sense Have weakly exogenous explanatory variables: Regressors are uncorrelated with the error terms Constancy: The values of the parameters should be stable. In other word, the parameter values obtained using within sample observation should not deviate significantly from outside sample observation. Coherency: Residuals estimated from the model must be purely random Encompassing: No other model explains better
Selection Criteria
Evaluation of Competing Models
Three statistics for model evaluation criteria available in most econometric software are;
Adjusted R-Squared Choose model that generates the highest Adjusted R squared Akaike Information Criterion Choose model that generates the smallest AIC Schwarz Information Criterion Choose model that generates the smallest SIC