Stata implementation Specification tests Panel data models with strictly exogenous instruments
GMM: Stata implementation and tests
Giovanni Bruno1
1 Bocconi University
Econometrics - ESS, 2016-2017
Stata implementation Specification tests Panel data models with strictly exogenous instruments
1 Stata implementation
Notation
ivregress 2sls
ivregress gmm
2 Specification tests
Non-robust Hausman test
Robust Hausman test
Hansen-Sargan test of overidentifying restrictions
Testing for weak instruments
Limited Information ML (LIML)
3 Panel data models with strictly exogenous instruments
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Notation
Notation
z contains all available exogenous variable ==> if some of
the x are exogenous, say x1 , they have to be part of z and we
write z (x1 z1 ) , referring to
x1 as included exogenous variables with k1 = the number of
such variables
z1 as excluded exogenous variables (the strictly speaking
instrumental variables for many authors) with L1 = the number
of such variables
The remaining variables in x, say x2 , are the endogenous
variables in the model with k2 = the number of such variables
==> k = k1 + k2 and L = k1 + L1
==> The necessary order condition L k is satisfied if
and only if the number of endogenous variables is no
greater than the number of excluded exogenous, that
is L1 k2
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress 2sls
ivregress 2sls
TSLS is implemented by the command ivregress 2sls
followed by the name of the dependent variable y , the names
of the included exogenous x1 and, within parentheses, the
names of the endogenous variables x2 to the left of the equal
symbol = and the names of the excluded exogenous z1 to the
right of =, as follows
ivregress 2sls depvar indepvars (endog_vars =
instruments ), options
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress 2sls
. * IV estimation of a just-identified model with single endog regressor
.
. ivregress 2sls ldrugexp (hi_empunion = ssiratio) totchr age female blhisp linc, first
First-stage regressions
Number of obs = 10089
F( 6, 10082) = 138.32
Prob > F = 0.0000
R-squared = 0.0761
Adj R-squared = 0.0755
Root MSE = 0.4672
hi_empunion Coef. Std. Err. t P>|t| [95% Conf. Interval]
totchr .0127865 .0036225 3.53 0.000 .0056856 .0198874
age -.0086323 .000713 -12.11 0.000 -.01003 -.0072347
female -.07345 .0094932 -7.74 0.000 -.0920586 -.0548414
blhisp -.06268 .0127687 -4.91 0.000 -.0877091 -.0376509
linc .0483937 .0056768 8.52 0.000 .0372661 .0595212
ssiratio -.1916432 .0141289 -13.56 0.000 -.2193387 -.1639477
_cons 1.028981 .0574094 17.92 0.000 .9164466 1.141514
Instrumental variables (2SLS) regression Number of obs = 10089
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress 2sls
Instrumental variables (2SLS) regression Number of obs = 10089
Wald chi2(6) = 1919.06
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177
ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]
hi_empunion -.8975913 .2079185 -4.32 0.000 -1.305104 -.4900786
totchr .4502655 .0104189 43.22 0.000 .4298449 .4706861
age -.0132176 .0028749 -4.60 0.000 -.0188523 -.0075829
female -.020406 .0315408 -0.65 0.518 -.0822249 .0414129
blhisp -.2174244 .0386745 -5.62 0.000 -.2932249 -.1416238
linc .0870018 .0220144 3.95 0.000 .0438543 .1301493
_cons 6.78717 .2554343 26.57 0.000 6.286528 7.287812
Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
ivregress gmm
The GMM in the linear model is implemented by ivregress
gmm with the same syntax as for TSLS, as follows
ivregress gmm depvar indepvars (endog_vars =
instruments ), options
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
The GMM weighting matrix
The weighting matrix in the optimal two-step GMM estimator is
1
/n
A = Z 0 SZ . (1)
A is a consistent estimate of the inverse of Var p1 Z 0 # , the
n
variance-covariance matrix of the sample moments. Choices of S
are the following:
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
If # is homoskedastic and independent, S = I (the resulting
GMM estimator collapses to TSLS). Its implemented through
the ivregress gmm option: wmatrix(unadjusted).
If # is heteroskedastic and independent, S is diagonal:
0 1
e12 0 0
B .. C
B 0 e22 . C
S = B
B .. ..
C
C
@ . . 0 A
0 0 en2
with ei = yi xi0 b
b
TSLS , i = 1, ..., n. Its implemented through
the ivregress gmm option: wmatrix(robust). Its the
default.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
If errors are clustered, then S is a block diagonal matrix with
generic block equal to the outer product of the residuals
peculiar to the corresponding cluster. Residuals are taken
from a one-step consistent regression (TSLS):
0 1
S 1 0 0
B .. C
B 0 S 2 . C
S=B .B C
@ .. .. C
. 0 A
0 0 S N
with S i = ei ei0 and ei = yi xi0 b
b
TSLS is the vector of
residual observations peculiar to cluster i = 1, ..., N. Its
implemented through the ivregress gmm option:
wmatrix(cluster cluster_var ).
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
With time-series data and # heteroskedastic and serially correlated
the optimal weighting matrix A may be assembled by using the
Newey-West heteroskedasticity- and- autocorrelation-consistent
(HAC) estimator. This is implemented by specifying wmatrix(hac
kernel # ), which requests a weighting matrix using the specified
kernel (see below) with # lags. The bandwidth of a kernel is equal
to the number of lags plus one.
Specifying wmatrix(hac kernel opt) requests an HAC
weighting matrix using the specified kernel, and the lag order is
selected using Newey and Wests (1994) optimal lag-selection
algorithm. Specifying wmatrix(hac kernel ) requests an HAC
weighting matrix using the specified kernel and n-2 lags, where n is
the sample size.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
There are three kernels available for HAC weighting matrices:
bartlett or nwest requests the Bartlett (Newey-West)
kernel;
parzen or gallant requests the Parzen (Gallant 1987) kernel;
and
quadraticspectral or andrews requests the quadratic
spectral (Andrews 1991) kernel.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
Iterative GMM
The GMM procedure can be iterated by adding the option igmm to
ivregress gmm. The resulting estimator is asymptotically
equivalent to the two-step estimator. Hall (2005) suggests that it
may have a better finite-sample performance.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
Robust standard errors
The less efficient, but computationally simpler and still consistent,
TSLS estimator is often used inestimation.
Its robust
variance-covariance matrix Var b b
TSLS is consistently estimated
as
\ 1 1
Var b [Z ] X X 0 P[Z ] X
TSLS = X P[Z ] X X 0 P[Z ] SP
b 0
,
where S is chosen according to the various departures from
homoskedasticity and independence spelled out above. The Stata
implementation of the variance-covariance estimators is through
the following ivregress options: vce(unadjusted),
vce(robust), vce(cluster cluster_var ), vce(hac kernel
... )
Stata implementation Specification tests Panel data models with strictly exogenous instruments
ivregress gmm
Results for four GMM estimators
Variable TwoSLS GMM_het GMM_clu TwoSLS_~f
hi_empunion -0.98993 -0.99328 -1.03587 -0.98993
0.20459 0.20467 0.20438 0.19221
totchr 0.45121 0.45095 0.44822 0.45121
0.01031 0.01031 0.01325 0.01051
age -0.01414 -0.01415 -0.01185 -0.01414
0.00290 0.00290 0.00626 0.00278
female -0.02784 -0.02817 -0.02451 -0.02784
0.03217 0.03219 0.02919 0.03117
blhisp -0.22371 -0.22310 -0.20907 -0.22371
0.03958 0.03960 0.05018 0.03870
linc 0.09427 0.09446 0.09573 0.09427
0.02188 0.02190 0.01474 0.02123
_cons 6.87519 6.87782 6.72769 6.87519
0.25789 0.25800 0.50588 0.24528
legend: b/se
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Non-robust Hausman test
Non-robust Hausman test
We test exogeneity of X2 maintaining instruments validity:
E (#|Z ) = 0, which implies E (#|X1 ) = 0
==> H0 : E (#|X1 X2 ) = E (#|X1 )
A conventional Hausman test ([2]) can be implemented, based
on the Hausmans statistics measuring the statistical
difference between IV and OLS estimates. It would not be
robust to heteroskedastic and clustered errors, though.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Robust Hausman test
Robust Hausman test
A robust version of the test is implemented through the
control-function approach, recasting endogeneity as a
misspecification problem in the structural equation (see [5]
and [1])
y = X b + #, (2)
X = (X1 X2 ), b = b10 b20 , Z = (X1 Z1 ) and # = u + np.
u : E (u|X ) = 0 and n is the n k2 -matrix of the errors in the
k2 first-stage equations. NB: n is what makes X2 endogenous.
Replacing n in (2) with the residuals from the first-stage
regressions, n = M[Z ] X2 , (M[Z ] I P[Z ] ) makes the H test
a simple test of joint significance for p in the auxiliary OLS
regression
y = X b + M[Z ] X2 p + u . (3)
The test works since under the alternative of p 6= 0, OLS
estimation of the auxiliary regression yields the TSLS
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Robust Hausman test
The H test could be easily robustified for heteroskedasticity
and/or clustered errors by testing the joint significance of p
via test after estimating (3) with regress and a suitable
robust option:
with heteroskedasticity: vce(robust)
with heteroskedasticity and cluster correlation: vce(cluster
clustervar ).
The above is not necessary, though. The various versions of
the H test can be immediately implemented in Stata through
the ivregress postestimation command estat endogenous.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Robust Hausman test
. * Robust Durbin-Wu-Hausman test of endogeneity implemented by estat endogenous
. ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)
Instrumental variables (2SLS) regression Number of obs = 10089
Wald chi2(6) = 2000.86
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177
Robust
ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]
hi_empunion -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908
totchr .4502655 .0101969 44.16 0.000 .43028 .470251
age -.0132176 .0029977 -4.41 0.000 -.0190931 -.0073421
female -.020406 .0326114 -0.63 0.531 -.0843232 .0435113
blhisp -.2174244 .0394944 -5.51 0.000 -.294832 -.1400167
linc .0870018 .0226356 3.84 0.000 .0426368 .1313668
_cons 6.78717 .2688453 25.25 0.000 6.260243 7.314097
Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio
.
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = 24.935 (p = 0.0000)
Robust regression F(1,10081) = 26.4333 (p = 0.0000)
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Robust Hausman test
. estat endogenous,forcenonrobust
Tests of endogeneity
Ho: variables are exogenous
Durbin (score) chi2(1) = 25.2819 (p = 0.0000)
Wu-Hausman F(1,10081) = 25.3253 (p = 0.0000)
Robust score chi2(1) = 24.935 (p = 0.0000)
Robust regression F(1,10081) = 26.4333 (p = 0.0000)
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Robust Hausman test
. * Robust Durbin-Wu-Hausman test of endogeneity implemented manually
. global xlist totchr age female blhisp linc
. quietly regress hi_empunion ssiratio $xlist
. quietly predict v1hat, resid
. quietly regress ldrugexp hi_empunion v1hat $xlist, vce(robust)
. test v1hat
( 1) v1hat = 0
F( 1, 10081) = 26.43
Prob > F = 0.0000
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Hansen-Sargan test of overidentifying restrictions
Hansen-Sargan Test
If the population moment conditions
are true, then the minimized
GMM criterion function Q b b
TSLS should not be significantly
different from zero. This provides a test for the validity of the
L k over-identifying moment conditions based on the following
statistic (Hansen-Sargan test)
2
S nQ b b
TSLS c (L k) .
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Hansen-Sargan test of overidentifying restrictions
. * Test of overidentifying restrictions following ivregress gmm
. quietly ivregress gmm ldrugexp (hi_empunion = ssiratio multlc) ///
> $xlist, wmatrix(robust)
.
. estat overid
Test of overidentifying restriction:
Hansen's J chi2(1) = 1.04754 (p = 0.3061)
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Testing for weak instruments
Testing for weak instruments
Staiger and Stocks rule of thumb: partial F tests in the first
stage regression > 10. Not rigorous, rejects too often weak
intruments, no obvious implementation when there are more
than one endogenous variables.
Stock and Yogos (2005) two tests overcome all of the above
difficulties. Both based on the on the minimum eigenvalue of
a matrix analog of the partial F test, a statistics introduced by
Cragg and Donald (1993) to test nonidentification.
Importantly, the large-sample properties for both tests have
been derived under the assumption of homoskedastic and
independent errors: caution must be taken.
Both procedures are implemented by the ivregress
postestimation command estat firststage.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Testing for weak instruments
. * Weak instrument tests - just-identified model
. quietly ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)
.
. estat firststage, forcenonrobust all /// implements the Stock and
> ///Yogo (2005) weak instrument tests
>
First-stage regression summary statistics
Adjusted Partial Robust
Variable R-sq. R-sq. R-sq. F(1,10082) Prob > F
hi_empunion 0.0761 0.0755 0.0179 65.7602 0.0000
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Testing for weak instruments
Minimum eigenvalue statistic = 183.98
Critical Values # of endogenous regressors: 1
Ho: Instruments are weak # of excluded instruments: 1
5% 10% 20% 30%
2SLS relative bias (not available)
10% 15% 20% 25%
2SLS Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Testing for weak instruments
Olea and Pfluger (2013) derive a new test for weak
instruments that extends that by Stock and Yogo to
heteroskedasticity and cluster correlation. This is implemented
in Stata by the user-written command weakivtest after
ivregress.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Testing for weak instruments
. weakivtest /// weakivtest (user-written)
> /// implements the weak
> /// instrument test of Montiel Olea and
> /// Pflueger (2013).
> /// extends Stock and Yogo (2005)
> /// to accommodate heteroskedasticity
> /// and cluster correlation
> /// It is a postestimation command for
> /// ivregress.
>
(obs=10089)
Montiel-Pflueger robust weak instrument test
Effective F statistic: 65.760
Confidence level alpha: 5%
Critical Values TSLS LIML
% of Worst Case Bias
tau=5% 37.418 37.418
tau=10% 23.109 23.109
tau=20% 15.062 15.062
tau=30% 12.039 12.039
.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Limited Information ML (LIML)
LIML
LIML is a ML estimator maintaining that erorrs in structural and
first-stage equations are jointly normal. It is not full-information
ML because it is based on reduced form first-stage equations,
rather than fully specified structural equations for the included
endogenous variables. It has often better finite sample properties,
but is less robust than TSLS. It is implemented by ivregress
liml.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Panel data models with strictly exogenous instruments
The conventional panel-data transformations, group-mean
deviations and partial deviations, can be applied to yield consistent
panel-data IV-GMM estimators only if there is a matrix Z of
strictly exogenous variables.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
The FE-TSLS estimator is simply computed by applying TSLS
to variables y, Z and X transformed in group-mean deviations.
The RE-TSLS estimator is simply computed by applying TSLS
to variables y, Z and X transformed in partial deviations.
For both estimators to be consistent it is required that
E (#|Z ) = 0.
FE-TSLS and RE-TSLS are implemented in Stata by xtivreg
with options, respectively, fe and re (or default). For the rest
the syntax is as that of ivregress 2sls.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
A. C. Cameron and P. K. Trivedi.
Microeconometrics using Stata - Revised Edition.
Stata Press, College Station, TX, 2010.
J. Hausman.
Specification tests in econometrics.
Econometrica, 46:12511271, 1978.
J. L. M. Olea and C. Pfluger.
A robust test for weak instruments.
Journal of Business & Economic Statistics, pages 358369,
2013.
J. H. Stock and M. Yogo.
Testing forweak instruments in linear iv regression.
In D.W. Andrews and J. H. Stock, editors, Identification and
Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg, pages 80108. Cambridge: Cambridge University
Press, 2005.
Stata implementation Specification tests Panel data models with strictly exogenous instruments
Jeffrey M. Wooldridge.
Econometric Analysis of Cross Section and Panel Data.
The MIT Press, Cambridge, MA, 2nd edition, 2010.