KEMBAR78
IV Notes1-2 | PDF | Endogeneity (Econometrics) | Ordinary Least Squares
0% found this document useful (0 votes)
8 views56 pages

IV Notes1-2

The document discusses Instrumental Variables (IV) regression, which addresses biases in estimating causal relationships, specifically omitted variable bias, simultaneous causality bias, and errors-in-variables bias. It outlines the Two Stage Least Squares (TSLS) method for estimating parameters using a valid instrument that meets relevance and exogeneity conditions. Examples illustrate the application of IV regression in contexts like demand for butter and cigarette consumption, emphasizing the importance of valid instruments for accurate estimation.

Uploaded by

a.irem2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views56 pages

IV Notes1-2

The document discusses Instrumental Variables (IV) regression, which addresses biases in estimating causal relationships, specifically omitted variable bias, simultaneous causality bias, and errors-in-variables bias. It outlines the Two Stage Least Squares (TSLS) method for estimating parameters using a valid instrument that meets relevance and exogeneity conditions. Examples illustrate the application of IV regression in contexts like demand for butter and cigarette consumption, emphasizing the importance of valid instruments for accurate estimation.

Uploaded by

a.irem2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Instrumental Variable

Instrumental Variables Regression

Three important threats to internal validity are:


• omitted variable bias from a variable that is correlated
with X but is unobserved, so cannot be included in the
regression;
• simultaneous causality bias (X causes Y, Y causes X);
• errors-in-variables bias (X is measured with error)

Instrumental variables regression can eliminate bias when


E(u|X) ≠ 0 – using an instrumental variable, Z

12-2
IV Regression with One Regressor and One Instrument

Yi = β0 + β1Xi + ui

• IV regression breaks X into two parts: a part that might be


correlated with u, and a part that is not. By isolating the
part that is not correlated with u, it is possible to estimate
β1.
• This is done using an instrumental variable, Zi, which is
uncorrelated with ui.
• The instrumental variable detects movements in Xi that are
uncorrelated with ui, and uses these to estimate β1.

12-3
Terminology: endogeneity and exogeneity

An endogenous variable is one that is correlated with u


An exogenous variable is one that is uncorrelated with u

Historical note: “Endogenous” literally means


“determined within the system,” that is, a variable that is
jointly determined with Y, that is, a variable subject to
simultaneous causality. However, this definition is
narrow and IV regression can be used to address OV bias
and errors-in-variable bias, not just to simultaneous
causality bias.

12-4
Two conditions for a valid instrument

Yi = β0 + β1Xi + ui

For an instrumental variable (an “instrument”) Z to be valid,


it must satisfy two conditions:
1. Instrument relevance: corr(Zi,Xi) ≠ 0
2. Instrument exogeneity: corr(Zi,ui) = 0

Suppose for now that you have such a Zi (we’ll discuss how
to find instrumental variables later).

How can you use Zi to estimate β1?

12-5
The IV Estimator, one X and one Z

Explanation #1: Two Stage Least Squares (TSLS)


As it sounds, TSLS has two stages – two regressions:
(1) First isolates the part of X that is uncorrelated with u:
regress X on Z using OLS

Xi = π0 + π1Zi + vi (1)

• Because Zi is uncorrelated with ui, π0 + π1Zi is


uncorrelated with ui. We don’t know π0 or π1 but we
have estimated them, so…
• Compute the predicted values of Xi, Xˆ i , where Xˆ i = πˆ0
+ πˆ1 Zi, i = 1,…,n.
12-6
Two Stage Least Squares

(2) Replace Xi by Xˆ i in the regression of interest:


regress Y on Xˆ using OLS:
i

Yi = β0 + β1 Xˆ i + ui (2)

• Because Xˆ i is uncorrelated with ui (if n is large), the first


least squares assumption holds (if n is large)
• Thus β1 can be estimated by OLS using regression (2)
• This argument relies on large samples (so π0 and π1 are well
estimated using regression (1))
• This the resulting estimator is called the Two Stage Least
Squares (TSLS) estimator, βˆ TSLS .
1

12-7
Two Stage Least Squares

Suppose you have a valid instrument, Zi.

Stage 1: Regress Xi on Zi, obtain the predicted values Xˆ i

Stage 2: Regress Yi on Xˆ i ; the coefficient on Xˆ i is


the TSLS estimator, βˆ TSLS .
1

βˆ1TSLS is a consistent estimator of β1.

12-8
The IV Estimator, one X and one Z
Explanation #2: a little algebra…

Yi = β0 + β1Xi + ui
Thus,
cov(Yi,Zi) = cov(β0 + β1Xi + ui,Zi)
= cov(β0,Zi) + cov(β1Xi,Zi) + cov(ui,Zi)
= 0 + cov(β1Xi,Zi) + 0
= β1cov(Xi,Zi)

where cov(ui,Zi) = 0 (instrument exogeneity); thus

cov(Yi , Z i )
β1 =
cov( X i , Z i )
12-9
The IV Estimator, one X and one Z

cov(Yi , Z i )
β1 =
cov( X i , Z i )

The IV estimator replaces these population covariances with


sample covariances:

ˆ TSLS sYZ
β1 = ,
s XZ

sYZ and sXZ are the sample covariances. This is the TSLS
estimator – just a different derivation!

12-10
Consistency of the TSLS estimator

ˆ TSLS sYZ
β1 =
s XZ

p
The sample covariances are consistent: sYZ → cov(Y,Z) and
p
sXZ → cov(X,Z). Thus,

sYZ p cov(Y , Z )
βˆ1TSLS = → = β1
s XZ cov( X , Z )

• The instrument relevance condition, cov(X,Z) ≠ 0, ensures


that you don’t divide by zero.
12-11
Example #1: Supply and demand for butter

IV regression was originally developed to estimate demand


elasticities for agricultural goods, for example butter:

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

• β1 = price elasticity of butter = percent change in quantity


for a 1% change in price (recall log-log specification
discussion)
• Data: observations on price and quantity of butter for
different years
• The OLS regression of ln(Qibutter ) on ln( Pi butter ) suffers from
simultaneous causality bias (why?)
12-12
Simultaneous causality bias in the OLS regression of
ln(Qibutter ) on ln( Pi butter ) arises because price and quantity are
determined by the interaction of demand and supply

12-13
This interaction of demand and supply produces…

Would a regression using these data produce the demand


curve?

12-14
But…what would you get if only supply shifted?

• TSLS estimates the demand curve by isolating shifts in


price and quantity that arise from shifts in supply.
• Z is a variable that shifts supply but not demand.
12-15
TSLS in the supply-demand example:

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

Let Z = rainfall in dairy-producing regions.


Is Z a valid instrument?
(1) Exogenous? corr(raini,ui) = 0?
Plausibly: whether it rains in dairy-producing regions
shouldn’t affect demand
(2) Relevant? corr(raini,ln( Pi butter )) ≠ 0?
Plausibly: insufficient rainfall means less grazing
means less butter

12-16
TSLS in the supply-demand example, ctd.

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

Zi = raini = rainfall in dairy-producing regions.

· P butter )
Stage 1: regress ln( Pi butter ) on rain, get ln( i
· P butter ) isolates changes in log price that arise from
ln( i

supply (part of supply, at least)

· P butter )
Stage 2: regress ln(Qibutter ) on ln( i

The regression counterpart of using shifts in the supply


curve to trace out the demand curve.
12-17
Example #2: Test scores and class size

• The California regressions still could have OV bias (e.g.


parental involvement).
• This bias could be eliminated by using IV regression
(TSLS).
• IV regression requires a valid instrument, that is, an
instrument that is:
(1) relevant: corr(Zi,STRi) ≠ 0
(2) exogenous: corr(Zi,ui) = 0

12-18
Example #2: Test scores and class size
Here is a (hypothetical) instrument:
• some districts, randomly hit by an earthquake, “double up”
classrooms:
Zi = Quakei = 1 if hit by quake, = 0 otherwise
• Do the two conditions for a valid instrument hold?
• The earthquake makes it as if the districts were in a random
assignment experiment. Thus the variation in STR arising
from the earthquake is exogenous.
• The first stage of TSLS regresses STR against Quake,
thereby isolating the part of STR that is exogenous (the part
that is “as if” randomly assigned)

12-19
Inference using TSLS

• In large samples, the sampling distribution of the TSLS


estimator is normal
• Inference (hypothesis tests, confidence intervals) proceeds
in the usual way, e.g. ± 1.96SE
• The idea behind the large-sample normal distribution of the
TSLS estimator is that – like all the other estimators we
have considered – it involves an average of mean zero i.i.d.
random variables, to which we can apply the CLT.
• Here is a sketch of the math

12-20
1 n
sYZ ∑
n − 1 i =1
(Yi − Y )( Z i − Z )
ˆ TSLS
β1 = =
s XZ 1 n

n − 1 i =1
( X i − X )( Z i − Z )
n

∑Y ( Z
i =1
i i − Z)
= n

∑ X (Z
i =1
i i − Z)

Substitute in Yi = β0 + β1Xi + ui and simplify:


n n
β1 ∑ X i ( Z i − Z ) + ∑ ui ( Z i − Z )
βˆ1TSLS = i =1
n
i =1

∑ X (Z
i =1
i i − Z)

so…

12-21
n

∑u (Z i i − Z)
βˆ1TSLS = β1 + i =1
n
.
∑ X (Z
i =1
i i − Z)
n

∑u (Z i i − Z)
so βˆ1TSLS – β1 = i =1
n

∑ X (Z
i =1
i i − Z)

Multiply through by n :
1 n

n i =1
( Z i − Z )ui
n ( βˆ1TSLS – β1) =
1 n

n i =1
X i ( Zi − Z )

12-22
1 n

n i =1
( Z i − Z )ui
n ( βˆ1TSLS – β1) =
1 n

n i =1
X i ( Zi − Z )

1 n 1 n p
• ∑ X i ( Z i − Z ) = ∑ ( X i − X )( Z i − Z ) → cov(X,Z) ≠ 0
n i =1 n i =1
n
1

n
∑ (Z
i =1
i − Z )ui is dist’d N(0,var[(Z–µZ)u]) (CLT)

so: βˆ1TSLS is approx. distributed N(β1,σ β2ˆ TSLS ),


1

2 1 var[( Z i − µZ )ui ]
where σ βˆ TSLS
= 2
.
1
n [cov( Z i , X i )]
where cov(X,Z) ≠ 0 because the instrument is relevant

12-23
Inference using TSLS
βˆ1TSLS is approx. distributed N(β1,σ β2ˆ TSLS ),
1

• Statistical inference proceeds in the usual way.


• The justification is (as usual) based on large samples
• This all assumes that the instruments are valid – we’ll
discuss what happens if they aren’t valid shortly.
• Important note on standard errors:
o The OLS standard errors from the second stage
regression aren’t right – they don’t take into account the
estimation in the first stage ( Xˆ is estimated).
i

o Instead, use a single specialized command that computes


the TSLS estimator and the correct SEs.
o as usual, use heteroskedasticity-robust SEs

12-24
Example: Cigarette demand

ln(Qicigarettes ) = β0 + β1ln( Pi cigarettes ) + ui

Panel data:
• Annual cigarette consumption and average prices paid
(including tax)
• 48 continental US states, 1985-1995
Proposed instrumental variable:
• Zi = general sales tax per pack in the state = SalesTaxi
• Is this a valid instrument?
(1) Relevant? corr(SalesTaxi, ln( Pi cigarettes )) ≠ 0?
(2) Exogenous? corr(SalesTaxi,ui) = 0?

12-25
Cigarette demand

For now, use data from 1995 only.


First stage OLS regression:
·P cigarettes ) = 4.63 + .031SalesTax , n = 48
ln( i i

Second stage OLS regression:


·Q cigarettes ) = 9.72 – 1.08 ln(
ln( ·P cigarettes ) , n = 48
i i

Combined regression with correct, heteroskedasticity-robust


standard errors:
·Q cigarettes ) = 9.72 – 1.08 ln(
ln( ·P cigarettes ) , n = 48
i i

(1.53) (0.32)
12-26
STATA Example: Cigarette demand, First stage
Instrument = Z = rtaxso = general sales tax (real $/pack)

X Z
. reg lravgprs rtaxso if year==1995, r;

Regression with robust standard errors Number of obs = 48


F( 1, 46) = 40.39
Prob > F = 0.0000
R-squared = 0.4710
Root MSE = .09394

------------------------------------------------------------------------------
| Robust
lravgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rtaxso | .0307289 .0048354 6.35 0.000 .0209956 .0404621
_cons | 4.616546 .0289177 159.64 0.000 4.558338 4.674755
------------------------------------------------------------------------------

X-hat
. predict lravphat; Now we have the predicted values from the 1st stage

12-27
Second stage
Y X-hat
. reg lpackpc lravphat if year==1995, r;

Regression with robust standard errors Number of obs = 48


F( 1, 46) = 10.54
Prob > F = 0.0022
R-squared = 0.1525
Root MSE = .22645

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------
lravphat | -1.083586 .3336949 -3.25 0.002 -1.755279 -.4118932
_cons | 9.719875 1.597119 6.09 0.000 6.505042 12.93471
------------------------------------------------------------------------------

• These coefficients are the TSLS estimates


• The standard errors are wrong because they ignore the fact
that the first stage was estimated

12-28
Combined into a single command:
Y X Z
. ivreg lpackpc (lravgprs = rtaxso) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 1, 46) = 11.54
Prob > F = 0.0014
R-squared = 0.4011
Root MSE = .19035

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.083587 .3189183 -3.40 0.001 -1.725536 -.4416373
_cons | 9.719876 1.528322 6.36 0.000 6.643525 12.79623
------------------------------------------------------------------------------
Instrumented: lravgprs This is the endogenous regressor
Instruments: rtaxso This is the instrumental varible
------------------------------------------------------------------------------

OK, the change in the SEs was small this time...but not always!

·Q cigarettes ) = 9.72 – 1.08 ln(


ln( ·P cigarettes ) , n = 48
i i

(1.53) (0.32)
12-29
Summary of IV Regression with a Single X and Z

• A valid instrument Z must satisfy two conditions:


(1) relevance: corr(Zi,Xi) ≠ 0
(2) exogeneity: corr(Zi,ui) = 0
• TSLS proceeds by first regressing X on Z to get X̂ , then
regressing Y on X̂ .
• The key idea is that the first stage isolates part of the
variation in X that is uncorrelated with u
• If the instrument is valid, then the large-sample sampling
distribution of the TSLS estimator is normal, so inference
proceeds as usual

12-30
The General IV Regression Model

• So far we have considered IV regression with a single


endogenous regressor (X) and a single instrument (Z).
• We need to extend this to:
o multiple endogenous regressors (X1,…,Xk)
o multiple included exogenous variables (W1,…,Wr)
These need to be included for the usual OV reason
o multiple instrumental variables (Z1,…,Zm)
More (relevant) instruments can produce a smaller
variance of TSLS: the R2 of the first stage increases,
so you have more variation in X̂ .
• Terminology: identification & overidentification

12-31
Identification

• In general, a parameter is said to be identified if different


values of the parameter would produce different
distributions of the data.
• In IV regression, whether the coefficients are identified
depends on the relation between the number of instruments
(m) and the number of endogenous regressors (k)
• Intuitively, if there are fewer instruments than endogenous
regressors, we can’t estimate β1,…,βk
o For example, suppose k = 1 but m = 0 (no instruments)!

12-32
Identification, ctd.

The coefficients β1,…, βk are said to be:


• exactly identified if m = k.
There are just enough instruments to estimate β1,…,βk.
• overidentified if m > k.
There are more than enough instruments to estimate
β1,…,βk. If so, you can test whether the instruments are
valid (a test of the “overidentifying restrictions”) – we’ll
return to this later
• underidentified if m < k.
There are too few instruments to estimate β1,…,βk. If so,
you need to get more instruments!

12-33
The general IV regression model: Summary of jargon

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

• Yi is the dependent variable


• X1i,…, Xki are the endogenous regressors (potentially
correlated with ui)
• W1i,…,Wri are the included exogenous variables or
included exogenous regressors (uncorrelated with ui)
• β0, β1,…, βk+r are the unknown regression coefficients
• Z1i,…,Zmi are the m instrumental variables (the excluded
exogenous variables)
• The coefficients are overidentified if m > k; exactly
identified if m = k; and underidentified if m < k.
12-34
TSLS with a single endogenous regressor
Yi = β0 + β1X1i + β2W1i + … + β1+rWri + ui

• m instruments: Z1i,…, Zm
• First stage
o Regress X1 on all the exogenous regressors: regress X1
on W1,…,Wr, Z1,…, Zm by OLS
o Compute predicted values Xˆ , i = 1,…,n
1i

• Second stage
o Regress Y on X̂ 1, W1,…, Wr by OLS
o The coefficients from this second stage regression are
the TSLS estimators, but SEs are wrong
• To get correct SEs, do this in a single step

12-35
Example: Demand for cigarettes

ln(Qicigarettes ) = β0 + β1ln( Pi cigarettes ) + β2ln(Incomei) + ui

Z1i = general sales taxi


Z2i = cigarette-specific taxi

• Endogenous variable: ln( Pi cigarettes ) (“one X”)


• Included exogenous variable: ln(Incomei) (“one W”)
• Instruments (excluded endogenous variables): general sales
tax, cigarette-specific tax (“two Zs”)
• Is the demand elasticity β1 overidentified, exactly identified,
or underidentified?

12-36
Example: Cigarette demand, one instrument
Y W X Z
. ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 8.19
Prob > F = 0.0009
R-squared = 0.4189
Root MSE = .18957

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.143375 .3723025 -3.07 0.004 -1.893231 -.3935191
lperinc | .214515 .3117467 0.69 0.495 -.413375 .842405
_cons | 9.430658 1.259392 7.49 0.000 6.894112 11.9672
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors
as instruments – slightly different
terminology than we have been using
------------------------------------------------------------------------------
• Running IV as a single command yields correct SEs
• Use , r for heteroskedasticity-robust SEs
12-37
Example: Cigarette demand, two instruments
Y W X Z1 Z2
. ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 16.17
Prob > F = 0.0000
R-squared = 0.4294
Root MSE = .18786

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.277424 .2496099 -5.12 0.000 -1.780164 -.7746837
lperinc | .2804045 .2538894 1.10 0.275 -.230955 .7917641
_cons | 9.894955 .9592169 10.32 0.000 7.962993 11.82692
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as “instruments” – slightly different
terminology than we have been using
------------------------------------------------------------------------------

12-38
TSLS estimates, Z = sales tax (m = 1)
·Q cigarettes ) = 9.43 – 1.14 ln(
ln( ·P cigarettes ) + 0.21ln(Income )
i i i

(1.26) (0.37) (0.31)

TSLS estimates, Z = sales tax, cig-only tax (m = 2)


·Q cigarettes ) = 9.89 – 1.28 ln(
ln( ·P cigarettes ) + 0.28ln(Income )
i i i

(0.96) (0.25) (0.25)

• Smaller SEs for m = 2. Using 2 instruments gives more


information – more “as-if random variation”.
• Low income elasticity (not a luxury good); income elasticity
not statistically significantly different from 0
• Surprisingly high price elasticity

12-39
The General Instrument Validity Assumptions

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui


(1) Instrument exogeneity: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0
(2) Instrument relevance: General case, multiple X’s
Suppose the second stage regression could be run using
the predicted values from the population first stage
regression. Then: there is no perfect multicollinearity in
this (infeasible) second stage regression.
• Multicollinearity interpretation…
• Special case of one X: the general assumption is
equivalent to (a) at least one instrument must enter the
population counterpart of the first stage regression, and
(b) the W’s are not perfectly multicollinear.
12-40
The IV Regression Assumptions

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui


1. E(ui|W1i,…,Wri) = 0
• #1 says “the exogenous regressors are exogenous.”
2. (Yi,X1i,…,Xki,W1i,…,Wri,Z1i,…,Zmi) are i.i.d.
• #2 is not new
3. The X’s, W’s, Z’s, and Y have nonzero, finite 4th moments
• #3 is not new
4. The instruments (Z1i,…,Zmi) are valid.
• We have discussed this

• Under 1-4, TSLS and its t-statistic are normally distributed


• The critical requirement is that the instruments be valid…
12-41
Checking Instrument Validity

Recall the two requirements for valid instruments:


1. Relevance (special case of one X)
At least one instrument must enter the population
counterpart of the first stage regression.
2. Exogeneity
All the instruments must be uncorrelated with the error
term: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0

What happens if one of these requirements isn’t satisfied?


How can you check? What do you do?
If you have multiple instruments, which should you use?

12-42
Checking Assumption #1: Instrument Relevance

We will focus on a single included endogenous regressor:


Yi = β0 + β1Xi + β2W1i + … + β1+rWri + ui

First stage regression:


Xi = π0 + π1Z1i +…+ πmZmi + πm+1W1i +…+ πm+kWki + ui

• The instruments are relevant if at least one of π1,…,πm are


nonzero.
• The instruments are said to be weak if all the π1,…,πm are
either zero or nearly zero.
• Weak instruments explain very little of the variation in X,
beyond that explained by the W’s
12-43
What are the consequences of weak instruments?

If instruments are weak, the sampling distribution of TSLS


and its t-statistic are not (at all) normal, even with n large.
Consider the simplest case:
Yi = β0 + β1Xi + ui
Xi = π0 + π1Zi + ui
ˆ TSLS sYZ
• The IV estimator is β1 =
s XZ
• If cov(X,Z) is zero or small, then sXZ will be small: With
weak instruments, the denominator is nearly zero.
• If so, the sampling distribution of βˆ TSLS (and its t-statistic) is
1

not well approximated by its large-n normal


approximation…
12-44
An example: the sampling distribution of the TSLS
t-statistic with weak instruments

Dark line = irrelevant instruments


Dashed light line = strong instruments
12-45
Why does our trusty normal approximation fail us?
ˆ TSLS sYZ
β1 =
s XZ
• If cov(X,Z) is small, small changes in sXZ (from one sample
to the next) can induce big changes in βˆ TSLS
1

• Suppose in one sample you calculate sXZ = .00001...


• Thus the large-n normal approximation is a poor
approximation to the sampling distribution of βˆ TSLS
1

• A better approximation is that βˆ1TSLS is distributed as the


ratio of two correlated normal random variables (see SW
App. 12.4)
• If instruments are weak, the usual methods of inference are
unreliable – potentially very unreliable.

12-46
Measuring the strength of instruments in practice:
The first-stage F-statistic

• The first stage regression (one X):


Regress X on Z1,..,Zm,W1,…,Wk.
• Totally irrelevant instruments ⇔ all the coefficients on
Z1,…,Zm are zero.
• The first-stage F-statistic tests the hypothesis that Z1,…,Zm
do not enter the first stage regression.
• Weak instruments imply a small first stage F-statistic.

12-47
Checking for weak instruments with a single X

• Compute the first-stage F-statistic.


Rule-of-thumb: If the first stage F-statistic is less than
10, then the set of instruments is weak.
• If so, the TSLS estimator will be biased, and statistical
inferences (standard errors, hypothesis tests, confidence
intervals) can be misleading.
• Note that simply rejecting the null hypothesis that the
coefficients on the Z’s are zero isn’t enough – you actually
need substantial predictive content for the normal
approximation to be a good one.
• There are more sophisticated things to do than just compare
F to 10 but they are beyond this course.
12-48
What to do if you have weak instruments?

• Get better instruments (!)

• If you have many instruments, some are probably weaker


than others and it’s a good idea to drop the weaker ones
(dropping an irrelevant instrument will increase the first-
stage F)

12-49
Estimation with weak instruments

• There are no consistent estimators if instruments are weak


or irrelevant.
• However, some estimators have a distribution more
centered around β1 than does TSLS
• One such estimator is the limited information maximum
likelihood estimator (LIML)
• The LIML estimator
o can be derived as a maximum likelihood estimator

12-50
Checking Assumption #2: Instrument Exogeneity

• Instrument exogeneity: All the instruments are


uncorrelated with the error term: corr(Z1i,ui) = 0,…,
corr(Zmi,ui) = 0
• If the instruments are correlated with the error term, the
first stage of TSLS doesn’t successfully isolate a
component of X that is uncorrelated with the error term, so
X̂ is correlated with u and TSLS is inconsistent.
• If there are more instruments than endogenous regressors,
it is possible to test – partially – for instrument
exogeneity.

12-51
Testing overidentifying restrictions

Consider the simplest case:


Yi = β0 + β1Xi + ui,

• Suppose there are two valid instruments: Z1i, Z2i


• Then you could compute two separate TSLS estimates.
• Intuitively, if these 2 TSLS estimates are very different
from each other, then something must be wrong: one or the
other (or both) of the instruments must be invalid.
• The J-test of overidentifying restrictions makes this
comparison in a statistically precise way.
• This can only be done if #Z’s > #X’s (overidentified).

12-52
Suppose #instruments = m > # X’s = k (overidentified)
Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

The J-test of overidentifying restrictions


The J-test is the Anderson-Rubin test, using the TSLS
estimator instead of the hypothesized value β1,0. The recipe:
1. First estimate the equation of interest using TSLS and all
m instruments; compute the predicted values Yˆ , using the
i

actual X’s (not the X̂ ’s used to estimate the second stage)


2. Compute the residuals uˆi = Yi – Yˆi
3. Regress uˆi against Z1i,…,Zmi, W1i,…,Wri
4. Compute the F-statistic testing the hypothesis that the
coefficients on Z1i,…,Zmi are all zero;
5. The J-statistic is J = mF
12-53
J = mF, where F = the F-statistic testing the coefficients
on Z1i,…,Zmi in a regression of the TSLS residuals against
Z1i,…,Zmi, W1i,…,Wri.

Distribution of the J-statistic


• Under the null hypothesis that all the instruments are
exogeneous, J has a chi-squared distribution with m–k
degrees of freedom
• If m = k, J = 0 (does this make sense?)
• If some instruments are exogenous and others are
endogenous, the J statistic will be large, and the null
hypothesis that all instruments are exogenous will be
rejected.

12-54
Checking Instrument Validity: Summary

The two requirements for valid instruments:

1. Relevance (special case of one X)


• At least one instrument must enter the population
counterpart of the first stage regression.
• If instruments are weak, then the TSLS estimator is biased
and the and t-statistic has a non-normal distribution
• To check for weak instruments with a single included
endogenous regressor, check the first-stage F
o If F>10, instruments are strong – use TSLS
o If F<10, weak instruments – take some action

12-55
2. Exogeneity

• All the instruments must be uncorrelated with the error


term: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0
• We can partially test for exogeneity: if m>1, we can test
the hypothesis that all are exogenous, against the
alternative that as many as m–1 are endogenous
(correlated with u)
• The test is the J-test, constructed using the TSLS
residuals.

12-56

You might also like