0% found this document useful (0 votes)

8 views56 pages

IV Notes1-2

The document discusses Instrumental Variables (IV) regression, which addresses biases in estimating causal relationships, specifically omitted variable bias, simultaneous causality bias, and errors-in-variables bias. It outlines the Two Stage Least Squares (TSLS) method for estimating parameters using a valid instrument that meets relevance and exogeneity conditions. Examples illustrate the application of IV regression in contexts like demand for butter and cigarette consumption, emphasizing the importance of valid instruments for accurate estimation.

Uploaded by

a.irem2010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views56 pages

IV Notes1-2

Uploaded by

a.irem2010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Instrumental Variable

Instrumental Variables Regression

Three important threats to internal validity are:

• omitted variable bias from a variable that is correlated
with X but is unobserved, so cannot be included in the
regression;
• simultaneous causality bias (X causes Y, Y causes X);
• errors-in-variables bias (X is measured with error)

Instrumental variables regression can eliminate bias when

E(u|X) ≠ 0 – using an instrumental variable, Z

12-2
IV Regression with One Regressor and One Instrument

Yi = β0 + β1Xi + ui

• IV regression breaks X into two parts: a part that might be

correlated with u, and a part that is not. By isolating the
part that is not correlated with u, it is possible to estimate
β1.
• This is done using an instrumental variable, Zi, which is
uncorrelated with ui.
• The instrumental variable detects movements in Xi that are
uncorrelated with ui, and uses these to estimate β1.

12-3
Terminology: endogeneity and exogeneity

An endogenous variable is one that is correlated with u

An exogenous variable is one that is uncorrelated with u

Historical note: “Endogenous” literally means

“determined within the system,” that is, a variable that is
jointly determined with Y, that is, a variable subject to
simultaneous causality. However, this definition is
narrow and IV regression can be used to address OV bias
and errors-in-variable bias, not just to simultaneous
causality bias.

12-4
Two conditions for a valid instrument

Yi = β0 + β1Xi + ui

For an instrumental variable (an “instrument”) Z to be valid,

it must satisfy two conditions:
1. Instrument relevance: corr(Zi,Xi) ≠ 0
2. Instrument exogeneity: corr(Zi,ui) = 0

Suppose for now that you have such a Zi (we’ll discuss how
to find instrumental variables later).

How can you use Zi to estimate β1?

12-5
The IV Estimator, one X and one Z

Explanation #1: Two Stage Least Squares (TSLS)

As it sounds, TSLS has two stages – two regressions:
(1) First isolates the part of X that is uncorrelated with u:
regress X on Z using OLS

Xi = π0 + π1Zi + vi (1)

• Because Zi is uncorrelated with ui, π0 + π1Zi is

uncorrelated with ui. We don’t know π0 or π1 but we
have estimated them, so…
• Compute the predicted values of Xi, Xˆ i , where Xˆ i = πˆ0
+ πˆ1 Zi, i = 1,…,n.
12-6
Two Stage Least Squares

(2) Replace Xi by Xˆ i in the regression of interest:

regress Y on Xˆ using OLS:
i

Yi = β0 + β1 Xˆ i + ui (2)

• Because Xˆ i is uncorrelated with ui (if n is large), the first

least squares assumption holds (if n is large)
• Thus β1 can be estimated by OLS using regression (2)
• This argument relies on large samples (so π0 and π1 are well
estimated using regression (1))
• This the resulting estimator is called the Two Stage Least
Squares (TSLS) estimator, βˆ TSLS .
1

12-7
Two Stage Least Squares

Suppose you have a valid instrument, Zi.

Stage 1: Regress Xi on Zi, obtain the predicted values Xˆ i

Stage 2: Regress Yi on Xˆ i ; the coefficient on Xˆ i is

the TSLS estimator, βˆ TSLS .
1

βˆ1TSLS is a consistent estimator of β1.

12-8
The IV Estimator, one X and one Z
Explanation #2: a little algebra…

Yi = β0 + β1Xi + ui
Thus,
cov(Yi,Zi) = cov(β0 + β1Xi + ui,Zi)
= cov(β0,Zi) + cov(β1Xi,Zi) + cov(ui,Zi)
= 0 + cov(β1Xi,Zi) + 0
= β1cov(Xi,Zi)

where cov(ui,Zi) = 0 (instrument exogeneity); thus

cov(Yi , Z i )
β1 =
cov( X i , Z i )
12-9
The IV Estimator, one X and one Z

cov(Yi , Z i )
β1 =
cov( X i , Z i )

The IV estimator replaces these population covariances with

sample covariances:

ˆ TSLS sYZ
β1 = ,
s XZ

sYZ and sXZ are the sample covariances. This is the TSLS
estimator – just a different derivation!

12-10
Consistency of the TSLS estimator

ˆ TSLS sYZ
β1 =
s XZ

p
The sample covariances are consistent: sYZ → cov(Y,Z) and
p
sXZ → cov(X,Z). Thus,

sYZ p cov(Y , Z )
βˆ1TSLS = → = β1
s XZ cov( X , Z )

• The instrument relevance condition, cov(X,Z) ≠ 0, ensures

that you don’t divide by zero.
12-11
Example #1: Supply and demand for butter

IV regression was originally developed to estimate demand

elasticities for agricultural goods, for example butter:

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

• β1 = price elasticity of butter = percent change in quantity

for a 1% change in price (recall log-log specification
discussion)
• Data: observations on price and quantity of butter for
different years
• The OLS regression of ln(Qibutter ) on ln( Pi butter ) suffers from
simultaneous causality bias (why?)
12-12
Simultaneous causality bias in the OLS regression of
ln(Qibutter ) on ln( Pi butter ) arises because price and quantity are
determined by the interaction of demand and supply

12-13
This interaction of demand and supply produces…

Would a regression using these data produce the demand

curve?

12-14
But…what would you get if only supply shifted?

• TSLS estimates the demand curve by isolating shifts in

price and quantity that arise from shifts in supply.
• Z is a variable that shifts supply but not demand.
12-15
TSLS in the supply-demand example:

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

Let Z = rainfall in dairy-producing regions.

Is Z a valid instrument?
(1) Exogenous? corr(raini,ui) = 0?
Plausibly: whether it rains in dairy-producing regions
shouldn’t affect demand
(2) Relevant? corr(raini,ln( Pi butter )) ≠ 0?
Plausibly: insufficient rainfall means less grazing
means less butter

12-16
TSLS in the supply-demand example, ctd.

ln(Qibutter ) = β0 + β1ln( Pi butter ) + ui

Zi = raini = rainfall in dairy-producing regions.

· P butter )
Stage 1: regress ln( Pi butter ) on rain, get ln( i
· P butter ) isolates changes in log price that arise from
ln( i

supply (part of supply, at least)

· P butter )
Stage 2: regress ln(Qibutter ) on ln( i

The regression counterpart of using shifts in the supply

curve to trace out the demand curve.
12-17
Example #2: Test scores and class size

• The California regressions still could have OV bias (e.g.

parental involvement).
• This bias could be eliminated by using IV regression
(TSLS).
• IV regression requires a valid instrument, that is, an
instrument that is:
(1) relevant: corr(Zi,STRi) ≠ 0
(2) exogenous: corr(Zi,ui) = 0

12-18
Example #2: Test scores and class size
Here is a (hypothetical) instrument:
• some districts, randomly hit by an earthquake, “double up”
classrooms:
Zi = Quakei = 1 if hit by quake, = 0 otherwise
• Do the two conditions for a valid instrument hold?
• The earthquake makes it as if the districts were in a random
assignment experiment. Thus the variation in STR arising
from the earthquake is exogenous.
• The first stage of TSLS regresses STR against Quake,
thereby isolating the part of STR that is exogenous (the part
that is “as if” randomly assigned)

12-19
Inference using TSLS

• In large samples, the sampling distribution of the TSLS

estimator is normal
• Inference (hypothesis tests, confidence intervals) proceeds
in the usual way, e.g. ± 1.96SE
• The idea behind the large-sample normal distribution of the
TSLS estimator is that – like all the other estimators we
have considered – it involves an average of mean zero i.i.d.
random variables, to which we can apply the CLT.
• Here is a sketch of the math

12-20
1 n
sYZ ∑
n − 1 i =1
(Yi − Y )( Z i − Z )
ˆ TSLS
β1 = =
s XZ 1 n
∑
n − 1 i =1
( X i − X )( Z i − Z )
n

∑Y ( Z
i =1
i i − Z)
= n

∑ X (Z
i =1
i i − Z)

Substitute in Yi = β0 + β1Xi + ui and simplify:

n n
β1 ∑ X i ( Z i − Z ) + ∑ ui ( Z i − Z )
βˆ1TSLS = i =1
n
i =1

∑ X (Z
i =1
i i − Z)

so…

12-21
n

∑u (Z i i − Z)
βˆ1TSLS = β1 + i =1
n
.
∑ X (Z
i =1
i i − Z)
n

∑u (Z i i − Z)
so βˆ1TSLS – β1 = i =1
n

∑ X (Z
i =1
i i − Z)

Multiply through by n :
1 n
∑
n i =1
( Z i − Z )ui
n ( βˆ1TSLS – β1) =
1 n
∑
n i =1
X i ( Zi − Z )

12-22
1 n
∑
n i =1
( Z i − Z )ui
n ( βˆ1TSLS – β1) =
1 n
∑
n i =1
X i ( Zi − Z )

1 n 1 n p
• ∑ X i ( Z i − Z ) = ∑ ( X i − X )( Z i − Z ) → cov(X,Z) ≠ 0
n i =1 n i =1
n
1
•
n
∑ (Z
i =1
i − Z )ui is dist’d N(0,var[(Z–µZ)u]) (CLT)

so: βˆ1TSLS is approx. distributed N(β1,σ β2ˆ TSLS ),

2 1 var[( Z i − µZ )ui ]
where σ βˆ TSLS
= 2
.
1
n [cov( Z i , X i )]
where cov(X,Z) ≠ 0 because the instrument is relevant

12-23
Inference using TSLS
βˆ1TSLS is approx. distributed N(β1,σ β2ˆ TSLS ),
1

• Statistical inference proceeds in the usual way.

• The justification is (as usual) based on large samples
• This all assumes that the instruments are valid – we’ll
discuss what happens if they aren’t valid shortly.
• Important note on standard errors:
o The OLS standard errors from the second stage
regression aren’t right – they don’t take into account the
estimation in the first stage ( Xˆ is estimated).
i

o Instead, use a single specialized command that computes

the TSLS estimator and the correct SEs.
o as usual, use heteroskedasticity-robust SEs

12-24
Example: Cigarette demand

ln(Qicigarettes ) = β0 + β1ln( Pi cigarettes ) + ui

Panel data:
• Annual cigarette consumption and average prices paid
(including tax)
• 48 continental US states, 1985-1995
Proposed instrumental variable:
• Zi = general sales tax per pack in the state = SalesTaxi
• Is this a valid instrument?
(1) Relevant? corr(SalesTaxi, ln( Pi cigarettes )) ≠ 0?
(2) Exogenous? corr(SalesTaxi,ui) = 0?

12-25
Cigarette demand

For now, use data from 1995 only.

First stage OLS regression:
·P cigarettes ) = 4.63 + .031SalesTax , n = 48
ln( i i

Second stage OLS regression:

·Q cigarettes ) = 9.72 – 1.08 ln(
ln( ·P cigarettes ) , n = 48
i i

Combined regression with correct, heteroskedasticity-robust

standard errors:
·Q cigarettes ) = 9.72 – 1.08 ln(
ln( ·P cigarettes ) , n = 48
i i

(1.53) (0.32)
12-26
STATA Example: Cigarette demand, First stage
Instrument = Z = rtaxso = general sales tax (real $/pack)

X Z
. reg lravgprs rtaxso if year==1995, r;

Regression with robust standard errors Number of obs = 48

F( 1, 46) = 40.39
Prob > F = 0.0000
R-squared = 0.4710
Root MSE = .09394

------------------------------------------------------------------------------
| Robust
lravgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rtaxso | .0307289 .0048354 6.35 0.000 .0209956 .0404621
_cons | 4.616546 .0289177 159.64 0.000 4.558338 4.674755
------------------------------------------------------------------------------

X-hat
. predict lravphat; Now we have the predicted values from the 1st stage

12-27
Second stage
Y X-hat
. reg lpackpc lravphat if year==1995, r;

Regression with robust standard errors Number of obs = 48

F( 1, 46) = 10.54
Prob > F = 0.0022
R-squared = 0.1525
Root MSE = .22645

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------
lravphat | -1.083586 .3336949 -3.25 0.002 -1.755279 -.4118932
_cons | 9.719875 1.597119 6.09 0.000 6.505042 12.93471
------------------------------------------------------------------------------

• These coefficients are the TSLS estimates

• The standard errors are wrong because they ignore the fact
that the first stage was estimated

12-28
Combined into a single command:
Y X Z
. ivreg lpackpc (lravgprs = rtaxso) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48

F( 1, 46) = 11.54
Prob > F = 0.0014
R-squared = 0.4011
Root MSE = .19035

OK, the change in the SEs was small this time...but not always!

·Q cigarettes ) = 9.72 – 1.08 ln(

ln( ·P cigarettes ) , n = 48
i i

(1.53) (0.32)
12-29
Summary of IV Regression with a Single X and Z

• A valid instrument Z must satisfy two conditions:

(1) relevance: corr(Zi,Xi) ≠ 0
(2) exogeneity: corr(Zi,ui) = 0
• TSLS proceeds by first regressing X on Z to get X̂ , then
regressing Y on X̂ .
• The key idea is that the first stage isolates part of the
variation in X that is uncorrelated with u
• If the instrument is valid, then the large-sample sampling
distribution of the TSLS estimator is normal, so inference
proceeds as usual

12-30
The General IV Regression Model

• So far we have considered IV regression with a single

endogenous regressor (X) and a single instrument (Z).
• We need to extend this to:
o multiple endogenous regressors (X1,…,Xk)
o multiple included exogenous variables (W1,…,Wr)
These need to be included for the usual OV reason
o multiple instrumental variables (Z1,…,Zm)
More (relevant) instruments can produce a smaller
variance of TSLS: the R2 of the first stage increases,
so you have more variation in X̂ .
• Terminology: identification & overidentification

12-31
Identification

• In general, a parameter is said to be identified if different

values of the parameter would produce different
distributions of the data.
• In IV regression, whether the coefficients are identified
depends on the relation between the number of instruments
(m) and the number of endogenous regressors (k)
• Intuitively, if there are fewer instruments than endogenous
regressors, we can’t estimate β1,…,βk
o For example, suppose k = 1 but m = 0 (no instruments)!

12-32
Identification, ctd.

The coefficients β1,…, βk are said to be:

• exactly identified if m = k.
There are just enough instruments to estimate β1,…,βk.
• overidentified if m > k.
There are more than enough instruments to estimate
β1,…,βk. If so, you can test whether the instruments are
valid (a test of the “overidentifying restrictions”) – we’ll
return to this later
• underidentified if m < k.
There are too few instruments to estimate β1,…,βk. If so,
you need to get more instruments!

12-33
The general IV regression model: Summary of jargon

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

• Yi is the dependent variable

• X1i,…, Xki are the endogenous regressors (potentially
correlated with ui)
• W1i,…,Wri are the included exogenous variables or
included exogenous regressors (uncorrelated with ui)
• β0, β1,…, βk+r are the unknown regression coefficients
• Z1i,…,Zmi are the m instrumental variables (the excluded
exogenous variables)
• The coefficients are overidentified if m > k; exactly
identified if m = k; and underidentified if m < k.
12-34
TSLS with a single endogenous regressor
Yi = β0 + β1X1i + β2W1i + … + β1+rWri + ui

• m instruments: Z1i,…, Zm
• First stage
o Regress X1 on all the exogenous regressors: regress X1
on W1,…,Wr, Z1,…, Zm by OLS
o Compute predicted values Xˆ , i = 1,…,n
1i

• Second stage
o Regress Y on X̂ 1, W1,…, Wr by OLS
o The coefficients from this second stage regression are
the TSLS estimators, but SEs are wrong
• To get correct SEs, do this in a single step

12-35
Example: Demand for cigarettes

ln(Qicigarettes ) = β0 + β1ln( Pi cigarettes ) + β2ln(Incomei) + ui

Z1i = general sales taxi

Z2i = cigarette-specific taxi

• Endogenous variable: ln( Pi cigarettes ) (“one X”)

• Included exogenous variable: ln(Incomei) (“one W”)
• Instruments (excluded endogenous variables): general sales
tax, cigarette-specific tax (“two Zs”)
• Is the demand elasticity β1 overidentified, exactly identified,
or underidentified?

12-36
Example: Cigarette demand, one instrument
Y W X Z
. ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48

F( 2, 45) = 8.19
Prob > F = 0.0009
R-squared = 0.4189
Root MSE = .18957

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.143375 .3723025 -3.07 0.004 -1.893231 -.3935191
lperinc | .214515 .3117467 0.69 0.495 -.413375 .842405
_cons | 9.430658 1.259392 7.49 0.000 6.894112 11.9672
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors
as instruments – slightly different
terminology than we have been using
------------------------------------------------------------------------------
• Running IV as a single command yields correct SEs
• Use , r for heteroskedasticity-robust SEs
12-37
Example: Cigarette demand, two instruments
Y W X Z1 Z2
. ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48

F( 2, 45) = 16.17
Prob > F = 0.0000
R-squared = 0.4294
Root MSE = .18786

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.277424 .2496099 -5.12 0.000 -1.780164 -.7746837
lperinc | .2804045 .2538894 1.10 0.275 -.230955 .7917641
_cons | 9.894955 .9592169 10.32 0.000 7.962993 11.82692
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as “instruments” – slightly different
terminology than we have been using
------------------------------------------------------------------------------

12-38
TSLS estimates, Z = sales tax (m = 1)
·Q cigarettes ) = 9.43 – 1.14 ln(
ln( ·P cigarettes ) + 0.21ln(Income )
i i i

(1.26) (0.37) (0.31)

TSLS estimates, Z = sales tax, cig-only tax (m = 2)

·Q cigarettes ) = 9.89 – 1.28 ln(
ln( ·P cigarettes ) + 0.28ln(Income )
i i i

(0.96) (0.25) (0.25)

• Smaller SEs for m = 2. Using 2 instruments gives more

information – more “as-if random variation”.
• Low income elasticity (not a luxury good); income elasticity
not statistically significantly different from 0
• Surprisingly high price elasticity

12-39
The General Instrument Validity Assumptions

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

(1) Instrument exogeneity: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0
(2) Instrument relevance: General case, multiple X’s
Suppose the second stage regression could be run using
the predicted values from the population first stage
regression. Then: there is no perfect multicollinearity in
this (infeasible) second stage regression.
• Multicollinearity interpretation…
• Special case of one X: the general assumption is
equivalent to (a) at least one instrument must enter the
population counterpart of the first stage regression, and
(b) the W’s are not perfectly multicollinear.
12-40
The IV Regression Assumptions

Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

1. E(ui|W1i,…,Wri) = 0
• #1 says “the exogenous regressors are exogenous.”
2. (Yi,X1i,…,Xki,W1i,…,Wri,Z1i,…,Zmi) are i.i.d.
• #2 is not new
3. The X’s, W’s, Z’s, and Y have nonzero, finite 4th moments
• #3 is not new
4. The instruments (Z1i,…,Zmi) are valid.
• We have discussed this

• Under 1-4, TSLS and its t-statistic are normally distributed

• The critical requirement is that the instruments be valid…
12-41
Checking Instrument Validity

Recall the two requirements for valid instruments:

1. Relevance (special case of one X)
At least one instrument must enter the population
counterpart of the first stage regression.
2. Exogeneity
All the instruments must be uncorrelated with the error
term: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0

What happens if one of these requirements isn’t satisfied?

How can you check? What do you do?
If you have multiple instruments, which should you use?

12-42
Checking Assumption #1: Instrument Relevance

We will focus on a single included endogenous regressor:

Yi = β0 + β1Xi + β2W1i + … + β1+rWri + ui

First stage regression:

Xi = π0 + π1Z1i +…+ πmZmi + πm+1W1i +…+ πm+kWki + ui

• The instruments are relevant if at least one of π1,…,πm are

nonzero.
• The instruments are said to be weak if all the π1,…,πm are
either zero or nearly zero.
• Weak instruments explain very little of the variation in X,
beyond that explained by the W’s
12-43
What are the consequences of weak instruments?

If instruments are weak, the sampling distribution of TSLS

and its t-statistic are not (at all) normal, even with n large.
Consider the simplest case:
Yi = β0 + β1Xi + ui
Xi = π0 + π1Zi + ui
ˆ TSLS sYZ
• The IV estimator is β1 =
s XZ
• If cov(X,Z) is zero or small, then sXZ will be small: With
weak instruments, the denominator is nearly zero.
• If so, the sampling distribution of βˆ TSLS (and its t-statistic) is
1

not well approximated by its large-n normal

approximation…
12-44
An example: the sampling distribution of the TSLS
t-statistic with weak instruments

Dark line = irrelevant instruments

Dashed light line = strong instruments
12-45
Why does our trusty normal approximation fail us?
ˆ TSLS sYZ
β1 =
s XZ
• If cov(X,Z) is small, small changes in sXZ (from one sample
to the next) can induce big changes in βˆ TSLS
1

• Suppose in one sample you calculate sXZ = .00001...

• Thus the large-n normal approximation is a poor
approximation to the sampling distribution of βˆ TSLS
1

• A better approximation is that βˆ1TSLS is distributed as the

ratio of two correlated normal random variables (see SW
App. 12.4)
• If instruments are weak, the usual methods of inference are
unreliable – potentially very unreliable.

12-46
Measuring the strength of instruments in practice:
The first-stage F-statistic

• The first stage regression (one X):

Regress X on Z1,..,Zm,W1,…,Wk.
• Totally irrelevant instruments ⇔ all the coefficients on
Z1,…,Zm are zero.
• The first-stage F-statistic tests the hypothesis that Z1,…,Zm
do not enter the first stage regression.
• Weak instruments imply a small first stage F-statistic.

12-47
Checking for weak instruments with a single X

• Compute the first-stage F-statistic.

Rule-of-thumb: If the first stage F-statistic is less than
10, then the set of instruments is weak.
• If so, the TSLS estimator will be biased, and statistical
inferences (standard errors, hypothesis tests, confidence
intervals) can be misleading.
• Note that simply rejecting the null hypothesis that the
coefficients on the Z’s are zero isn’t enough – you actually
need substantial predictive content for the normal
approximation to be a good one.
• There are more sophisticated things to do than just compare
F to 10 but they are beyond this course.
12-48
What to do if you have weak instruments?

• Get better instruments (!)

• If you have many instruments, some are probably weaker

than others and it’s a good idea to drop the weaker ones
(dropping an irrelevant instrument will increase the first-
stage F)

12-49
Estimation with weak instruments

• There are no consistent estimators if instruments are weak

or irrelevant.
• However, some estimators have a distribution more
centered around β1 than does TSLS
• One such estimator is the limited information maximum
likelihood estimator (LIML)
• The LIML estimator
o can be derived as a maximum likelihood estimator

12-50
Checking Assumption #2: Instrument Exogeneity

• Instrument exogeneity: All the instruments are

uncorrelated with the error term: corr(Z1i,ui) = 0,…,
corr(Zmi,ui) = 0
• If the instruments are correlated with the error term, the
first stage of TSLS doesn’t successfully isolate a
component of X that is uncorrelated with the error term, so
X̂ is correlated with u and TSLS is inconsistent.
• If there are more instruments than endogenous regressors,
it is possible to test – partially – for instrument
exogeneity.

12-51
Testing overidentifying restrictions

Consider the simplest case:

Yi = β0 + β1Xi + ui,

• Suppose there are two valid instruments: Z1i, Z2i

• Then you could compute two separate TSLS estimates.
• Intuitively, if these 2 TSLS estimates are very different
from each other, then something must be wrong: one or the
other (or both) of the instruments must be invalid.
• The J-test of overidentifying restrictions makes this
comparison in a statistically precise way.
• This can only be done if #Z’s > #X’s (overidentified).

12-52
Suppose #instruments = m > # X’s = k (overidentified)
Yi = β0 + β1X1i + … + βkXki + βk+1W1i + … + βk+rWri + ui

The J-test of overidentifying restrictions

The J-test is the Anderson-Rubin test, using the TSLS
estimator instead of the hypothesized value β1,0. The recipe:
1. First estimate the equation of interest using TSLS and all
m instruments; compute the predicted values Yˆ , using the
i

actual X’s (not the X̂ ’s used to estimate the second stage)

2. Compute the residuals uî = Yi – Yî
3. Regress uî against Z1i,…,Zmi, W1i,…,Wri
4. Compute the F-statistic testing the hypothesis that the
coefficients on Z1i,…,Zmi are all zero;
5. The J-statistic is J = mF
12-53
J = mF, where F = the F-statistic testing the coefficients
on Z1i,…,Zmi in a regression of the TSLS residuals against
Z1i,…,Zmi, W1i,…,Wri.

Distribution of the J-statistic

• Under the null hypothesis that all the instruments are
exogeneous, J has a chi-squared distribution with m–k
degrees of freedom
• If m = k, J = 0 (does this make sense?)
• If some instruments are exogenous and others are
endogenous, the J statistic will be large, and the null
hypothesis that all instruments are exogenous will be
rejected.

12-54
Checking Instrument Validity: Summary

The two requirements for valid instruments:

1. Relevance (special case of one X)

• At least one instrument must enter the population
counterpart of the first stage regression.
• If instruments are weak, then the TSLS estimator is biased
and the and t-statistic has a non-normal distribution
• To check for weak instruments with a single included
endogenous regressor, check the first-stage F
o If F>10, instruments are strong – use TSLS
o If F<10, weak instruments – take some action

12-55
2. Exogeneity

• All the instruments must be uncorrelated with the error

term: corr(Z1i,ui) = 0,…, corr(Zmi,ui) = 0
• We can partially test for exogeneity: if m>1, we can test
the hypothesis that all are exogenous, against the
alternative that as many as m–1 are endogenous
(correlated with u)
• The test is the J-test, constructed using the TSLS
residuals.

12-56

Introduction To Econometrics - Stock & Watson - CH 10 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 10 Slides
99 pages
Lecture Set 7
No ratings yet
Lecture Set 7
88 pages
Instrumental Variables Regression Guide
No ratings yet
Instrumental Variables Regression Guide
63 pages
Class 7 After
No ratings yet
Class 7 After
23 pages
Instrumental Variables Regression Guide
No ratings yet
Instrumental Variables Regression Guide
7 pages
Topic 3 Instrumental Variables Regression (Part 1 Basics)
No ratings yet
Topic 3 Instrumental Variables Regression (Part 1 Basics)
26 pages
Inst Va Reg
No ratings yet
Inst Va Reg
37 pages
ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)
No ratings yet
ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)
33 pages
LN11 Handout
No ratings yet
LN11 Handout
16 pages
Cathy Econ0019 - w2
No ratings yet
Cathy Econ0019 - w2
62 pages
Econometrics: Instrumental Variables
No ratings yet
Econometrics: Instrumental Variables
21 pages
Chapter 15
No ratings yet
Chapter 15
38 pages
Instrumental Variables: Ani Katchova
No ratings yet
Instrumental Variables: Ani Katchova
27 pages
15 Instrumental Variables
No ratings yet
15 Instrumental Variables
27 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
Instrumental Variables Regression
No ratings yet
Instrumental Variables Regression
20 pages
Lecture 12 Instrumental Variables
No ratings yet
Lecture 12 Instrumental Variables
5 pages
Notes 11
No ratings yet
Notes 11
9 pages
Endogeneity 6
No ratings yet
Endogeneity 6
16 pages
Endogeneity and IV Estimation
No ratings yet
Endogeneity and IV Estimation
27 pages
Key Expressions & Concepts
No ratings yet
Key Expressions & Concepts
5 pages
Econometrics for Advanced Students
No ratings yet
Econometrics for Advanced Students
73 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
Lectures On IV Estimation: 1 General Set-UP
No ratings yet
Lectures On IV Estimation: 1 General Set-UP
7 pages
CH 10 Quiz
No ratings yet
CH 10 Quiz
7 pages
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
No ratings yet
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
16 pages
Endogeneity and Instrumental Variables
No ratings yet
Endogeneity and Instrumental Variables
22 pages
PEV Onesided
No ratings yet
PEV Onesided
322 pages
Ch. 1 - Endogeneity
No ratings yet
Ch. 1 - Endogeneity
18 pages
2SLS vs SEM: Advantages & Disadvantages
No ratings yet
2SLS vs SEM: Advantages & Disadvantages
14 pages
Lecture 2 - Instrumental Variable
No ratings yet
Lecture 2 - Instrumental Variable
18 pages
5 Ivmf
No ratings yet
5 Ivmf
13 pages
Cathy Econ0019 - w3
No ratings yet
Cathy Econ0019 - w3
44 pages
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
No ratings yet
09 Causal Inference II: MSBA7003 Quantitative Analysis Methods
34 pages
Week 10
No ratings yet
Week 10
42 pages
Econometrics: 2SLS Estimator Insights
No ratings yet
Econometrics: 2SLS Estimator Insights
4 pages
Endogeneity 4
No ratings yet
Endogeneity 4
19 pages
Instrumental Variables Regression With Weak Instruments
No ratings yet
Instrumental Variables Regression With Weak Instruments
30 pages
Tests
No ratings yet
Tests
10 pages
Chapter 1 - Instrumental Variable Method
No ratings yet
Chapter 1 - Instrumental Variable Method
32 pages
s10 IV Handout
No ratings yet
s10 IV Handout
48 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Block 4
No ratings yet
Block 4
51 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
3 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
17 pages
Inoue TWOSAMPLEINSTRUMENTALVARIABLES 2010
No ratings yet
Inoue TWOSAMPLEINSTRUMENTALVARIABLES 2010
6 pages
IV Estimation: Errors, Tests, and OLS Conditions
No ratings yet
IV Estimation: Errors, Tests, and OLS Conditions
4 pages
05 - Instrumental Variables PDF
No ratings yet
05 - Instrumental Variables PDF
92 pages
Instrumental Variable: Rus'an Nasrudin
No ratings yet
Instrumental Variable: Rus'an Nasrudin
29 pages
Aspects of Bayesian Inference
No ratings yet
Aspects of Bayesian Inference
25 pages
CH 10 TB
100% (1)
CH 10 TB
23 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Slides 5 Iu
No ratings yet
Slides 5 Iu
38 pages
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
No ratings yet
Development Economics I Dr. Elisabetta Gentile: Orientation Tutorial
11 pages
Econ433 Lecture 2
No ratings yet
Econ433 Lecture 2
57 pages
e426-2025s-Notes-9-Comparison and Evaluation of Valuation Methods
No ratings yet
e426-2025s-Notes-9-Comparison and Evaluation of Valuation Methods
63 pages
E426 2025s Notes 11 Environmental Accounting 2
No ratings yet
E426 2025s Notes 11 Environmental Accounting 2
94 pages
E426 2025s Notes 8 Non Demand Curve Approaches
No ratings yet
E426 2025s Notes 8 Non Demand Curve Approaches
102 pages
E426 2025s Notes 10 Criticism of Cost Benefit Analysis
No ratings yet
E426 2025s Notes 10 Criticism of Cost Benefit Analysis
19 pages
16EC206
No ratings yet
16EC206
2 pages
ML Exam Solutions
No ratings yet
ML Exam Solutions
6 pages
Tableau Technical Interview Questions
No ratings yet
Tableau Technical Interview Questions
41 pages
Project Report On A Study of Best Performing Scripts of Nifty
No ratings yet
Project Report On A Study of Best Performing Scripts of Nifty
63 pages
SPSS Act 1
No ratings yet
SPSS Act 1
10 pages
Course Plan MA2001D - (Monsoon 2023-24) - With Co-Po
No ratings yet
Course Plan MA2001D - (Monsoon 2023-24) - With Co-Po
3 pages
Resolving Attachment Injuries in Couples Using Emotionally Focused Therapy Steps Toward Forgiveness and Reconciliation Johnson and Makinen
No ratings yet
Resolving Attachment Injuries in Couples Using Emotionally Focused Therapy Steps Toward Forgiveness and Reconciliation Johnson and Makinen
11 pages
Probability & Statistics Course Overview
No ratings yet
Probability & Statistics Course Overview
8 pages
Choice-Based Budget Lodging Accommodation Study
No ratings yet
Choice-Based Budget Lodging Accommodation Study
14 pages
Math Study Packet A 23
No ratings yet
Math Study Packet A 23
10 pages
Feldman CH02 LecturePPT Accessible
No ratings yet
Feldman CH02 LecturePPT Accessible
64 pages
Statistics For People Who Think They Hate Statistics Excel 2010 Edition 3rd Edition by Neil J Salkind Ebook and TestBank Bundle Fast Access
No ratings yet
Statistics For People Who Think They Hate Statistics Excel 2010 Edition 3rd Edition by Neil J Salkind Ebook and TestBank Bundle Fast Access
324 pages
Model Exam - B.tech - DS - Sec B - I Yr - Sem II - M II - P&S
No ratings yet
Model Exam - B.tech - DS - Sec B - I Yr - Sem II - M II - P&S
3 pages
Probability & Expectation Basics
No ratings yet
Probability & Expectation Basics
36 pages
Validation of Nuclear Gauge Density-Meter Readings Against Sand Replacement Method
No ratings yet
Validation of Nuclear Gauge Density-Meter Readings Against Sand Replacement Method
6 pages
Marketing Research Essentials
No ratings yet
Marketing Research Essentials
13 pages
B. Com, Syllabus (NEP) For Regular and Non-Collegiate Students
No ratings yet
B. Com, Syllabus (NEP) For Regular and Non-Collegiate Students
154 pages
Community Project: Simple Linear Regression in SPSS
No ratings yet
Community Project: Simple Linear Regression in SPSS
4 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
No ratings yet
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
35 pages
Financial Management Unit 4
No ratings yet
Financial Management Unit 4
146 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Exit Exam Model Hawassa University
No ratings yet
Exit Exam Model Hawassa University
32 pages
Molina Et Al 2016
No ratings yet
Molina Et Al 2016
21 pages
Chapter11 Sampling Systematic Sampling
No ratings yet
Chapter11 Sampling Systematic Sampling
17 pages
Data - Preprocessing 1 19
No ratings yet
Data - Preprocessing 1 19
19 pages
CPM17332 - Dissartation Newww
No ratings yet
CPM17332 - Dissartation Newww
116 pages
Youth and Political Participation: What Factors Influence Them?
No ratings yet
Youth and Political Participation: What Factors Influence Them?
28 pages
Four Assumptions of Multiple Regression That Researchers Should A
No ratings yet
Four Assumptions of Multiple Regression That Researchers Should A
6 pages
Yi Wang Amended Thesis Clean Version 14-12-2018
No ratings yet
Yi Wang Amended Thesis Clean Version 14-12-2018
257 pages