Chapter 6: Limited Dependent Variable Models
6.1. The Linear Probability Model
It is among discrete choice models or dichotomous choice models. In this
case the dependent variable takes only two values: 0 and 1. There are
several methods to analyze regression models where the dependent
variable is 0 or 1. The simplest method is to use the least squares method.
In this case the model is called linear probability model.
The other method is where there is an underlying or latent variable
which we do not observe.
                                                                       6.1
This is the idea behind Logit and Probit models
In this case the variable      is an indicator variable that denotes the
occurrence and non occurrence of an event. For instance in the analysis of
the determinants of unemployment, we have data on each person that
shows whether or not the person is employed and we have some
explanatory variables that determine employment.
In regression form that is written as:
                                                                       6.2
Where,                and the conditional expectation                     ,
which is the probability that the event will occur given     .
Since      takes only two values, 0 and 1, the regression in the above
equation can take only two values,
         and
The variance of       ,                                                6.3
                                                                          1
Using OLS would result in heteroskedasticity problem.
This problem can be overcome by using the following two step
estimation procedure.
1. Estimate             using OLS                                       6.4
2. Compute               and use weighted least squares, i.e,
                                                                        6.5
Then, regress
However, the problem with this procedure (the least squares or weighted
least squares) is:
1.              may be negative
2.     are not normally distributed and there is problem with the
application of the usual tests of significance.
3. The conditional expectation             be interpreted as the probability
that the event will occur. In many cases          can lie outside the limits
     .
6.2. The Probit and Logit Models
An alternative approach is to assume the following regression model
                                                                        6.6
                                                                          2
Where              is not observed. It is commonly called a latent variable. What
we observe is a dummy variable                       defined by:
                                                                                   6.7
The Probit and Logit models differ in the specification of the distribution
of the error term
For instance, if the observed dummy variable is whether or not a person is
employed or not,                     would be defined as ‘propensity or ability to find
employment.’
Thus,
                                                                                   6.8
Where, F is the cumulative density function of
a) The Probit Model:
The cumulative standard density is given:
                                          2
                              t
                                   1  Z2
                 p Y 1           e dt  Z 
                                 2                                              6.9
Where, Z  0  1 X1   2 X 2  ...   k X k
b)The Logit Model:
The cumulative logistic function for Logit model is based on the concept
of an odds ratio.
Let the log odds that Y 1 be given by:
                 p 
             ln         0  1 X 1   2 X 2  ...   k X k
                 1 p                                                           6.10
Solving for the probability that Y 1 we will get:
                                                                                      3
                  p
                     e Z
                1 p                                                             6.11
                 P 1  P  e e  pe
                                    Z       Z               Z
                 p  pe Z e Z
                 p 1  e Z  e Z
                     eZ         1         1    1
                 p        Z         z 
                            e 1  e  e 1 1  e
                         Z                        z
                    1 e            Z
                                                                                 6.12
                                                                     Z 
The above logistic probability is simply denoted as                          .
Both Probit and Logit distributions are ‘S’ shaped, but differ in the
relative thickness of the tails. Logit is relatively thicker than Probit. This
difference would, however, disappear, as the sample size gets large.
The relationship between               can be represented as a ‘latent’
                                         Z & p Y 1
underlying index that determines choices. The latent index function, Z, is
determined in linear fashion by a set of independent variables X. In turn,
the latent index Z determines P(Y=1).
The Bernoulli trail of Probit and Logit model conditional on Z is given
by:
                                                1 Y
                 f Y / Z  PY 1  P 
                                                                  6.13
Plugging either the standard normal cumulative density function (for
Probit) or the cumulative logistic function (for Logit) into the above
function to have the appropriate probability function gives:
                                         1 Yi 
       f Yi / Z   Z  i 1   
                                Y
                                                                                 6.14
for Probit model
       f Yi / Z   Z  i 1   Z 
                                Y                1 Yi
                                                                                 6.15
for Logit model
The likelihood function for these models is given by:
                            n
                                                         1 Yi 
       L   k / Yi , X i     z  i 1   
                                     Y
                           i 1                                                  6.16
for Probit Model
                                                                                    4
                                 n                       1 Yi 
        L   k / Yi , X i     z  i 1    z 
                                             Y
                              i 1                                                    6.17
for Logit Model
The Log Likelihood function of these models is give as:
                             n
   ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
                            i 1                                                      6.18
for Probit
                                       n
         ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
   and                                i 1                                            6.19
for Logit
These functions can be optimized using standard methods to get the
parameter values.
In choosing between Probit and Logit models, there is no statistical
theory for preferring one to the other. Thus, it makes no difference which
one to choose. The two models are quite similar in large samples. But in
small samples the two models differ significantly.
However, choice between the two models can be made on convenience. It
is much easier to compute Probit probabilities (table of z statistic). Logit
is simpler mathematically.
The probability model in the form of a regression is:
        E Y / X  0  1  F   ' X   1  F   ' X 
                                                                                      6.20
                        F   ' X 
Whatever distribution is used, the parameters of the model like those of
any other nonlinear regression model, are not necessarily the marginal
effects:
                    E Y / X         dF   ' X  
                                                     
                       X               d   ' X  
                    6.21
                                         f  ' X  
            f .
Where,              is the density function that corresponds to the cumulative
                                 F ..
density distribution,
a) For the normal distribution, this is:
                                                                                         5
               E  Y / X 
                                     ' X  
                       X
6.22
where   is the standard normal density.
      .
b) For logistic distribution
               d   ' X      e ' X
                                              ' X   1     ' X 
                d   ' X  1  e  ' X 2
                                                                                6.23
               E Y / X 
                   X               ' X   1     ' X  
                             =                                                  6.24
In interpreting the estimated model, in most cases the means of the
regressors are used. In other instances, pertinent values are used based on
the choice of the researcher.
For an independent variable, say k, that is binary the marginal effect can
be computed as:
               prob  Y 1/ X * , K 1  prob  Y 1/ X * , K 0 
                                                                                6.25
Where, X * denotes the mean of all other variables in the model.
Therefore, the marginal effects can be evaluated at the sample means of
the data. Or the marginal effects can be evaluated at every observation
and the average can be computed to represent the marginal effects.
More generally, the marginal effects are give as
6.3. Estimation of Binary Choice Models
The log likelihood function for the two models is:
                   
       log L  yi log F   ' X   1  yi log 1  F   ' X      
                                                                      6.26
The first order condition with respect to the parameters of the model is be
given by:
                                                                                   6
                     log L    y f                 fi 
                              i i  1  yi             X i 0
                              Fi              1  Fi                                    6.27
                                          dFi
                                       d  ' X 
Where fi is the density           , here i indicates that the function has an
argument  ' X i .
i) For a normal distribution (Probit), the log likelihood is
                   log L   log  1     ' X i    log    ' X i 
                               yi 0                         yi 1
                                                                                              6.28
                     log L           i             
                                         Xi   i Xi
                           yi 0 1   i      yi 1  i
                                                                                              6.29
ii) For a Logit model, the log likelihood is:
                         n
ln L   k / Yi , X i    Yi ln   z   1  Yi ln 1   z 
                        i 1
6.30
 log L
          yi  i X i 0
  
6.4. Measures of Goodness of fit
When the independent variable to be measure is dichotomous, there is a
problem of using the conventional   as a measure of goodness of fit.
1. Measures based on likelihood ratios
Let       be the maximum likelihood function when maximized with
respect to all the parameters and be the maximum when maximized
with restrictions      .
                                                                                               6.31
Cragg and Uhler (1970) suggested a pseudo                                that lies between 0 and 1.
                                                                                                  7
                                                                   6.32
McFadden (1974) defined         as
                                                                   6.33
   can also be linked in terms of the proportion of correct predictions.
After computing    , we can classify the   observation as belonging to
group 1 if          and group 2 if            . We can then count the
number of correct predictions.
6.34
Count                                                              6.35
Example: Regression results of a Probit model of house ownership and
income is given below
                                                                   6.36
We want to measure the effect of a unit change in income on the
probability of owning a house
   6.37
                                                                       8
Where,              is the standard normal probability density function
evaluated at        .
At    value    of          ,    the    normal      density   function     at
                           that is equal to
                                        = 0.3066                        6.38
Now multiplying this value by the slop coefficient of income, we get
0.01485.
Logit model of owning a house
                                                                        6.39
This means for a unit increase in weighted income the weighted log of the
odds in favour of owning a house goes up by 0.08 units.
Converting into odds ratio, we take the antilog
                                                                        6.40
6.5. Maximum Likelihood Estimation
For Linear Regression Model, the MLE of a normal variable
conditional on  with mean        and variance , the pdf for an
observation is:
                                                                        6.41
The pdf of a normal variable with mean and            is often expressed in
terms of the pdf standardized normal variable           with mean 0 and
variance of 1.
                                                                           9
                                                                    6.42
Thus,                                                               6.43
The Likelihood can be written as
                                                                    6.44
6.6. Limited Dependent Variables
The density function of a normally distributed variable with mean   and
variance of   is given by:
                                                                    6.45
Where y N ¿ )
For a standard normal distribution,
                                                                    6.46
The density of a standard normal variable is
                                                                    6.47
The cumulative density function of a normal distribution is
                                                                      10
                                                                       6.48
Due to symmetry,
In limited variable models we may encounter some form of truncation.
If    has density         the distribution of   truncated from below at a
given c         is given by:
                               if    and 0 otherwise                  6.49
If y is a standard normal variable, the truncated distribution of       has
the probability:
                                                                       6.50
If the distribution is truncated from above
If    has a normal distribution with mean           and variance      , the
truncated distribution   has mean
                                                                       6.51
       Where,
And
6.6.1. Tobit (Censored Regression) Model
In certain applications, the dependent variable is continuous, but its range
may be constrained. Most commonly this occurs when the dependent
                                                                         11
variable is zero for a substantial part of the population but positive for the
rest of the population.
                                                                           6.52
      Where,
In this model all negative values are mapped to zeros. i.e. observations
are censored (from below) at zero.
The model describes two things:
1. The possibility that       given
                                                                           6.53
2. The distribution of    given that it is positive.
This is truncated normal distribution with expectation
                                                                           6.54
The last term shows the conditional expectation of a mean zero normal
variable given that it is no larger than The conditional expectation of
  no longer equals        but depends non linearly on    through       .
Marginal effects of the Tobit Model
1. The probability of a zero outcome is:
                                                                             12
      6.55
2. The expected value of    (positive values) is
                                                                       6.56
Thus the marginal effect on the expected value of     of a change in     is
given by
                                                                6.57
This means the marginal effect of a change in      upon the expected
outcome        is given by the model’s coefficient multiplied by the
possibility of having a positive outcome.
3. The marginal effect up on the latent variable is
Maximum Likelihood Estimation of the Tobit Model
The contribution of an observation either equals the probability mass (at
the observed point      ) or the conditional density of , given that it is
positive times the probability mass of observing
                                                                       6.58
Using appropriate expression for normal distribution we can obtain:
                                                                       6.59
                                                                         13
Maximizing this function with respect to the parameters will give the
maximum likelihood estimates.
6.6.2. Sample Selection
Tobit model imposes a structure that is often restrictive: exactly the same
variables affecting the probability of nonzero observation determine the
level of positive observation and more over with the same sign.
This implies, for example, that those who are more likely to spend a
positive amount are, on average, also those that spend more on durable
goods.
For example, we might be interested in explaining wages. Obviously
wages are observed for people that are actually working, but we might be
interested in (potential) wages not conditional on this selection.
For example, a change in some variable        may lower someone’s wage
such that he decides to stop working. Consequently, his wage would not
be observed and the effect of this variable could be underestimated from
the available data.
Because, a sample of workers may not be a random sample of the
population (of potential workers), one may expect that people with lower
(potential) wages are more likely to be unemployed - This problem is
often referred to as sample selection.
Consider the following sample selection model of wage:
                                                                      6.60
Where,      denotes vector of exogenous characteristics of the person,
denotes the persons wage.
The wage     is not observable for people that are not working.
Thus to describe whether a person is working or not, a second equation is
specified, which is binary choice type:
                                                                         14
                                                                     6.61
Where,
                                                                     6.62
The binary variable   indicates working or not working. The error terms
of the two equations have mean of zero with variances of                 ,
respectively and covariance of .
One usually sets the restriction,  for normalization restriction of the
Probit model. The conditional expected wage given that a person is
working is given by:
                                                                     6.63
The conditional expected wage equals             only if       . So if the
error terms of the two equations are not correlated the wage equation can
be consistently estimated by OLS.
A sample selection bias of OLS arises if
                                                                       15
The term            is known as the inverse Mill’s ratio and is denoted by
           by Heckman (1979) and is referred as Heckman’s model.
                                                                       16