Faculty of Economics and Political Science
Econometrics (2) – 2021/2022
Fourth year – All sections
Sheet (3): Logit and Probit Models
Question (1):
From the Household Income, Expenditure and Consumption Survey of 2005 of the
CAPMAS, Mohamed obtained the following logit model based on a sample of 2820
households. (The results given here are based on the method of maximum likelihood
and are after the third iteration.) The purpose of the logit model was to determine car
ownership as a function of the logarithm of income. Car ownership was a binary
variable: Y = 1 if a household owns a car, zero otherwise.
Lˆi = −2.77231+ 0.347582ln Income
t= (-3.35) (4.05)
2
(1) = 16.681 (p-value = 0.0000)
Where L̂i = estimated logit and where ln Income is the logarithm of income. The 2
measures the goodness of fit of the model.
a. Interpret the estimated logit model.
b. From the estimated logit model, how would you obtain the expression for
the probability of car ownership?
c. What is the probability that a household with an income of 20000 will own
a car?
d. Comment on the statistical significance of the estimated logit model.
Question (2):
Assume the following regression model:
Yi = α+ βXi + ∊i
Where Yi is one if individual i is a smoker, zero otherwise; Xi is one if individual i is
a male, zero otherwise; and∊i has a logistic distribution.
Consider that you have a random sample of 100 observations of which 5 are smoking
males, 45 are smoking females and 35 are non-smoking males.
1
a. Write down the log-likelihood function for your sample.
b. How would you test the null hypothesis that β = 0 i.e., that gender is an
irrelevant variable, using the likelihood ratio test, at the 5% significance
2
level (𝜒0.05,1 = 3.84).
Question (3)
Consider the following LPM of probability that an individual purchases
supplemental insurance:
Yi = B0 + B1pricei +B2incomei +B3Agei +B4Ci +∊i
Where Yi= 1 if an individual purchases supplemental insurance and 0 otherwise,
price is the price of supplemental insurance in dollars, income is the individual
income in thousands of dollars, age is the person’s age in years, C is the amount of
health coverage given by the government for each individual.
The following table shows the means of these variables and the estimated results :
Variable Mean LPM coefficients Logit coefficients
(Std. error) (Std. error)
Price 5 -0.4 -1
(0.001) (0.03)
Income 40 0.01 0.025
(0.0005) (0.01)
Age 44 0.04 0.1
(0.01) (0.01)
C 1500 -0.0002 -0.0005
(0.0001) (0.00001)
Constant - 0.6 -0.2
(0.2) (0.2)
a) Given the estimated results what is the probability that Y =1 in the LPM?
b) What are the problems associated with the LPM?
c) Based on the LPM what is the probability that a person with average
characteristics will purchase a supplemental insurance?
d) How does the probability change when income is increased to 80000?
e) Given the estimated results what is the probability that Y =1 in the Logit
model?
2
f) Based on the Logit results what is the probability that a person with average
characteristics will purchase a supplemental insurance?
g) How does the probability change when income is increased by 40000?
h) Interpret the coefficient of age in Logit and LPM
Question (4):
In studying the female labor force participation in 2009, LPM, a Logit and a Probit
models were estimated for 5197 individuals. The following table shows the results
of the three models.
Dependent variable Lfp
Independent Mean LPM (OLS) Logit (MLE) Probit (MLE)
variable
Age 39.6448 0.0074744 0.0410075 0.0228952
Age^2 1684.849 -0.0001186** -0.000647** -0.0003666**
ed 13.26381 0.0420481*** 0.2332999*** 0.1358836***
Black 0.3440446 -0.0169058 -0.1135549 -0.0589809
Others 0.0527227 -0.0433624 -0.2062645 -0.1284139
Married 0.6778911 -0.0309536** -0.1772773** -0.1124941**
Constant - 0.1267127 -2.281569*** -1.276322***
***and** represent respectively statistical significance at the 1% and 5% levels.
Such that;
• lfp is equal to 1 if the individual is working, zero otherwise;
• ed is years of schooling;
• Age is the age of the individual;
• Black is a dummy variable equal to 1 if the woman is Afro-American, zero otherwise;
• Other is a dummy variable equal to 1 if the woman is not white or black, zero otherwise;
• and married is a dummy variable equal to 1 if the woman is married, zero otherwise.
a) Which model is the best to use? Why?
b) Compare between the coefficients of the three models.
c) Discuss how education affects female labor force participation for the three
models.
d) Do women who are unmarried have a higher probability of working?
e) What is the ratio of partial effects of age to education for the three models?
3
Question (5):
In studying the probability of a man being arrested in 1986, the following Probit
models were estimated for 2725 individuals. The results are shown in the following
table.
Dependent variable Arr
Independent variables Model (1) Model (2) Model (3) Mean
Pcnv -0.5529248 ----- 0.2167615 0.3577872
(-7.67)** (0.83)
Avgsen 0.0127395 ----- 0.0139969 0.6322936
(0.60) (0.57)
Tottime -0.0076486 ----- -0.0178158 0.8387523
(-0.45) (-0.89)
Ptime86 -0.0812017 ----- 0.7449712 0.387156
(-4.52) ** (5.18) **
Inc86 -0.0046346 ----- -0.0058786 54.96705
(-9.70) ** (-5.97) **
Black 0.4666076 ----- 0.4368131 0.1611009
(6.48) ** (5.95) **
Hispan 0.2911005 ----- 0.2663945 0.2176147
(4.45) ** (3.97) **
Born60 0.0112074 ----- -0.0145223 0.3625688
(0.20) (-0.26)
Pcnv 2 ----- ----- -0.8570512 0.284131
(-3.16) **
Ptime86 2 ----- ----- -0.1035031 3.951193
(-4.62) **
Inc86 2 ----- ----- 8.75e-06 7458.933
(2.04) **
Constant -0.3138331 -0.5915851 -0.337362
(-6.12) (-23.11) (-6)
Log Likelihood -1483.6406 -1608.1837 -1439.8005
Pseudo R 2 0.0774 0.00 0.1047
Percentage correctly predicted 72.7% 72.29% 73.8%
**the results in brackets are the z-value for each coefficient.
Such that:
arr = 1 if an individual was arrested at least one time in 1986 and 0 otherwise;
pcnv is proportion of prior convictions;
avgsen is average sentence length since age 18 (mos);
tottime is time in prison since age 18 (mos.)
ptime86 is mos. in prison during 1986
inc86 is legal income in 1986, in $100
black =1 if an individual is black and 0 otherwise;
4
hispan =1 if an individual is Hispanic and 0 otherwise;
born60 =1 if born in 1960 and 0 otherwise;
Pcnv2 is the square of pcnv;
Ptime862 is square of ptime86;
Inc862 is square of inc86;
Table of percentage correctly predicted of Model 1
-------- True --------
Classified D ~D Total
+ 78 67 145
- 677 1903 2580
Total 755 1970 2725
Classified + if predicted Pr(D) >= .5
a) Comment on the coefficients of Model 1 and 3.
b) In model 1, at the average values of avgsen, Tottime, Ptime86 and Inc86 in
the sample, and with black=1, hispan=0 and born60=1, what is the estimated
effect on the probability of arrest in the first Model if pcnv goes from 0.25 to
0.75?
c) Are Pcnv2, Ptime862 and Inc862 individually or jointly significant?
d) Comment on the overall significance of Model 1 and 3.
e) What is the percent correctly predicted for model 1 when arr = 0? When arr =
1?
Table of percentage correctly predicted of Model 1
-------- True --------
Classified D ~D Total
+ 78 67 145
- 677 1903 2580
Total 755 1970 2725
Classified + if predicted Pr(D) >= .5
Actual Y=1 Actual Y=0 Total
Predicted Y= 1 78 67 145
Predicted Y = 0 677 1903 2580
Total 755 1970 2725
f) Comment on the goodness of fit of model 1 and 3?
Some useful critical values
Chi-squared distribution
Df 1 2 3 7 8 9 10 11
χ2.050 3.84 5.99 7.815 14.0671 15.507 16.919 18.307 19.675
χ2.025 5.02 7.38 9.348 16.013 17.535 19.023 20.483 21.920
χ2.01 6.63 9.21 11.345 18.475 20.090 21.666 23.209 24.725
χ2.005 7.89 10.6 12.838 20.278 21.955 23.589 25.188 26.757
5
6