A Practical Guide To
Using Econometrics
A. H. Studenmund
Chapter 3
Steps in Applied Regression Analysis
Step 1: Review literature and develop
theoretical model.
Step 2: Specify model: Select independent
variables and functional form.
Step 3: Hypothesize expected signs of coefficients.
Step 4: Collect data. Inspect and clean data.
Step 5: Estimate and evaluate equation.
Step 6: Document results.
β 3-2
Step 1: Review the Literature and
Develop the Theoretical Model
• Best data analysts start with theory!
• It’s smart to review scholarly literature before doing
anything else.
• Many approaches, but searching EconLit is helpful.
• When topic has not been studied, two strategies:
1. Transfer theory from a similar topic to your topic.
2. Consult someone who works in the area.
β 3-3
Step 2: Specify the Model: Select the Independent
Variables and Functional Form
• Most important step in applied regression analysis:
specification of theoretical model.
• Specifying a model involves choosing:
1. Independent variables and how they should
be measured.
2. Functional (mathematical) form of variables.
3. Properties of stochastic error term.
β 3-4
Step 2: Specify the Model (continued)
• Any mistake in these three components leads to
specification error—a disastrous error to validity.
• Choose independent variables based on theory.
• Judgment must often be used and researchers impose
priors.
Example: Estimate demand equation for a good.
Theory suggests including prices of compliments
and substitutes.
Which ones do you choose?
β 3-5
Step 3: Hypothesize the Expected
Signs of Coefficients
• Once variables selected, hypothesize expected
signs of coefficients.
• Often, basic theory is general knowledge and
expected coefficient signs need no explanation.
• If there’s uncertainty, opposing theories should be
documented and your hypothesized sign explained.
β 3-6
Step 3: Hypothesize the Expected
Signs of Coefficients (continued)
Example: Impact of class size on student learning.
dependent variable:
Y= student score on grammar test
independent variables:
X1 = income level of student’s family
X2 = students per teacher
Notation with hypothesized signs above coefficients:
+ -
Y = b0 + b1 X1 + b2 X2 + ei (3.1)
β 3-7
Step 4: Collect the Data.
Inspect and Clean the Data
• Obtaining and preparing an original data set is difficult.
• General rule: the more observations the better.
• Reason there should be as many observations as
possible concerns the concept of degrees of freedom
(first mentioned in Section 2.4).
• With large number of degrees of freedom, every positive
error is likely balanced by a negative error.
β 3-8
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Another question: does unit of measurement of the
variables matter?
• Short answer: No—except in interpreting scale of coef.
Example: Independent variable is measured in dollars or
thousands of dollars.
• Constant term and measures of fit are unchanged.
• Slope coefficient of the variable changes by the exact
amount to compensate for the change in units.
• Variable measured in “thousands of $”: coefficient is 50
• Variable measured in “$”: coefficient is 0.05
β 3-9
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Always review data set for errors.
• Approaches:
• Plot the data and look for outliers.
• Look at mean, maximum and minimum of each
variable.
• Typically, data can be “cleaned” by replacing an
incorrect value with correct value.
• In extremely rare circumstances, drop an observation.
• BE CAREFUL! Mere existence of an outlier is not a
justification for dropping that observation.
β 3-10
Step 5: Estimate and Evaluate the Equation
• It can take months to complete steps 1–4!
• Once your equation is estimated, your work is not over.
• Rather, you need to evaluate.
• For example:
• How well did the equation fit the data?
• Were signs and magnitudes of coefficients expected?
• If evaluation indicates a problem, go back to step 1.
β 3-11
Step 6: Document the Results
• A standard format usually used to present results:
Yˆi =103.40 + 6.38Xi
(0.88) (3.2)
t = 7.22
N = 20 R = 0.73
2
• Number in parenthesis is standard error of coefficient.
• t-statistic is one used to test hypothesis that the true
value of the coefficient is different from zero.
• It is also important to explain the model, assumptions
and document data manipulations in written report.
β 3-12
Example: Using Regression
Analysis to Pick Restaurant Locations
• You’re hired to determine best location of the next
Woody’s restaurant (a moderately priced, 24-hour,family
restaurant chain).
• You decide to build a regression model to explain the
gross sales volume of each of the restaurants.
Step 1:Review the literature and develop theoretical
model.
• Read about restaurant industry.
• Talk to experts within the firm.
β 3-13
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 2: Specify the model: Select independent
variables and the functional form.
• You decide there are three major determinants of sales:
N = Competition: the number of direct market
competitors within a two-mile radius of the
Woody’s location
P = Population: the number of people living within a
three-mile radius of the Woody’s location
I = Income: the average household income of the
population measured in variable P
β 3-14
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 3: Hypothesize the expected signs of the
coefficients.
• N: More competition in area, the fewer customers the
location will have (negative).
• P: More people in area, the more customers the location
will have (positive).
• I: Unclear—probably positive but could be negative.
- + +?
• Thus: Yi = b0 + b N Ni + bP Pi + bI Ii + ei (3.3)
β 3-15
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 4: Collect the data. Inspect and clean the data.
Table 3.1: Data for Woody’s Restaurant Example
β 3-16
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 5: Estimate and evaluate the equation.
• With software and the data set, you estimate:
Yˆi =102,192 - 9075Ni + 0.355Pi +1.288Ii
(2053) (0.073) (0.543) (3.4)
t = -4.42 4.88 2.37
N = 33 R 2 = 0.579
• Estimated coefficients have expected signs.
• Overall fit seems reasonable.
β 3-17
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 6: Document the results.
• Equation 3.4 from Step 5 documents results—pulled from
statistical software output (like Table 3.2).
β 3-18
Dummy Variables
• Some variables are inherently “qualitative” (rather than
“quantitative”) and can’t be expressed as a number.
• Some qualitative variables can be quantified by using a
dummy variable.
• A dummy variable takes on value of 0 or 1 depending on
whether a specified condition is met.
β 3-19
Dummy Variables (continued)
+ +
Example: Yi = b0 + b1 Xi + b2 Di + ei (3.5)
where: Yi = the income of the ith teacher in dollars
Xi = the number of years teaching experience
income of the ith teacher
Di = 1 if the ith teacher has a graduate
degree and 0 otherwise
• Di is a dummy variable.
• β2 indicates the additional income attributed to having a
graduate degree, holding teaching experience constant.
β 3-20
Dummy Variables (continued)
• A dummy variable changes the intercept.
• Slope remains constant.
β 3-21
Dummy Variables (continued)
• In teacher salary example, there are two “conditions”:
Condition 1: Teacher has a graduate degree
Condition 2: Teacher does not have a graduate degree
• However, only one dummy variable is used.
• Event not explicitly represented by a dummy variable is
the omitted condition.
• The omitted condition forms the basis against which the
included condition(s) are compared.
β 3-22
Dummy Variables (continued)
Example: relationship between fraternity/sorority (Greek)
membership and GPA
• Use a dummy variable to represent Greek organization
membership.
Di = 1 if the ith student is an active member of a
fraternity or a sorority and 0 otherwise
• Other factors to include:
1. High school GPA
2. SAT score.
• Gather data and estimate.
β 3-23
Dummy Variables (continued)
CGi = 0.37+ 0.81HGi + 0.00001Si - 0.38Di + ei (3.6)
N = 25 R = 0.45
2
where:
CGi = the cumulative college GPA (4-point scale) of the
ith student
HGi = the cumulative high school GPA (4-point scale) of
the ith student
Si = the sum of the highest verbal and mathematics
SAT scores earned by the ith student
• Di coef. means that a fraternity/sorority member’s GPA is 0.38
points lower than a nonmember, holding HG and S constant.
β 3-24
Dummy Variables (continued)
• What about a qualitative variable that has three or more
alternatives (or conditions)?
• In teacher salary example, “graduate degree” could
mean M.A. or Ph.D.
• This means there are three alternatives: B.A., M.A., and
Ph.D.
• Dummy variable approach: create one fewer dummy
than there are conditions.
β 3-25
Dummy Variables (continued)
• For revised high school teacher salary example:
PHDi = 1 if the ith teacher’s highest degree is a
Ph.D. and 0 otherwise
MAi = 1 if the ith teacher’s highest degree is an
M.A. and 0 otherwise
• Incorporating these into Equation (3.5):
+ + ?
Yi = b0 + b1 Xi + b2 PHDi + b3 MAi + ei (3.7)
• What’s the expected sign of MA?
β 3-26
Dummy Variables (continued)
+ + ?
Yi = b0 + b1 Xi + b2 PHDi + b3 MAi + ei (3.7)
• The coefficient on each dummy variable is the increase
in the dependent variable caused by the condition being
met compared to the omitted condition.
• β3 measures the impact of having the highest degree be
an M.A. (holding X and PHD constant) compared to the
omitted condition (which is having a B.A.).
β 3-27
β
CHAPTER 3: the end