LEARNING TO USE REGRESSION ANALYSIS
β 3-1
Steps in Applied Regression Analysis
Step 1: Review literature and develop
theoretical model.
Step 2: Specify model: Select independent
variables and functional form.
Step 3: Hypothesize expected signs of coefficients.
Step 4: Collect data. Inspect and clean data.
Step 5: Estimate and evaluate equation.
Step 6: Document results.
β 3-2
Step 1: Review the Literature and
Develop the Theoretical Model
• Best data analysts start with theory!
• It’s smart to review scholarly literature before doing
anything else.
• Many approaches, but searching EconLit is helpful.
• When topic has not been studied, two strategies:
1. Transfer theory from a similar topic to your topic.
2. Consult someone who works in the area.
β 3-3
Step 2: Specify the Model: Select the Independent
Variables and Functional Form
• Most important step in applied regression analysis:
specification of theoretical model.
• Specifying a model involves choosing:
1. Independent variables and how they should
be measured.
2. Functional (mathematical) form of variables.
3. Properties of stochastic error term.
β 3-4
Step 2: Specify the Model (continued)
• Any mistake in these three components leads to
specification error—a disastrous error to validity.
• Choose independent variables based on theory.
• Judgment must often be used and researchers impose
priors.
Example: Estimate demand equation for a good.
Theory suggests including prices of compliments
and substitutes.
Which ones do you choose?
β 3-5
Step 3: Hypothesize the Expected
Signs of Coefficients
• Once variables selected, hypothesize expected
signs of coefficients.
• Often, basic theory is general knowledge and
expected coefficient signs need no explanation.
• If there’s uncertainty, opposing theories should be
documented and your hypothesized sign explained.
β 3-6
Step 3: Hypothesize the Expected
Signs of Coefficients (continued)
Example: Impact of class size on student learning.
dependent variable:
Y= student score on grammar test
independent variables:
X1 = income level of student’s family
X2 = students per teacher
Notation with hypothesized signs above coefficients:
+ -
Y=b 0 + b 1 X1 + b 2 X2 + e i
(3.1)
β 3-7
Step 4: Collect the Data.
Inspect and Clean the Data
• Obtaining and preparing an original data set is difficult.
• General rule: the more observations the better.
• Reason there should be as many observations as
possible concerns the concept of degrees of freedom
(first mentioned in Section 2.4).
• With large number of degrees of freedom, every positive
error is likely balanced by a negative error.
β 3-8
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Another question: does unit of measurement of the
variables matter?
• Short answer: No—except in interpreting scale of coef.
Example: Independent variable is measured in dollars or
thousands of dollars.
• Constant term and measures of fit are unchanged.
• Slope coefficient of the variable changes by the exact
amount to compensate for the change in units.
• Variable measured in “thousands of $”: coefficient is 50
• Variable measured in “$”: coefficient is 0.05
β 3-9
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Always review data set for errors.
• Approaches:
• Plot the data and look for outliers.
• Look at mean, maximum and minimum of each
variable.
• Typically, data can be “cleaned” by replacing an
incorrect value with correct value.
• In extremely rare circumstances, drop an observation.
• BE CAREFUL! Mere existence of an outlier is not a
justification for dropping that observation.
β 3-10
Step 5: Estimate and Evaluate the Equation
• It can take months to complete steps 1–4!
• Once your equation is estimated, your work is not over.
• Rather, you need to evaluate.
• For example:
• How well did the equation fit the data?
• Were signs and magnitudes of coefficients expected?
• If evaluation indicates a problem, go back to step 1.
β 3-11
Step 6: Document the Results
• A standard format usually used to present results:
Ŷi =103.40 + 6.38Xi
(0.88) (3.2)
t = 7.22
2
N = 20 R = 0.73
• Number in parenthesis is standard error of coefficient.
• t-statistic is one used to test hypothesis that the true
value of the coefficient is different from zero.
• It is also important to explain the model, assumptions
and document data manipulations in written report.
β 3-12
Example: Using Regression
Analysis to Pick Restaurant Locations
• You’re hired to determine best location of the next
Woody’s restaurant (a moderately priced, 24-hour,family
restaurant chain).
• You decide to build a regression model to explain the
gross sales volume of each of the restaurants.
Step 1:Review the literature and develop theoretical
model.
• Read about restaurant industry.
• Talk to experts within the firm.
β 3-13
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 2: Specify the model: Select independent
variables and the functional form.
• You decide there are three major determinants of sales:
N = Competition: the number of direct market
competitors within a two-mile radius of the
Woody’s location
P = Population: the number of people living within a
three-mile radius of the Woody’s location
I = Income: the average household income of the
population measured in variable P
β 3-14
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 3: Hypothesize the expected signs of the
coefficients.
• N: More competition in area, the fewer customers the
location will have (negative).
• P: More people in area, the more customers the location
will have (positive).
• I: Unclear—probably positive but could be negative.
- + +?
• Thus: Yi = b 0 +b N Ni + b P Pi + b I Ii +e i
(3.3)
β 3-15
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 4: Collect the data. Inspect and clean the data.
Table 3.1: Data for Woody’s Restaurant Example
β 3-16
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 5: Estimate and evaluate the equation.
• With software and the data set, you estimate:
Ŷi =102,192 - 9075Ni + 0.355Pi +1.288I i
(2053) (0.073) (0.543) (3.4)
t = - 4.42 4.88 2.37
N = 33 R2 = 0.579
• Estimated coefficients have expected signs.
• Overall fit seems reasonable.
β 3-17
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 6: Document the results.
• Equation 3.4 from Step 5 documents results—pulled from
statistical software output (like Table 3.2).
β 3-18
β
CHAPTER 3: the end