KEMBAR78
Chapter3 Final | PDF | Linear Regression | Regression Analysis
0% found this document useful (0 votes)
15 views29 pages

Chapter3 Final

Chapter 3 of 'A Practical Guide To Using Econometrics' outlines the steps in applied regression analysis, including reviewing literature, specifying the model, hypothesizing coefficient signs, collecting and cleaning data, estimating the equation, and documenting results. It emphasizes the importance of theoretical models and careful selection of independent variables to avoid specification errors. The chapter also discusses the use of dummy variables for qualitative data in regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

Chapter3 Final

Chapter 3 of 'A Practical Guide To Using Econometrics' outlines the steps in applied regression analysis, including reviewing literature, specifying the model, hypothesizing coefficient signs, collecting and cleaning data, estimating the equation, and documenting results. It emphasizes the importance of theoretical models and careful selection of independent variables to avoid specification errors. The chapter also discusses the use of dummy variables for qualitative data in regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

A Practical Guide To

Using Econometrics

A. H. Studenmund

Chapter 3
Steps in Applied Regression Analysis
Step 1: Review literature and develop
theoretical model.
Step 2: Specify model: Select independent
variables and functional form.
Step 3: Hypothesize expected signs of coefficients.
Step 4: Collect data. Inspect and clean data.
Step 5: Estimate and evaluate equation.

Step 6: Document results.


β 3-2
Step 1: Review the Literature and
Develop the Theoretical Model

• Best data analysts start with theory!


• It’s smart to review scholarly literature before doing
anything else.
• Many approaches, but searching EconLit is helpful.
• When topic has not been studied, two strategies:
1. Transfer theory from a similar topic to your topic.
2. Consult someone who works in the area.

β 3-3
Step 2: Specify the Model: Select the Independent
Variables and Functional Form

• Most important step in applied regression analysis:


specification of theoretical model.

• Specifying a model involves choosing:


1. Independent variables and how they should
be measured.
2. Functional (mathematical) form of variables.
3. Properties of stochastic error term.

β 3-4
Step 2: Specify the Model (continued)
• Any mistake in these three components leads to
specification error—a disastrous error to validity.
• Choose independent variables based on theory.
• Judgment must often be used and researchers impose
priors.

Example: Estimate demand equation for a good.


Theory suggests including prices of compliments
and substitutes.
Which ones do you choose?

β 3-5
Step 3: Hypothesize the Expected
Signs of Coefficients

• Once variables selected, hypothesize expected


signs of coefficients.

• Often, basic theory is general knowledge and


expected coefficient signs need no explanation.

• If there’s uncertainty, opposing theories should be


documented and your hypothesized sign explained.

β 3-6
Step 3: Hypothesize the Expected
Signs of Coefficients (continued)
Example: Impact of class size on student learning.
dependent variable:
Y= student score on grammar test
independent variables:
X1 = income level of student’s family
X2 = students per teacher

Notation with hypothesized signs above coefficients:


+ -
Y = b0 + b1 X1 + b2 X2 + ei (3.1)

β 3-7
Step 4: Collect the Data.
Inspect and Clean the Data

• Obtaining and preparing an original data set is difficult.

• General rule: the more observations the better.

• Reason there should be as many observations as


possible concerns the concept of degrees of freedom
(first mentioned in Section 2.4).

• With large number of degrees of freedom, every positive


error is likely balanced by a negative error.

β 3-8
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Another question: does unit of measurement of the
variables matter?
• Short answer: No—except in interpreting scale of coef.
Example: Independent variable is measured in dollars or
thousands of dollars.
• Constant term and measures of fit are unchanged.
• Slope coefficient of the variable changes by the exact
amount to compensate for the change in units.
• Variable measured in “thousands of $”: coefficient is 50
• Variable measured in “$”: coefficient is 0.05
β 3-9
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Always review data set for errors.
• Approaches:
• Plot the data and look for outliers.
• Look at mean, maximum and minimum of each
variable.
• Typically, data can be “cleaned” by replacing an
incorrect value with correct value.
• In extremely rare circumstances, drop an observation.
• BE CAREFUL! Mere existence of an outlier is not a
justification for dropping that observation.
β 3-10
Step 5: Estimate and Evaluate the Equation
• It can take months to complete steps 1–4!
• Once your equation is estimated, your work is not over.
• Rather, you need to evaluate.
• For example:
• How well did the equation fit the data?
• Were signs and magnitudes of coefficients expected?

• If evaluation indicates a problem, go back to step 1.

β 3-11
Step 6: Document the Results
• A standard format usually used to present results:
Yˆi =103.40 + 6.38Xi
(0.88) (3.2)
t = 7.22
N = 20 R = 0.73
2

• Number in parenthesis is standard error of coefficient.


• t-statistic is one used to test hypothesis that the true
value of the coefficient is different from zero.
• It is also important to explain the model, assumptions
and document data manipulations in written report.
β 3-12
Example: Using Regression
Analysis to Pick Restaurant Locations
• You’re hired to determine best location of the next
Woody’s restaurant (a moderately priced, 24-hour,family
restaurant chain).
• You decide to build a regression model to explain the
gross sales volume of each of the restaurants.

Step 1:Review the literature and develop theoretical


model.
• Read about restaurant industry.
• Talk to experts within the firm.

β 3-13
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 2: Specify the model: Select independent
variables and the functional form.
• You decide there are three major determinants of sales:

N = Competition: the number of direct market


competitors within a two-mile radius of the
Woody’s location
P = Population: the number of people living within a
three-mile radius of the Woody’s location
I = Income: the average household income of the
population measured in variable P
β 3-14
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 3: Hypothesize the expected signs of the
coefficients.
• N: More competition in area, the fewer customers the
location will have (negative).
• P: More people in area, the more customers the location
will have (positive).
• I: Unclear—probably positive but could be negative.

- + +?
• Thus: Yi = b0 + b N Ni + bP Pi + bI Ii + ei (3.3)

β 3-15
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 4: Collect the data. Inspect and clean the data.
Table 3.1: Data for Woody’s Restaurant Example

β 3-16
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 5: Estimate and evaluate the equation.
• With software and the data set, you estimate:

Yˆi =102,192 - 9075Ni + 0.355Pi +1.288Ii


(2053) (0.073) (0.543) (3.4)
t = -4.42 4.88 2.37
N = 33 R 2 = 0.579
• Estimated coefficients have expected signs.
• Overall fit seems reasonable.

β 3-17
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 6: Document the results.
• Equation 3.4 from Step 5 documents results—pulled from
statistical software output (like Table 3.2).

β 3-18
Dummy Variables
• Some variables are inherently “qualitative” (rather than
“quantitative”) and can’t be expressed as a number.

• Some qualitative variables can be quantified by using a


dummy variable.

• A dummy variable takes on value of 0 or 1 depending on


whether a specified condition is met.

β 3-19
Dummy Variables (continued)
+ +
Example: Yi = b0 + b1 Xi + b2 Di + ei (3.5)

where: Yi = the income of the ith teacher in dollars


Xi = the number of years teaching experience
income of the ith teacher
Di = 1 if the ith teacher has a graduate
degree and 0 otherwise
• Di is a dummy variable.
• β2 indicates the additional income attributed to having a
graduate degree, holding teaching experience constant.
β 3-20
Dummy Variables (continued)
• A dummy variable changes the intercept.
• Slope remains constant.

β 3-21
Dummy Variables (continued)
• In teacher salary example, there are two “conditions”:
Condition 1: Teacher has a graduate degree
Condition 2: Teacher does not have a graduate degree

• However, only one dummy variable is used.

• Event not explicitly represented by a dummy variable is


the omitted condition.

• The omitted condition forms the basis against which the


included condition(s) are compared.

β 3-22
Dummy Variables (continued)
Example: relationship between fraternity/sorority (Greek)
membership and GPA
• Use a dummy variable to represent Greek organization
membership.
Di = 1 if the ith student is an active member of a
fraternity or a sorority and 0 otherwise
• Other factors to include:
1. High school GPA
2. SAT score.
• Gather data and estimate.
β 3-23
Dummy Variables (continued)
CGi = 0.37+ 0.81HGi + 0.00001Si - 0.38Di + ei (3.6)
N = 25 R = 0.45
2

where:
CGi = the cumulative college GPA (4-point scale) of the
ith student
HGi = the cumulative high school GPA (4-point scale) of
the ith student
Si = the sum of the highest verbal and mathematics
SAT scores earned by the ith student
• Di coef. means that a fraternity/sorority member’s GPA is 0.38
points lower than a nonmember, holding HG and S constant.
β 3-24
Dummy Variables (continued)
• What about a qualitative variable that has three or more
alternatives (or conditions)?

• In teacher salary example, “graduate degree” could


mean M.A. or Ph.D.

• This means there are three alternatives: B.A., M.A., and


Ph.D.

• Dummy variable approach: create one fewer dummy


than there are conditions.

β 3-25
Dummy Variables (continued)
• For revised high school teacher salary example:
PHDi = 1 if the ith teacher’s highest degree is a
Ph.D. and 0 otherwise
MAi = 1 if the ith teacher’s highest degree is an
M.A. and 0 otherwise
• Incorporating these into Equation (3.5):
+ + ?
Yi = b0 + b1 Xi + b2 PHDi + b3 MAi + ei (3.7)

• What’s the expected sign of MA?


β 3-26
Dummy Variables (continued)
+ + ?
Yi = b0 + b1 Xi + b2 PHDi + b3 MAi + ei (3.7)
• The coefficient on each dummy variable is the increase
in the dependent variable caused by the condition being
met compared to the omitted condition.

• β3 measures the impact of having the highest degree be


an M.A. (holding X and PHD constant) compared to the
omitted condition (which is having a B.A.).

β 3-27
β

CHAPTER 3: the end

You might also like