Learning Objectives for the Multiple
Regression Model
We will discuss:
◼ How to develop a multiple regression model
◼ How to interpret the regression coefficients
◼ How to determine which independent variables to include in the
regression model
◼ How to determine which independent variables are most important in
predicting a dependent variable
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Three-category approach to
quantitative methods
◼ Quantitative methods has 3 distinctive categories or
functions:
◼ Descriptive statistics
◼ Analysis of differences in the same variables measured (or
observed) under different conditions-t-test, Z-test, ANOVA,
MANOVA.
◼ Analysis of relationships between/among variables –
regression models (cross sectional, time-series and panel
data). Regression is king of Econometrics! A US Professor ◼
Wooldridge Jeffrey M. - 7th edition
◼ Gujarati Damodar N. - Basic Econometrics, 5th edition. Ragnar
Frisch – father of Econometrics, Jan Tinbergen, Simon Kuznets-
GDP, Jacob Marschak-Ukraine.
2
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Regression Models
◼ A regression model can be applied to estimate the
parameters of many relationships in economics, business,
and the social sciences. The term regression was coined by
Francis Galton (1886).
◼ regression toward the mean (or regression to the
mean), ”on average, move towards the middle”. ◼ Any time
you ask how much a change in one variable will affect
another variable, regression analysis is a potential tool
◼ Similarly, any time you wish to predict the value of one
variable given the value of another then least squares
regression is a tool to consider The Simple Linear Regression Model 3
Principles of Econometrics, 5e
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
The Multiple Regression
Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables: Y-intercept
Population slopes Random Error
Y =β + β X + β X + ⋅⋅⋅ + β X + ε i 0
1 1i 2 2i k ki i
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Introduction (2 of 2)
◼ Variable to be predicted is called the dependent
variable or response variable
◼ Value depends on the value of the
independent variable(s)
◼ Explanatory or predictor variable
Copyrigh
t ©2015 Pearson Education, Ltd. All rights reserved. Copyright © 2018, 2015, 2012 Pearson Education, Inc. All Rights
Reserved
Multiple Regression Equation
Sample Data Example
The coefficients of the multiple regression model are estimated
using sample data
Multiple regression equation with k independent variables:
Estimated (or predicted) value of Y Estimated slope coefficients
Estimated intercept
ˆ
i011i22ikkiY = b + b X + b X + ⋅ ⋅ ⋅
+bX
We will use soft to obtain the regression slope coefficients
and other regression summary measures.
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Multiple Regression Equation
(continued)
Two variable model
Y X1
ˆ
=++
Y b 0b 1X 1b 2X 2 X 2
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Line of best fit
C
opyright ©2015 Pearson Education, Ltd. All rights reserved.
Estimating β1 and β2 manually is a very
tedious work
◼Solving the normal equations simultaneously, we
obtain
Copyrig
ht ©2015 Pearson Education, Ltd. All rights reserved.
Estimating β1 and β2 manually is a very
tedious work
◼ where X¯ and Y¯ are the sample means of X and Y
and where we define xi = (Xi − X¯ ) and yi = (Yi − Y¯).
Henceforth we adopt the convention of letting the
lowercase letters denote
deviations from mean values.
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Sample Data for Estimating the β1 and β2
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Regression model equation
Show the Excel file
• βˆ1 = 24.4545 se (βˆ1) = 6.4138
• βˆ2 = 0.5091 se (βˆ2) = 0.0357
• The estimated regression line therefore is •
Yˆi = 24.4545 + 0.509*Xi
• which is shown graphically
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
123 SALES PRICE ADVERT= β + β + β 12233 ββ
β iiiiy x x e = + + +
The Economic Model
◼Let’s set up an economic model in which sales revenue
depends on one or more explanatory variables
◼We initially hypothesize that sales revenue is linearly
related to price and advertising expenditure
◼The economic model is:
Principles of Econometrics, 5e The Multiple Regression Model 15
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Data Sales_Example.xls. Stata
commands for exploring the data
◼ sum Y X, d
◼ graph twoway (lfit Y X) (scatter Y X)
◼ corr Y X
◼ reg Y X1 X2
◼ predict yhat
◼ predict rvar, res
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Observations 15 Sales = 306.526 - 24.975(Pri ce) + 74.131(Adv
ertising)
ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12
27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993
57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732
2.85478 0.01449 17.55303 130.70888
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
The Multiple Regression Equation
Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales will decrease, on b2 = 74.131: sales will increase, on
average, by 24.975 pies per week average, by 74.131 pies per week
for each $1 increase in selling price, for each $100 increase in
net of the effects of changes due to advertising, net of the effects of
advertising changes due to price
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Is the Model Significant?
◼ F Test for Overall Significance of the Model
◼ Shows if there is a linear relationship between all
of the X variables considered together and Y
◼ Use F-test statistic
◼ Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship) H1:
at least one βi ≠ 0 (at least one independent
variable affects Y)
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
F Test for Overall Significance
◼ Test statistic:
SSR
MSR k
FSTAT
SSE
==
nk
MSE
− −1
where FSTAT has numerator d.f. = k and
denominator d.f. = (n – k - 1)
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
F Test for Overall Significance In
Excel
(continued)
Regression Statistics
Multiple R 0.72213 R Square 0.52148
MSR =14730.0
FSTAT = =
6.5386
Adjusted R Square 0.44172
MSE 2252.8
Standard Error 47.46341 Observations 15
of freedomP-value for the F Test
With 2 and 12 degrees
ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12
27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993
57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732
2.85478 0.01449 17.55303 130.70888
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
F Test for Overall Significance
(continued
)
H0: β1 = β2 = 0 MSR
H1: β1 and β2 not both zero α = F 6.5386 STAT = = MSE
.05
df1= 2 df2 = 12 Decision:
Critical Since FSTAT test statistic is in
Value: the rejection region (p value
< .05), reject H0
F0.05 = 3.885
α = .05 Conclusion:
Test Statistic:
0 Do not Reject H0
least one F
There is evidence that at
reject H0 independent variable affects Y
F0.05 = 3.885
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Sales 306.526 - 24.975(Price) 74.131(Advertising)
=+
=+
306.526 - 24.975 (5.50) 74.131(3.5)
= 428.62
Note that Advertising is in
Predicted sales is $100s, so $350 means that X2 =
428.62 pies 3.5
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Multiple Coefficient of
Determination
Regression Statistics
SSR 29460.0
Multiple R 0.72213
r2= = = .52148
R Square 0.52148
SST 56493.3
Adjusted R Square 0.44172 Standard Error 47.46341
explained by the variation in price
Observations 15
and advertising
52.1% of the variation in pie sales is
ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12
27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993
57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732
2.85478 0.01449 17.55303 130.70888
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
2
Adjusted r
◼ r2 never decreases when a new X variable is
added to the model
This can be a disadvantage when comparing
◼
models
◼ What is the net effect of adding a new variable?
◼ We lose a degree of freedom when a new X
variable is added
◼ Did the new X variable add enough
explanatory power to offset the loss of one
degree of freedom?
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
2(continued)
Adjusted r
◼ Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used
⎢ ⎡⎟ ⎥
⎣ ⎠⎞ ⎦⎤
=−−
11
n −
22 r r adj ⎜ ⎛
⎝ −−nk
1 (1 )
(where n = sample size, k = number of independent variables)
◼ Penalizes excessive use of unimportant independent
variables
2
◼ Smaller than r
◼ Useful in comparing among models
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
2
Adjusted r in Excel
Regression Statistics
44.2% of the variation in pie sales is
Multiple R 0.72213 R Square 0.52148 Adjusted R Square
explained by the variation in price
0.44172 Standard Error 47.46341 Observations 15
and advertising, taking into account
r .44172 2adj = the sample size and number of
independent variables
ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12
27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993
57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732
2.85478 0.01449 17.55303 130.70888
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Are Individual Significant?
Variables
(continued)
H0: βj = 0 (no linear relationship between Xj and Y)
H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)
Test Statistic:
t− 0
= b j
j
STAT b
S (df = n – k – 1)
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Are Individual Output
Variables
Significant? Excel (continued)
Regression Statistics 0.44172 Standard Error 47.46341 Observations 15
Multiple R 0.72213 R Square 0.52148 Adjusted R Square
t Stat for Price is tSTAT = -2.306, with t Stat for Advertising is tSTAT = 2.855,
p-value .0398 with p-value .0145
ANOVA df SS MS F Significance F Regression 2 29460.027 14730.013 6.53861 0.01201 Residual 12
27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 306.52619 114.25389 2.68285 0.01993
57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 Advertising 74.13096 25.96732
2.85478 0.01449 17.55303 130.70888
Copyright ©2015 Pearson Education, Ltd. All rights reserved.
Residuals in Multiple Regression
Two variable model
Y Sample
ˆ
Yi = + +
Residual = observation Yi
<
ei = (Yi – Yi) <
Y b 0b 1X 1b 2X 2
x2i x1i
X2
X1 minimizing the sum of squared errors,
The best fit equation is found by Σe2
Copyright ©2015 Pearson Education, Ltd. All rights reserved.