Simple Linear
Regression
Correlation vs. Regression
A scatter plot can be used to show the
relationship between two variables
Correlation analysis is used to measure the
strength of the association (linear relationship)
between two variables
Correlation is only concerned with strength of the
relationship
No causal effect is implied with correlation
Types of Relationships DCOVA
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Types of Relationships
No relationship
X
Regression Models
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
What is
Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on
the value of at least one independent variable
Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to
predict or explain
Independent variable: the variable used to
predict or explain the
dependent
variable
Simple Linear Regression Model
Only one independent variable, X
Relationship between X and Y is described by
a linear function
Changes in Y are assumed to be related to
changes in X
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi ε i
Linear component Random Error
component
Simple Linear Regression Equation
(Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for
Ŷi b0 b1Xi
observation i
Interpretation of the
Slope and the Intercept
b0 is the estimated mean value of Y when
the value of X is zero
b1 is the estimated change in the mean
value of Y as a result of a one-unit increase
in X
Simple Linear Regression
Example
A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
Simple Linear Regression
Example: Data DCOVA
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Simple Linear Regression Example:
Scatter Plot
House price model: Scatter Plot
450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Simple Linear Regression Example:
Using Excel Data Analysis Function
1. Choose Data 2. Choose Data Analysis
3. Choose Regression
Simple Linear Regression Example:
Using Excel Data Analysis Function
(continued)
Enter Y range and X range and desired options
Simple Linear Regression Example:
Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price 98.24833 0.10977 (square feet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Simple Linear Regression Example:
Graphical Representation
House price model: Scatter Plot and Prediction Line
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
house price 98.24833 0.10977 (square feet)
Simple Linear Regression
Example: Interpretation of bo
house price 98.24833 0.10977 (square feet)
b0 is the estimated mean value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)
Because a house cannot have a square footage
of 0, b0 has no practical application
Simple Linear Regression
Example: Interpreting b1
house price 98.24833 0.10977 (square feet)
b1 estimates the change in the mean
value of Y as a result of a one-unit
increase in X
Here, b1 = 0.10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Simple Linear Regression
Example: Making Predictions
Predict the price for a house
with 2000 square feet:
house price 98.25 0.1098 (sq.ft.)
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Inferences About the Slope:
t Test Example
H0: Square footage does not
impact house prices
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
b1 Sb1
b1 β 1 0.10977 0
t STAT 3.32938
Sb 0.03297
1
Inferences About the Slope:
t Test Example
H0: β1 = 0
From Excel output: H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
From Minitab output:
Predictor Coef SE Coef T P p-value
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
Decision: Reject H0, since p-value < α
There is sufficient evidence that
square footage affects house price.
F-Test for Significance
Excel Output
Regression Statistics
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 FSTAT 11.0848
Adjusted R MSE 1708.1957
Square 0.52842
Standard Error 41.33032
With 1 and 8 degrees p-value for
Observations 10 of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
F Test for Significance (continued)
H0: β1 = 0 Test Statistic:
H1: β1 ≠ 0 MSR
FSTAT 11.08
= .05 MSE
df1= 1 df2 = 8 Decision:
Critical Reject H0 at = 0.05
Value:
F = 5.32
Conclusion:
= .05
There is sufficient evidence that
0 F house size affects selling price
Do not Reject H0
reject H0
F.05 = 5.32
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
Yi β 0 β1 X1i β 2 X 2i β k X ki ε i
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
Multiple regression equation with k independent variables:
Estimated Estimated
(or predicted) Estimated slope coefficients
value of Y intercept
ˆ b b X b X b X
Yi 0 1 1i 2 2i k ki
In this chapter we will use Excel to obtain the regression
slope coefficients and other regression summary measures.
Example: 2 Independent Variables
A distributor of frozen dessert pies wants to
evaluate factors thought to influence demand
Dependent variable: Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100’s)
Data are collected for 15 weeks
Pie Sales Example
Pie Price Advertising
Week Sales ($) ($100s)
1 350 5.50 3.3 Multiple regression equation:
2 460 7.50 3.3
3
4
350
430
8.00
8.00
3.0
4.5
Sales = b0 + b1 (Price)
5 350 6.80 3.0
+ b2
6 380 7.50 4.0
7 430 4.50 3.0 (Advertising)
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Excel Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Sales 306.526 - 24.975(Price) 74.131(Advertising)
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The Multiple Regression Equation
Sales 306.526 - 24.975(Price) 74.131(Adv ertising)
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of changes effects of changes
due to advertising due to price
Using The Equation to Make Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Sales 306.526 - 24.975(Price) 74.131(Advertising)
306.526 - 24.975 (5.50) 74.131 (3.5)
428.62
Note that Advertising is
Predicted sales in $100s, so $350 means
that X2 = 3.5
is 428.62 pies
The Coefficient of Multiple
Determination, r2
Reports the proportion of total variation in Y
explained by all X variables taken together
SSR regression sum of squares
r
2
SST total sum of squares
Multiple Coefficient of Determination In Excel
Regression Statistics
SSR 29460.0
Multiple R 0.72213
r
2
.52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
52.1% of the variation in pie sales
Standard Error 47.46341
is explained by the variation in
Observations 15
price and advertising
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Is the Model Significant?
F Test for Overall Significance of the Model
Shows if there is a linear relationship between all
of the X variables considered together and Y
Use F-test statistic
Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)
F Test for Overall Significance
Test statistic:
SSR
MSR k
FSTAT
MSE SSE
n k 1
where FSTAT has numerator d.f. = k and
denominator d.f. = (n – k -
1)
F Test for Overall Significance In Excel
(continued)
Regression Statistics
Multiple R 0.72213
R Square 0.52148 MSR 14730.0
FSTAT 6.5386
Adjusted R Square 0.44172 MSE 2252.8
Standard Error 47.46341
Observations 15
With 2 and 12 degrees P-value for
of freedom the F Test
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Are Individual Variables Significant?
Use t tests of individual variable slopes
Shows if there is a linear relationship between
the variable Xj and Y holding constant the
effects of other X variables
Hypotheses:
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)
Are Individual Variables Significant?
(continued)
H0: βj = 0 (no linear relationship between X j and Y)
H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)
Test Statistic:
bj 0
t STAT (df = n – k – 1)
Sb
j
Are Individual Variables
Significant? Excel Output (continued)
Regression Statistics
Multiple R 0.72213
t Stat for Price is tSTAT = -2.306, with
R Square 0.52148 p-value .0398
Adjusted R Square 0.44172
Standard Error 47.46341 t Stat for Advertising is tSTAT = 2.855,
Observations 15 with p-value .0145
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888