Topic
Simple Linear Regression
Learning Objectives
On completion of this lecture/session students should be able to:
• Evaluate relationship between two variables
• Predict relationships between two variables using linear regression model
• Recognise the equation of a simple regression line from a sample of data and interpret
the slope and intercept of the equation
• realise the usefulness of residual analysis in testing the assumptions underlying
regression analysis and in examining the fit of the regression line to the data and
testing model adequacy
• Estimate prediction equation
• Conduct prediction using statistical packages
Key Terms
Simple Linear Regression Model
Simple Linear Regression Equation
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Simple Linear Regression
Managerial decisions often are based on the
relationship between two or more variables.
Regression analysis can be used to develop an
equation showing how the variables are related.
The variable being predicted is called the dependent
variable and is denoted by y.
The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
Simple Linear Regression
Simple linear regression involves one independent
variable and one dependent variable.
The relationship between the two variables is
approximated by a straight line.
Regression analysis involving two or more
independent variables is called multiple regression.
Simple Linear Regression Model
The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:
y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation
The simple linear regression equation is:
E(y) = 0 + 1x
• Graph of the regression equation is a straight line.
• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation
Positive Linear Relationship
E(y)
Regression line
Intercept Slope b1
b0
is positive
x
Simple Linear Regression Equation
Negative Linear Relationship
E(y)
Intercept
b0 Regression line
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Intercept Regression line
b0
Slope b1
is 0
x
Types of Linear Relationships
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
08/14/2020 1
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation
ŷ b0 b1 x
• The graph is called the estimated regression line.
• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
Estimation Process
Regression Model Sample Data:
y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn y n
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ b0 b1 x
b0 and b1 Sample Statistics
b0, b1
Least Squares Method
No need to
Least Squares Criterion memorise
this formula
min (y i y i ) 2
where:
It’s called a “least squares” because the best line of fit is
one that minimizes the variance (the sum of squares of the
errors). ^
Least Squares Method
Slope for the Estimated Regression Equation
b1= slope of the regression Equation
y-Intercept for the Estimated Regression Equation
b0 y b1 x
Simple Linear Regression
Example: Reed Auto Sales
Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
Simple Linear Regression
Example: Reed Auto Sales
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x2 y 20
Estimated Regression Equation
Slope for the Estimated Regression Equation
Assume this is computed and given b1= 5
y-Intercept for the Estimated Regression Equation
b0 y b1 x 20 5(2) 10
Estimated Regression Equation
yˆ 10 5x
Using Excel’s Chart Tools for
Scatter Diagram & Estimated Regression Equation
Reed Auto Sales Estimated Regression Line
30
25
20
Cars Sold
y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads
Coefficient of Determination
• The Coefficient of Determination, also known as R Squared, is
interpreted as the goodness of fit of a regression.
• The higher the coefficient of determination, the better the variance that
the dependent variable is explained by the independent variable.
• The coefficient of determination is the overall measure of the
usefullness of a regression.
If in our example r2 = .8772
The regression relationship is very strong; 87.72%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Sample Correlation Coefficient
rxy (sign of b1 ) Coefficient of Determination
rxy (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ b0 b1 x
correlation coefficient gives us the relationship strength
Sample Correlation Coefficient
rxy (sign of b1 ) r 2
yˆ 10 is
The sign of b1 in the equation 5 x“+”.
rxy = + .8772
rxy = +.9366
Assumptions About the Error Term e
1. The error is a random variable with mean of zero.
2. The variance of , denoted by 2, is the same for
all values of the independent variable.
3. The values of are independent.
4. The error is a normally distributed random
variable.
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
We won’t discuss F
test in this
subject
Two tests are commonly used:
t Test and F Test
Both the t test and F test require an estimate of s 2,
the variance of e in the regression model.
Testing for Significance: t Test
Hypotheses
H0 : 1 0
H a : 1 0 No need to
Memorise this
Test Statistic formula
b1 s
t where sb1
sb1 ( xi x ) 2
Testing for Significance: t Test
Rejection Rule
Reject H0 if p-value < a
or t < -tor t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: t Test
1. Determine the hypotheses. H0 : 1 0
H a : 1 0
2. Specify the level of significance. a = .05
This t-statistics is computed
b1
3. Select the test statistic. t by Excel
sb1
4. State the rejection rule. Reject H0 if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)
Testing for Significance: t Test
5. Compute the value of the test statistic.
b1 5
t 4.63
sb1 1.08
6. Determine whether to reject H0.
t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.
Computer Solution
Performing the regression analysis computations
without the help of a computer can be quite time
consuming.
On the next slide we show Excel output for the
Reed Auto Sales example.
Recall that the independent variable was named Ads
and the dependent variable was named Cars in the
example.
SUMMARY OUTPUT estimated regression equation :
Regression Statistics Cars = 10.0 + 5.00 Ads.
Multiple R 0.937 rxy
R Square 0.877 ANOVA part of Summary
Adjusted R Square 0.836
rxy2 output is not taught
in this subject
Standard Error 2.160
Observations 5.000
Sample size
ANOVA
df SS MS F Significance F
Regression 1 100.000 100.000 21.429 0.019
Residual 3 14.000 4.667
Total 4 114.000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept b0 10.000 2.366 4.226 0.024 2.469 17.531
Number of TV Ads(X)
b1 5.000 1.080 4.629 0.019 1.563 8.437
Sample Data for Model
Weekly Number of Weekly sales model: scatter plot
sales in
$1000s Customers
(Y) (X) 500
245 1400 400
Weekly sales
312 1600 300
279 1700 200
308 1875 100
0
199 1100
0 500 1000 1500 2000 2500 3000
219 1550 Number of customers
405 2350
324 2450
319 1425
255 1700
Regression Using Excel
Tools / Data Analysis / Regression
Excel Output
08/14/2020 3
Graphical Presentation
Weekly sales model: scatter plot and regression line
450
400
350
Slope
Weekly sales
300
= 0.10977
($1000s)
250
200
150
100
50
0
Intercept 0 1000 2000 3000
= 98.248 Number of customers
Weekly sales 98.24833 0.10977 (customers )
Some Cautions about the
Interpretation of Significance Tests
Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
Just because we are able to reject H0: b1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.