SIMPLE LINEAR REGRESSION
15.1 SIMPLE LINEAR REGRESSION represents the data. It represents the percent of the data
A simple linear regression attempts to model the that is the closest to the line of best fit.
relationship between two variables by fitting a linear For example, if 𝑟 = 0.922, then 𝑟 2 = 0.850, this means
equation to observed data. One variable is considered to that 85% of the total variation in 𝑦 can be explained by
be an explanatory variable, and the other is considered the linear relationship between 𝑥 and 𝑦. The other 15% of
to be a dependent variable. the total variation in 𝑦 remains unexplained. If the
Dependent variable – the variable that is being regression line passes exactly through every point on the
estimated or predicted. scatter plot, it would be able to explain all of the
Independent variable – the variable that provides a variation. The further the line is away from the points,
basis for estimation. It is the predictor variable. the less it is able to explain.
The linear regression model postulates that
EXERCISES:
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑒
1. A study was made by a retail merchant to determine
where:
the relation between weekly advertising expenditures
𝑌 = dependent/response variable
and sales. The following data were recorded:
𝑋 = independent/explanatory variable
Advertising Sales ($) Advertising Sales ($)
𝛽0 and 𝛽1= regression coefficients
Costs ($) Costs ($)
𝛽0= 𝑦-intercept of the regression line
40 385 40 490
𝛽1 = slope of the regression line
20 400 20 420
𝑒 = residual/random error
25 395 50 560
In general, the goal of linear regression is to find the 20 365 40 525
line that best predicts 𝑌 from 𝑋, that is, to find the line 30 475 25 480
𝑌 = 𝑎 + 𝑏𝑋 that best estimates the regression model 50 440 50 510
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝑒 by determining 𝑎 and 𝑏 that best a) Plot a scatter diagram.
estimate 𝛽0 and 𝛽1. b) Find the equation of the regression line to predict
**Note that linear regression assumes that the data weekly sales from advertising expenditures.
are linear and it finds the slope and intercept that make a c) Compute the coefficient of determination.
straight line best fit the data. d) Estimate the weekly sales when advertising costs are
15.2 METHOD OF LEAST SQUARES $35.
The slope: 2. A jeans manufacturer knows that a large budget for
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 television advertising of his product will create a demand
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 for it among department store buyers. The following
The 𝒚-intercept: table shows the amount (ten thousand pesos) spent for
∑𝑦 ∑𝑥 advertising the fall line of jeans for eight years and the
𝑎= −𝑏 number of pairs of jeans sold (in thousands) in each fall
𝑛 𝑛
𝑎 = 𝑦̅ − 𝑏𝑥̅ line.
Year 1 2 3 4 5 6 7 8
The goal of linear regression is to adjust the values of Amount 50 65 75 100 125 140 170 195
slope and intercept to find the line that best predicts 𝑌 spent
from 𝑋. More precisely, the goal of regression is to Number 45 60 80 95 120 150 145 190
minimize the sum of the squares of the vertical distances of jeans
of the points from the line. a) Plot a scatter diagram.
b) Find the equation of the regression line.
15.3 THE COEFFICIENT OF DETERMINATION
The coefficient of determination, 𝑟 2 , is used to 3. A woman wants to open a small fashion boutique
determine the proportion of the variance (fluctuation) of business. Before selecting a location, she would like to be
one variable that is predictable from the other variable. It able to predict the profit in pesos that the store may be
allows us to determine how certain one can be in making expected to earn per hundred square feet of selling
predictions from a certain model/graph. space. She gathers the following information:
The coefficient of determination has values from 0 to Store Size
35 22 27 16 28 12 40 32
+1, and measures how well the regression line (ft2)
Profit
20 15 17 9 16 7 22 23
(10,000)
Page 1 of 2
SIMPLE LINEAR REGRESSION
Predict the profit of the store if the store measures 45
square feet.
Page 2 of 2