Linear Regression
Models
Dinesh K. Vishwakarma, Ph.D.
1
Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Understand and check model assumptions
6. Predict Response Variable
2
Learning Objectives…
7. Correlation Models
8. Link between a correlation model and a
regression model
9. Test of coefficient of Correlation
3
What is a Model?
1. Representation of Some Phenomenon
2. Non-Maths/Stats Model
EPI 809/Spring 2008
4
What is a Maths/Stats Model?
1. Often Describe Relationship between Variables
2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)
EPI 809/Spring 2008
5
Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of body
fat based.
Weight in Kilograms
• Metric Formula: 𝐵𝑀𝐼 =
(Height in Meters)2
Weight (pounds)×703
• Non-metric Formula: 𝐵𝑀𝐼 =
(Height in inches)2
6
Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns is 6
Times the Age in days + Random Error
• 𝑆𝐵𝑃 = 6 × 𝑎𝑔𝑒 𝑑 + 𝜀
• Random Error may be due to factors other than
age in days (e.g. Birth weight)
7
Bivariate & multivariate models
Bivariate or simple regression model
(Education) x y (Income) Bivariate
Multivariate or multiple regression model
(Education) x1
(Sex) x2
(Experience) x3 y (Income)
Multivariate
(Age) x4
Model with simultaneous relationship
Price of wheat Quantity of wheat produced
Regression Modeling Steps
1. Hypothesize Deterministic Component
• Estimate Unknown Parameters
2. Specify Probability Distribution of
Random Error Term
• Estimate Standard Deviation of Error
3. Evaluate the fitted Model
4. Use Model for Prediction & Estimation
9
Models Facts
1. Theory of Field (e.g., Epidemiology)
2. Mathematical Theory
3. Previous Research
4. ‘Common Sense’
10
Thinking Challenge: Which is more
logical?
Grade Grade
1 2
Study Time Study Time
Grade Grade
3 4
Study Time Study Time
Scatter Plot of Data
Y Y Y
1 2 3
X X X
r = -1 r = -.6 r=0
Y
Y Y
4 5
5
X X X
r = +1 r = +.3 r=0
Types of Relationship
Linear relationships Curvilinear relationships
Y Y
1 2
X X
Y Y
4
3
X X
13
Types of Relationship…
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
14
Types of Relationship…
No relationship
X
Linear Regression Models
A linear regression is one of the easiest
statistical models in machine learning.
It is used to show the linear relationship
between a dependent variable and one or
more independent variables.
Relationship between one dependent variable
(y) and explanatory variable (s).
Regression analysis is a form of predictive
modelling technique which investigates the
relationship between a dependent and
independent variable
16
Types of Regression
Logistic
Types of Basis Linear Regression
Regression
Regression Core The data is modelled
The data is
modelled using a
Concept using a straight line
sigmoid
Linear Regression
Categorical
Used with Continuous Variable
Variable
Logistic Regression
Probability of
Output/Predi
Value of the variable occurrence of an
Polynomial ction
event
Regression Measured by
Accuracy Accuracy,
Measured by loss, R
Stepwise Regression and
Goodness of
squared, Adjusted R
Precision, Recall,
F1 score, ROC
squared etc.
Fit curve, Confusion
Matrix, etc
17
Applications of LR
Evaluating Trends and Sales Estimates
A company sales analysis (monthly sales vs time)
Analyzing the Impact of Price Changes
If company changes the price of a product several
times
Assessing Risk E.g. health care
Number claims vs age
18
Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
Linear Regression Model
Relationship Between Variables is a
Linear Function
Population Population Random
Y-Intercept Slope Error
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊
Dependent (Response Independent (Explanatory)
Variable) Variable
E.g. Grade E.g. Study Time
Estimating the Coefficients
The estimates are determined by
drawing a sample from the population of interest,
calculating sample statistics.
producing a straight line that cuts into the data.
1 Question: What should be
y w considered a good line?
w
w 2
w
w w w w w
w w w w w 3
w
x
21
Sum Squared Difference
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
4 (2,4) Let us compare two lines
w
The second line is horizontal
3 w (4,3.2)
2.5
2 The smaller the sum of
(1,2) w squared differences
w (3,1.5)
1 the better the fit of the
line to the data.
1 2 3 4
A good line is one that minimizes the sum of squared differences between the points
and the line. 22
Population Linear Regression
Model
Y 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 Observed
value
i = Random error
𝐸(𝑌) = 𝛽0 + 𝛽1 𝑋𝑖
X
Observed value
Simple Linear Regression Model
Y Yi ˆ0 ˆ1 X i ˆi
^i = Random
error
Unsampled
observation
Yˆi ˆ0 ˆ1 X i
Observed value
X
24
Estimating Parameters:
Least Squares Method
25
Scatter plot
1. Plot of All (Xi, Yi) Pairs
2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60
EPI 809/Spring 2008
26
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope Un-
changed
Y
60 Slope
40 Change
20
0 X
0 20 40 60
EPI 809/Spring 2008
27
Least Squares Error
‘Best Fit’ Means Difference Between Actual Y
Values & Predicted 𝒀 Values are a Minimum.
But Positive Differences Off-Set Negative ones
So square errors!
𝑛 2 σ𝑛 2
σ𝑖=1(𝑌𝑖 − 𝑌𝑖 ) = 𝑖=1 𝜀ෝ𝑖
LS Minimizes the Sum of the Squared
Differences (errors) (SSE)2008
EPI 809/Spring
28
Least Squares Graphically
𝑛
LS Minimizes 𝜀𝑖Ƹ 2 = 𝜀1Ƹ 2 + 𝜀2Ƹ 2 + 𝜀3Ƹ 2 … … + 𝜀𝑛Ƹ 2
𝑖=1
Y Y2 ˆ0 ˆ1 X 2 ˆ2
^4
^2
^1 ^3
Yˆi ˆ0 ˆ1 X i
X
29
Coefficient Equations
Prediction equation
𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
Sample slope
ˆ
1
SS xy
x x y y
i i
x x
2
SS xx i
Sample Y - intercept
ˆ0 y ˆ1 x
30
Finding 𝜷𝟎 MSE Method
σ𝑛𝑖=1 𝜀𝑖 2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 = σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 + 𝛽1 . 𝑥𝑖 )2
𝜕 σ𝑛
𝑖=1(𝑦 𝑖 −𝛽0 +𝛽1 .𝑥𝑖 ) 2
0=
𝜕𝛽0
𝑛 2 2 2 2
𝜕 σ𝑖=1 𝑦𝑖 +𝛽0 +𝛽1 .𝑥𝑖 +2𝛽0 𝛽1 .𝑥𝑖 −2𝑦𝑖 .𝛽0 −2𝑦𝑖 𝛽1 .𝑥𝑖
=0
𝜕𝛽0
0=2 σ𝑛𝑖=1 𝛽0 + 2𝛽1 σ𝑛𝑖=1 𝑥𝑖 − 2 σ𝑛𝑖=1 𝑦𝑖
0 = 2 𝑛𝛽0 + 𝑛𝛽1 𝑥ҧ − 𝑛𝑦ത
Minimum Sum
𝟎 = 𝒚
𝜷 𝟏𝒙
ഥ−𝜷 ഥ Square Error
Method
31
Finding 𝜷𝟏 MSE Method
σ𝑛𝑖=1 𝜀𝑖 2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 = 𝑦𝑖2 + 𝑦ො𝑖 2 − 2𝑦𝑖 . 𝑦ො𝑖
Where 𝑦ො𝑖 = 𝛽0 + 𝛽1 . 𝑥𝑖
→ σ𝑛𝑖=1 𝑦𝑖2 + (𝛽0 +𝛽1 . 𝑥𝑖 )2 − 2𝑦𝑖 . (𝛽0 + 𝛽1 . 𝑥𝑖 )
→
𝑛 2 2 2 2
σ𝑖=1 𝑦𝑖 + 𝛽0 + 𝛽1 . 𝑥𝑖 + 2𝛽0 𝛽1 . 𝑥𝑖 − 2𝑦𝑖 . 𝛽0 − 2𝑦𝑖 𝛽1 . 𝑥𝑖
Finding minimum error, the derivative must be zero:
Optimization
𝜕(σ𝑛 2
𝑖=1 𝜀𝑖 ) 𝜕 σ𝑛
𝑖=1 𝑦𝑖 −(𝛽0 +𝛽1 .𝑥𝑖
2
= =0
𝜕𝛽1 𝜕𝛽1
0 = −2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )
0=−2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦ത + 𝛽1 𝑥ҧ − 𝛽1 𝑥𝑖 ) substituting the value of 𝛽0
32
Minimum Square Error Method…
0=−2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦ത + 𝛽1 𝑥ҧ − 𝛽1 𝑥𝑖 )
• substituting the value of 𝛽0
𝛽1 σ𝑛𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ = σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦)
ത
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ (𝑦𝑖 −𝑦)
ത
𝛽1 = σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑥𝑖 −𝑥ҧ
𝑆𝑆𝑥𝑦
𝛽1 =
𝑆𝑆𝑥𝑥
33
Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1Y1
2 2
X2 Y2 X2 Y2 X2Y2
: : : : :
2 2
Xn Yn Xn Yn XnYn
Xi Yi Xi2
Yi2
XiYi
34
Interpretation of Coefficients
• 1 > 0 Positive Association
• 1 < 0 Negative Association
1 )
Slope (𝛽 • 1 = 0 No Association
1 for each 1 Unit Increase
Estimated Y Changes by 𝛽
in X
1 = 2, then Y Is Expected to Increase by 2 for each 1
• If 𝛽
Unit Increase in X
Y-Intercept (0)
Average Value of Y When X = 0
• If 0 = 4, then Average Y is expected to be 4
When X is 0EPI 809/Spring 2008 35
E.g. Parameter Estimation
What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4
36
Scatterplot
Birthweight vs. Estriol level
Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level
37
Parameter Estimation Solution
Table
2 2
Xi Yi Xi Yi XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
38
Parameter Estimation Solution
X n
n
i Yi
n
1510
i 1 i 1
X iYi 37
n
ˆ1 i 1
5 0.70
X
n 2
15
2
i 55
n
5
i 1
Xi
2
i 1 n
ˆ0 Y ˆ1 X 2 0.7 3 0.1 yˆ .1 .7 x
39
Coefficient Interpretation
Solution
^
1. Slope (1)
Birthweight (Y) is Expected to Increase by .7 Units
for Each 1 unit Increase in Estriol (X).
2. Intercept (0)
Average Birthweight (Y) is -.10 Units When Estriol
level (X) Is 0
• Difficult to explain
• The birthweight should always be positive
40
Goodness: Variation Measures
Unexplained sum
𝒚y of squares ( yi yˆi )
2
yi
𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
Total sum of
squares ( yi y )2 Explained sum of
squares ( yˆi y )2
y
x
xi𝒙𝒊 𝒙
Estimation of σ 2
𝑺𝑺𝑬
𝒔𝟐 = Where 𝑺𝑺𝑬 = σ𝒏𝒊=𝟏( 𝒚𝒊 − 𝒚
ෝ 𝒊 )𝟐
𝒏−𝟐
The subtraction of 2 can
be thought of as the fact
that we have estimated
two parameters: 𝛽0 and 𝛽1
𝑺𝑺𝑬
𝒔= 𝒔𝟐 =
𝒏−𝟐
E.g. Compute SSE, s2, s
You’re a marketing analyst for any Toys. You gather
the following data:
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Find SSE, s2, and s.
E.g. Solution: SSE, s2, s
xi yi yˆ .1 .7 x y yˆ ( y yˆ ) 2
1 1 .6 .4 .16
2 1 1.3 -.3 .09
3 2 2 0 0
4 2 2.7 -.7 .49
5 4 3.4 .6 .36
SSE=1.1
SSE 1.1
s
2
.36667 s .36667 .6055
n2 52
Residual Analysis
ei Yi Yˆi
The residual for observation 𝑖, 𝑒𝑖 , is the difference
between its observed and predicted value
Check the assumptions of regression by examining
the residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X
(homoscedasticity)
Residual Analysis for Linearity
Residual Analysis for
Independence
RA for Equal Variance
Evaluating the Model
Testing for Significance
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random
error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Test of Slope Coefficient
Shows if there is a linear relationship between
x and y
Involves population slope 1
Hypotheses
H0: 1 = 0 (No Linear Relationship)
Ha: 1 0 (Linear Relationship)
Theoretical basis is sampling distribution of
slope
Distribution of Sample Slopes
y Sample 1 Line All Possible
Sample Slopes
Sample 2 Line Sample 1: 2.5
Sample 2: 1.6
Population Line Sample 3: 1.8
x Sample 4: 2.1
: :
Sampling Distribution Very large number of
sample slopes
S
𝛽መ 11
^
𝛽1 መ
𝛽 11
1
Slope Coefficient Test Statistic
𝟏
𝜷 𝟏
𝜷
𝒕= = , 𝒅𝒇 = 𝟐
𝑺𝜷 𝟏 𝑺
𝑺𝑺𝒙𝒙
𝒏
σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
𝑺𝑺𝒙𝒙 = 𝒙𝒊 −
𝒏
𝒊=𝟏
E.g. Test of Slope Coefficient
You’re a marketing analyst for any Toys.
^ ^
You find β0 = –.1, β1 = .7 and s = .6055.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level of significance?
Solution Table
xi yi xi2 yi2 xiyi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Slope Coefficient Test Statistic
𝟏 𝟏 𝑊ℎ𝑒𝑟𝑒 𝒏
𝜷 𝜷 σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
𝒕= = , 𝒅𝒇 = 𝟐 𝑺𝑺𝒙𝒙 = 𝒙𝒊 −
𝑺𝜷 𝟏 𝑺
𝒏
𝑺𝑺𝒙𝒙 𝒊=𝟏
𝒔 𝟎.𝟔𝟎𝟓𝟓
𝑺Type
𝟏 =equation here.
𝜷 = =. 𝟏𝟗𝟏𝟒
𝑺𝑺𝑿𝑿 𝟏𝟓𝟐
𝟓𝟓−
𝟓
𝟎.𝟕𝟎
t= = 𝟑. 𝟔𝟓𝟕
.𝟏𝟗𝟏𝟒
Test of Slope Coefficient
Solution
H0: 1 = 0
Reject H0 Reject H0
Ha: 1 0
.025 .025
.05
df 5 - 2 = 3
-3.182 0 3.182 t
Critical Value(s):
𝜷𝟏 𝟎.𝟕𝟎
Test Statistic: 𝒕 = = = 𝟑. 𝟔𝟓𝟕
𝑺𝜷
.𝟏𝟗𝟏𝟒
𝟏
Decision: Reject (H0 ) at = .05,
Conclusion: There is evidence of a relationship
Correlation Coefficient
Correlation Models
Answers ‘How strong is the linear
relationship between two variables?’
Coefficient of correlation
Sample correlation coefficient denoted r
Values range from –1 to +1
Measures degree of association
Does not indicate cause–effect relationship
Coefficient of Correlation
SS xy
r
SS xx SS yy
x
2
where SS xx x 2
n
y
2
SS yy y 2
n
SS xy xy
x y
n
Correlation Coefficient Values
Perfect Negative No Linear Perfect Positive
Correlation Correlation Correlation
–1.0 –.5 0 +.5 +1.0
Increasing degree of negative Increasing degree of positive
correlation correlation
E.g. Coefficient of Correlation
You’re a marketing analyst for any Toys.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Calculate the coefficient of
correlation.
Solution Table
xi yi xi2 yi2 xi yi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution
x
2
(15) 2
SS xx x 2 55 10
n 5
y
2
2
(10)
SS yy y 2 26 6
n 5
SS xy xy
x y
37
(15)(10)
7
n 5
SS xy 7
r .904
SS xx SS yy 10 6
It can be predicted using LR due
High value of Correlation Coefficient
Coefficient of Correlation Challenge
You’re an economist for the county cooperative.
You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the coefficient of correlation.
Solution Table*
2 2
xi yi xi yi x i yi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
Coefficient of Correlation
Solution*
x
2
(32) 2
SS xx x 2 296 40
n 4
y
2
2
(24)
SS yy y 2 162.5 18.5
n 4
SS xy xy
x y
218
(32)(24)
26
n 4
SS xy 26
r .956
SS xx SS yy 40 18.5
Coefficient of Determination
Proportion of variation ‘explained’ by relationship
between x and y
Explained Variation SS yy SSE 0 r2 1
r
2
Total Variation SS yy
r2 = (coefficient of correlation)2
E.g. Approximate r 2 Values
Y
r2 = 1
Perfect linear relationship
between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X
X
r2 =1
E.g. Approximate r 2 Values…
r2 = 0
Y
• No linear relationship between X
and Y:
• The value of Y does not depend
on X. (None of the variation in Y
is explained by variation in X)
X
r2 = 0
E.g. Determination Coefficient
You’re a marketing analyst for any Toys. You
know r = .904.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the
coefficient of determination.
E.g. Determination Coefficient
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817
Interpretation: About 81.7% of the sample variation
in Sales (y) can be explained by using Ad ₹ (x) to
predict Sales (y) in the linear model.
Other Evaluation Metrics
𝑚
Mean Squared Error (MSE) 1
M𝑆𝐸 = (𝑦𝑖 − 𝑦ෝ𝑖 )2
Most commonly used Metric 𝑚
𝑖=1
Differentiable due to convex shape
Easier to optimize.
Penalizes large errors 𝑚
Mean Absolute Error (MAE) 1
M𝐴𝐸 = 𝑦𝑖 − 𝑦ෝ𝑖
𝑚
Not preferred in cases where outliers are prominent.𝑖=1
MAE does not penalize large errors.
Small MAE suggests the model is great at prediction, while a
large MAE suggests that model may have trouble in certain
areas. 73
Other Evaluation Metrics…
Root Mean Squared Error (RMSE)
RMSE measures the scatter of these residuals
RMSE penalizes large errors.
Lower RMSE indicates model is better for
predictions.
Higher RMSE indicates, large deviations between
the predicted and actual value.
𝑚
1
𝑅M𝑆𝐸 = (𝑦𝑖 − 𝑦ෝ𝑖 )2
𝑚
𝑖=1 74
Conclusion
1. Described the Linear Regression Model
2. Stated the Regression Modeling Steps
3. Explained Least Squares
4. Computed Regression Coefficients
5. Explained Correlation
6. Predicted Response Variable