0% found this document useful (0 votes)

105 views75 pages

Lecture 4 Linear Regression

This document provides an overview of linear regression models. It describes linear regression as representing the relationship between a dependent variable and one or more independent variables with a linear equation. The key steps of regression modeling are outlined as 1) hypothesizing the deterministic relationship, 2) specifying the probability distribution of the error term, 3) evaluating the fitted model, and 4) using the model for prediction and estimation. Ordinary least squares is explained as a method for estimating regression coefficients by minimizing the sum of squared differences between observed and predicted values of the dependent variable.

Uploaded by

Gaurav Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views75 pages

Lecture 4 Linear Regression

Uploaded by

Gaurav Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Linear Regression

Models
Dinesh K. Vishwakarma, Ph.D.

1
Learning Objectives

1. Describe the Linear Regression Model

2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Understand and check model assumptions
6. Predict Response Variable

2
Learning Objectives…

7. Correlation Models
8. Link between a correlation model and a
regression model
9. Test of coefficient of Correlation

3
What is a Model?
1. Representation of Some Phenomenon
2. Non-Maths/Stats Model

EPI 809/Spring 2008

4
What is a Maths/Stats Model?

1. Often Describe Relationship between Variables

2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)

EPI 809/Spring 2008

5
Deterministic Models

1. Hypothesize Exact Relationships

2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of body
fat based.
Weight in Kilograms
• Metric Formula: 𝐵𝑀𝐼 =
(Height in Meters)2

Weight (pounds)×703
• Non-metric Formula: 𝐵𝑀𝐼 =
(Height in inches)2

6
Probabilistic Models

1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns is 6
Times the Age in days + Random Error
• 𝑆𝐵𝑃 = 6 × 𝑎𝑔𝑒 𝑑 + 𝜀
• Random Error may be due to factors other than
age in days (e.g. Birth weight)
7
Bivariate & multivariate models
Bivariate or simple regression model
(Education) x y (Income) Bivariate

Multivariate or multiple regression model

(Education) x1
(Sex) x2
(Experience) x3 y (Income)
Multivariate
(Age) x4

Model with simultaneous relationship

Price of wheat Quantity of wheat produced
Regression Modeling Steps
 1. Hypothesize Deterministic Component
• Estimate Unknown Parameters

 2. Specify Probability Distribution of

Random Error Term
• Estimate Standard Deviation of Error

 3. Evaluate the fitted Model

 4. Use Model for Prediction & Estimation

9
Models Facts

 1. Theory of Field (e.g., Epidemiology)

 2. Mathematical Theory
 3. Previous Research
 4. ‘Common Sense’

10
Thinking Challenge: Which is more
logical?
Grade Grade
1 2

Study Time Study Time

Grade Grade

3 4

Study Time Study Time

Scatter Plot of Data
Y Y Y
1 2 3

X X X
r = -1 r = -.6 r=0

Y
Y Y
4 5

5
X X X
r = +1 r = +.3 r=0
Types of Relationship

Linear relationships Curvilinear relationships

Y Y

1 2

X X

Y Y
4
3

X X
13
Types of Relationship…
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
14
Types of Relationship…
No relationship

X
Linear Regression Models
 A linear regression is one of the easiest
statistical models in machine learning.

 It is used to show the linear relationship

between a dependent variable and one or
more independent variables.

 Relationship between one dependent variable

(y) and explanatory variable (s).

 Regression analysis is a form of predictive

modelling technique which investigates the
relationship between a dependent and
independent variable

16
Types of Regression
Logistic
 Types of Basis Linear Regression
Regression
Regression Core The data is modelled
The data is
modelled using a
Concept using a straight line
sigmoid
 Linear Regression
Categorical
Used with Continuous Variable
Variable
 Logistic Regression
Probability of
Output/Predi
Value of the variable occurrence of an
 Polynomial ction
event

Regression Measured by
Accuracy Accuracy,
Measured by loss, R
 Stepwise Regression and
Goodness of
squared, Adjusted R
Precision, Recall,
F1 score, ROC
squared etc.
Fit curve, Confusion
Matrix, etc

17
Applications of LR

 Evaluating Trends and Sales Estimates

 A company sales analysis (monthly sales vs time)
 Analyzing the Impact of Price Changes
 If company changes the price of a product several
times
 Assessing Risk E.g. health care
 Number claims vs age

18
Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
Linear Regression Model

 Relationship Between Variables is a

Linear Function
Population Population Random
Y-Intercept Slope Error

𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊

Dependent (Response Independent (Explanatory)

Variable) Variable
E.g. Grade E.g. Study Time
Estimating the Coefficients

 The estimates are determined by

 drawing a sample from the population of interest,
 calculating sample statistics.
 producing a straight line that cuts into the data.
1 Question: What should be
y w considered a good line?
w
w 2
w
w w w w w
w w w w w 3
w
x
21
Sum Squared Difference

Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99

4 (2,4) Let us compare two lines

w
The second line is horizontal
3 w (4,3.2)
2.5
2 The smaller the sum of
(1,2) w squared differences
w (3,1.5)
1 the better the fit of the
line to the data.
1 2 3 4
A good line is one that minimizes the sum of squared differences between the points
and the line. 22
Population Linear Regression
Model

Y 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 Observed
value

i = Random error

𝐸(𝑌) = 𝛽0 + 𝛽1 𝑋𝑖

X
Observed value
Simple Linear Regression Model

Y Yi  ˆ0  ˆ1 X i  ˆi

^i = Random
error
Unsampled
observation
Yˆi  ˆ0  ˆ1 X i

Observed value
X
24
Estimating Parameters:
Least Squares Method

25
Scatter plot

 1. Plot of All (Xi, Yi) Pairs

 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
EPI 809/Spring 2008
26
Thinking Challenge

How would you draw a line through the points?

How do you determine which line ‘fits best’?
Slope Un-
changed
Y
60 Slope
40 Change

20
0 X
0 20 40 60
EPI 809/Spring 2008
27
Least Squares Error

 ‘Best Fit’ Means Difference Between Actual Y

Values & Predicted 𝒀 ෡ Values are a Minimum.
But Positive Differences Off-Set Negative ones
 So square errors!

𝑛 ෠ 2 σ𝑛 2
 σ𝑖=1(𝑌𝑖 − 𝑌𝑖 ) = 𝑖=1 𝜀ෝ𝑖

 LS Minimizes the Sum of the Squared

Differences (errors) (SSE)2008
EPI 809/Spring
28
Least Squares Graphically
𝑛

LS Minimizes ෍ 𝜀𝑖Ƹ 2 = 𝜀1Ƹ 2 + 𝜀2Ƹ 2 + 𝜀3Ƹ 2 … … + 𝜀𝑛Ƹ 2

𝑖=1

Y Y2  ˆ0  ˆ1 X 2  ˆ2

^4
^2
^1 ^3
Yˆi  ˆ0  ˆ1 X i
X
29
Coefficient Equations
 Prediction equation

𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖

 Sample slope
ˆ
1 
SS xy

 x  x  y  y 
i i

 x  x 
2
SS xx i

 Sample Y - intercept

ˆ0  y  ˆ1 x

30
Finding 𝜷𝟎 MSE Method
 σ𝑛𝑖=1 𝜀𝑖 2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 = σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 + 𝛽1 . 𝑥𝑖 )2
𝜕 σ𝑛
𝑖=1(𝑦 𝑖 −𝛽0 +𝛽1 .𝑥𝑖 ) 2
 0=
𝜕𝛽0

𝑛 2 2 2 2
𝜕 σ𝑖=1 𝑦𝑖 +𝛽0 +𝛽1 .𝑥𝑖 +2𝛽0 𝛽1 .𝑥𝑖 −2𝑦𝑖 .𝛽0 −2𝑦𝑖 𝛽1 .𝑥𝑖
 =0
𝜕𝛽0

 0=2 σ𝑛𝑖=1 𝛽0 + 2𝛽1 σ𝑛𝑖=1 𝑥𝑖 − 2 σ𝑛𝑖=1 𝑦𝑖

 0 = 2 𝑛𝛽0 + 𝑛𝛽1 𝑥ҧ − 𝑛𝑦ത

Minimum Sum
 ෡𝟎 = 𝒚
𝜷 ෡ 𝟏𝒙
ഥ−𝜷 ഥ Square Error
Method
31
Finding 𝜷𝟏 MSE Method
 σ𝑛𝑖=1 𝜀𝑖 2 = σ𝑛𝑖=1(𝑦𝑖 − 𝑦ො𝑖 )2 = 𝑦𝑖2 + 𝑦ො𝑖 2 − 2𝑦𝑖 . 𝑦ො𝑖
 Where 𝑦ො𝑖 = 𝛽0 + 𝛽1 . 𝑥𝑖
 → σ𝑛𝑖=1 𝑦𝑖2 + (𝛽0 +𝛽1 . 𝑥𝑖 )2 − 2𝑦𝑖 . (𝛽0 + 𝛽1 . 𝑥𝑖 )
 →
𝑛 2 2 2 2
σ𝑖=1 𝑦𝑖 + 𝛽0 + 𝛽1 . 𝑥𝑖 + 2𝛽0 𝛽1 . 𝑥𝑖 − 2𝑦𝑖 . 𝛽0 − 2𝑦𝑖 𝛽1 . 𝑥𝑖
 Finding minimum error, the derivative must be zero:
Optimization
𝜕(σ𝑛 2
𝑖=1 𝜀𝑖 ) 𝜕 σ𝑛
𝑖=1 𝑦𝑖 −(𝛽0 +𝛽1 .𝑥𝑖
2
 = =0
𝜕𝛽1 𝜕𝛽1
 0 = −2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )
 0=−2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦ത + 𝛽1 𝑥ҧ − 𝛽1 𝑥𝑖 ) substituting the value of 𝛽0
32
Minimum Square Error Method…
 0=−2 σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦ത + 𝛽1 𝑥ҧ − 𝛽1 𝑥𝑖 )
• substituting the value of 𝛽0

 𝛽1 σ𝑛𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ = σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦)

ത
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ (𝑦𝑖 −𝑦)
ത
 𝛽1 = σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑥𝑖 −𝑥ҧ
𝑆𝑆𝑥𝑦
 𝛽1 =
𝑆𝑆𝑥𝑥

33
Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1Y1
2 2
X2 Y2 X2 Y2 X2Y2
: : : : :
2 2
Xn Yn Xn Yn XnYn
Xi Yi Xi2
Yi2
XiYi
34
Interpretation of Coefficients
• 1 > 0  Positive Association
• 1 < 0  Negative Association
෢1 )
 Slope (𝛽 • 1 = 0  No Association

 ෢1 for each 1 Unit Increase

Estimated Y Changes by 𝛽
in X
෢1 = 2, then Y Is Expected to Increase by 2 for each 1
• If 𝛽
Unit Increase in X

 Y-Intercept (0)
 Average Value of Y When X = 0
• If 0 = 4, then Average Y is expected to be 4
When X is 0EPI 809/Spring 2008 35
E.g. Parameter Estimation
 What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

36
Scatterplot
Birthweight vs. Estriol level

Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level

37
Parameter Estimation Solution
Table
2 2
Xi Yi Xi Yi XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
38
Parameter Estimation Solution

 X  n 
n
  i   Yi 
n
   1510

i 1 i 1
X iYi  37 
n
ˆ1  i 1
 5  0.70
 X
n 2
15
2

  i 55 
n
  5

i 1
Xi 
2

i 1 n

ˆ0  Y  ˆ1 X  2  0.7 3  0.1 yˆ  .1  .7 x

39
Coefficient Interpretation
Solution
^
 1. Slope (1)
 Birthweight (Y) is Expected to Increase by .7 Units
for Each 1 unit Increase in Estriol (X).
 2. Intercept (0)
 Average Birthweight (Y) is -.10 Units When Estriol
level (X) Is 0
• Difficult to explain
• The birthweight should always be positive

40
Goodness: Variation Measures

Unexplained sum
𝒚y of squares ( yi  yˆi )
2
yi
𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖
Total sum of
squares ( yi  y )2 Explained sum of
squares ( yˆi  y )2
y
x
xi𝒙𝒊 𝒙
Estimation of σ 2

𝑺𝑺𝑬
𝒔𝟐 = Where 𝑺𝑺𝑬 = σ𝒏𝒊=𝟏( 𝒚𝒊 − 𝒚
ෝ 𝒊 )𝟐
𝒏−𝟐

The subtraction of 2 can

be thought of as the fact
that we have estimated
two parameters: 𝛽0 and 𝛽1
𝑺𝑺𝑬
𝒔= 𝒔𝟐 =
𝒏−𝟐
E.g. Compute SSE, s2, s
You’re a marketing analyst for any Toys. You gather
the following data:
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Find SSE, s2, and s.
E.g. Solution: SSE, s2, s
xi yi yˆ  .1  .7 x y  yˆ ( y  yˆ ) 2
1 1 .6 .4 .16
2 1 1.3 -.3 .09
3 2 2 0 0
4 2 2.7 -.7 .49
5 4 3.4 .6 .36
SSE=1.1
SSE 1.1
s 
2
  .36667 s  .36667  .6055
n2 52
Residual Analysis
ei  Yi  Yˆi
 The residual for observation 𝑖, 𝑒𝑖 , is the difference
between its observed and predicted value
 Check the assumptions of regression by examining
the residuals
 Examine for linearity assumption
 Evaluate independence assumption
 Evaluate normal distribution assumption
 Examine for constant variance for all levels of X
(homoscedasticity)
Residual Analysis for Linearity
Residual Analysis for
Independence
RA for Equal Variance
Evaluating the Model
Testing for Significance
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random
error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Test of Slope Coefficient

 Shows if there is a linear relationship between

x and y
 Involves population slope 1
 Hypotheses
 H0: 1 = 0 (No Linear Relationship)
 Ha: 1  0 (Linear Relationship)
 Theoretical basis is sampling distribution of
slope
Distribution of Sample Slopes

y Sample 1 Line All Possible

Sample Slopes
Sample 2 Line Sample 1: 2.5
Sample 2: 1.6
Population Line Sample 3: 1.8
x Sample 4: 2.1
: :
Sampling Distribution Very large number of
sample slopes
S
𝛽መ 11

^
𝛽1 መ
𝛽 11
1
Slope Coefficient Test Statistic

෢𝟏
𝜷 ෢𝟏
𝜷
𝒕= = , 𝒅𝒇 = 𝟐
𝑺𝜷෡ 𝟏 𝑺
𝑺𝑺𝒙𝒙

𝒏
σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
𝑺𝑺𝒙𝒙 = ෍ 𝒙𝒊 −
𝒏
𝒊=𝟏
E.g. Test of Slope Coefficient
You’re a marketing analyst for any Toys.
^ ^
You find β0 = –.1, β1 = .7 and s = .6055.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level of significance?
Solution Table

xi yi xi2 yi2 xiyi

1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Slope Coefficient Test Statistic
෢𝟏 ෢𝟏 𝑊ℎ𝑒𝑟𝑒 𝒏
𝜷 𝜷 σ𝒏𝒊=𝟏 𝒙𝒊 𝟐
𝒕= = , 𝒅𝒇 = 𝟐 𝑺𝑺𝒙𝒙 = ෍ 𝒙𝒊 −
𝑺𝜷෡ 𝟏 𝑺
𝒏
𝑺𝑺𝒙𝒙 𝒊=𝟏

𝒔 𝟎.𝟔𝟎𝟓𝟓
𝑺Type
෡ 𝟏 =equation here.
𝜷 = =. 𝟏𝟗𝟏𝟒
𝑺𝑺𝑿𝑿 𝟏𝟓𝟐
𝟓𝟓−
𝟓

𝟎.𝟕𝟎
t= = 𝟑. 𝟔𝟓𝟕
.𝟏𝟗𝟏𝟒
Test of Slope Coefficient
Solution
 H0: 1 = 0
Reject H0 Reject H0
 Ha: 1  0
.025 .025
   .05
 df  5 - 2 = 3
-3.182 0 3.182 t
 Critical Value(s):

𝜷෢𝟏 𝟎.𝟕𝟎
Test Statistic: 𝒕 = = = 𝟑. 𝟔𝟓𝟕
𝑺𝜷
෡ .𝟏𝟗𝟏𝟒
𝟏

Decision: Reject (H0 ) at  = .05,

Conclusion: There is evidence of a relationship
Correlation Coefficient
Correlation Models

 Answers ‘How strong is the linear

relationship between two variables?’
 Coefficient of correlation
 Sample correlation coefficient denoted r
 Values range from –1 to +1
 Measures degree of association
 Does not indicate cause–effect relationship
Coefficient of Correlation

SS xy
r
SS xx SS yy

  x
2

where SS xx   x 2

n
  y
2

SS yy   y 2

n

SS xy   xy 
  x   y 
n
Correlation Coefficient Values

Perfect Negative No Linear Perfect Positive

Correlation Correlation Correlation

–1.0 –.5 0 +.5 +1.0

Increasing degree of negative Increasing degree of positive

correlation correlation
E.g. Coefficient of Correlation

You’re a marketing analyst for any Toys.

Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Calculate the coefficient of
correlation.
Solution Table

xi yi xi2 yi2 xi yi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution

  x
2
(15) 2
SS xx   x 2   55   10
n 5
  y
2
2
(10)
SS yy   y 2   26  6
n 5

SS xy   xy 
  x   y 
 37 
(15)(10)
7
n 5
SS xy 7
r   .904
SS xx SS yy 10  6
It can be predicted using LR due
High value of Correlation Coefficient
Coefficient of Correlation Challenge
You’re an economist for the county cooperative.
You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the coefficient of correlation.
Solution Table*
2 2
xi yi xi yi x i yi

4 3.0 16 9.00 12

6 5.5 36 30.25 33

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

Coefficient of Correlation
Solution*
  x
2
(32) 2
SS xx   x 2   296   40
n 4
  y
2
2
(24)
SS yy   y 2   162.5   18.5
n 4

SS xy   xy 
  x   y 
 218 
(32)(24)
 26
n 4

SS xy 26
r   .956
SS xx SS yy 40 18.5
Coefficient of Determination
Proportion of variation ‘explained’ by relationship
between x and y

Explained Variation SS yy  SSE 0  r2  1

r 
2

Total Variation SS yy
r2 = (coefficient of correlation)2
E.g. Approximate r 2 Values

Y
r2 = 1

Perfect linear relationship

between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X

X
r2 =1
E.g. Approximate r 2 Values…

r2 = 0
Y
• No linear relationship between X
and Y:
• The value of Y does not depend
on X. (None of the variation in Y
is explained by variation in X)
X
r2 = 0
E.g. Determination Coefficient
You’re a marketing analyst for any Toys. You
know r = .904.
Ad (₹) Sales (Qty)
1 1
2 1
3 2
4 2
5 4
Calculate and interpret the
coefficient of determination.
E.g. Determination Coefficient

r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817

Interpretation: About 81.7% of the sample variation

in Sales (y) can be explained by using Ad ₹ (x) to
predict Sales (y) in the linear model.
Other Evaluation Metrics
𝑚
 Mean Squared Error (MSE) 1
M𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ෝ𝑖 )2
 Most commonly used Metric 𝑚
𝑖=1
 Differentiable due to convex shape
 Easier to optimize.
 Penalizes large errors 𝑚
 Mean Absolute Error (MAE) 1
M𝐴𝐸 = ෍ 𝑦𝑖 − 𝑦ෝ𝑖
𝑚
 Not preferred in cases where outliers are prominent.𝑖=1
 MAE does not penalize large errors.
 Small MAE suggests the model is great at prediction, while a
large MAE suggests that model may have trouble in certain
areas. 73
Other Evaluation Metrics…
 Root Mean Squared Error (RMSE)
 RMSE measures the scatter of these residuals
 RMSE penalizes large errors.
 Lower RMSE indicates model is better for
predictions.
 Higher RMSE indicates, large deviations between
the predicted and actual value.
𝑚
1
𝑅M𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ෝ𝑖 )2
𝑚
𝑖=1 74
Conclusion

1. Described the Linear Regression Model

2. Stated the Regression Modeling Steps
3. Explained Least Squares
4. Computed Regression Coefficients
5. Explained Correlation
6. Predicted Response Variable

Topic 3a
No ratings yet
Topic 3a
64 pages
Fu Ch11 Linear Regression
No ratings yet
Fu Ch11 Linear Regression
70 pages
Linear Regression
No ratings yet
Linear Regression
110 pages
Linear Regression
No ratings yet
Linear Regression
65 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Chapter 10 - 2 - 2
No ratings yet
Chapter 10 - 2 - 2
33 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Notes 1
No ratings yet
Notes 1
26 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
QBM 101 Lecture 10
No ratings yet
QBM 101 Lecture 10
45 pages
Lecture 1 - Simple Linear Regression
No ratings yet
Lecture 1 - Simple Linear Regression
9 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Fu Ch11 Linear Regression
No ratings yet
Fu Ch11 Linear Regression
70 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
60 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Chapter 4 - Notes
No ratings yet
Chapter 4 - Notes
58 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression
No ratings yet
Regression
9 pages
Regression and Correlation Methods
No ratings yet
Regression and Correlation Methods
70 pages
Regression
No ratings yet
Regression
24 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Regression
No ratings yet
Regression
39 pages
Linear Regression Basics Guide
No ratings yet
Linear Regression Basics Guide
6 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
MLR and Regression
No ratings yet
MLR and Regression
30 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Correlation Least Squares
No ratings yet
Correlation Least Squares
59 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Iml Unit III
No ratings yet
Iml Unit III
18 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
Fu Ch11 Linear Regression
No ratings yet
Fu Ch11 Linear Regression
70 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
28 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
9 pages
Park 2021
No ratings yet
Park 2021
9 pages
Artificial Intelligence Methods in Spare Parts Demand Forecasting
No ratings yet
Artificial Intelligence Methods in Spare Parts Demand Forecasting
11 pages
Quelopana 2019 - Released Volume Estimation For Dam Break Analysis
No ratings yet
Quelopana 2019 - Released Volume Estimation For Dam Break Analysis
13 pages
Dubey 2018
No ratings yet
Dubey 2018
11 pages
SciAps Calib
No ratings yet
SciAps Calib
19 pages
Testing of Two 30-Inch Magnetic Flow Meters
No ratings yet
Testing of Two 30-Inch Magnetic Flow Meters
36 pages
Tesla Battery SOC Estimation Guide
No ratings yet
Tesla Battery SOC Estimation Guide
48 pages
A Simplified Pavement Condition Index Regression
No ratings yet
A Simplified Pavement Condition Index Regression
11 pages
42 IJTPE Issue59 Vol16 No2 Jun2024 331 341
No ratings yet
42 IJTPE Issue59 Vol16 No2 Jun2024 331 341
12 pages
MOOC Econometrics 6
100% (1)
MOOC Econometrics 6
4 pages
p3 Assesslearners Report
No ratings yet
p3 Assesslearners Report
7 pages
Technologies 10 00005
No ratings yet
Technologies 10 00005
11 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
Google Merchandise Store Data Analysis: - Google Analytics Customer Revenue Prediction
No ratings yet
Google Merchandise Store Data Analysis: - Google Analytics Customer Revenue Prediction
15 pages
Determination of Moisture in Corn Kernels by Near-Infrared Transmittance Measurements - Finney - 1978 PDF
No ratings yet
Determination of Moisture in Corn Kernels by Near-Infrared Transmittance Measurements - Finney - 1978 PDF
4 pages
Range-Extension Algorithms and Strategies For TDOA Ultra-Wideband Positioning System
No ratings yet
Range-Extension Algorithms and Strategies For TDOA Ultra-Wideband Positioning System
27 pages
CKD Prediction with ML Algorithms
No ratings yet
CKD Prediction with ML Algorithms
4 pages
Forecastingtime Series
No ratings yet
Forecastingtime Series
24 pages
Intraday Trading with Self-Organizing Maps
No ratings yet
Intraday Trading with Self-Organizing Maps
7 pages
Result-Based Talent Identification in Road Cycling - Discovering
No ratings yet
Result-Based Talent Identification in Road Cycling - Discovering
18 pages
EV Battery SOC Estimation via Deep Learning
No ratings yet
EV Battery SOC Estimation via Deep Learning
9 pages
Prakhar Final Presentation 29 MAY
No ratings yet
Prakhar Final Presentation 29 MAY
27 pages
Mir & Dwivedi 2023
No ratings yet
Mir & Dwivedi 2023
15 pages
Analisis Deret Waktu ARIMA Model
No ratings yet
Analisis Deret Waktu ARIMA Model
11 pages
Housing Price Prediction
No ratings yet
Housing Price Prediction
25 pages
Research Paper Rain Prediction System
No ratings yet
Research Paper Rain Prediction System
6 pages
Financial Time Series Forecasting by Combining Anfis With Various Aggregation Operators
No ratings yet
Financial Time Series Forecasting by Combining Anfis With Various Aggregation Operators
48 pages
Researchpaper
No ratings yet
Researchpaper
20 pages
A New Procedure To Estimate BLEVE Overpressure
No ratings yet
A New Procedure To Estimate BLEVE Overpressure
16 pages
Solar Radiation Models for Tetuan
No ratings yet
Solar Radiation Models for Tetuan
13 pages

Lecture 4 Linear Regression

Uploaded by

Lecture 4 Linear Regression

Uploaded by

Linear Regression

1. Describe the Linear Regression Model

EPI 809/Spring 2008

1. Often Describe Relationship between Variables

EPI 809/Spring 2008

1. Hypothesize Exact Relationships

Multivariate or multiple regression model

Model with simultaneous relationship

 2. Specify Probability Distribution of

 3. Evaluate the fitted Model

 1. Theory of Field (e.g., Epidemiology)

Study Time Study Time

Study Time Study Time

Linear relationships Curvilinear relationships

 It is used to show the linear relationship

 Relationship between one dependent variable

 Regression analysis is a form of predictive

 Evaluating Trends and Sales Estimates

 Relationship Between Variables is a

Dependent (Response Independent (Explanatory)

 The estimates are determined by

4 (2,4) Let us compare two lines

Y Yi  ˆ0  ˆ1 X i  ˆi

 1. Plot of All (Xi, Yi) Pairs

How would you draw a line through the points?

 ‘Best Fit’ Means Difference Between Actual Y

 LS Minimizes the Sum of the Squared

LS Minimizes ෍ 𝜀𝑖Ƹ 2 = 𝜀1Ƹ 2 + 𝜀2Ƹ 2 + 𝜀3Ƹ 2 … … + 𝜀𝑛Ƹ 2

Y Y2  ˆ0  ˆ1 X 2  ˆ2

𝑦ො𝑖 = 𝛽መ0 + 𝛽መ1 𝑥𝑖

 0=2 σ𝑛𝑖=1 𝛽0 + 2𝛽1 σ𝑛𝑖=1 𝑥𝑖 − 2 σ𝑛𝑖=1 𝑦𝑖

 0 = 2 𝑛𝛽0 + 𝑛𝛽1 𝑥ҧ − 𝑛𝑦ത

 𝛽1 σ𝑛𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ = σ𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦)

 ෢1 for each 1 Unit Increase

ˆ0  Y  ˆ1 X  2  0.7 3  0.1 yˆ  .1  .7 x

The subtraction of 2 can

 Shows if there is a linear relationship between

y Sample 1 Line All Possible

xi yi xi2 yi2 xiyi

Decision: Reject (H0 ) at  = .05,

 Answers ‘How strong is the linear

Perfect Negative No Linear Perfect Positive

–1.0 –.5 0 +.5 +1.0

Increasing degree of negative Increasing degree of positive

You’re a marketing analyst for any Toys.

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

Explained Variation SS yy  SSE 0  r2  1

Perfect linear relationship

Interpretation: About 81.7% of the sample variation

1. Described the Linear Regression Model

You might also like