2603282
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
Week 14: 16 November, 2023
1/ 53
2603282
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
Week 14: 16 November, 2023
1/ 53
วทลดการ ส ป บ
* เท
2603282
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
Week 14: 16 November, 2023
1/ 53
จั
รุ
Flow
2/ 53
Regression
Objective To study the relationship between two types of variables (X and Y )
and/or use X to predict Y .
3/ 53
Regression
Example 1 American Express Company has long believed that its cardholders
tend to travel more extensively than others - both on business and for pleasure.
As part of a comprehensive research e↵ort undertaken by a New York market
research firm on behalf of American Express, a study was conducted to
determine the relationship between travel and charges on the American Express
card. The research firm selected a random sample of 25 cardholders from the
American Express computer file and recorded their total charges over a
specified period. For the selected cardholders, information was also obtained,
through a mailed questionnaire, on the total number of miles traveled by each
cardholder during the same period.
4/ 53
Regression
Example 1: Data Layout
5/ 53
Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.
6/ 53
Regression
Example 2: Data Layout
7/ 53
Regression: Model
1 Simple Linear Regression (Example 1)
One X ! Simple Linear Regression (SLR)
Y = 0 + 1X +✏
2 Multiple Linear Regression (Example 2)
More than one X ! Multiple Linear Regression (MLR)
Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏
8/ 53
Regression: Model
Note
1 Relationship between X and Y is linear.
2 We call X , X1 , X2 , . . . , Xk as independent variables or predictor variables.
We have this data.
3 Y is a dependent variable or variable that we want to predict. We have
this data.
4 ✏ is the error term or the residual term. We do not have this data.
5 Data type of all X and Y are at least interval scale. (X can be
categorical variable but you need more steps than material in this course)
6 0 , 1 , 2 , . . . , k are the coefficients. We will estimates these values to
get the relationship or prediction model.
9/ 53
Regression: Relationship between one X and Y
source: Complete Business Statistics 7th Edition, Aczel - Sounderpandian
10/ 53
Simple Linear Regression: Model
1x 1 ว 3 หา า า 30,81 โด ย
ี
ท ให้ erro rน อยท
use G sordinary
least square
Y = 0 + 1X +✏
0 is the intercept of the model.
1 is the slope of the model.
E (Y |X ) = 0 + 1X ) Average of Y given X
11/ 53
ค่
ตั
น้
ำ
Simple Linear Regression: Estimation
Y = 0 + 1X +✏
Parameters: 0, 1
ŷ = b0 + b1 x
Estimators: ˆ0 = b0 , ˆ1 = b1
Also get ✏ˆ = e from Y ŷ
Important !! Choose the best fitting line for the data
The best model, the smallest error.
12/ 53
Simple Linear Regression: Estimation
13/ 53
Simple Linear Regression: Estimation
Estimation method “Least Square method”
Objective is to minimize the Sum Square Error (SSE)
P P
Sum square error = SSE = ni=1 ei2 = ni=1 (yi ŷi )2
14/ 53
Simple Linear Regression: Estimation
SPSS Results
15/ 53
Simple Linear Regression: Example
The simple linear regression model for this data set is
Y = 0 + 1X + ✏.
We then get
ŷ = b0 + b1 x
ŷ = 274.85 + 1.255x.
Next, we need to clarify that there is a relationship between x and y (slope are
necessary to stay in the model). ) It means we need to do the hypothesis
testing that 1 6= 0.
Hypothesis setting H0 : 1 = 0, H1 : 1 6= 0
b1
Test statistic tcal = 10
s(b1 )
ŷ = 274.85 + 1.255x
16/ 53
Simple Linear Regression: Assumptions
Y = 0 + 1X +✏
Model assumptions
1 The relationship between X and Y is a straight-line relationship.
2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )
17/ 53
Simple Linear Regression: Assumptions
• The relationship between X and Y is a straight-line
relationship.
18/ 53
Simple Linear Regression: Assumptions
• ✏ ⇠ N(0, 2
) Normality assumption
• K-S or S-W test
19/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a
2
constant variance . The errors are uncorrelated (not related)
with one another in successive observations.
20/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one
another in successive observations.
21/ 53
Simple Linear Regression: Example
Example 1: American Express Company . . . , a study was conducted to
determine the relationship between travel and charges on the American Express
card. . . .
22/ 53
Simple Linear Regression: Example
First, see whether X and Y pass the linear assumption. However, the error
term, ✏, cannot be checked right now. We need to wait until we get ŷ , so we
can calculate the following e.
23/ 53
Simple Linear Regression: Example Con’t
Check assumption
24/ 53
Simple Linear Regression: Interpretation
The simple linear regression model for this data set is
ŷ = b0 + b1 x
ŷ = 274.85 + 1.26x
In practice, the two objectives of regression model are
1 to study relationship
b1 : when x increases 1 unit, y also increase b1 unit.
2 to make a prediction eg. when x = 2, 000, ŷ = 2, 785.52, e = .....
25/ 53
Simple Linear Regression: Prediction
Note
1 We use x to predict y but not vice versa.
2 You should be aware that using a regression for extrapolating outside the
estimation range is risky, as the estimated relationship may not be
appropriate outside this range.
26/ 53
Simple Linear Regression: Model Evaluation
How good the model is?
The coefficient of determination R 2 is a measure of the strength of the
regression relationship, a measure of how well the regression line fits the data.
SSR SSE
R2 = =1 ; 0 R2 1
SST SST
The coefficient of determination can be interpreted as the proportion of the
variation in Y that is explained by the regression relationship of Y with X .
27/ 53
Simple Linear Regression: Model Evaluation
28/ 53
Simple Linear Regression: Example Con’t
2
SSxy
R2 =
SSx SSy
= 0.97
Meaning: Dollars spend on the American Express card (Y ) can be explained by
the relationship between the amount spent (Y ) and the travel miles (X ) by
97%.
29/ 53
Multiple Linear Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.
30/ 53
Multiple Linear Regression
Example 2: Data Layout
31/ 53
Multiple Linear Regression
1 Simple Linear Regression
Y = 0 + 1X +✏
2 Multiple Linear Regression
Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏
32/ 53
Multiple Linear Regression: Assumptions
Model assumptions
1 The relationship between X and Y is a straight-line relationship.
2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )
3 All independent vaiables are uncorrelated. No multicollinearity problem!
33/ 53
Multiple Linear Regression: Estimation
Example 2:
We will get b0 = 47.165, b1 = 1.599, b2 = 1.149.
34/ 53
Multiple Linear Regression: Estimation
Example 2 Con’t:
How to get b0 = 47.165, b1 = 1.599, b2 = 1.149?
Excel: Data ! Data Analysis ! Regression
ŷ = 47.165 + 1.599x1 + 1.149x2
35/ 53
Multiple Linear Regression: Hypothesis Testing
Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏
A hypothesis test for the existence of a linear relationship between any of the
Xi and Y is
H0 : 1 = 2 = 3 = ··· = k =0
H1 : Not all i are zero.
Look similar to . . . . . . . . . . . . . . . . . . . . . test
36/ 53
Multiple Linear Regression: Hypothesis Testing
Variation Sum Square df Mean Square F Ratio
SSR MSR
Regression SSR k MSR = k
Fcal = MSE
SSE
Error SSE n (k + 1) MSE = n (k+1)
Total SST n 1
Excel: Data ! Data Analysis ! Regression
Example 2 Con’t:
37/ 53
Multiple Linear Regression: Hypothesis Testing
Rejection region: Fcal > Ftable = F↵,k,n (k+1)
38/ 53
Multiple Linear Regression: Hypothesis Testing
Next question: Which i is/are significant?
Tests individual regression slope parameter
H0 : i =0
Ha : i 6= 0
Test statistics
bi 0
tcal = ; df = n (k + 1)
s(bi )
s
p
where s(bi ) = pSS ; s = MSE is the standard error of bi .
xi
This test is under the assumption that the regression errors are normally
distributed.
39/ 53
Multiple Linear Regression: Hypothesis Testing
Example 2 Con’t:
ŷ = 47.165 + 1.599x1 + 1.149x2
SPSS
40/ 53
Multiple Linear Regression: Assumptions
Example 2 Con’t:
41/ 53
Multiple Linear Regression: Assumptions
Multicollinearity
Ideally, the Xi variables in a regression equation are uncorrelated with one
another; each variable contains a unique piece of information about Y
information that is not contained in any of the other Xi .
Excel: Data ! Data Analysis ! Correlation
42/ 53
Multiple Linear Regression: Assumptions
The e↵ects of multicollinearity
1 The variances (and standard errors) of regression coefficient estimators
are inflated.
2 The magnitudes of the regression coefficient estimates may be di↵erent
from what we expect.
3 The signs of the regression coefficient estimates may be the opposite of
what we expect.
4 Adding or removing variables produces large changes in the coefficient
estimates or their signs.
5 Removing a data point causes large changes in the coefficient estimates
or their signs.
6 In some cases, the F ratio is significant, but none of the t ratios is.
43/ 53
Multiple Linear Regression: Coefficient of Determination
The multiple coefficient of determination R 2 is
SSR SSE
R2 = =1 ; 0 R2 1
SST SST
The adjusted multiple coefficient of determination is
2 SSE /(n (k + 1)) 2
Radj =1 ; 0 Radj 1
SST /(n 1)
2
Radj always less than R 2 .
44/ 53
Appendix
45/ 53
Simple Linear Regression: Estimation
ŷi = b0 + b1 xi
Minimize SSE through normal equations which are
n
X n
X
yi = nb0 + b1 xi (1)
i=1 i=1
n
X n
X n
X
xi yi = b0 xi + b 1 xi2 (2)
i=1 i=1 i=1
Hence from (1) and (2), the intercept is
b0 = ȳ b1 x̄.
and the slope is
Pn Pn P
( xi )( ni=1 yi )
i=1 xi yi SSxy
i=1
n
b1 = Pn P
( ni=1 xi )2
=
2 SSx
i=1 xi n
46/ 53
Simple Linear Regression: Estimation
n n Pn
X X ( xi )2
SSx = (xi x̄)2 = xi2 i=1
i=1 i=1
n
n n Pn
X X ( yi )2
SSy = (yi ȳ )2 = yi2 i=1
i=1 i=1
n
n n Pn Pn
X X ( xi )( yi )
i=1 i=1
SSxy = (xi x̄)(yi ȳ ) = xi yi
i=1 i=1
n
Note: SS comes from Sum Square. Therefore,
SSxy
b1 = ,
SSx
47/ 53
Simple Linear Regression: Hypothesis testing
A hypothesis test for the existence of a linear relationship between X and Y is
H0 : 1 =0
H1 : 1 6= 0
H0 , Ha can be written as one-sided too.
Test statistic
Given the assumption of normality of the regression errors, the test statistic
possesses the t distribution with n 2 degrees of freedom.
b1 10
tcal =
s(b1 )
For the critical region, we use the same rules as when we do the t-test.
48/ 53
Simple Linear Regression: Hypothesis testing
b1
From tcal = 10
s(b1 )
,
s
s(b1 ) = p
SSx
where s 2 is an unbiased estimator of 2 , then
p
s = MSE
and
SSE
MSE = .
n 2
where
n
X n
X
SSE = ei2 = (yi ŷi )2
i=1 i=1
(SSxy )2
= SSy
SSx
= SSy b1 SSxy .
49/ 53
Simple Linear Regression: Hypothesis testing
If you reject H0 , 1 6= 0. Your regression model will be
ŷ = b0 + b1 x.
For the result that you FTR H0 , 1 = 0. Your regression model will be
ŷ = b0 .
Moreover, we can also find the CI of 0 and 1.
A (1 ↵)100% confidence interval for 0 is
b0 ± t↵/2,n 2 s(b0 ).
A (1 ↵)100% confidence interval for 1 is
b1 ± t↵/2,n 2 s(b1 ).
50/ 53
Simple Linear Regression: Model Evaluation
where
SST = SSE + SSR
n
X n
X n
X
2 2
(yi ȳ ) = (yi ŷi ) + (ŷi ȳ )2
i=1 i=1 i=1
SSy = (SSy b1 SSxy ) + b1 SSxy
yi ȳ = yi ŷi + ŷi ȳ
Total deviation = Unexplained deviation + Explained deviation
(error ) (regression)
Therefore,
2
SSxy
R2 = ; 0 R2 1
SSx SSy
51/ 53
Multiple Linear Regression: Estimation
Example of two independent variables
Objective is to minimize SSE through normal equations which are
n
X n
X n
X
yi = nb0 + b1 x1i + b2 x2i
i=1 i=1 i=1
n
X n
X n
X n
X
x1i yi = b0 x1i + b1 x1i2 + b2 x1i x2i
i=1 i=1 i=1 i=1
Xn Xn Xn n
X
x2i yi = b0 x2i + b1 x1i x2i + b2 x2i2
i=1 i=1 i=1 i=1
For finding b0 , b1 , b2 , we calculate all summations above then substitute in the
formula. Next is to solve three equations.
52/ 53
Ending Ticket
https://forms.gle/dPEQqPTNNi16c34z6
After you submit the form, you will get a receipt ticket. Please
check your email address and keep it as your evidence.
53/ 53