Ch3 Simple Linear Regression PDF
Ch3 Simple Linear Regression PDF
E (Y ) Y Y
Y x 10 x x
y 5 7 y
10 X 10 Y Y , E (Y | X ) Y
E (Y ) 10 PRL
PRL PRL Y x
E (Y | X i ) Xi Xi x
E (Y | X i ) = f ( X i )
E (Y | X i ) = β1 + β 2 X i
β1 , β 2 regression coefficients β1 , β 2
PRF intercept and slope coefficients
Y Xi
1
Linearity in the parameters
Y β x
Yi = β1 + β 2 X i + β 3 X i 2 + ε i → quadratic
Yi = β1 + β 2 X i + β 3 X i 2 + β 4 X i 3 + ε i → cubic
1
Yi = β1 + β 2 + ε i → reciprocal
Xi
Yi = β1 + β 2 ln X i + ε i → semilogarithmic
ln Yi = β1 + β 2 X i + ε i → inverse semilogarithmic
1
ln Yi = β1 − β 2 + ε i → logarithmic reciprocal
Xi
Yi = e β1 + β2 X i +ε i → exponential
1
Yi = β1 + β 2 X i +ε i
→ logistic ( probability ) distribution function
1+ e
Yi = β1 + ( 0.75 − β1 ) e
− β2 ( X i −2)
+ εi
CD
yi = β1 xiβ22 xiβ33 eε i or ln yi = ln β1 + β 2 ln xi 2 + β 3 ln xi 3 + ε i
yi = β1 xiβ22 xiβ33 ε i or ln yi = ln β1 + β 2 ln xi 2 + β 3 ln xi 3 + ln ε i
2
intrinsically linear regression model
CES
−1/ β
yi = A δ K i− β + (1 − δ ) L−i β
( β ≥ −1 ). εi
3 PRF
Yi deviation
ε i = Yi − E (Y | X i )
or
Yi = E (Y | X i ) + ε i
εi
term E (Y | X i )
E (Yi | X i ) = E E (Y | X i ) + E ( ε i | X i ) = E (Y | X i ) + E ( ε i | X i )
E (Yi | X i ) E (Y | X i )
E (ε i | X i ) = 0
εi
1
3
2
3 Core variables versus peripheral variables
εi
4 Intrinsic randomness in human behavior
5
6 Principle of parsimony
Occam razor
εi
x y PRF
SRF SRL
PRF N
N SRFs SRFs
PRF SRF)
Y i = b1 + b2 X i (2.6.1)
Yi E (Y | X i ) b1 , b2 β1 , β 2
Yi = b1 + b2 X i + ei (2.6.2)
2 CLRM
CLRM 10
1
CLRM
4
2 x
x
3 E ( ε i | xi ) = 0
εi y
εi εi Y 0
4 Homoscedasticity var ( ε i | xi ) = σ 2
x x
Y X
heteroscedasticity var ( ε i | xi ) = σ i2
cov ( ε i , ε j | xi , x j ) = 0 ( i ≠ j )
6 εi Xi 0 cov ( ε i , xi ) = E ( ε i , xi ) = 0
7 n
8 X
∑( X )
2
−X
var ( X ) =
i
X
n −1
10 multicollinearity
3 CLRM
5
1 Ordinary Least Squares, OLS
OLS
OLS SRF
n n ∧ n
RSS = ∑ ei2 = ∑ ( yi − yi ) 2 = ∑ ( yi − b1 − b2 xi ) 2 (3.1.2)
i =1 i =1 i =1
OLS
centered model
1 n
x= ∑ xi
n i =1
yi = β1∗ + β 2 ( xi − x ) + ε i
β1∗ = β1 + β 2 x
∂ n 2
∗ ∑ i
y − b1∗ − b2 ( xi − x ) = 0
∂b1 i =1
∂ n 2
∑ yi − b1 − b2 ( xi − x ) = 0
∗
∂b2 i =1
n n
nb1∗ + b2 ∑ ( xi − x ) = ∑ yi
i =1 i =1
n n n
b1∗ ∑ ( xi − x ) + b2 ∑ ( xi − x ) = ∑ ( xi − x ) yi
2
i =1 i =1 i =1
b1∗ = y
b2 = S xy / S xx
β1∗ = β1 + β 2 x
b1 = y − b2 x
S xy
b2 =
S xx
1 n 1 n 1 n ∧
x= ∑
n i =1
xi , y = ∑ yi = ∑ yi
n i =1 n i =1
6
n n n
S xy = ∑ ( xi − x )( yi − y ) = ∑ xi ( yi − y ) = ∑ ( xi − x ) yi
i =1 i =1 i =1
( )
S xx = ∑ ( xi − x ) = ∑ xi xi − x = ∑ ( xi2 − 2 xi x + x 2 ) = ∑ xi2 − nx 2
n n n n
2
i =1 i =1 i =1 i =1
OLS
:
1 y x
2 yi y y=y
3) 0
4 yi ∑e y
i i =0
5) xi ∑e x i i =0
3 OLS
OLS
σ2
var ( b2 ) = (3.3.1)
∑(x − x )
2
i
σ
se ( b2 ) = (3.3.2)
∑ ( xi − x )
2
Σxi2
var(b1 ) = σ2 (3.3.3)
nΣ( xi − x ) 2
Σxi2
se(b1 ) = σ (3.3.4)
nΣ( xi − x ) 2
7
σ2 4 εi
σ2 σ2
σ =∑
2 ei2
(3.3.5)
n−2
2
σ σ2 OLS
n-2 s2
σ= ∑e 2
i
(3.3.8)
n−2
y
b1 b2
CLRM
Gauss-Markov theorem CLRM OLS BLUE
Gauss-Markov Theorem: Given the assumptions of the classical linear regression model, the
least-squares estimators, in the class of unbiased linear estimators, have minimum variance, that is,
they are BLUE.
OLS b2 β2
best linear unbiased estimator, BLUE
1. It is linear.
2. It is unbiased.
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased
estimator with the least variance is known as an efficient estimator.
8
4 CNLRM)
OLS b2 .
b2 = ∑ ki yi = ∑ ki ( β1 + β 2 xi + ε i )
b2 εi b2 εi
OLS εi
Mean: E (ε i ) = 0 (4.2.1)
ε i ∼ N ( 0, σ 2 ) (4.2.4)
2 CNLRM OLS
OLS b1 , b2
3 CNLRM OLS
1. b1 (
b1 ∼ N β1 , σ b21 )
Σxi2 1 x2 2
σ b2 = var(b1 ) = σ 2
= + σ (3.3.3)(4.3.2)
1
nΣ( xi − x ) 2 n S xx
9
2. b2 (
b2 ∼ N β 2 , σ b22 )
σ2 σ2
σ = var ( b2 ) =
2
= (3.3.1)(4.3.5)
∑(x − x )
b2 2
i
S xx
2
σ σ2
b1 − β1 b2 − β 2
tb1 = , tb2 =
1 x2 2 σ
2
+ σ
n S xx S xx
n-2 t
2
σ
3. ( n − 2 ) 2 (n-2) χ2
σ
4 (MLE)
ML
σ = ∑ ei2 / ( n − 2 ) ∑e
2
2
MLE i /n
n n
σ2 MLE
2
r (two-variable case)
2
R (multiple regression)
1 y
y
explanation
10
n
Total sum of squares (TSS) = ∑( y − y)
i =1
i
2
n ∧
Explained sum of squares (ESS) = ∑( y − y)
i =1
i
2
(or Regression sum of squares, SSReg)
n ∧
Residual sum of squares (RSS) = ∑(y − y )
i =1
i i
2
n n ∧ n ∧
∑ ( yi − y )2 = ∑ ( yi − y )2 + ∑ ( yi − yi )2
i =1 i =1 i =1
∧ ∧
yi − y = yi − y + yi − yi
If we square both sides and sum, we have
n n ∧ n ∧ n ∧ ∧
∑ ( y − y) = ∑ ( y − y) + ∑ ( y − y )
i =1
i
2
i =1
i
2
i =1
i i
2
+ 2∑ ( yi − y )( yi − yi )
i =1
n ∧ n ∧
= ∑ ( yi − y ) 2 + ∑ ( yi − yi ) 2 ← ∑ u i y i = 0, y ∑ u i = 0
i =1 i =1
2 r2
2
r
n
∑ (y − y)
TSS TSS 2
i
i =1
2
r
2
r
1
2 0 ≤ r2 ≤ 1
2
r
11
∑ (
)
2
x 2 − nx 2
2 ∑ i
2
x − x
r = b2
2 i =b (3.5.6)
∑ ( yi − y ) 2 2 ∑ yi2 − ny 2
3.5.6 the sample size n (or n-1 if the sample size is
small)
S2
r 2 = b22 x2 (3.5.7)
Sy
S y2 S x2 y x
3 r
2
r r coefficient of
correlation
r=
∑ ( x − x )( y − y )
i i
(3.5.13)
∑ ( x − x) ∑ ( y − y)
2 2
i i
2
r actual yi the estimated yi, y i
3.5.13
( )( )
2
∑ y − y y − y
r =
i i
2
(
∑ yi − y ∑ yi − y ) ( )
2 2
r = ± r2
b2 β2 δ α
12
( b2 − δ , b2 + δ ) β2 1−α
Pr ( b2 − δ ≤ β 2 ≤ b2 + δ ) = 1 − α
CLRM
CNLRM
CNLRM
b2 − β 2
Z= → Z ∼ N ( 0,1)
σb 2
σ2 β2 σ2
2 2
σ σ σ2
b − β 2 ( b2 − β 2 ) ∑ ( xi − x )
2
t= 2 =
se ( b2 ) σ
se ( b2 ) t n-2
t df
t β2
b − β2
Pr −tα / 2 ≤ t = 2 ≤ tα / 2 = 1 − α
se ( b2 )
tα / 2 α /2 n-2 t
critical value
Pr ( b2 − tα / 2 se ( b2 ) ≤ β 2 ≤ b2 + tα / 2 se ( b2 ) ) = 1 − α (5.3.5)
5.3.5 β2 100 (1 − α )
13
100 (1 − α ) % confidence interval for β 2 : b2 ± tα / 2 se ( b2 ) (5.3.6)
Pr ( b1 − tα / 2 se ( b1 ) ≤ β1 ≤ b1 + tα / 2 se ( b1 ) ) = 1 − α (5.3.7)
CNLRM
2
σ
χ = ( n − 2) 2
2
(5.4.1)
σ
n-2 χ2 χ2 σ2
5.4.1 5.4.2
σ
2
σ
2
Pr ( n − 2 ) 2 ≤ σ ≤ ( n − 2 ) 2 = 1 − α
2
(5.4.3)
χα / 2 χ1−α / 2
which gives the 100 (1 − α ) % confidence interval for σ2.
14
4
simple composite
>< ≠
H 0 : β 2 = 0.3
H1 : β 2 ≠ 0.3
under H 0 falls within this confidence interval, do not reject H 0 , but if it falls
15
6
0- 2-t
H0 : β2 = 0 y x
2-t Rule of Thumb for two-tail test: If the number of degrees of freedom is
20 or more and if α is set at 0.05, then the null hypothesis β 2 = 0 can be rejected
if the t value computed from 5.3.2 exceeds 2 in absolute value.
t p
t α t α p ( t)
n − 2,
2
α ≥ P →t H0
16
χ2
7 (ANOVA)
17
Under normal distribution assumption in simple regression, we have the results:
---------------------------------------------------------
SS df Distribution
SS Re s n-2 σ 2 χ n2− 2
H 0 : β 2 = 0; H1 : β 2 ≠ 0
E ( SSRe g ) = σ 2 + β 22 S xx
E ( SS Re s ) = σ 2 ( n − 2 )
E ( SSTotal ) = σ 2 ( n − 1) + β 22 S xx
regression 1
( )
b22 ∑ xi − x /1 F =
2
(
= b22 ∑ xi − x )
2 2
(ESS) σ
Due to
residuals ∑e 2
i n-2 ∑e 2
i
=σ
2
(RSS) n−2
TSS ∑ (y − y)
i
2
n-1
SS means sum of squares; MSS means mean sum of squares, which is obtained by dividing
SS by their df.
(
b22 ∑ xi − x )
2
MSS of ESS
F= = 2
(5.9.1)
MSS of RSS σ
CNLRM H0 β2 = 0 F distribution with 1 df in the numerator and n-2
df in the denominator
18
χ12 /1
F∼ ( under H 0 ) ∼ F1,n−2 ( under H 0 )
χ n2− 2 / ( n − 2 )
1 F ≥ Fα (1, n − 2 ) H0 α
F H1
α≥P H0
F t
β2 = 0
(
b22 ∑ xi − x )
2
SS Re g
t2 = 2
= 2
,
σ σ
F F
H0 : β2 = 0 t F t F
19
1
∧
y ( x0 ) = b1 + b2 x0 = estimator of E ( y | x0 ) ; whereas E ( y | x0 ) = β1 + β 2 x0
∧
CNLRM y ( x0 ) β1 + β 2 x0
∧ 1 ( x − x )
2
var y ( x0 ) = σ 2 + n 0 (5.10.2)
n ( xi − x )
∑
2
i =1
∧
y ( x0 ) = b1 + b2 x0
∧
E y ( x0 ) = E ( b1 ) + E ( b2 ) x0 = β1 + β 2 x0 = E ( y | x0 )
∧
var y ( x0 ) = var ( b1 ) + var ( b2 ) x02 + 2 cov ( b1 , b2 ) x0
σ2 1 x2 2
var ( b2 ) = var(b1 ) = + σ cov ( b1 , b2 ) = − x var ( b2 )
∑ ( xi − x )
2
n S xx
∧ 1 ( x0 − x )2
Var y ( x0 ) = σ + 2
n S
xx
2
σ σ2
∧
y ( x0 ) − ( β1 + β 2 x0 )
t=
∧
se y ( x0 )
n-2 df t
20
( x0 − x )
2
∧ 1
se y ( x0 ) = σ +
n
∑( x − x )
n 2
i
i =1
t E ( y | x0 )
∧ ∧
Pr b1 + b2 x0 − tα / 2 se y ( x0 ) ≤ E ( y | x0 ) = β1 + β 2 x0 ≤ b1 + b2 x0 + tα / 2 se y ( x0 ) = 1 − α
Because we are predicting an individual y value, we must account for the variability
∧ ∧
of y about its mean. Since y = y + ε and y is assumed to be independent of ε , we
have
∧2 ∧2 ∧2 1 ( x − x )2
var ( y ( x0 ) ) = σ y ( x0 ) + σ ε = σ ε 1 + +
∧
n S xx
So the prediction intervals is
1 (x − x)
2
∧
y0 = y ( x0 ) ± tα / 2, n − 2 σ 1 + + 0
n S xx
Y
(k=1) prediction interval
single y y ∼ Normal
confidence interval
∧
y = b0 + b1 x1
confidence interval
prediction interval
x1
x ( x0 ) Future
data points
21
x x
9 CLRM
(units
of measurement) x y
PRF
yi = β 2 xi + ui (6.1.1)
SRF
yi = b2 xi + ei (6.1.5)
OLS
n
∑yx i i
b2 = i =1
n
(6.1.6)
∑x
i =1
2
i
σ2
var ( b2 ) = n
(6.1.7)
∑x i =1
2
i
where σ 2 is estimated by
2
∑e 2
i
σ = i =1
(6.1.8)
n −1
Difference 1
In the case of regression-through-origin, from the first-order derivatives, we only
But in the conventional model, i.e., when the intercept term is present in the model,
22
Difference 2
y≠ y
Difference 3
2
It was noted that, for the zero-intercept model, r can be negative, whereas for
2 2
raw r r
(∑ x y )
2
=
2 i i
raw r n
∑x ∑y 2
i
i =1
2
i
yi − y
yi∗ =
Sy
xi − x
xi∗ =
Sx
yi = β1 + β 2 xi + ε i
23
the intercept term is always zero; that is, the above model is a regression through
the origin.
ln yi = ln β1 + β 2 ln xi + ε i = α + β 2 ln xi + ε i
ln yt = ln y0 + t ln (1 + r )
ln yt = β1 + β 2t
yt = β1 + β 2t + ε t
yi = β1 + β 2 ln xi + ε i
a. Reciprocal model
1
yi = β1 + β 2 + ei
xi
1
ln yi = β1 − β 2 + ui 6.7.8
xi
dy dy x
dx dx y
24