KEMBAR78
Ch3 Simple Linear Regression PDF | PDF | Linear Regression | Statistical Inference
0% found this document useful (0 votes)
75 views24 pages

Ch3 Simple Linear Regression PDF

Uploaded by

Xin Xie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views24 pages

Ch3 Simple Linear Regression PDF

Uploaded by

Xin Xie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1

bivariate, or two-variable regression


Simple regression analysis

E (Y ) Y Y

Y x 10 x x
y 5 7 y

10 X 10 Y Y , E (Y | X ) Y

E (Y ) 10 PRL

PRL PRL Y x

E (Y | X i ) Xi Xi x

E (Y | X i ) = f ( X i )

conditional expectation function, CEF


population regression function, PRF
PRF

E (Y | X i ) = β1 + β 2 X i

β1 , β 2 regression coefficients β1 , β 2
PRF intercept and slope coefficients

Linearity in the variables

Y Xi

1
Linearity in the parameters

Y β x

Yi = β1 + β 2 X i + β 3 X i 2 + ε i → quadratic
Yi = β1 + β 2 X i + β 3 X i 2 + β 4 X i 3 + ε i → cubic

 1 
Yi = β1 + β 2   + ε i → reciprocal
 Xi 
Yi = β1 + β 2 ln X i + ε i → semilogarithmic
ln Yi = β1 + β 2 X i + ε i → inverse semilogarithmic

 1 
ln Yi = β1 − β 2   + ε i → logarithmic reciprocal
 Xi 

ln Yi = ln β1 + β 2 ln X i + ε i → logarithmic or double logarithmic let α = ln β1

look nonlinear in the parameter transform

intrinsically linear regression model

Yi = e β1 + β2 X i +ε i → exponential
1
Yi = β1 + β 2 X i +ε i
→ logistic ( probability ) distribution function
1+ e

linear regression model intrinsically linear regression model


LRM

linearized in the parameters


intrinsically nonlinear regression model NLRM

Yi = β1 + ( 0.75 − β1 ) e
− β2 ( X i −2)
+ εi

CD

yi = β1 xiβ22 xiβ33 eε i or ln yi = ln β1 + β 2 ln xi 2 + β 3 ln xi 3 + ε i

intrinsically linear regression model

yi = β1 xiβ22 xiβ33 ε i or ln yi = ln β1 + β 2 ln xi 2 + β 3 ln xi 3 + ln ε i

2
intrinsically linear regression model

yi = β1 xiβ22 xiβ33 + ε i intrinsically a nonlinear model

CES
−1/ β
yi = A δ K i− β + (1 − δ ) L−i β 

A=scale parameter, δ =distribution parameter ( 0 < δ < 1 ), and β =substitution parameter

( β ≥ −1 ). εi

linear in the parameters,

the βs linear in the X s

3 PRF

Yi deviation

ε i = Yi − E (Y | X i )
or
Yi = E (Y | X i ) + ε i

εi

εi stochastic disturbance stochastic error

term E (Y | X i )

E (Yi | X i ) = E  E (Y | X i )  + E ( ε i | X i ) = E (Y | X i ) + E ( ε i | X i )

E (Yi | X i ) E (Y | X i )

E (ε i | X i ) = 0

εi
1

3
2
3 Core variables versus peripheral variables

εi
4 Intrinsic randomness in human behavior
5
6 Principle of parsimony
Occam razor

εi

4 Sample Regression Function, SRF

x y PRF
SRF SRL
PRF N
N SRFs SRFs

PRF SRF)

Y i = b1 + b2 X i (2.6.1)

Yi E (Y | X i ) b1 , b2 β1 , β 2

PRF SRF (2.6.1)

Yi = b1 + b2 X i + ei (2.6.2)

ei sample residual term ei εi εi

2 CLRM

CLRM 10

1
CLRM

4
2 x
x

3 E ( ε i | xi ) = 0

εi y

εi εi Y 0

4 Homoscedasticity var ( ε i | xi ) = σ 2

x x
Y X

heteroscedasticity var ( ε i | xi ) = σ i2

5 No autocorrelation between the disturbances


X correlation 0

cov ( ε i , ε j | xi , x j ) = 0 ( i ≠ j )

6 εi Xi 0 cov ( ε i , xi ) = E ( ε i , xi ) = 0

7 n

8 X


∑( X ) 
2
−X
var ( X )  = 
i
X
 n −1 
 

10 multicollinearity

3 CLRM

OLS MLE OLS MLE

5
1 Ordinary Least Squares, OLS

OLS

OLS SRF
n n ∧ n
RSS = ∑ ei2 = ∑ ( yi − yi ) 2 = ∑ ( yi − b1 − b2 xi ) 2 (3.1.2)
i =1 i =1 i =1

OLS
centered model

1 n
x= ∑ xi
n i =1

yi = β1∗ + β 2 ( xi − x ) + ε i
β1∗ = β1 + β 2 x

∂  n 2
∗ ∑  i
 y − b1∗ − b2 ( xi − x )   = 0
∂b1  i =1 
∂  n 2
∑  yi − b1 − b2 ( xi − x )   = 0

∂b2  i =1 

n n
nb1∗ + b2 ∑ ( xi − x ) = ∑ yi
i =1 i =1
n n n
b1∗ ∑ ( xi − x ) + b2 ∑ ( xi − x ) = ∑ ( xi − x ) yi
2

i =1 i =1 i =1

b1∗ = y
b2 = S xy / S xx

β1∗ = β1 + β 2 x

b1 = y − b2 x
S xy
b2 =
S xx

1 n 1 n 1 n ∧
x= ∑
n i =1
xi , y = ∑ yi = ∑ yi
n i =1 n i =1

6
n n n
S xy = ∑ ( xi − x )( yi − y ) = ∑ xi ( yi − y ) = ∑ ( xi − x ) yi
i =1 i =1 i =1

( )
S xx = ∑ ( xi − x ) = ∑ xi xi − x = ∑ ( xi2 − 2 xi x + x 2 ) = ∑ xi2 − nx 2
n n n n
2

i =1 i =1 i =1 i =1

Σxi2 Σyi − Σxi Σyi xi


b1 = y − b2 x =
nΣxi2 − (Σxi ) 2

S xy nΣyi xi − Σyi Σxi


b2 = =
S xx nΣxi2 − (Σxi ) 2

OLS

:
1 y x

2 yi y y=y
3) 0

4 yi ∑e y
i i =0

5) xi ∑e x i i =0

3 OLS

OLS

σ2
var ( b2 ) = (3.3.1)
∑(x − x )
2
i

σ
se ( b2 ) = (3.3.2)
∑ ( xi − x )
2

Σxi2
var(b1 ) = σ2 (3.3.3)
nΣ( xi − x ) 2

Σxi2
se(b1 ) = σ (3.3.4)
nΣ( xi − x ) 2

7
σ2 4 εi

σ2 σ2

σ =∑
2 ei2
(3.3.5)
n−2
2
σ σ2 OLS

n-2 s2

σ= ∑e 2
i
(3.3.8)
n−2
y

b1 b2

cov ( b1 , b2 ) = − x var ( b2 ) (3.3.9)

CLRM
Gauss-Markov theorem CLRM OLS BLUE

Gauss-Markov Theorem: Given the assumptions of the classical linear regression model, the
least-squares estimators, in the class of unbiased linear estimators, have minimum variance, that is,
they are BLUE.

OLS b2 β2
best linear unbiased estimator, BLUE
1. It is linear.
2. It is unbiased.
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased
estimator with the least variance is known as an efficient estimator.

8
4 CNLRM)

OLS b2 .

b2 = ∑ ki yi = ∑ ki ( β1 + β 2 xi + ε i )

b2 εi b2 εi

OLS εi

CLRM εi classical normal linear

regression model (CNLRM) CNLRM εi

Mean: E (ε i ) = 0 (4.2.1)

Variance: var ( ε i ) = σ 2 (4.2.2)

Covariance: cov ( ε i , ε j ) = 0 ( i ≠ j ) (4.2.3)

ε i ∼ N ( 0, σ 2 ) (4.2.4)

2 CNLRM OLS

OLS b1 , b2

best unbiased estimators (BUE) BLUE

3 CNLRM OLS

1. b1 (
b1 ∼ N β1 , σ b21 )
Σxi2  1 x2  2
σ b2 = var(b1 ) = σ 2
=  + σ (3.3.3)(4.3.2)
1
nΣ( xi − x ) 2  n S xx 

9
2. b2 (
b2 ∼ N β 2 , σ b22 )
σ2 σ2
σ = var ( b2 ) =
2
= (3.3.1)(4.3.5)
∑(x − x )
b2 2
i
S xx

2
σ σ2
b1 − β1 b2 − β 2
tb1 = , tb2 =
 1 x2  2 σ
2

 + σ
 n S xx  S xx
n-2 t

2
σ
3. ( n − 2 ) 2 (n-2) χ2
σ

4 (MLE)

ML

OLS MLE σ 2 OLS

σ = ∑ ei2 / ( n − 2 ) ∑e
2
2
MLE i /n
n n

σ2 MLE

2
r (two-variable case)
2
R (multiple regression)

1 y

y
explanation

10
n
Total sum of squares (TSS) = ∑( y − y)
i =1
i
2

n ∧
Explained sum of squares (ESS) = ∑( y − y)
i =1
i
2
(or Regression sum of squares, SSReg)

n ∧
Residual sum of squares (RSS) = ∑(y − y )
i =1
i i
2

n n ∧ n ∧
∑ ( yi − y )2 = ∑ ( yi − y )2 + ∑ ( yi − yi )2
i =1 i =1 i =1

TSS = ESS + RSS SSTotal = SSRe g + SSRe s

∧ ∧
yi − y = yi − y + yi − yi
If we square both sides and sum, we have
n n ∧ n ∧ n ∧ ∧
∑ ( y − y) = ∑ ( y − y) + ∑ ( y − y )
i =1
i
2

i =1
i
2

i =1
i i
2
+ 2∑ ( yi − y )( yi − yi )
i =1
n ∧ n ∧
= ∑ ( yi − y ) 2 + ∑ ( yi − yi ) 2 ← ∑ u i y i = 0, y ∑ u i = 0
i =1 i =1

2 r2

2
r
n

ESS RSS ∑(y i − y )2


r2 = = 1− = i =1
n

∑ (y − y)
TSS TSS 2
i
i =1

2
r

2
r
1

2 0 ≤ r2 ≤ 1

2
r

11

∑ (
)
2
 x 2 − nx 2 
2 ∑ i
2
x − x
r = b2
2 i  =b   (3.5.6)
 ∑ ( yi − y ) 2  2  ∑ yi2 − ny 2 
   
3.5.6 the sample size n (or n-1 if the sample size is
small)
 S2 
r 2 = b22  x2  (3.5.7)
 Sy 
 

S y2 S x2 y x

3 r

2
r r coefficient of
correlation

r=
∑ ( x − x )( y − y )
i i
(3.5.13)
∑ ( x − x) ∑ ( y − y)
2 2
i i

2
r actual yi the estimated yi, y i

3.5.13

( )( )
2
∑ y − y y − y 
r =  
i i
2

(
∑ yi − y ∑ yi − y ) ( )
2 2

r = ± r2

b2 β2 δ α

12
( b2 − δ , b2 + δ ) β2 1−α

Pr ( b2 − δ ≤ β 2 ≤ b2 + δ ) = 1 − α

1−α confidence coefficient

α ( 0 < α < 1) level of significance

CLRM
CNLRM

CNLRM
b2 − β 2
Z= → Z ∼ N ( 0,1)
σb 2

σ2 β2 σ2
2 2
σ σ σ2

b − β 2 ( b2 − β 2 ) ∑ ( xi − x )
2

t= 2 =
se ( b2 ) σ

se ( b2 ) t n-2

t df

t β2

 b − β2 
Pr  −tα / 2 ≤ t = 2 ≤ tα / 2  = 1 − α
 se ( b2 ) 

tα / 2 α /2 n-2 t

critical value

Pr ( b2 − tα / 2 se ( b2 ) ≤ β 2 ≤ b2 + tα / 2 se ( b2 ) ) = 1 − α (5.3.5)

5.3.5 β2 100 (1 − α )

13
100 (1 − α ) % confidence interval for β 2 : b2 ± tα / 2 se ( b2 ) (5.3.6)

Pr ( b1 − tα / 2 se ( b1 ) ≤ β1 ≤ b1 + tα / 2 se ( b1 ) ) = 1 − α (5.3.7)

100 (1 − α ) % confidence interval for β1 : b1 ± tα / 2 se ( b1 ) (5.3.8)

CNLRM
2
σ
χ = ( n − 2) 2
2
(5.4.1)
σ

n-2 χ2 χ2 σ2

Pr ( χ12−α / 2 ≤ χ 2 ≤ χα2 / 2 ) = 1 − α (5.4.2)

5.4.1 5.4.2
 σ
2
σ 
2

Pr ( n − 2 ) 2 ≤ σ ≤ ( n − 2 ) 2  = 1 − α
2
(5.4.3)
 χα / 2 χ1−α / 2 

which gives the 100 (1 − α ) % confidence interval for σ2.

14
4

simple composite
>< ≠

confidence interval and test


of significance

H 0 : β 2 = 0.3
H1 : β 2 ≠ 0.3

Decision Rule: Construct a 100 (1 − α ) % confidence interval for β 2 . If the β 2

under H 0 falls within this confidence interval, do not reject H 0 , but if it falls

outside this interval, reject H 0 .

15
6

0- 2-t

H0 : β2 = 0 y x

2-t Rule of Thumb for two-tail test: If the number of degrees of freedom is

20 or more and if α is set at 0.05, then the null hypothesis β 2 = 0 can be rejected
if the t value computed from 5.3.2 exceeds 2 in absolute value.

t p

t α t α p ( t)
n − 2,
2

α ≥ P →t H0

16
χ2

7 (ANOVA)

17
Under normal distribution assumption in simple regression, we have the results:
---------------------------------------------------------
SS df Distribution

SSRe g 1 σ 2 χ12 ( under H 0 )

SS Re s n-2 σ 2 χ n2− 2

SSTotal n-1 σ 2 χ n2−1 ( under H 0 )


----------------------------------------------------------

H 0 : β 2 = 0; H1 : β 2 ≠ 0

E ( SSRe g ) = σ 2 + β 22 S xx

E ( SS Re s ) = σ 2 ( n − 2 )

E ( SSTotal ) = σ 2 ( n − 1) + β 22 S xx

ANOVA table for the two-variable regression model


Source of
SS df MSS F p
variation

Due to
∑ ( yi − y )2 (
b22 ∑ xi − x )
2

regression 1
( )
b22 ∑ xi − x /1 F =
2

(
= b22 ∑ xi − x )
2 2
(ESS) σ
Due to
residuals ∑e 2
i n-2 ∑e 2
i

2

(RSS) n−2

TSS ∑ (y − y)
i
2
n-1

SS means sum of squares; MSS means mean sum of squares, which is obtained by dividing
SS by their df.

(
b22 ∑ xi − x )
2
MSS of ESS
F= = 2
(5.9.1)
MSS of RSS σ
CNLRM H0 β2 = 0 F distribution with 1 df in the numerator and n-2

df in the denominator

18
χ12 /1
F∼ ( under H 0 ) ∼ F1,n−2 ( under H 0 )
χ n2− 2 / ( n − 2 )

1 F ≥ Fα (1, n − 2 ) H0 α

F H1

2 About significance probability p value ( ANOVA F p )

α≥P H0

F t

β2 = 0

(
b22 ∑ xi − x )
2
SS Re g
t2 = 2
= 2
,
σ σ
F F

H0 : β2 = 0 t F t F

1) mean prediction, and 2) individual prediction


confidence interval prediction interval

19
1


y ( x0 ) = b1 + b2 x0 = estimator of E ( y | x0 ) ; whereas E ( y | x0 ) = β1 + β 2 x0


CNLRM y ( x0 ) β1 + β 2 x0

 
∧ 1 ( x − x ) 
2

var y ( x0 ) = σ 2  + n 0 (5.10.2)
n ( xi − x ) 

2

 i =1 


y ( x0 ) = b1 + b2 x0

∧ 
E  y ( x0 )  = E ( b1 ) + E ( b2 ) x0 = β1 + β 2 x0 = E ( y | x0 )
 

var ( a + b ) = var ( a ) + var ( b ) + 2 cov ( a, b )

∧ 
var  y ( x0 )  = var ( b1 ) + var ( b2 ) x02 + 2 cov ( b1 , b2 ) x0
 

σ2  1 x2  2
var ( b2 ) = var(b1 ) =  + σ cov ( b1 , b2 ) = − x var ( b2 )
∑ ( xi − x )
2
 n S xx 

∧  1 ( x0 − x )2 
Var y ( x0 ) = σ  + 2

n S 
 xx 

2
σ σ2

y ( x0 ) − ( β1 + β 2 x0 )
t=
∧ 
se  y ( x0 ) 
 
n-2 df t

20
( x0 − x )
2
∧  1
se  y ( x0 )  = σ +
 
n

∑( x − x )
n 2
i
i =1

t E ( y | x0 )

 ∧  ∧ 
Pr b1 + b2 x0 − tα / 2 se  y ( x0 )  ≤ E ( y | x0 ) = β1 + β 2 x0 ≤ b1 + b2 x0 + tα / 2 se  y ( x0 )   = 1 − α
    

Because we are predicting an individual y value, we must account for the variability
∧ ∧
of y about its mean. Since y = y + ε and y is assumed to be independent of ε , we
have
∧2 ∧2 ∧2  1 ( x − x )2 
var ( y ( x0 ) ) = σ y ( x0 ) + σ ε = σ ε 1 + +


 n S xx 
So the prediction intervals is

1 (x − x)
2

y0 = y ( x0 ) ± tα / 2, n − 2 σ 1 + + 0
n S xx

Y
(k=1) prediction interval
single y y ∼ Normal
confidence interval


y = b0 + b1 x1

confidence interval
prediction interval

x1
x ( x0 ) Future
data points

21
x x

9 CLRM

(units
of measurement) x y

PRF

yi = β 2 xi + ui (6.1.1)

SRF

yi = b2 xi + ei (6.1.5)

OLS
n

∑yx i i
b2 = i =1
n
(6.1.6)
∑x
i =1
2
i

σ2
var ( b2 ) = n
(6.1.7)
∑x i =1
2
i

where σ 2 is estimated by

2
∑e 2
i
σ = i =1
(6.1.8)
n −1

Difference 1
In the case of regression-through-origin, from the first-order derivatives, we only

can get ∑e x i i = 0 but cannot get ∑e


i = 0.

But in the conventional model, i.e., when the intercept term is present in the model,

∑e x i i = 0 and ∑e i = 0 all hold.

22
Difference 2

Since ∑e i ≠ 0 → e ≠ 0 , it then follows that

y≠ y

But for the intercept-present model, y = y .

Difference 3
2
It was noted that, for the zero-intercept model, r can be negative, whereas for

the conditional model it can never be negative. Therefore, the conventionally


2
computed r may not be appropriate for regression-through-the-origin models.

2 2
raw r r

(∑ x y )
2

=
2 i i
raw r n

∑x ∑y 2
i
i =1
2
i

yi − y
yi∗ =
Sy
xi − x
xi∗ =
Sx

Now, instead of running the standard regression:

yi = β1 + β 2 xi + ε i

we could run regression on the standardized variables as

yi∗ = β1∗ + β 2∗ xi∗ + ε i∗ = β 2∗ xi∗ + ε i∗


since it is easy to show that, in the regression involving standardized variables,

23
the intercept term is always zero; that is, the above model is a regression through
the origin.

1> The log-linear model: how to measure elasticity

ln yi = ln β1 + β 2 ln xi + ε i = α + β 2 ln xi + ε i

2> Semilog models: log-lin and lin-log models

a. The log-lin model: how to measure the growth rate or semielasticity

ln yt = ln y0 + t ln (1 + r )

ln yt = β1 + β 2t

b. Linear trend model

yt = β1 + β 2t + ε t

c. The lin-log model

yi = β1 + β 2 ln xi + ε i

3> Reciprocal models

a. Reciprocal model

1
yi = β1 + β 2   + ei
 xi 

b. Log reciprocal or log hyperbola model

1 
ln yi = β1 − β 2   + ui 6.7.8
 xi 

dy dy x
dx dx y

24

You might also like