KEMBAR78
Lecture 2 - Regression Model PDF | PDF | Ordinary Least Squares | Estimator
0% found this document useful (0 votes)
562 views69 pages

Lecture 2 - Regression Model PDF

The document summarizes key aspects of linear regression models. It discusses: 1) Population regression functions (PRF) which show the relationship between variables for an entire population. 2) Sample regression functions (SRF) which are estimated based on samples from the population and aim to closely estimate the PRF. 3) The Ordinary Least Squares (OLS) method which chooses parameter estimates for the SRF to minimize the sum of squared residuals, providing linear, unbiased, and efficient estimates.

Uploaded by

Dat Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
562 views69 pages

Lecture 2 - Regression Model PDF

The document summarizes key aspects of linear regression models. It discusses: 1) Population regression functions (PRF) which show the relationship between variables for an entire population. 2) Sample regression functions (SRF) which are estimated based on samples from the population and aim to closely estimate the PRF. 3) The Ordinary Least Squares (OLS) method which chooses parameter estimates for the SRF to minimize the sum of squared residuals, providing linear, unbiased, and efficient estimates.

Uploaded by

Dat Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Lecture 2:

The Linear Regression Model

Đinh Thi Thanh Binh, PhD


Faculty of International Economics, FTU

1
1. Introduction to regression model

• The term « regression » means «regression to


mediocrity»

• Defined by Galton (1886) when he studied the


relationship between the height of sons and the
height of fathers

2
Distribution of the height of sons respects
to the height of the fathers

3
The study shows that:

• Given the height of fathers, the height of sons will


distribute around a medium value
• On average, when the height of fathers increase, the
height of sons also increase
• If we conect all the medium points, we will have a
linear line
• This line is called regression line, showing the
relationship between the height of sons and the height
of father on average

4
2. Population Regression Function (PRF)
and Sample Regression Function (SRF)
2.1. Definition of PRF
PRF is a regression function that is constructed based
on the survey of the population

For example: For example, Galton studied the


relationship between the height of fathers and the height
of sons in one city. He collected the data of all fathers
having adult sons. So he can build PRF.

5
 So, E(Y|Xi) is a function of independent variable Xi :
E(Y/Xi)= f(Xi) = β0+ β1Xi [1]

• The equation [1] is called Population regression


function (PRF).

– PRF shows how the expected value of Y changes at


different values of X
– If PRF has 1 independent variable  simple
regression function
– If PRF has 2 or more independent variables 
multiple regression function
6
• Suppose that PRF E(Y|Xi) is a linear function,
then:
E(Y|Xi)= β0+ β1Xi [2]

- β0, β1: regression coefficients/ parameters


• β0: constant coefficient
• β1: slope coefficient

• The equation [2] is a simple regression function

7
2.2. Error/ disturbance term
• Because E(Y|Xi) is expected value of Y given Xi,
single values of Yi is not necessary the same with
E(Y|Xi), but they are around E(Y/Xi).

• Note ui is the difference between Yi and E(Y/Xi), we


have:
ui= Yi- E(Y|Xi) [3]

Or : Yi= E(Y|Xi)+ ui [4]

 ui is a random variable/ component or disturbance


8
2.3. Sample regression function (SRF)
• In reality, we can not carry out surveys of population
 we can not build PRF
• Then we only can estimate the expected value of Y, or
in other words, estimate PRF based on sample(s)
taken from population
• Obviously the estimated SRF can not be absolutely
exact

The regression function that is constructed based on a


sample is called Sample Regression Function (SRF).

9
Graph 2.03. Scatter graph and regression line of the 2
samples SRF1 và SRF2

10
• From the population, we can get many samples.
With each sample, we can have a SRF

• To have the “best” SRF, meaning that that SRF is


the closest estimate of PRF, we have to base on
some criteria even when we do not have PRF to
compare.

11
2.3.Sample Regression Function (SRF) – Simple

PRF SRF
 E (Y / X i )  0  1 X i Yˆi  ˆ0  ˆ1 X i
 
Yi  E (Y / X i )  ui  0  1 X i  ui 
 i i i
Y Yˆ  ˆ
u   ˆ
0  ˆ
1 X i  ui
ˆ

• 𝑌𝑖 is an estimate of E(Y/Xi) and is a fitted value/


predicted value of Y
• 0 , 1 are estimates of β0, β1
• 𝑢𝑖 is an estimate of ui and is called as residuals
12
2.3.Sample Regression Function (SRF) – Multiple

• PRF:
E (Y / X 1 , X 2 )   0  1 X 1   2 X 2  ...   k X k
Y   0  1 X 1   2 X 2  ...   k X k  u
• SRF:

Yˆ  ˆ0  ˆ1 X 1  ˆ2 X 2  ...  ˆk X k


Y  ˆ0  ˆ1 X 1  ˆ2 X 2  ...  ˆk X k  u

13
3. The Ordinary Least Square (OLS)

• The method OLS is invented by the German


mathematician - Carl Friedrich Gauss.
• It is used to estimate parameters given some
assumptions.
• The estimates have some properties (linearity,
unbiasedness, and efficiency).
• This method is used the most popularly now.

14
3.1. The Ordinary Least Square (OLS)

• Assume that:
PRF has the form: Yi= β0+ β1Xi+ ui [3.01]

• Because we can not have PRF, we have to estimate it


through SRF
SRF: 𝑌𝑖 = β0 + 1 𝑋𝑖 + 𝑢𝑖 = 𝑌𝑖 + 𝑢𝑖 [3.02]

Where Yˆi is the predicted/ fitted value of Yi

15
3.1. The Ordinary Least Square (OLS)

From [3.02], we have:


uˆi  Yi  Y i [3.03]

[3.03] ûi is the difference between the actual


value and predicted value of Yi.

 If ûi is smaller and smaller, the difference


between Yi and Yˆi is smaller. Then, the estimated
value Yˆi is closer to Yi.
16
3.1. The Ordinary Least Square (OLS)

• Suppose that we have n observations of Y and X,


we try to find SRF so that Yˆi is closest to Y.
• It means that we have to choose SRF so that the
sum of residuals:

n n

 uˆi   (Yi  Yˆi )


i 1 i 1
has the minimum value

• However, this is not the best choice because of


some following reasons: 17
Y SRF

Yi
Yˆi  ˆ1  ˆ2 X i
û3

û1 û 4

û 2

X1 X2 X3 X4 X

18
3.1. The Ordinary Least Square (OLS)

We can overcome all these problems by choosing


SRF so that:
n n n

i  i i  i 0 1 i
ˆ
u 2

i 1
 (Y 
i 1
ˆ
Y ) 2
 (Y 
i 1
ˆ  ˆ X ) 2

n
has minimum value. Of which, i
ˆ
u 2

i 1
is the sum of
squared residuals (SSR).

19
3.1. The Ordinary Least Square (OLS)
n

have  i is
ˆ 2
From the equation [3.03], we u a function
i 1
of ˆ0 and ˆ1

n n

i
ˆ
u 2

i 1
 f (  0 1  i 0 1 i
ˆ , ˆ )  (Y  ˆ  ˆ X )2
i 1

20
3.1. The Ordinary Least Square (OLS)

• We have known that, f(X) is minimized when:

 f '(X )  0
↔ 
 f ''(X )  0

21
3.1. The Ordinary Least Square (OLS)
n
• So, the function  uˆi2 is minimized when:
i 1

 f ' (u )  0

 f ' ' (u )  0

22
• Then, we have ˆ0 and ˆ1 are the results of the
equations:

f ( ˆ0 , ˆ1 ) n
  2(Yi  ˆ0  ˆ1 X i )(1)  0
ˆ0 i 1

n n
nˆ0  ˆ1  X i   Yi
i 1 i 1
23
f ( ˆ0 , ˆ1 ) n
  2(Yi  ˆ0  ˆ1 X i )( X i )  0
ˆ 1 i 1

n n n
ˆ 0  X i  1  X i   Yi X i
ˆ 2

i 1 i 1 i 1

24
• In sum:

n n
nˆ0  ˆ1  X i   Yi
i 1 i 1
[3.05] n n n
ˆ 0  X i  1  X i   Yi X i
ˆ 2

i 1 i 1 i 1

25
•Solve the equations [3.05], we have:
n n n n
n X iYi   X i  Yi (X i  X )(Yi  Y )
ˆ1  i 1
n
i 1
n
i 1
 i 1
n

[3.06] n X  ( X i )
i
2 2
 i
( X  X ) 2

i 1 i 1 i 1

n n n n

 X Y   X  X Y
i
2
i i i i
ˆ0  i 1 i 1
n
i 1
n
i 1
 Y  ˆ1 X
n X  ( X i ) i
2 2

i 1 i 1

26
Example
A random sample as followings:
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd

Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF 27
Example
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd

X 5 4 2 8 8
Y 1 2 3 4 5

Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF
d. Calculate SST, SSE, SSR, R-square
e. Explain the meaning of R-square
28
• Beta0 = 69/68; beta1 = 25/68
• SST = 10, SSE = 125/34, SSR = 215/34, R-square =
0.3676
• R-square = 0.3676
It means that: Income can explain 36.76% of the sample
variation in consumption of people. So 63.24% of the
the sample variation in consumption of people is
explained by other independent variables that are not
included in the model.

29
Example
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd

X 6 5 2 4 4
Y 5 2 2 3 1

Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF
d. Calculate SST, SSE, SSR, R-square
e. Explain the meaning of R-square
30
• E(X) = 21/5
• E(Y) = 13/5
• Var(X) = 1.76
• Var(Y) = 1.84
• Beta0 = 1/44 ; beta1 = 27/44
• SST = 9.2, SSE = 3.3, SSR = 5.9, R-square = 35.86%
It means that: Income can explain 35.86% of the sample
variation in consumption of people. So 64.14% of the the
sample variation in consumption of people is explained by
other independent variables that are not included in the
model.

31
3.2. The statistical properties of OLS estimators

• The OLS estimators are expressed solely in terms of


the observable (i.e., sample) quantities.

• They are point estimators; that is, given the sample,


each estimator will provide only a single (point) value
of the relevant population parameter.

• Once the OLS estimates are obtained from the sample


data, the sample regression line can be easily
obtained.
32
3.2. The statistical properties of OLS estimators
The regression line thus obtained has the following
properties:
1. It passes through the sample means of Y and X ( X , Y )

2. The mean value of the estimated Yˆi is equal to the


mean value of the actual Y
Yˆ  Y
3. The mean value of the residuals is zero.
n

 uˆ
i 1
i 0

33
3.2. The statistical properties of OLS estimators

4. The residuals 𝑢𝑖 are uncorrelated with the predicted


𝑌𝑖 :
n

 iuˆi  0

i 1

5. The residuals 𝑢𝑖 are uncorrelated with 𝑋𝑖 :


n

 uˆ X
i 1
i i 0

34
3.3. The sum of squares
• SST (Total Sum of Squares)
SST   (Yi  Y )2

• SSE: (Explained Sum of Squares)

SSE   (Yi  Y )
ˆ 2

• SSR: (Residual Sum of Squares)

Y Y 
n 2

SSR    u
2
i
i i
i 1

35
• SST is a measure of the total sample variation in the yi; that is,
it measures how spread out the yi are in the sample

• SSE measures the sample variation in the Yˆi

• SSR measures the sample variation in the u i

• The total variation in Y can always be expressed as the sum of


explained variation SSE and the unexplained variation SSR.
Thus,

SSE SSR
SST = SSE + SSR 1 
SST SST
36
3.4. Determination Coefficient (R-squared)
R2 is the fraction (percentage) of the sample variation in Y that
is explained by X
SSE SSR
R 
2
 1
SST SST
Simple linear regression Multiple linear regression

2 
n n n
( n  X iYi   X i  Yi ) 2   2

i Y
[()
Y Y
( Y)]
R2  i 1 i 1 i 1
R i

 
n n n n
[n  X i  (  X i ) ][n  Yi  (  Yi ) 2 ]
2 2 2
 2
 2
i 1 i 1 i 1 i 1
[Y
(i
Y)
]
[(Yi
Y)
]

37
Properties of R2

R22≤1
0≤ R ≤1
100* R2 is the percentage of the sample variation
in y that is explained by x
R2 =1: the data points all lie on the same line,OLS
provide a perfect fit to the data
R2 =0: X and Y has no relationship
Weakness: R2 increases when including more X in
the model, even they have no significant effect on Y.
=>Use adjusted-R2 to determine the variables added
in the model
38
3.5. AjustedR2

2 n 1
R  1  (1  R )
2

n  k 1

• When adding new variable into the model,


adjusted R2 increases  that variable should
be included in the model

39
3.6. Assumptions of the OLS
Assumption 1- Linear in parameters: In the PRF, the
dependent variable, y, is related to the independent
variable, x, and the error term, u, as
Y     X u
0 1

Assumption 2 – Random sampling: We have a


random sample of size n

Assumption 3 – Sample variation in the explanatory


variable: The sample outcomes on x, namely {xi , i =
1,…, n}, are not all the same value.
40
3.6. Assumptions of the OLS
• Assumption 4 – No perfect collinearity: In
the sample, there are no exact linear
relationships among the independent variables.

41
• Assumption 5: The error term has an expected value of
zero given any value of the explanatory variable. In other
words, E(u|X)=0.

 This assumption simply says that the factors not explicitly


included in the model, therefore subsumed in 𝑢𝑖 , do not
systematically affect the mean value of Y; the positive 𝑢𝑖
values cancel out the negative 𝑢𝑖 values so that their average
or mean effect on Y is zero.
 Geometrically, this assumption can be pictured as in Figure
3.3, which shows a few values of the variable X and Y
populations associated with each of the them. As shown,
each Y corresponding to a given X is distributed around its
mean value.
42
Figure 3.3. Conditional distribution of the
disturbances ui

43
Assumption 6 - Homoskedasticity: The error term ui
has the same variance given any value of the
independent variable. In other words,
var (ui/Xi)= E[ui- E(ui/Xi)]2= E(ui2/Xi)= σ2

Var(u) reflects the distribution of Y surrounding its


E(Y|X).

 This assumption means that Y corresponding to


various X values have the same variance. The variance
surrouding the regression line is the same across the X
values, it neither increases nor decreases as X varies.
44
Figure 3.4. The simple regression model under
homoskedasticity

45
• Consider figure 3.5, where the conditional variance of
population Y varies with X.
• Let Y represents for weekly consumption expenditure
and X represents for weekly income.
• Figure 3.4 and 3.5 show that as income increases, the
average consumption expenditure also increases.
• In figure 3.4, the variance of consumption expenditure
remains the same at all levels of income.
• In figure 3.5, it increases with the increases in income.
• Richer families on average consume more than poor
families. There is also more variability in the
consumption expenditure of the former.
46
Figure 3.5. The simple regression model under
heteroskedasticity

47
3.7. Properties of the OLS estimators -
Gauss-Markov Theorem

• The OLS estimators are the best, linear,


unbiased estimators (BLUE).

• The Gauss- Markov Theorem: Under the


OLS assumptions, the estimators are BLUE (
best, linear, unbiased estimators).

48
– Linear: OLS Estimators are linear functions
of a random variable

– Unbiased: E (ˆ j) = βj

– The best: Smallest variance among layers of


unbiased estimators

49
Theorem 1: Unbiasedness of OLS
Given assumptions, we have:
E (  )   , and E (  )  
0 0 1 1

for any values of  0and  1 . In other words,  0 is unbiased


for  0 , and  1 is unbiased for . 1

50
Unbiased

E ( ˆ2 )   2 ˆ2
(a) Phân phối mẫu của β2

51
Theorem 2: Sampling variances of the OLS estimators
Under assumptions 1 through 6,

2
Var( ˆ j )  n

( X  ) (1  R j )
2 2

i 1
ij X j

j =1,2,….,k

2 2
R is R of regression of X on other independent
j j

variable
Best - efficient

ˆ2

 2*

2 ˆ2 ,  2*
(c ) Phân phối mẫu của β2 và β2*

53
3.8. Measuring the accuracy of OLS estimates

Variance of estimators

2
Var( ˆ j )  n

 ( X ij  (1  R j )
2 2

i 1
X j
)

Standard deviation of estimators


sd ( ˆ j ) 
(1  R j )]1/2
2
[SST j
Standard error of the regression ( 𝜎 )

𝜎2 = ( 𝑢𝑖 2 )/(𝑛 − 𝑘 − 1)
𝑖=1

(n-k-1): degree of freedom; n: observations; k: number


of independent
Standard error of the regression:
𝜎
𝑠𝑒(𝛽𝑗 ) =
[𝑆𝑆𝑇𝑗 (1 − 𝑅𝑗 2 )1/2
3.9. The components of the OLS variances
2
Var( ˆ j )  n

 ( X ij  (1  R j )
2 2

i 1
X j
)

1. The error variance, 𝟐


• A larger 2 means larger 𝑣𝑎𝑟 𝑗
• More noise, larger 2 , makes it more difficult to
estimate the partial effect of any independent
variables on Y
• To reduce the error variance, there is only one
way is to add more explanatory variables
2. The total sample variance in 𝑿𝒋 ,
n

( X
i 1
ij
 X j
) 2

• The larger the total variation in 𝑥𝑗 is, the smaller


𝑣𝑎𝑟 𝑗 is

• Everything else being equal, for estimating 𝑗 we


prefer to have as much sample variation in 𝑋𝑗 as
possible

• Increase the sample size


3. The linear relationships among the
𝟐
independent variables 𝑹𝒋

• 𝑹𝟐𝒋 is the proportion of the total variation in 𝑋𝑗


that can be explained by other independent
variables

• 𝑹𝟐𝒋 =0, happens if, and only if, 𝑋𝑗 has zero


sample correlation
3.10. Units of measurement

How does changing the units of measurement of


the dependent and/or independent variables affect
OLS estimates?

59
3.10. Units of measurement

• Example: data set “CEO Salary and Return on Equity”

Salary: salary per year in thousands dollar of CEO


Roe: average return on equity in percentage

salary  963.191  18.501roe


=> When roe increases by 1%, salary per year of CEO is
expected to increase by 18.501 thousand usd

60
Case 1
• When salary is measured in usd  salarydol =
1000*salary
• The unit of roe is unchanged

salarydol  963191  18501roe

=> If the dependent variable is multiplies or divided by


the constant c, then the OLS intercept and slope
estimates are also multiplies or divided by c.

61
Case 2
• When the unit of salary unchanged
• The unit of roe changed: roedec = roe/100

salary  963.191  1850.1roedec


• Coefficient of roedec is 100 times greater than the
coefficient of roe in [1]
=> If the independent variable is divided or multiplied
by some non zero constant c, then the OLS slope
coefficient is multiplied or divided by c, respectively. The
intercept is unchanged.

62
3.11. “Linear” in regression model

• Linear regression requires “linear” only in


parameters, not variables.
Yi  1   2 X i  ui yes
1
Y  1   2  ui
X yes
Y  1  (1  2 )2 X  ui no
Ln(𝑌𝑖 ) = 𝛽1 + 𝛽2 /𝑋𝑖 + 𝑢𝑖 yes

63
3.12. Functional form

Model Dependent Independent Interpretation


variable variable of  1
Lin - lin y x
y   x
1

Lin-log y log(x)
y  ( /100)%x
1

Log-lin log(y) x
%y  (100 )x
1

Log-log log(y) log(x)


%y   %x
1

64
Data on “Wage and Education”
Wage: usd/ hour
Educ: years of education

1. Lin-lin:
wage  0,90  0,54educ
• Each year of education is expected to increase the wage
by 54 cent per hour.
• Because wage and education has a linear relation  the
effect of education on wage is the same for all the
education levesl (54 cent)  the effect of the 2nd
education level equals the 20th education level, for
example
65
2. Log-lin:

logwage  0,584  0,083educ


• Explanation:
%y  (100 )x
1
• Each year of higher education will increase wage by a
certain %  The change of wage increases when the
year of education increases  increasing return to
education
• Each year of higher education will increase wage by
8.3%.
• The higher education, the higher value/ effect
66
log(wage)=  + educ+u  wage  exp(   educ  u)
0 1 0 1

wage

u  0,   0
1

educ
67
3. Lin-log:
demand  0,584  94,3log( price)
• Explanation: y  ( 1 /100)%x

• When price of product X increases by 1% , demand for


that product decreases by 0,94 thousand of unit.

68
4. Log-log:
log(demand)  0,584  0,253log( price)

• Explanation: %y  ( )%x


1

• When price of product X increases by 1%, the demand


for X decreases by 0.25%

69

You might also like