Lecture 2:
The Linear Regression Model
Đinh Thi Thanh Binh, PhD
Faculty of International Economics, FTU
1
1. Introduction to regression model
• The term « regression » means «regression to
mediocrity»
• Defined by Galton (1886) when he studied the
relationship between the height of sons and the
height of fathers
2
Distribution of the height of sons respects
to the height of the fathers
3
The study shows that:
• Given the height of fathers, the height of sons will
distribute around a medium value
• On average, when the height of fathers increase, the
height of sons also increase
• If we conect all the medium points, we will have a
linear line
• This line is called regression line, showing the
relationship between the height of sons and the height
of father on average
4
2. Population Regression Function (PRF)
and Sample Regression Function (SRF)
2.1. Definition of PRF
PRF is a regression function that is constructed based
on the survey of the population
For example: For example, Galton studied the
relationship between the height of fathers and the height
of sons in one city. He collected the data of all fathers
having adult sons. So he can build PRF.
5
So, E(Y|Xi) is a function of independent variable Xi :
E(Y/Xi)= f(Xi) = β0+ β1Xi [1]
• The equation [1] is called Population regression
function (PRF).
– PRF shows how the expected value of Y changes at
different values of X
– If PRF has 1 independent variable simple
regression function
– If PRF has 2 or more independent variables
multiple regression function
6
• Suppose that PRF E(Y|Xi) is a linear function,
then:
E(Y|Xi)= β0+ β1Xi [2]
- β0, β1: regression coefficients/ parameters
• β0: constant coefficient
• β1: slope coefficient
• The equation [2] is a simple regression function
7
2.2. Error/ disturbance term
• Because E(Y|Xi) is expected value of Y given Xi,
single values of Yi is not necessary the same with
E(Y|Xi), but they are around E(Y/Xi).
• Note ui is the difference between Yi and E(Y/Xi), we
have:
ui= Yi- E(Y|Xi) [3]
Or : Yi= E(Y|Xi)+ ui [4]
ui is a random variable/ component or disturbance
8
2.3. Sample regression function (SRF)
• In reality, we can not carry out surveys of population
we can not build PRF
• Then we only can estimate the expected value of Y, or
in other words, estimate PRF based on sample(s)
taken from population
• Obviously the estimated SRF can not be absolutely
exact
The regression function that is constructed based on a
sample is called Sample Regression Function (SRF).
9
Graph 2.03. Scatter graph and regression line of the 2
samples SRF1 và SRF2
10
• From the population, we can get many samples.
With each sample, we can have a SRF
• To have the “best” SRF, meaning that that SRF is
the closest estimate of PRF, we have to base on
some criteria even when we do not have PRF to
compare.
11
2.3.Sample Regression Function (SRF) – Simple
PRF SRF
E (Y / X i ) 0 1 X i Yˆi ˆ0 ˆ1 X i
Yi E (Y / X i ) ui 0 1 X i ui
i i i
Y Yˆ ˆ
u ˆ
0 ˆ
1 X i ui
ˆ
• 𝑌𝑖 is an estimate of E(Y/Xi) and is a fitted value/
predicted value of Y
• 0 , 1 are estimates of β0, β1
• 𝑢𝑖 is an estimate of ui and is called as residuals
12
2.3.Sample Regression Function (SRF) – Multiple
• PRF:
E (Y / X 1 , X 2 ) 0 1 X 1 2 X 2 ... k X k
Y 0 1 X 1 2 X 2 ... k X k u
• SRF:
Yˆ ˆ0 ˆ1 X 1 ˆ2 X 2 ... ˆk X k
Y ˆ0 ˆ1 X 1 ˆ2 X 2 ... ˆk X k u
13
3. The Ordinary Least Square (OLS)
• The method OLS is invented by the German
mathematician - Carl Friedrich Gauss.
• It is used to estimate parameters given some
assumptions.
• The estimates have some properties (linearity,
unbiasedness, and efficiency).
• This method is used the most popularly now.
14
3.1. The Ordinary Least Square (OLS)
• Assume that:
PRF has the form: Yi= β0+ β1Xi+ ui [3.01]
• Because we can not have PRF, we have to estimate it
through SRF
SRF: 𝑌𝑖 = β0 + 1 𝑋𝑖 + 𝑢𝑖 = 𝑌𝑖 + 𝑢𝑖 [3.02]
Where Yˆi is the predicted/ fitted value of Yi
15
3.1. The Ordinary Least Square (OLS)
From [3.02], we have:
uˆi Yi Y i [3.03]
[3.03] ûi is the difference between the actual
value and predicted value of Yi.
If ûi is smaller and smaller, the difference
between Yi and Yˆi is smaller. Then, the estimated
value Yˆi is closer to Yi.
16
3.1. The Ordinary Least Square (OLS)
• Suppose that we have n observations of Y and X,
we try to find SRF so that Yˆi is closest to Y.
• It means that we have to choose SRF so that the
sum of residuals:
n n
uˆi (Yi Yˆi )
i 1 i 1
has the minimum value
• However, this is not the best choice because of
some following reasons: 17
Y SRF
Yi
Yˆi ˆ1 ˆ2 X i
û3
û1 û 4
û 2
X1 X2 X3 X4 X
18
3.1. The Ordinary Least Square (OLS)
We can overcome all these problems by choosing
SRF so that:
n n n
i i i i 0 1 i
ˆ
u 2
i 1
(Y
i 1
ˆ
Y ) 2
(Y
i 1
ˆ ˆ X ) 2
n
has minimum value. Of which, i
ˆ
u 2
i 1
is the sum of
squared residuals (SSR).
19
3.1. The Ordinary Least Square (OLS)
n
have i is
ˆ 2
From the equation [3.03], we u a function
i 1
of ˆ0 and ˆ1
n n
i
ˆ
u 2
i 1
f ( 0 1 i 0 1 i
ˆ , ˆ ) (Y ˆ ˆ X )2
i 1
20
3.1. The Ordinary Least Square (OLS)
• We have known that, f(X) is minimized when:
f '(X ) 0
↔
f ''(X ) 0
21
3.1. The Ordinary Least Square (OLS)
n
• So, the function uˆi2 is minimized when:
i 1
f ' (u ) 0
f ' ' (u ) 0
22
• Then, we have ˆ0 and ˆ1 are the results of the
equations:
f ( ˆ0 , ˆ1 ) n
2(Yi ˆ0 ˆ1 X i )(1) 0
ˆ0 i 1
n n
nˆ0 ˆ1 X i Yi
i 1 i 1
23
f ( ˆ0 , ˆ1 ) n
2(Yi ˆ0 ˆ1 X i )( X i ) 0
ˆ 1 i 1
n n n
ˆ 0 X i 1 X i Yi X i
ˆ 2
i 1 i 1 i 1
24
• In sum:
n n
nˆ0 ˆ1 X i Yi
i 1 i 1
[3.05] n n n
ˆ 0 X i 1 X i Yi X i
ˆ 2
i 1 i 1 i 1
25
•Solve the equations [3.05], we have:
n n n n
n X iYi X i Yi (X i X )(Yi Y )
ˆ1 i 1
n
i 1
n
i 1
i 1
n
[3.06] n X ( X i )
i
2 2
i
( X X ) 2
i 1 i 1 i 1
n n n n
X Y X X Y
i
2
i i i i
ˆ0 i 1 i 1
n
i 1
n
i 1
Y ˆ1 X
n X ( X i ) i
2 2
i 1 i 1
26
Example
A random sample as followings:
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd
Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF 27
Example
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd
X 5 4 2 8 8
Y 1 2 3 4 5
Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF
d. Calculate SST, SSE, SSR, R-square
e. Explain the meaning of R-square
28
• Beta0 = 69/68; beta1 = 25/68
• SST = 10, SSE = 125/34, SSR = 215/34, R-square =
0.3676
• R-square = 0.3676
It means that: Income can explain 36.76% of the sample
variation in consumption of people. So 63.24% of the
the sample variation in consumption of people is
explained by other independent variables that are not
included in the model.
29
Example
X: Personal income/ day in thousand vnd
Y: Personal consumption/ day in thousand vnd
X 6 5 2 4 4
Y 5 2 2 3 1
Questions:
a. Calculate the main properties of X and Y (Expected value,
variance, median, mod)
b. Estimate the parameters of the SRF
c. Write SRF
d. Calculate SST, SSE, SSR, R-square
e. Explain the meaning of R-square
30
• E(X) = 21/5
• E(Y) = 13/5
• Var(X) = 1.76
• Var(Y) = 1.84
• Beta0 = 1/44 ; beta1 = 27/44
• SST = 9.2, SSE = 3.3, SSR = 5.9, R-square = 35.86%
It means that: Income can explain 35.86% of the sample
variation in consumption of people. So 64.14% of the the
sample variation in consumption of people is explained by
other independent variables that are not included in the
model.
31
3.2. The statistical properties of OLS estimators
• The OLS estimators are expressed solely in terms of
the observable (i.e., sample) quantities.
• They are point estimators; that is, given the sample,
each estimator will provide only a single (point) value
of the relevant population parameter.
• Once the OLS estimates are obtained from the sample
data, the sample regression line can be easily
obtained.
32
3.2. The statistical properties of OLS estimators
The regression line thus obtained has the following
properties:
1. It passes through the sample means of Y and X ( X , Y )
2. The mean value of the estimated Yˆi is equal to the
mean value of the actual Y
Yˆ Y
3. The mean value of the residuals is zero.
n
uˆ
i 1
i 0
33
3.2. The statistical properties of OLS estimators
4. The residuals 𝑢𝑖 are uncorrelated with the predicted
𝑌𝑖 :
n
iuˆi 0
Yˆ
i 1
5. The residuals 𝑢𝑖 are uncorrelated with 𝑋𝑖 :
n
uˆ X
i 1
i i 0
34
3.3. The sum of squares
• SST (Total Sum of Squares)
SST (Yi Y )2
• SSE: (Explained Sum of Squares)
SSE (Yi Y )
ˆ 2
• SSR: (Residual Sum of Squares)
Y Y
n 2
SSR u
2
i
i i
i 1
35
• SST is a measure of the total sample variation in the yi; that is,
it measures how spread out the yi are in the sample
• SSE measures the sample variation in the Yˆi
• SSR measures the sample variation in the u i
• The total variation in Y can always be expressed as the sum of
explained variation SSE and the unexplained variation SSR.
Thus,
SSE SSR
SST = SSE + SSR 1
SST SST
36
3.4. Determination Coefficient (R-squared)
R2 is the fraction (percentage) of the sample variation in Y that
is explained by X
SSE SSR
R
2
1
SST SST
Simple linear regression Multiple linear regression
2
n n n
( n X iYi X i Yi ) 2 2
i Y
[()
Y Y
( Y)]
R2 i 1 i 1 i 1
R i
n n n n
[n X i ( X i ) ][n Yi ( Yi ) 2 ]
2 2 2
2
2
i 1 i 1 i 1 i 1
[Y
(i
Y)
]
[(Yi
Y)
]
37
Properties of R2
R22≤1
0≤ R ≤1
100* R2 is the percentage of the sample variation
in y that is explained by x
R2 =1: the data points all lie on the same line,OLS
provide a perfect fit to the data
R2 =0: X and Y has no relationship
Weakness: R2 increases when including more X in
the model, even they have no significant effect on Y.
=>Use adjusted-R2 to determine the variables added
in the model
38
3.5. AjustedR2
2 n 1
R 1 (1 R )
2
n k 1
• When adding new variable into the model,
adjusted R2 increases that variable should
be included in the model
39
3.6. Assumptions of the OLS
Assumption 1- Linear in parameters: In the PRF, the
dependent variable, y, is related to the independent
variable, x, and the error term, u, as
Y X u
0 1
Assumption 2 – Random sampling: We have a
random sample of size n
Assumption 3 – Sample variation in the explanatory
variable: The sample outcomes on x, namely {xi , i =
1,…, n}, are not all the same value.
40
3.6. Assumptions of the OLS
• Assumption 4 – No perfect collinearity: In
the sample, there are no exact linear
relationships among the independent variables.
41
• Assumption 5: The error term has an expected value of
zero given any value of the explanatory variable. In other
words, E(u|X)=0.
This assumption simply says that the factors not explicitly
included in the model, therefore subsumed in 𝑢𝑖 , do not
systematically affect the mean value of Y; the positive 𝑢𝑖
values cancel out the negative 𝑢𝑖 values so that their average
or mean effect on Y is zero.
Geometrically, this assumption can be pictured as in Figure
3.3, which shows a few values of the variable X and Y
populations associated with each of the them. As shown,
each Y corresponding to a given X is distributed around its
mean value.
42
Figure 3.3. Conditional distribution of the
disturbances ui
43
Assumption 6 - Homoskedasticity: The error term ui
has the same variance given any value of the
independent variable. In other words,
var (ui/Xi)= E[ui- E(ui/Xi)]2= E(ui2/Xi)= σ2
Var(u) reflects the distribution of Y surrounding its
E(Y|X).
This assumption means that Y corresponding to
various X values have the same variance. The variance
surrouding the regression line is the same across the X
values, it neither increases nor decreases as X varies.
44
Figure 3.4. The simple regression model under
homoskedasticity
45
• Consider figure 3.5, where the conditional variance of
population Y varies with X.
• Let Y represents for weekly consumption expenditure
and X represents for weekly income.
• Figure 3.4 and 3.5 show that as income increases, the
average consumption expenditure also increases.
• In figure 3.4, the variance of consumption expenditure
remains the same at all levels of income.
• In figure 3.5, it increases with the increases in income.
• Richer families on average consume more than poor
families. There is also more variability in the
consumption expenditure of the former.
46
Figure 3.5. The simple regression model under
heteroskedasticity
47
3.7. Properties of the OLS estimators -
Gauss-Markov Theorem
• The OLS estimators are the best, linear,
unbiased estimators (BLUE).
• The Gauss- Markov Theorem: Under the
OLS assumptions, the estimators are BLUE (
best, linear, unbiased estimators).
48
– Linear: OLS Estimators are linear functions
of a random variable
– Unbiased: E (ˆ j) = βj
– The best: Smallest variance among layers of
unbiased estimators
49
Theorem 1: Unbiasedness of OLS
Given assumptions, we have:
E ( ) , and E ( )
0 0 1 1
for any values of 0and 1 . In other words, 0 is unbiased
for 0 , and 1 is unbiased for . 1
50
Unbiased
E ( ˆ2 ) 2 ˆ2
(a) Phân phối mẫu của β2
51
Theorem 2: Sampling variances of the OLS estimators
Under assumptions 1 through 6,
2
Var( ˆ j ) n
( X ) (1 R j )
2 2
i 1
ij X j
j =1,2,….,k
2 2
R is R of regression of X on other independent
j j
variable
Best - efficient
ˆ2
2*
2 ˆ2 , 2*
(c ) Phân phối mẫu của β2 và β2*
53
3.8. Measuring the accuracy of OLS estimates
Variance of estimators
2
Var( ˆ j ) n
( X ij (1 R j )
2 2
i 1
X j
)
Standard deviation of estimators
sd ( ˆ j )
(1 R j )]1/2
2
[SST j
Standard error of the regression ( 𝜎 )
𝜎2 = ( 𝑢𝑖 2 )/(𝑛 − 𝑘 − 1)
𝑖=1
(n-k-1): degree of freedom; n: observations; k: number
of independent
Standard error of the regression:
𝜎
𝑠𝑒(𝛽𝑗 ) =
[𝑆𝑆𝑇𝑗 (1 − 𝑅𝑗 2 )1/2
3.9. The components of the OLS variances
2
Var( ˆ j ) n
( X ij (1 R j )
2 2
i 1
X j
)
1. The error variance, 𝟐
• A larger 2 means larger 𝑣𝑎𝑟 𝑗
• More noise, larger 2 , makes it more difficult to
estimate the partial effect of any independent
variables on Y
• To reduce the error variance, there is only one
way is to add more explanatory variables
2. The total sample variance in 𝑿𝒋 ,
n
( X
i 1
ij
X j
) 2
• The larger the total variation in 𝑥𝑗 is, the smaller
𝑣𝑎𝑟 𝑗 is
• Everything else being equal, for estimating 𝑗 we
prefer to have as much sample variation in 𝑋𝑗 as
possible
• Increase the sample size
3. The linear relationships among the
𝟐
independent variables 𝑹𝒋
• 𝑹𝟐𝒋 is the proportion of the total variation in 𝑋𝑗
that can be explained by other independent
variables
• 𝑹𝟐𝒋 =0, happens if, and only if, 𝑋𝑗 has zero
sample correlation
3.10. Units of measurement
How does changing the units of measurement of
the dependent and/or independent variables affect
OLS estimates?
59
3.10. Units of measurement
• Example: data set “CEO Salary and Return on Equity”
Salary: salary per year in thousands dollar of CEO
Roe: average return on equity in percentage
salary 963.191 18.501roe
=> When roe increases by 1%, salary per year of CEO is
expected to increase by 18.501 thousand usd
60
Case 1
• When salary is measured in usd salarydol =
1000*salary
• The unit of roe is unchanged
salarydol 963191 18501roe
=> If the dependent variable is multiplies or divided by
the constant c, then the OLS intercept and slope
estimates are also multiplies or divided by c.
61
Case 2
• When the unit of salary unchanged
• The unit of roe changed: roedec = roe/100
salary 963.191 1850.1roedec
• Coefficient of roedec is 100 times greater than the
coefficient of roe in [1]
=> If the independent variable is divided or multiplied
by some non zero constant c, then the OLS slope
coefficient is multiplied or divided by c, respectively. The
intercept is unchanged.
62
3.11. “Linear” in regression model
• Linear regression requires “linear” only in
parameters, not variables.
Yi 1 2 X i ui yes
1
Y 1 2 ui
X yes
Y 1 (1 2 )2 X ui no
Ln(𝑌𝑖 ) = 𝛽1 + 𝛽2 /𝑋𝑖 + 𝑢𝑖 yes
63
3.12. Functional form
Model Dependent Independent Interpretation
variable variable of 1
Lin - lin y x
y x
1
Lin-log y log(x)
y ( /100)%x
1
Log-lin log(y) x
%y (100 )x
1
Log-log log(y) log(x)
%y %x
1
64
Data on “Wage and Education”
Wage: usd/ hour
Educ: years of education
1. Lin-lin:
wage 0,90 0,54educ
• Each year of education is expected to increase the wage
by 54 cent per hour.
• Because wage and education has a linear relation the
effect of education on wage is the same for all the
education levesl (54 cent) the effect of the 2nd
education level equals the 20th education level, for
example
65
2. Log-lin:
logwage 0,584 0,083educ
• Explanation:
%y (100 )x
1
• Each year of higher education will increase wage by a
certain % The change of wage increases when the
year of education increases increasing return to
education
• Each year of higher education will increase wage by
8.3%.
• The higher education, the higher value/ effect
66
log(wage)= + educ+u wage exp( educ u)
0 1 0 1
wage
u 0, 0
1
educ
67
3. Lin-log:
demand 0,584 94,3log( price)
• Explanation: y ( 1 /100)%x
• When price of product X increases by 1% , demand for
that product decreases by 0,94 thousand of unit.
68
4. Log-log:
log(demand) 0,584 0,253log( price)
• Explanation: %y ( )%x
1
• When price of product X increases by 1%, the demand
for X decreases by 0.25%
69