MULTIPLE LINEAR REGRESSION MODELS
DEBRE BERHAN UNIVERSITY
COLLEGE OF BUSINESS AND ECONOMICS
DEPARTMENT OF ECONOMICS
Solomon Estifanos
MARCH, 2022
DEBRE BERHAN, ETHIOPIA
Outline
Multiple Linear regression Models
Method of Ordinary Least Squares revised
Partial Correlation Coefficients & their Interpretation
Coefficient of Multiple Determination
Properties of Least Squares and Gauss-Markov Theorem
Hypothesis Testing in Multiple Linear Regression
Predictions using Multiple Linear Regression
2
Multiple Liner Regression Models
In the preceding chapter, we discussed the simple linear regression in which we
considered a dependent variable to be a function of one independent variable.
In real life, however, a dependent variable is a function of many explanatory
variables.
For instance, in demand studies we study the relationship between quantity
demanded of a good and price of the good, price of substitute goods and the
consumer’s income. The model we assume is:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑷𝟏 + 𝜷𝟐 𝑷𝟐 + 𝜷𝟑 𝑿𝒊 + 𝒖𝒊
Where quantity demanded, P1 is price of the good, P2 is price of substitute goods, Xi is
consumer’s income, 𝜷𝟎 , 𝜷𝟏 , 𝜷𝟐 𝒂𝒏𝒅 𝜷𝟑 are unknown parameters and 𝒖𝒊 is the disturbance.
3
Method of Ordinary Least Squares Revised
Assumptions of Multiple Linear Regression Model
1. Randomness of the error term: The variable u is a real random variable.
2. Zero mean of the error term: 𝐸 𝑢𝑖 = 0
3. Homoscedasticity: The variance of each is the same for all the values. i.e. 𝐸(𝑢𝑖 2 ) = 𝜎𝑢 2 (constant)
4. Normality of u: The values of each are normally distributed. i.e. 𝑈𝑖 ~𝑁(0, 𝜎 2)
5. No auto or serial correlation: The values of 𝑢𝑖 (corresponding to ) are independent from the values of any other
(corresponding to Xj ) for i j. i.e. 𝐸(𝑢𝑖 𝑢𝑗 ) = 0 for i j
6. Independence of 𝒖𝒊 and Xi : Every disturbance term is independent of the explanatory variables.
i.e. 𝐸(𝑢𝑖 𝑋1𝑖 ) = 𝐸(𝑢𝑖 𝑋2𝑖 ) = 0
7. No perfect multicollinearity: The explanatory variables are not perfectly linearly correlated. Note that we can’t
exclusively list all the assumptions but the above assumptions are some of the basic assumptions that enable us to
proceed our analysis.
4
A Model with Two Explanatory Variables
Estimation of parameters of two-explanatory variables model
The model:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝑼𝒊
is multiple regression with two explanatory variables. The expected value of the above
model is called population regression equation i.e.
𝑬(𝒀𝒊 ) = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 , Since𝑬(𝑼𝒊 ) = 𝟎.
Given sample observation on 𝒀,𝑿𝟏 & 𝑿𝟐, we estimate 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝑼𝒊
using the method of least square (OLS).
𝟎 + 𝜷
𝒊 = 𝜷
𝒀 𝟏 𝑿𝟏𝒊 + 𝜷
𝟐 𝑿𝟐𝒊
is sample relation between 𝒀, 𝑿𝟏 & 𝑿𝟐
5 𝟎 − 𝜷
𝒊 = 𝒀𝒊 − 𝜷
𝒆𝒊 = 𝒀𝒊 − 𝒀 𝟏 𝑿𝟏𝒊 − 𝜷
𝟐 𝑿𝟐𝒊
Estimation of parameters of two-explanatory variables model Con’t . . .
To obtain expressions for the least square estimators, we partially
differentiate σ 𝑒𝑖2 with respect to 𝛽መ0 , 𝛽መ1 𝑎𝑛𝑑 𝛽መ2 set the partial derivatives
equal to zero.
𝝏 σ 𝒆𝟐𝒊
𝟎 − 𝜷
= −𝟐 𝒀𝒊 − 𝜷 𝟏 𝑿𝟏𝒊 − 𝜷
𝟐 𝑿𝟐𝒊 = 𝟎
𝝏𝜷𝟎
𝝏 σ 𝒆𝟐𝒊
𝟎 − 𝜷
= −𝟐 𝑿𝟏𝒊 𝒀𝒊 − 𝜷 𝟏 𝑿𝟏𝒊 − 𝜷
𝟏 𝑿𝟏𝒊 = 𝟎
𝝏𝜷𝟏
𝝏 σ 𝒆𝟐𝒊
𝟎 − 𝜷
= −𝟐 𝑿𝟐𝒊 𝒀𝒊 − 𝜷 𝟏 𝑿𝟏𝒊 − 𝜷
𝟐 𝑿𝟐𝒊 = 𝟎
6 𝟐
𝝏𝜷
Estimation of parameters of two-explanatory variables model Con’t . . .
The multiple regression equation produces three Normal Equations:
𝟎 + 𝜷
𝒀 = 𝒏𝜷 𝟏 𝜮𝑿𝟏𝒊 + 𝜷
𝟐 𝜮𝑿𝟐𝒊
𝟎 𝜮𝑿𝟏𝒊 + 𝜷
𝑿𝟏𝒊 𝒀𝒊 = 𝜷 𝟏 𝜮𝑿𝟐𝟏𝒊 + 𝜷
𝟐 𝜮𝑿𝟏𝒊 𝑿𝟏𝒊
𝟎 𝜮𝑿𝟐𝒊 + 𝜷
𝑿𝟐𝒊 𝒀𝒊 = 𝜷 𝟏 𝜮𝑿𝟏𝒊 𝑿𝟐𝒊 + 𝜷
𝟐 𝜮𝑿𝟐𝟐𝒊
From the first equation we obtain 𝛽መ0 :
𝟎 = 𝒀ሜ − 𝜷
𝜷 𝟏 𝑿ሜ 𝟏 − 𝜷
𝟐 𝑿ሜ 𝟐
7
Estimation of parameters of two-explanatory variables model Con’t . . .
𝟎 = 𝒀ሜ − 𝜷
Substituting 𝜷 𝟏 𝑿ሜ 𝟏 − 𝜷
𝟐 𝑿ሜ 𝟐 in to;
𝟎 𝜮𝑿𝟏𝒊 + 𝜷
𝑿𝟏𝒊 𝒀𝒊 = 𝜷 𝟏 𝜮𝑿𝟐𝟏𝒊 + 𝜷
𝟐 𝜮𝑿𝟏𝒊 𝑿𝟏𝒊
𝟎 𝜮𝑿𝟐𝒊 + 𝜷
𝑿𝟐𝒊 𝒀𝒊 = 𝜷 𝟏 𝜮𝑿𝟏𝒊 𝑿𝟐𝒊 + 𝜷
𝟐 𝜮𝑿𝟐𝟐𝒊
We obtain that;
𝟏 𝜮𝒙𝟏 𝟐 + 𝜷
𝒙𝟏 𝒚 = 𝜷 𝟐 𝜮𝒙𝟏 𝒙𝟐
𝟏 𝜮𝒙𝟏 𝒙𝟐 + 𝜷
𝒙𝟐 𝒚 = 𝜷 𝟐 𝜮𝒙𝟐 𝟐
8
Estimation of parameters of two-explanatory variables model Con’t . . .
We can rewrite the above two equations in matrix form as follows;
σ 𝒙𝟐𝟏𝒊 σ 𝒙𝟏𝒊 𝒙𝟐𝒊 𝜷𝟏 σ 𝒚𝒊 𝒙𝟏𝒊
=
σ 𝒙𝟏𝒊 𝒙𝟐𝒊 𝟐
σ 𝒙𝟐𝒊 𝜷𝟐 σ 𝒚𝒊 𝒙𝟐𝒊
Use Cramer’s rule to solve the above matrix we obtain;
𝜮𝒙𝟏 𝒚. 𝜮𝒙𝟐 𝟐 − 𝜮𝒙𝟏 𝒙𝟐 . 𝜮𝒙𝟐 𝒚
𝟏 =
𝜷
𝜮𝒙𝟏 𝟐 . 𝜮𝒙𝟐 𝟐 − (𝜮(𝒙𝟏 𝒙𝟐 ))𝟐
𝜮𝒙 𝒚. 𝜮𝒙 𝟐 − 𝜮𝒙 𝒙 . 𝜮𝒙 𝒚
𝜷𝟐 = 𝟐 𝟐 𝟏 𝟐 𝟏 𝟐 𝟏
𝜮𝒙𝟏 . 𝜮𝒙𝟐 − (𝜮(𝒙𝟏 𝒙𝟐 ))𝟐
9
The Coefficient of Determination (R2): Two Explanatory Variables Case
In the simple regression model, we introduced R2 as a measure of the
proportion of variation in the dependent variable that is explained by
variation in the explanatory variable.
In multiple regression model the same measure is relevant, and the same
formulas are valid but now we talk of the proportion of variation in the
dependent variable explained by all explanatory variables included in the
model. The coefficient of determination is:
𝑬𝑺𝑺 𝑹𝑺𝑺 𝜮𝒆 𝟐
𝟐 𝒊
𝑹 = =𝟏− =𝟏−
𝑻𝑺𝑺 𝑻𝑺𝑺 𝜮𝒚𝒊 𝟐
10
The Coefficient of Determination (R2) Con’t . . .
ต2
𝛴𝑦 = 𝛽መ1 𝛴𝑥1𝑖 𝑦𝑖 + 𝛽መ2 𝛴𝑥2𝑖 𝑦𝑖 + 𝛴
ถ 𝑒𝑖 2
𝑇𝑜𝑡𝑎𝑙𝑠𝑢𝑚𝑜𝑓 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑠𝑢𝑚𝑜𝑓 𝑅𝑒 𝑠𝑖𝑑𝑢𝑎𝑙𝑠𝑢𝑚 𝑜𝑓𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑠𝑞𝑢𝑎𝑟𝑒(𝑇𝑜𝑡𝑎𝑙 𝑠𝑞𝑢𝑎𝑟𝑒(𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 (𝑢𝑛 𝑒𝑥𝑝 𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟 𝑖𝑎𝑡𝑖𝑜𝑛)
𝑣𝑎𝑟 𝑖𝑎𝑡𝑖𝑜𝑛) 𝑣𝑎𝑟 𝑖𝑎𝑡𝑖𝑜𝑛)
𝛽 𝛴𝑥 𝑦 + 𝛽 𝛴𝑥 𝑦
𝐸𝑆𝑆
𝑅2 = = 1 1𝑖 𝑖 2 2 2𝑖 𝑖
𝑇𝑆𝑆 𝛴𝑦
If R2 is high, that means there is a close association between the values of 𝑌𝑖
and the values of predicted by the model, 𝑌 . In this case, the model is said to
𝑖
“fit” the data well.
If R2 is low, there is no association between the values of 𝑌𝑖 and the values
predicted by the model, 𝑌 and the model does not fit the data well.
𝑖
11
Adjusted Coefficient of Determination (𝑅ሜ 2 )
One difficulty with 𝑅 2 is that it can be made large by adding more and more variables,
even if the variables added have no economic justification.
Algebraically, it is the fact that as the variables are added the sum of squared errors (RSS)
goes down (it can remain unchanged, but this is rare) and thus 𝑅2 goes up. If the model
contains n-1 variables then 𝑅2 =1.
The manipulation of model just to obtain a high 𝑅 2 is not wise. An alternative measure of
goodness of fit, called the adjusted 𝑅2 and often symbolized as 𝑅ሜ 2 , is usually reported by
regression programs. It is computed as:
𝟐
𝜮𝒆 /𝒏 − 𝒌 𝒏−𝟏
ሜ𝑹𝟐 = 𝟏 − 𝒊 𝟐
= 𝟏 − (𝟏 − 𝑹 )
𝟐
𝜮𝒚 /𝒏 − 𝟏 𝒏−𝒌
12
Adjusted Coefficient of Determination (𝑹ሜ 𝟐 ) Con’t . . .
This measure does not always goes up when a variable is added because of the
degree of freedom term n-k is the numerator.
As the number of variables k increases, RSS goes down, but so does n-k. The effect
on 𝑅ሜ 2 depends on the amount by which 𝑅2 falls.
While solving one problem, this corrected measure of goodness of fit unfortunately
introduces another one. It losses its interpretation; 𝑅ሜ 2 is no longer the percent of
variation explained.
This modified 𝑅ሜ 2 is sometimes used and misused as a device for selecting the
appropriate set of explanatory variables.
13
Hypothesis Testing in Multiple Regression Model
Tests of Individual Significance
If we invoke the assumption that 𝑼𝒊 ~𝑵(𝟎, 𝝈𝟐 ) , then we can use either the t-test or
standard error test to test a hypothesis about any individual partial regression coefficient.
To illustrate consider the following example.
𝟎 + 𝜷
Let, 𝒀 = 𝜷 𝟏 𝑿𝟏 + 𝜷
𝟐 𝑿𝟐 + 𝒆𝒊
A. 𝑯𝟎 : 𝜷𝟏 = 𝟎 B. 𝑯𝟎 : 𝜷𝟐 = 𝟎
𝑯𝟏 : 𝜷𝟏 ≠ 𝟎 𝑯𝟏 : 𝜷𝟐 ≠ 𝟎
The null hypothesis (A) states that, holding X2 constant X1 has no (linear) influence on Y.
Similarly hypothesis (B) states that holding X1 constant, X2 has no influence on the
dependent variable Yi. To test these null hypothesis we will use the following tests:
14
Standard Error Test
Find the standard errors of the parameter estimates
1 𝑋ത12 σ 𝑥22 + 𝑋ത22 σ 𝑥12 − 2𝑋ത1 𝑋ത2 σ 𝑥1 𝑥2 2
𝑉𝑎𝑟 𝛽መ0 = + 𝛿
𝑛 σ 𝑥12 σ 𝑥22 − (σ 𝑥1 𝑥2 )2
𝛿 2 σ 𝑥22 𝛿 2 σ 𝑥12
𝑉𝑎𝑟 𝛽መ1 = σ and 𝑉𝑎𝑟 𝛽መ
2 =
𝑥12 σ 𝑥22 −(σ 𝑥1 𝑥2 )2 2 σ 𝑥 2 −(σ 𝑥 𝑥 )2
σ 𝑥1 2 1 2
𝛴𝑒𝑖2
𝑆𝐸(𝛽መ1 ) = 𝑣𝑎𝑟( 𝛽መ𝑗 ) ; where 𝜎ො 2 =
𝑛−3
Decision Rule:
• If 𝑆𝐸(𝛽መ1 ) > 1Τ2 𝛽መ1 , we do not reject the null hypothesis that is, we can conclude that 𝛽1 is not
statistically significant.
• If 𝑆𝐸(𝛽መ1 ) < 1Τ2 𝛽መ1 , we reject the null hypothesis that is, we can conclude that the estimate𝛽1 is
statistically significant.
15
Note: The smaller the standard errors, the stronger the evidence that the estimates are statistically reliable.
Student t-test
𝒊 − 𝜷
𝜷
𝒕𝒄𝒂𝒍 = ~t n−k
𝑺𝑬(𝜷𝒊 )
Where n is number of observation and k is number of parameters. If we have 3 parameters, the
degree of freedom will be n-3. So;
In our null hypothesis 𝛽2 = 0, the t* becomes:
𝟐
𝜷
𝒕𝒄𝒂𝒍 =
𝟐)
𝑺𝑬(𝜷
• If |𝒕𝒄𝒂𝒍 | < 𝒕𝒕𝒂𝒃 , we do not reject the null hypothesis, i.e. we can conclude that 𝛽መ2 is not significant
and hence the regressor does not appear to contribute to the explanation of the variations in Y.
• If |𝒕𝒄𝒂𝒍 | > 𝒕𝒕𝒂𝒃 , we reject the null hypothesis, i.e. 𝛽መ2 is statistically significant. Thus, the greater
the value of 𝒕𝒄𝒂𝒍 the stronger the evidence that 𝛽መ2 is statistically significant.
16
Test of Overall Significance
Throughout the previous section we were concerned with testing the
significance of the estimated partial regression coefficients individually, i.e.
under the separate hypothesis that each of the true population partial
regression coefficients was zero.
In this section we extend this idea to joint test of the relevance of all the
included explanatory variables. Now consider the following:
𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 +. . . . . . . . . +𝜷𝒌 𝑿𝒌 + 𝑼𝒊
17
Test of Overall Significance Con’t . . .
𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 =. . . . . . . . . . . . = 𝜷𝒌 = 𝟎
𝑯𝟏 : at least one of the 𝜷𝒌 is non-zero
This null hypothesis is a joint hypothesis that 𝛽1 , 𝛽2 , . . . . . . . . 𝛽𝑘 are jointly or
simultaneously equal to zero. A test of such a hypothesis is called a test of
overall significance of the observed or estimated regression line, that is,
whether Y is linearly related to𝑋1 , 𝑋2 , . . . . . . . . 𝑋𝑘 .
18
Test of Overall Significance Con’t . . .
The overall significance test for multiple linear regression follows F-distribution and the
formula needed to compute the F-calculated value is
(𝑇𝑆𝑆−𝑅𝑆𝑆)/𝑘−1 𝐸𝑆𝑆/𝑘−1
𝐹= Or 𝐹 = ~𝐹(𝑘−1, 𝑛−𝑘)
𝑅𝑆𝑆/𝑛−𝑘 𝑅𝑆𝑆/𝑛−𝑘
𝐸𝑆𝑆
/𝑘−1 𝑅 2 /𝑘−1
𝑇𝑆𝑆
𝐹= 𝑅𝑆𝑆 Or 𝐹=
/𝑛−𝑘 1−𝑅 2 /𝑛−𝑘
𝑇𝑆𝑆
• If 𝑭𝒄𝒂𝒍 > 𝑭𝒕𝒂𝒃 , we reject the null hypothesis, i.e. then the parameters of the model are
jointly significant or the dependent variable Y is linearly related to the independent
variables included in the model.
• If 𝑭𝒄𝒂𝒍 < 𝑭𝒕𝒂𝒃 , we do not reject the null hypothesis, i.e. then the parameters of the model
are not jointly significant or the dependent variable Y is not linearly related to the
19 independent variables included in the model.
Predictions Using Multiple Linear Regressions
The formulas for prediction in the multiple regression are similar to those in the case of simple
regression. Let the estimated regression be
𝒊 = 𝜶
𝒀 𝟏 𝑿𝟏 + 𝜷
ෝ +𝜷 𝟐 𝑿𝟐
Now consider the prediction of Y0 of Y given values X10 of X1, and X20 of X2, respectively. Then we
predict the corresponding value of by
𝑌0 = 𝛼 + 𝛽1 𝑋10 + 𝛽2 𝑋20 + 𝑈0 where 𝑈0 𝑖𝑠 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚
Consider
1 𝑋1 + 𝛽
𝑖 = 𝛼ෝ + 𝛽
𝑌 2 𝑋2
Hence, the prediction error is 𝑌0 − 𝑌0 = 𝛼ෝ – 𝛼 + (𝛽 1 −𝛽1 )𝑋1 +(𝛽
2 −𝛽2 )𝑋2 −𝑈0
1 − 𝛽1 = 0, 𝐸 𝛽
Since E 𝛼ෝ – 𝛼 = 0, 𝐸 𝛽 2 − 𝛽2 = 0 𝑎𝑛𝑑 𝐸(𝑈0 ) = 0, we have 𝐸 𝑌0 − 𝑌0 = 0
0 ) = 𝐸(𝑌0 , since both 𝑌0 𝑎𝑛𝑑 𝑌0 are random
Recall that the predictant is unbiased in the sense that 𝐸 𝑌
variables.
20
Example
From the table below, agricultural output is explained by amount of fertilizer
used and size of rainfall obtained. If the above econometrics model is
expressed as;
𝐘 𝐘𝐢𝐞𝐥𝐝 = 𝛃𝟎 + 𝛃𝟏 𝐟𝐞𝐫𝐭𝐢𝐥𝐢𝐳𝐞𝐫 + 𝛃𝟐 𝐫𝐚𝐢𝐧𝐟𝐚𝐥𝐥 + 𝐞
A) Examine the significance of each independent variable to affect yields at
5% level of significance
B) Examine the joint significance of the independent variables to affect
yields at 5% level of significance
21
C) Calculate the coefficient of determination and interpret the result
Example Con’t . . .
Obs Yield Fertilizer Rainfall 𝒀−𝒀ഥ ഥ𝟏
𝑿𝟏 − 𝑿 ഥ𝟐
𝑿𝟐 − 𝑿 𝒚𝟐 𝒙𝟐𝟏 𝒙𝟐𝟐 𝒚𝒙𝟏 𝒚𝒙𝟐 𝒙𝟏 𝒙𝟐 ෝ
𝒚 𝒆ො ෝ𝟐
𝒚 𝒆ො 𝟐
(Y) (X1) (X2 ) =𝒚 = 𝒙𝟏 = 𝒙𝟐
1 40 100 10 -21 -360 -10.5 441 129600 110.25 7560 220.5 3780 40.548 -0.548 418.3048 0.300304
2 50 200 20 -11 -260 -0.5 121 67600 0.25 2860 5.5 130 50.798 -0.798 104.091 0.636804
3 50 300 10 -11 -160 -10.5 121 25600 110.25 1760 115.5 1680 48.148 1.852 165.1868 3.429904
4 70 400 30 9 -60 9.5 81 3600 90.25 -540 85.5 -570 64.848 5.152 14.80326 26.543104
5 65 500 20 4 40 -0.5 16 1600 0.25 160 -2 -20 62.198 2.802 1.434006 7.851204
6 65 600 20 4 140 -0.5 16 19600 0.25 560 -2 -70 65.998 -0.998 24.97501 0.996004
7 80 700 30 19 240 9.5 361 57600 90.25 4560 180.5 2280 76.248 3.752 232.4863 14.077504
8 75 600 25 14 140 4.5 196 19600 20.25 1960 63 630 69.223 5.777 67.60951 33.373729
9 55 500 30 -6 40 9.5 36 1600 90.25 -240 -57 380 68.648 -13.648 58.48426 186.2679
10 60 700 10 -1 240 -10.5 1 57600 110.25 -240 10.5 -2520 63.348 -3.348 5.510756 11.209104
σ 𝒀 = 𝟔𝟏𝟎 σ 𝒚𝟐 = 𝟏𝟑𝟗𝟎 σ 𝒚𝒙𝟏 = 𝟏𝟖, 𝟒𝟎𝟎 σ𝒚ෝ𝟐 = 𝟏𝟎𝟗𝟐. 𝟖𝟗
σ 𝑿𝟏 = 𝟒, 𝟔𝟎𝟎 σ 𝒙𝟐𝟏 = 𝟑𝟖𝟒, 𝟎𝟎𝟎 σ 𝒚𝒙𝟐 = 𝟔𝟐𝟎 σ 𝒆ො 𝟐 = 𝟐𝟖𝟒. 𝟔𝟖𝟔
σ 𝑿𝟐 = 𝟐𝟎𝟓 σ 𝒙𝟐𝟐 = 𝟔𝟐𝟐. 𝟓 σ 𝒙𝟏 𝒙𝟐 = 𝟓, 𝟕𝟎𝟎
22
Example Con’t . . .
Then, lets obtain the parameter estimates;
𝐘 𝐘𝐢𝐞𝐥𝐝 = 𝛃𝟎 + 𝛃𝟏 𝐟𝐞𝐫𝐭𝐢𝐥𝐢𝐳𝐞𝐫 + 𝛃𝟐 𝐫𝐚𝐢𝐧𝐟𝐚𝐥𝐥 + 𝐞
σ 𝒚𝒙𝟏 σ 𝒙𝟐𝟐 −σ 𝒚𝒙𝟐 σ 𝒙𝟏 𝒙𝟐 18,400∗622.5−620∗5,700 7,920,000
𝛽መ1 = σ 𝒙𝟐𝟏 σ 𝒙𝟐𝟐 −(σ 𝒙𝟏 𝒙𝟐 )𝟐
= = = 0.038
384,000∗622.5−5,7002 206,550,000
σ 𝒚𝒙𝟐 σ 𝒙𝟐𝟏 −σ 𝒚𝒙𝟏 σ 𝒙𝟏 𝒙𝟐 620∗384,000−18,400∗5,700 133,200,000
𝛽መ2 = σ 𝒙𝟐𝟏 σ 𝒙𝟐𝟐 −(σ 𝒙𝟏 𝒙𝟐 )𝟐
= = = 0.645
384,000∗622.5−5,7002 206,550,000
𝛽መ0 = 𝑌ത − 𝛽መ2 𝑋ത1 + 𝛽መ2 𝑋ത2 = 61 − 0.038 ∗ 460 + 0.645 ∗ 20.5 = 30.298
Then, the estimated regression function will be;
=𝛃
𝒀𝒊𝒆𝒍𝒅 𝟎 + 𝛃
𝟏 𝐟𝐞𝐫𝐭𝐢𝐥𝐢𝐳𝐞𝐫 + 𝛃
𝟐 𝐫𝐚𝐢𝐧𝐟𝐚𝐥𝐥
= 𝟑𝟎. 𝟐𝟗𝟖 + 𝟎. 𝟎𝟑𝟖𝐟𝐞𝐫𝐭𝐢𝐥𝐢𝐳𝐞𝐫 + 𝟎. 𝟔𝟒𝟓𝐫𝐚𝐢𝐧𝐟𝐚𝐥𝐥
𝒀𝒊𝒆𝒍𝒅
23
Example Con’t . . .
The procedures followed to undertake an individual significance test for the independent variables are outlined as follows;
1. Set up the hypothesis. The hypotheses for testing a given regression coefficient is given by:
𝑯𝟎 : 𝜷𝒊 = 𝟎
𝑯𝟏 : 𝜷𝒊 ≠ 𝟎
2. Determine the level of significance for carrying out the test. We usually use either 1% or 5% level significance in
applied econometric research.
3. Determine the tabulated value of t from the table with n-k degrees of freedom, where k is the number of parameters
estimated and n is sample size.
4. Determine the calculated value of t. The test statistic (using the t- test) is given by:
𝜷
𝒕𝒄𝒂𝒍 = 𝒔𝒆(𝜷𝒊 )
𝒊
The test rule or decision is given as follows:
• Reject H0 if 𝒕𝒄𝒂𝒍 ≥ 𝒕𝜶Τ𝟐, 𝒏−𝒌
• Do not reject H0 if 𝒕𝒄𝒂𝒍 < 𝒕𝜶Τ𝟐, 𝒏−𝒌
24
Example Con’t . . .
𝟏)
For fertilizer use (𝜷
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0
𝛼 = 5% = 0.05, 𝒕𝜶Τ𝟐, 𝒏−𝒌 = 𝒕𝟎.𝟎𝟐𝟓,𝟏𝟎−𝟑 = 𝒕𝟎.𝟎𝟐𝟓,𝟕 = 𝟐. 𝟑𝟔
𝟏
𝜷 𝟐 σ 𝒙𝟐𝟐
𝜹 σ 𝒆𝟐
𝒕𝒄𝒂𝒍 = 𝟏 =
, 𝒃𝒖𝒕 𝒔𝒆 𝜷 𝟏 =
𝒗𝒂𝒓 𝜷 =
; 𝒘𝒉𝒆𝒓𝒆 𝜹𝟐
𝒔𝒆(𝜷𝟏 ) 𝟐 𝟐
σ 𝒙𝟏 σ 𝒙𝟐 − (σ 𝒙𝟏 𝒙𝟐 )𝟐 𝒏−𝟑
σ 𝟐
𝒆 𝟐𝟖𝟒. 𝟔𝟖𝟔
𝟐 =
𝜹 = = 𝟒𝟎. 𝟔𝟕
𝒏−𝟑 𝟏𝟎 − 𝟑
𝟐 σ 𝒙𝟐𝟐
𝜹 𝟒𝟎. 𝟔𝟕 ∗ 𝟔𝟐𝟐. 𝟓
𝟏 =
𝒔𝒆 𝜷 = = 𝟎. 𝟎𝟏𝟏𝟏
σ 𝒙𝟐𝟏 σ 𝒙𝟐𝟐 − (σ 𝒙𝟏 𝒙𝟐 )𝟐 𝟑𝟖𝟒, 𝟎𝟎𝟎 ∗ 𝟔𝟐𝟐. 𝟓 − (𝟓, 𝟕𝟎𝟎)𝟐
𝟏
𝜷 𝟎. 𝟎𝟑𝟖
𝒕𝒄𝒂𝒍 = = = 𝟑. 𝟒𝟐𝟑
𝟏 ) 𝟎. 𝟎𝟏𝟏𝟏
𝒔𝒆(𝜷
Decision: Since calculated t (tcal=3.423) is greater than the tabulated value (ttab=2.365), we reject the null hypothesis and
conclude that the fertilizer use is statistically significant in determining the agricultural yield at 5% level of significance.
25
Example Con’t . . .
𝟐)
For rainfall obtained (𝜷
𝐻0 : 𝛽2 = 0
𝐻1 : 𝛽2 ≠ 0
𝛼 = 5% = 0.05
𝒕𝜶Τ𝟐, 𝒏−𝒌 = 𝒕𝟎.𝟎𝟐𝟓,𝟏𝟎−𝟑 = 𝒕𝟎.𝟎𝟐𝟓,𝟕 = 𝟐. 𝟑𝟔𝟓
𝟐
𝜷 𝟐 σ 𝒙𝟐
𝜹 𝟏 σ 𝒆𝟐
𝒕𝒄𝒂𝒍 = 𝟐
, 𝒃𝒖𝒕 𝒔𝒆 𝜷 = 𝟐
𝒗𝒂𝒓 𝜷 = ; 𝒘𝒉𝒆𝒓𝒆
𝜹 𝟐
=
𝟐)
𝒔𝒆(𝜷 σ 𝒙𝟐 𝟐 𝟐 𝒏−𝟑
𝟏 σ 𝒙𝟐 − (σ 𝒙𝟏 𝒙𝟐 )
σ 𝒆𝟐 𝟐𝟖𝟒. 𝟔𝟖𝟔
𝜹𝟐 = = = 𝟒𝟎. 𝟔𝟕
𝒏−𝟑 𝟏𝟎 − 𝟑
𝟐 σ 𝒙𝟐
𝜹 𝟏 𝟒𝟎. 𝟔𝟕 ∗ 𝟑𝟖𝟒, 𝟎𝟎𝟎
𝟐
𝒔𝒆 𝜷 = = = 𝟎. 𝟐𝟕𝟓
σ 𝒙𝟐
𝟏
𝟐
σ 𝒙𝟐 − ( σ 𝒙𝟏 𝒙𝟐 ) 𝟐 𝟑𝟖𝟒, 𝟎𝟎𝟎 ∗ 𝟔𝟐𝟐. 𝟓 − (𝟓, 𝟕𝟎𝟎)𝟐
𝟐
𝜷 𝟎. 𝟔𝟒𝟓
𝒕𝒄𝒂𝒍 = = = 𝟐. 𝟑𝟒𝟓
𝟐)
𝒔𝒆(𝜷 𝟎. 𝟐𝟕𝟓
Decision: Since calculated t (tcal=2.345) is less than the tabulated value (ttab=2.365),
we don’t reject the null hypothesis and conclude that the rainfall obtained is not
26 significant (insignificant) in determining the agricultural yield.
Example Con’t . . .
B. Examine the joint significance of each independent variable to affect
yields at 5% level of significance
𝐻0 : 𝛽1 = 𝛽2 = 0
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽𝑖 ≠ 0
𝛼 = 5% = 0.05
𝑭𝜶(𝒌−𝟏,𝒏−𝒌) = 𝑭𝟎.𝟎𝟓 (𝟑−𝟏, 𝟏𝟎−𝟑) = 𝑭𝟎.𝟎𝟓(𝟐, 𝟕) = 𝟒. 𝟕𝟑𝟕
𝑬𝑺𝑺 𝟏𝟎𝟗𝟐. 𝟖𝟗
( 𝒌 − 𝟏) ( 𝟑 − 𝟏) 𝟓𝟒𝟔. 𝟒𝟒𝟓
𝑭 − 𝒓𝒂𝒕𝒊𝒐 = 𝑭𝒄𝒂𝒍 = = = = 𝟏𝟑. 𝟒𝟑𝟔
𝑹𝑺𝑺 𝟐𝟖𝟒. 𝟔𝟖𝟔 𝟒𝟎. 𝟔𝟕
(𝒏 − 𝒌) (𝟏𝟎 − 𝟑)
Decision: Since calculated F (Fcal=13.436) is greater than the tabulated value
(Ftab=4.373), we reject the null hypothesis and conclude that fertilizer use and
rainfall obtained are jointly statistically significant in determining the agricultural
yield. In short, the model is statistically significant or adequate.
27
Example Con’t . . .
C. Calculate the coefficient of determination and interpret the result
𝑬𝑺𝑺
𝑹𝟐 =
𝑻𝑺𝑺
Where, ESS = Explained Sum of Square
TSS = Total Sum of Square
𝟐
𝑬𝑺𝑺 𝟏𝟎𝟗𝟐. 𝟖𝟗
𝑹 = = = 𝟎. 𝟕𝟖𝟔𝟑 = 𝟕𝟖. 𝟔𝟑%
𝑻𝑺𝑺 𝟏𝟑𝟗𝟎
Interpretation: 78.63% of the variation in the agricultural yield under consideration
is explained by the variation in fertilizer use and rainfall obtained; and the rest
21.37% remained unexplained. In other word, there may be other important
explanatory variables left out that could contribute to the variation in agricultural
yield, under consideration.
28
Example using STATA Output
. reg qtybeer pricebeer incomecons
Source SS df MS Number of obs = 30
F( 2, 27) = 40.57
Model 1343.37384 2 671.686918 Prob > F = 0.0000
Residual 447.04083 27 16.5570678 R-squared = 0.7503
Adj R-squared = 0.7318
Total 1790.41467 29 61.7384368 Root MSE = 4.069
qtybeer Coef. Std. Err. t P>|t| [95% Conf. Interval]
pricebeer -27.6527 5.438333 -5.08 0.000 -38.81124 -16.49416
incomecons .0025803 .0007689 3.36 0.002 .0010026 .0041581
_cons 57.15986 9.467852 6.04 0.000 37.73343 76.58629
𝒒𝒕𝒚𝒃𝒆𝒆𝒓 = 𝟓𝟕. 𝟏𝟓𝟔 − 𝟐𝟕. 𝟔𝟓𝟑 ∗ 𝒑𝒓𝒊𝒄𝒆𝒃𝒆𝒆𝒓 + 𝟎. 𝟎𝟎𝟐𝟔 ∗ 𝒊𝒏𝒄𝒐𝒎𝒆𝒄𝒐𝒏𝒔
For each one-point increase in price of beer, quantity demand of beer
decreases by 27.653 million liters.
For each one-point increase in consumer income, quantity demand of beer
increases by 0.0026 million liters.
29