Topic 3 - Simple Regression Analysis
Topic 3 - Simple Regression Analysis
3.1 INTRODUCTION
Just like correlation, regression analysis is concerned with the study of the dependence of one
variable (the explained/dependent variable) on one or more other variables (the
explanatory/independent variable(s) with a view to estimating and/ or predicting the average value
of the dependent variable. The main advantage of regression is that it can easily be extended to
more than two variables.
To illustrate, consider the scatter graph below which shows the relationship between two variables
x and y and a line fitted among the scatter points.
y . +u
. -u
. . .
x
From the scatter graph, we notice that despite variability between X and Y, there is a general
“tendency” for the variables to move together- i.e., as X increases, so does Y; as shown by the line
of best fit. The fitted linen among the scatter points is actually the regression line. Thus regression
analysis aims at finding the line of best fit among variables. The regression line is thus as:
Y X , such that:
Y≡ dependent variable = intercept
1
the line while others are below the line. Only a few are above the line while others are below the
line. Only a few are on the line.
such that u is called an error term or the disturbance term. The function Y X u is
called a stochastic or statistical function. For points above the regression line, u is positive
while for points below the regression line, u is negative. A variable such as u which can take on
any set of values, positive or negative, with a given probability, is called a random variable or a
stochastic variable. Thus the error term implies that not all points will lie on the lines. It actually
represents the variations in Y than cannot be explained by X.
The reason why we include an error term in a regression model is due to:
A point to note about regression analysis is that any regression analysis is that any regression
must be guided a prior by economic theory.
Thus, although the function Y X u assumes causation, i.e. that X and Y but this
causation must be informed by economic theory. Thus, if we say:
A distinction is always made between correlation and regression. While correlation analysis
aims at finding the strength of linear association between variables, regression analysis, on the
other hand, aims at finding the direction of relationship between variables. There are thus two
(2) primary differences between correlation and regression analysis, as outlined below:
2
CORRELATION REGRESSION
1) Assumes symmetry between X and Y Assumes asymmetry between X and Y;
i.e., there is no distinction as to which i.e., distinguishes which variable is
variable is dependent (causality is not dependent and which is explanatory
important) (causality is important)
Thus, correlation does not imply causality, but regression does so.
In simple regression analysis, we study the effect of only one explanatory variable on the
dependent variable. For example, how X affects Y in: Y X u . Thus, Y = f(X)
For this reason, simple regression analysis is also known as two-variate regression analysis.,
or bivariate regression
In multiple variable analysis, we study the effect of more than one explanatory variable on the
dependent variable. For example, how X 1, X2 and X3 will affect Y in:
Y X 1 2 X 2 3 X 3 u . Thus Y= f(X1, X2, …………, Xn).
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 ………………………………………………………………………….. (3.1)
Where β0 and β1 are unknown but fixed parameters known as the regression coefficients. They are
also known as intercept and slope coefficients, respectively. In regression analysis our interest is
3
in estimating the values of unknowns β0 and β1 on the basis of observations on Y and X. ɛi is an
unobservable random variable known as the stochastic disturbance or stochastic error term. The
stochastic term ɛi captures other variables besides X that affect Y and are not included in the model.
As indicated, the disturbance term ɛi is a surrogate for all those variables that are omitted from the
model but that collectively affect Y. The obvious question is: Why not introduce these variables
into the model explicitly? I.e. why not develop a multiple regression model with as many variables
as possible? The reasons are many.
1. Vagueness of theory: The theory, if any, determining the behaviour of Y may be, and often
is, incomplete. We might know for certain that X influences Y, but we might be ignorant or
unsure about the other variables affecting Y. therefore, ɛi may be used as a substitute for all the
excluded or omitted variables from the model.
2. Unavailability of data: Even if we know what some of the excluded variables are and
therefore consider a multiple regression rather than a simple regression, we may not have
quantitative information about these variables. It is a common experience in empirical analysis,
that the data we would ideally like to have often are not available. For example, in principle
we could introduce family wealth as an explanatory variable in addition to the income variable
to explain family consumption expenditure. But unfortunately, information on family wealth
generally is not available. Therefore, we may be forced to omit the wealth variable from our
model despite its great theoretical relevance in explaining consumption expenditure.
3. Core Variables versus Peripheral Variables: Assume in our consumption income example
that besides income X1, the number of children per family X2, sex X3, religion X4, education
X5, and geographical region X6 also affect consumption expenditure. But it is quite possible
that the joint influence of all or some of these variables may be so small and at best
nonsystematic or random that as a practical matter and for cost considerations it does not pay
to introduce them into the model explicitly. One hopes that their combined effect can be treated
as a random variable ɛi.
4
individual Y’s that cannot be explained no matter how hard we try. The disturbances, the ɛ’s,
may very well reflect this intrinsic randomness.
5. Poor Proxy Variables: Although the classical regression model: Assumes that the variables
Y and X are measured accurately, in practice the data may be plagued by error of measurement.
Consider, for example, Milton Friedman’s well-known theory of the consumption function. He
regards permanent consumption (Yp) as a function of permanent income (Xp). But since data
on these variables are not directly observable, in a practice we use proxy variables, such as
current consumption (Y) and current income (X), which can be observable. Since the observed
Y and X may not equal Yp and Xp, there is the problem of errors of measurement. The
disturbance term u may in this case then also represent the errors of measurement. As we will
see in a later chapter, if there are such errors of measurement, they can have serious
implications for estimating the regression coefficients, the β’s.
5
For all these reasons, the stochastic disturbances ɛi assume an extremely critical role in regression
analysis, which we will see as we progress.
Equation 3.1 describes a Population Regression Function (PRF). However, in most practical
situations what we have is but a sample of Y values corresponding to some fixed X’s. Therefore,
our task now is to estimate the PRF on the basis of the sample information. That is, our primary
objective in regression analysis is to estimate the PRF (equation 3.1) on the basis of the Sample
Regression Function (SRF) because more often than not our analysis is based upon a single sample
from some population. The SRF is given as follows: -
̂0 + 𝛽
𝑌𝑖 = 𝛽 ̂1 𝑋𝑖 + 𝑒̂………………………………………………………………..………….
𝑖 (3.2)
̂0 and 𝛽
Where 𝛽 ̂1 are estimators of 𝛽0 and 𝛽1 respectively. Because of sampling fluctuations our
estimate of the PRF based on the SRF is at best an approximate one. The critical question now is:
given that the SRF is but an approximation of the PRF, can we devise a rule or a method that will
make this approximation as “close” as possible? In other words, how should the SRF be
̂0 is as “close” as possible to the true 𝛽0 and 𝛽
constructed so that 𝛽 ̂1 is as “close” as possible to the
true 𝛽1 even though we will never know the true 𝛽0 and 𝛽1 ? To that end, we must not only specify
the functional form of the model, but also make certain assumptions about the manner in which Y i
are generated. To see why this requirement is needed, look at the PRF: Yi = β1 + β2Xi + ɛi. It shows
that Yi depends on both Xi and ɛi. Therefore, unless we are specific about how Xi and ɛi are created
or generated, there is no way we can make any statistical inference about the Y i and also about β0
and β1. Thus, the assumptions made about the Xi variable(s) and the error term are extremely
critical to the valid interpretation of the regression estimates.
The Classical Linear Regression Model (CLRM), which is the cornerstone of most econometric
theory, makes 10 assumptions. We first discuss these assumptions in the context of the two
variables regression model; and in Chapter 4 we extend them to multiple regression models.
1. Linear Regression Model: The regression model is linear in the parameters, correctly
specified, and has an additive error term (the regress and Y and the regressor X themselves
may be nonlinear). The model parameters being linear means the regression coefficients don’t
enter the function being estimated as exponents (although the variables can have exponents).
6
2. X values are fixed in repeated sampling: Values taken by the regressor X are considered
fixed in repeated samples. More technically, X is assumed to be non-stochastic. “Fixed values
in repeated sampling, can be explained using an example. Let Y be consumption expenditure
and X weekly income. We can keep the value of income X fixed, say, at level $ 80, we draw
at random a family and observe its weekly family consumption expenditure Y as, say $ 60.
Still keeping X at $ 80, we draw at random another family and observe its Y value as $ 7. In
each of these drawings (i.e., repeated sampling), the value of X is fixed at $ 80. We can repeat
this process for all the X values we want. What all this means is that our regression analysis is
conditional regression analysis, that is, conditional on the given values of the regressor(s) X.
3. Zero mean value of disturbance ɛi: Given the value of X, the mean, or expected, value of the
random disturbance term ɛi is zero. Technically, the conditional mean value of ɛi is zero.
Symbolically, we have
𝐸 (𝜀𝑖|𝑋𝑖 ) = 0
This assumption implies that the average or mean value of these deviations corresponding to
any given X should be zero. That is, the factors not explicitly included in the model, and
therefore subsumed in ɛi, do not systematically affect the mean value of Y.
= 𝜎2
5. The error terms are uncorrelated with each other: No autocorrelation or serial correlation.
Given any two X values, Xi and Xj (i_= j), the correlation between any two ɛi and ɛj (i_= j) is
zero.
6. Disturbance ɛ and explanatory variable X are uncorrelated: X and ɛ (which represents the
influence of all the omitted variables) have separate (and additive) influence on Y. but if X and
ɛ are correlated, it is not possible to assess their individual effects on Y. Thus, if X and ɛ are
7
positively correlated, X increases when ɛ increases and it decreases when ɛ decreases.
Similarly, if X and ɛ are negatively correlated, X increases when ɛ decreases and it decreases
when ɛ increases. In either case, it is difficult to isolate the influence of X and ɛ on Y.
7. The number of observations n must be greater than the number of parameters to be
estimated: Alternatively, the number of observations n must be greater than the number of
explanatory variables.
8. The regression model is correctly specified: Alternatively, there is no specification bias or
error in the model used in empirical analysis.
9. There is no perfect multicollinearity: That is, there are no perfect linear relationships among
the explanatory variables.
10. The error term is normally distributed (optional assumption for hypothesis testing) with
zero mean and constant variance. Symbolically, 𝜀~𝑁(0, 𝜎 2 ).
11. The values for the independent variables are derived from a random sample of the population,
and they contain variability.
The ordinary squares (OLS) estimators are the main techniques used to estimate regression models.
The name OLS is derived from the fact that OLS aims at minimizing the sum of squared residuals.
In so doing, OLS finds the values of the model parameters ( 0 and 1 ) that fits a line of best fit.
The ordinary least squares estimators ( 0 and 1 ) are derived using the following eight steps:
STEP 1: Begin with a sample and population regression lines and obtain the residual, as a
difference of the two:
8
ei Yi 0 1 X 1
ei (Yi 0 1 X 1 ) 2
2
2
ei Yi 0 1 X 1
2
2
The result, i.e. ei is called the SUM OF SQUARED RESIDUALS (RSS)
STEP3: Obtain the partial derivatives of the sum of squared residuals with respect to ( 0 and 1
) as follows:
ei
2
2 Yi 0 1 X 1 . 1
0
2 Yi 0 1 X 1
2 Yi 2 0 2 1 X 1 , , , , but,2 0
2 Yi 2n 0 2 1 X 1..............................(1)
ei
2
2 Yi 0 1 X 1 . X 1
1
2 X 1 . (Yi 0 1 X 1 )
2 Yi X i 2 0 X i 2 1 X i .................................(2)
2
STEP 4: The first –order necessary condition for maxima/minima requires that each partial
derivative is equal to zero. Thus we shall equate equation 1 and equation 2 to zero:
From Equation 1:
9
2 Yi 2n 0 2 1 X i 0
2n 0 2 1 X i 2 Yi
n 0 1 X i Yi ...............................................(3)
From Equatio0n 2:
2 Yi X i 2 0 X i 2 1 X i 0
2
2 0 X i 2 1 X i 2 Yi X i
2
0 X i 1 X i Yi X i ..........................................(4)
2
The two resulting equation 3 and 4 are the famous normal equations
STEP 5: Check that the normal equations will maximize or minimize the residual sum of squares
2
ei
To do so, we use the second –order conditions on equations 1 and 2 as follows.
e
2
2
i
2n 0(min ima)
0
2
2 ei
2
2 X i (min ima)
2
1
Since the second-order conditions are positive definite, it means that and will indeed minimize
the residual sum of squares.
STEP 6: Express the two normal equations from equation 3 and equation 4 into matrix format as
follows,
n Y
n 0 1 X i Yi In matrix form
i 0 i
X
i 1 i i
2
YX
X i X
0 X i 1 X i Yi X i
2
10
STEP7: solve for 1 using the method of Crammer’s Rule :( replace column).
n Y i
1
X Y X i i i
n Yi X i Yi . X i
n X i X i
2 2
n X i
X X
2
i i
Thus, in a simpler way, the OLS estimator 1 is given by the formula:
n Yi X i Yi . X i
1
n X i X i
2 2
_ _
Step 8: Obtain the intercept parameter 0 is given by the formula: 0 Y 1 X
Where: Y
_
Y and X
_
X i.e. the mean values.
n n
Recall the example (on the sales and profit of ABC Company limited) provided under Correlation
Analysis:
The data table is reproduced as under for convenience: we got the following:
Using the information, we can now obtain the OLS estimators 0 and 1 respectively as
follows
1
n Yi X i Yi . X i
10 6340 55 90
n X i X i 10 38500 550 2
2 2
11
1
xy
x2
63,400 49,500 13,900
1390
1 0.1685 , Similarly 1
82,500 82,500 8250
1 0.1685
Y 90 9
_ 0 9 0.1685 55
_ _
Y
For 0 Y 1 X but
n 10 0 9 9.2667 0.2667
X
X 550 55
n 10 0 0.2667
Thus the OLS regression Equation is Y 0.2667 0.1685 X
From the OLS regression equation, we can also predict or forecast the value of Y for any given
value of X.
For example, given that X= 150, we can now predict Y as follows:
Y 0.2667 0.1685 X
Y 0.2667 0.1685(150)
Y 25
Apart from prediction or forecasting, we can equally calculate the elasticity of profit with respect
to sales using either point elasticity or arc elasticity as follows:
For point elasticity, the elasticity of profit with respect to sales is.
12
p s
e p ,s .
s p
Thus, since Pr ofit 0.2667 0.1685( sales) , then at the mean values of profit and
sales ,we shall obtain :
_
p S
e p ,s ._
s P
0.1685 , S P 90 9
p _
S 550 _
Where: 55 P
s n 10 n 10
55
Thus, e p ,s 0.1685 1.0297 , e p ,s 1.0297
9
Interpretation: A 1% increase in the value of sales, will lead to a 1.0297 %increase in profit,
ceteris paribus. Thus, sales and profit are relatively elastic.
For Arc elasticity, the elasticity of profit with respect to sale is:
p S1 S 2
e p ,s
s P1 P2
.
40 60
Arc elasticity = e p , s 0.1685 . 1.0327
6.4733 9.8433
e p ,s 1.0327
Finally, we can also obtain Y (i.e. The estimated value of profits), the residual ( ei ) and the
e
2
squared residuals ( i ) or RSS as follows:
13
Time X Y 2
Y 0.2667 0.1685 X ei Y Y ei
e i
0.008 e i
2
=9.806
e i E ei
_
e i
0.008
0.0008
n 10
Actually, the expected value or mean of the residual or error term should be zero. In this case,
it is not exactly zero due to the rounding off.
Thus E( ei )= Zero (0)
The value 9.806 is called the sum of squared residuals, i.e. ei 9.806( RSS)
2
14
ASSUMPTIONS OF THE ORIDINARY LEAST SQUARES
i. The expected value or the mean of the error term is Zero E ei
e i
0
n
This concept was introduced in the table provided above
When the variance of the error term is constant, this is the assumption of
Homoscedasticity; otherwise, if the variance is not constant, that is a case of
Heteroscedasticity, which is actually a violation of the OLS assumption of
homoscedasticity. Therefore, the error term should be homoscedastic. The problem of
Heteroscedasticity is common in cross-sectional data.
The error term is assumed to follow a normal distribution with a mean of zero and a
variance of 2 , ei N 0, 2
iv. There is a linear relationship between the dependent variable and the independent
variables: Y X e
Thus, the relationship between X and Y is linear in the OLS parameters and
v. Assumption of no Multicollinearity
corr X 1 , X 2 0 .
15
vi. Assumption of zero correlation between the independent variable and the error
term ; i.e., The error term and the independent variable should not be correlated
Cov ( X i , ei ) 0
The error term in period i and the error term in period j should not be corrected. Thus,
there should be no autocorrelation, otherwise known as Serial correlation.
Cov(ei , e j ) Eei e j 0 For all i j .
An outlier is a value that is very large or very small, in relation to the rest of the other
observations
2
The variance of the error term, i.e. VarU i Is given by:
u
2
2 2 i
VarUi
n2
N/B n 2 is called the degree of freedom (df) such that we minus 2 since the regression
model we obtained had 2 OLS estimators and .
e 9.806
2 2
U i Is the sum of squared residuals (RSS) which we found earlier i
u
2
2
9.806 9.806
1.22575
2 i
Thus VarUi
n2 10 2 8
16
THE STANDARD ERROR OF THE REGRESSION MODEL
The standard error of the regression model ( se ) is obtained by taking the square root of the
U
2
variance of the error term. i.e. se var ui i
n2
Hence se 1.22575 1.10714
N/B. The standard error of the regression model is actually the standard deviation of the Y values
about the mean of Y. Thus, it is also the standard deviation of Y.
by: se
x 2
From our example, we notice that 1.10714 x 2 8,250
Thus,
1.10714 1.10714
se 0.01219
8,250 90.8295
se 0.01219
X
2
se i
.x
n x 2
17
that 1.10714 , X i 38,500 , n 10 x 8,250
2 2
From our example, we notice
Thus,
se
38,500 1.10714
10 8,250
se 0.68313 1.10714
se 0.75632
OLS.estimator
t
s tan dard.error .of .OLS.estimator
t 13.8228
This is actually, the calculated t-statistics for
On the other hand, the t-value for the intercept parameter can be obtained in a similar way as
follows.
0.2667
t
se 0.75632
t 0.35258
18
THE COMPLETE REGRESSION MODEL
Hence we can now present the complete regression model for ABC company where we regressed
profit (Y) on sales (X) as follows:
Pr ofit 0.2667 0.1685(sales)
Se (0.75632) (0.01219)
t-values 0.35258 13.8228 R 2 0.9598
More formally, these results can be presented in a table of regression results as follows:
Profit Coefficient Std Errors t-value
R 2 0.9598
2
THE ADJUSTED, R
2
i. We cannot compare the r computed from models which have different dependent
variables.
2
Thus, any re-arrangement of the model will yield different values for r
19
2
ii. The values of r usually tends to increase as the numbers of independent variables
2
increase in the model. with this, r loses its usefulness since we cannot tell
whether it is measuring the goodness of fit or the number of independent variables
iii. r 2 Also cannot discriminate among models, i.e. It cannot tell us which particular
model to choose among 2 or more models.
2
Due to the above limitations of r , an alternative measure of goodness of fit ,known
_
2
as adjusted r or commonly r 2 has been developed to help overcome these limitation
2
of the simple r .
2
The adjusted r is modified or adjusted so as to accommodate the changes in degrees
of freedom that results due to addition or removal of some independent variables in
a regression model.
2
The formula for Adjusted R is:
n 1 2
1 R
_
R 2 1
nk
10 1
1 0.9598
_
R 2 1
10 2
9
R 2 1 .0.0402
_
8
R 2 1 0.04521
_
_
R 2 0.9548
Interpretation:
Holding all other factors constant, sales (X) explains or accounts for 95.48% of changes in
_
2
profit (Y), when adjusted for degrees of freedom. Always R 2 < R
20
PROPERTY OF THE ORDINARY LEAST SQUARES ESTIMATORS
An OLS estimator, such as is said to be BLUE - i.e. best linear unbiased estimator, if it has the
following properties:
i. LINEAR
The dependent variable Y should be linear in the parameters ( ) as shown below;
Y X ei
ii. UNBIASEDNESS
The average or expected value of is denoted by E ( ) is equal to its true value . Thus
E ( ) = or also E ( ) - =0. In such a case, we say is an unbiased estimator of
Similarly E ( ) = , i.e., is an unbiased estimator of . To demonstrate that is an
21
a i 1 ai 0 ai 1 X i ai ei
1 ai 0 ai 1 ai X i ei ai
Assumptions:
ai 1 E 1 , ai 0 , and
a X i i
1
Thus, E ( 1 ) 0 0 , E ( 1 ) 1 . Thus is an unbiased estimator of 1
Next, we now want to demonstrate that E ( )
_ _
We start from the formula that Y X
Y _
i
wiYi X
n
or
Y _
wi X Yii
n
1 _
wi X Yi
n
1 _
Let wi X hi be a constant .Therefore hiYi
n
1 _
Since wi X Yi and that Yi X i ei , we substitute Yi into as
n
1
n wi X X i ei
_
follows:
22
1
_
1 _
1 _
_
wi X i wi X X i wi X wi X
n n n
_
X _
1 _
wi X i wi X i X i wi X ei
n n n
w X
_
1
_ _
Xi i
wi X X i wi X i ei
n
i
n n
_
..............wi X i 0, , ,
X i
_
wi X X i 0, , , ei 0, ,
n
E( ) And also, E( )
Thus ( ) is an unbiased estimator of
In general, therefore, the OLS estimators are unbiased estimators of their actual or
population values.
method.
econometric method
23
We first define ( * ) as * = wi
2
The variance for * is defined as Var * wi .ei
2
Var * wi .E ei
2
2
2
Var * 2 wi wi .E ei
2
2
Var * 2 2 wi wi .E ei
2
Thus
Var * 2 wi 2
2
1
Var * 2 2 wi 2 , , , letting wi
2 2
xi
2
Var * Var 2 i
2
From the above equation we can note that Var Var *
Thus the OLS estimator ( ) has minimum variance when compared to the variance of another
estimator ( * ) obtained from another econometric method. Thus is an efficient estimator.
24
In summary, the Gauss-Markov Theorem states as follows: “Given the assumptions of the
classical linear regression model, the ordinary least squares (OLS) By goodness estimators
,in the class of unbiased linear estimators, have minimum variance i.e. They are BLUE.
GOODNESS OF FIT
By goodness of fit, we mean: “How well does the sample regression line fit the data?” The
2
goodness of fit, otherwise known as coefficient of determination, is denoted by r .
2
The value of r ranges from 0 to 1, i.e. from no goodness of fit to a perfect goodness of fit.
Therefore: 0 r 1.
2
2
The following steps illustrate derivation of r
Step1: Begin with an OLS regression model: Yi X i ei
Recall that the sample regression line is: Y X i . Thus: Yi Yi ei
2 2
_
_
Step 3: Square both sides and take summations Yi Y Y Y ei
2
e
2
i =Residual sum of squares (RSS)
25
Therefore: TSS=ESS+RSS
ESS 2
Now, the ratio is called the goodness of fit ( r ). Therefore:
TSS
ESS y 2 ei
2
RSS
r
2
..or ..r 2 1 . 1
TSS y 2
TSS y2
xy 2
x2
Another formula for r is: r
2 2
..or..r
2 2
2
x . y 2 2
y
Recall
e
2
9.806
r 2
1 i
.. 1
y 2
244
r 2 0.9598
xy 2
1390 2
Or, r 2
0.9598 .or 95.98%
x . y 2 2
8250 244
x2 8250
Or, r 0.1685 2 0.9598 .or.95.98%
2 2
2
y 244
Confidence interval estimation aims at constructing an interval around the OLS estimators. The
26
t .se t .se . 1
2 2
Where:
- is the estimated OLS estimator for
- t Is the critical t value for a two tailed test at n-k degrees of freedom.
2
- se is the standard error of the slope coefficient ( )
t .se t .se
2 2
1
2 2
In the diagram above, the shaded part is the rejection region, while the un-shaded part is the
acceptance region.
The following table shows the appropriate critical t values at various levels of significance and
at one-tail and two-tail tests:
27
=5% 0.05 0.0025 1.645 1.960 95%
For example, in order to calculate a 95% confidence interval for at a two-tailed test is
obtained as follows.
=0.1685; n=10; k=2 ; n-k= (10-2)=8 degrees of freedom ; and se 0.1219 and 1-
Thus:
t ,8df .se t ,8df .se 1
2 2
t 0.025 ,8df .se t 0.025 ,8df .se 95%
HYPOTHESIS TESTING
By hypothesis is testing, we mean: “can our regression results be trusted?” or also, “Do our
regression estimates matter?”
28
- The alternative hypothesis
The null hypothesis is the hypothesis of interest. It is usually denoted by Ho. For example, to test
whether the slope coefficient is significant, we state: Ho: =0.
The alternate hypothesis is the hypothesis that is tested against the hypothesis of interest, i.e. the
null hypothesis. The alternate hypothesis is denoted is denoted by H 1 or HA. For example, the
alternate hypothesis to test whether the slope coefficient is significant, we state as follows:
- H1: 0 for the case of a two-tailed test
- H1: >0 or H1: <0 for the case of a one-tail test.
Point to note
The hypothesis Ho: =0.means as follows:
- The slope coefficient is equal to zero, or
- The slope coefficient is not statistically significant, or
- X does not influence Y
The hypothesis H1: 0 means as follows:
- The slope coefficient is different from zero,
- The slope coefficient is statistically significant
- X does influence Y
In hypothesis testing, there are 2 possible types of errors that can be committed, i.e. Type I error
and type II error.
Type I error occurs when we reject the null hypothesis, when in actual sense, it should not
have been rejected; i.e. “killing an innocent man”
Type II error occurs when we do not reject (accept) the null hypothesis when in actual
sense, it should have been rejected, i.e. “letting a guilty man away Scott-free”
The aim of hypothesis testing is to reduce the chances of committing both type I and type II errors.
This is the reason why in hypothesis testing, we specify the level of significance ( =1% or 5%
or 10% ).
There are 3 common approaches used in hypothesis testing:
29
1. The confidence interval approach
2. The test of significance approach
3. The probability-value (P-Value) approach
The decision rule for hypothesis testing using the confidence interval approach states as follows:
“If the OLS parameter of interest under the Null hypothesis falls within the constructed confidence
interval, we do not reject the Null hypothesis. However, if it falls outside the confidence interval,
then we reject the Null hypothesis.”
This decision rule is demonstrated as under:
30
For the second set of hypothesis, we notice that the value 0.16 actually lies within the
confidence interval, i.e. it lies in the acceptance region. Thus, we accept or (do not reject) the null
hypothesis.
In conclusion, it means that is statistically equal to 0.16 or that is not statistically different
from 0.16.
The test of significance (t-test) approach is the most commonly used to test for hypothesis testing
in econometrics. In this approach, which is similar in spirit to the confidence interval approach,
the null and alternate hypotheses are stated respectively as:
Ho : *
HA : *
Such that: Ho : * is the true value of the estimated OLS coefficient and,
H A : * Is a hypothesized or guessed value of
*
The general formula for the t-test is as follows: t calculated
se
Where: se is the standard error of the OLS parameter
If *, then t-calculated will be positive
If *, then t-calculated will be negative
Irrespective of the value of t-calculated, we always take its absolute value.
Having obtained t-calculated, we then proceed to obtain the critical value for the t-statics, i.e. t-
critical, from the t-tables.
31
The critical t is obtained as follows:
The decision Rule for hypothesis testing using the test of significance approach states as follows:
“If t-calculated is greater than t-critical, reject the Null hypothesis, but if t-calculated is less than
t-critical, do not reject (accept) the null hypothesis.”
For example, we can now test for the following hypothesis using the t-test approach; assuming a
level of significance 5%
Ho : 0 Ho : 0.16
i.
ii
HA : 0 H A : 0.16
Where: 0.1685 , 0 , and se. = 0.01219, thus,
0.1685 0
t calculated 13.8228
0.01219
Then t-critical= t , n k , where 5% , 2.5% 0.025 , n=10, k=2, n-
2 2
k=8df. Thus, t-critical = t 0.025 ,8df 2.306
Thus, according to our decision rule, we reject the Null hypothesis but do not reject (accept) the
alternative hypothesis.
32
In conclusion, we can therefore say that is not equal to zero, or we could say, is statistically
different from zero.
For the second set of hypothesis, we can obtain t-calculated as follows:
* 0.1685 0.16
t calculated 0.6973
se 0.01219
The value for t-critical will remain the same t t-critical=2.306. Upon comparing t-calculated and
t-critical, we notice that: t calculated t critical . Thus, following the decision rule, we
do not reject (accept) the null hypothesis. In conclusion, we can therefore say that is statistically
equal to 0.16.
NOTE: The conclusions from the confidence interval approach actually resemble the conclusions
from the test of significance approach and this must always be so. Indeed, the confidence interval
approach is simply a mirror image of the test of significance approach.
The probability (P) value approach is also an ideal way for testing hypothesis. The P-value states
the smallest level of significance ( ) for which the null hypothesis can be rejected.
The beauty with P-value approach is that most computer software (Excel, SPSS, STATA, E-views,
SHAZAM, RATS, etc) automatically provide this P-value whenever you run a regression.
For example, if the software reports a P-Value of 0.07, it means there is a 7% chance that we can
reject the Null hypothesis. Thus, we can reject the Null hypothesis at 10% , but we cannot
reject the null hypothesis at 5%or 1%
33
P-value Details Is coefficient significant at
1% 5% 10%
P=0.074 No No Yes
is significant at 7.4%
P=0.1025 No No No
is significant at 10.25%
In summary, the smaller the P-Value, the more significant is .
y 2 u 2 y 2
By dividing the sum of squares (SS) by their associated degrees of freedom (df), we get the mean
sum of squares (MSS). The ANOVA table therefore shows the sum of squares (SS), degrees of
freedom (df), mean sum of squares (MSS) and source of variation.
34
Source of variation Sum of squares (SS) df Mean sum of squares (MSS)
y or x
2 2 2
ESS 2 x 2
MSSreg
df k 1
u 2
RSS u 2
MSSres
df nk
The F statistic follows the F distribution with (k-1) degrees of freedom on the numerator and (n-
k) degrees of freedom on the denominator. The F statistic is used to test for overall significance of
the model.
If F-calculated >F-critical, the model is statistically significant
If F-calculated <F-critical, the model is not statistically significant.
Example
Recall the example of the sales (X) and profit (Y) of ABC Company limited for a period of 10
years. The following values were obtained:
35
Notice that 234.236+9.806 = (244)
Critical F= Fk 1 , n k , F1 , 8 , 5% 5.32
N.B: The F ratio is always a one-tailed test. We notice that calculated-F is greater than critical-F,
i.e. (Fcal>Fcrit)
36
Summary
If a model contains only one explanatory variable, then it is called a simple regression model.
When there are more than one independent variables, then it is called a multiple regression
model. When there is only one study variable, the regression is termed as univariate regression.
When there are more than one study variables, the regression is termed as multivariate
regression. Note that the simple and multiple regressions are not same as univariate and
multivariate regressions. The simple and multiple regression are determined by the number of
explanatory variables, whereas univariate and multivariate regressions are determined by the
number of study variables.
Regression uses the historical relationship between an independent and a dependent variable to
predict the future values of the dependent variable. Businesses use regression to predict such things
as future sales, stock prices, currency exchange rates and productivity gains resulting from a
training program, etc.
37