CHAPTER TWO
THE CLASSICAL REGRESSION ANALYSIS
[The Simple Linear Regression Model]
2/4/2023 AAU; Prepared by: Hulunayen Y.
Terminology and Notation
2/4/2023 AAU; Prepared by: Hulunayen Y.
.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Dependent variable:
the variable that is influenced by the independent
variable(s).
For example, in a Multiple Linear Regression Model
(MLRM), output is influenced by independent variables
like fertilizers cost, labor cost, pesticides cost etc.
Independent variable:
a variable, whose values does not depend upon other
variable, but influences dependent variable.
Examples include, fertilizers cost, pesticides cost etc.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Regression Analysis
Economic theories are mainly concerned with the relationships
among various economic variables.
Example: demand theory, supply theory, consumption
theory, etc.
These relationships, when phrased in mathematical terms, can
predict the effect of one variable on another.
The functional relationships of these variables define the
dependence of one variable upon the other variable (s) in the
specific form.
The specific functional forms may be;
linear, quadratic, logarithmic, exponential, hyperbolic, or
any other form.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
The objective of linear regression analysis: is to estimate
and/or predict the mean or average value of the dependent
variable on the basis of the known or fixed values of the
explanatory variables.
That is to estimate the population regression function (PRF) on
the basis of sample regression function (SRF) as accurately as
possible.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.1 The concept of regression Analysis
Regression: is the most important tool to Econometricians.
Regression analysis:
It is concerned with the study of the dependence of
one variable on one or more other variables.
It is with a view to estimate and/or predict the
(population) mean or average value of the dependent
in terms of the known or fixed (in repeated sampling)
values of the latter.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Simple, or two-variable, regression analysis: if we are studying
the dependence of a variable on only a single explanatory
variable.
E.g. consumption expenditure on real income
Multiple regression analysis: if we are studying the dependence
of one variable on more than one explanatory variable.
Note: in two-variable regression there is only one explanatory variable,
whereas in multiple regressions there is more than one explanatory variable.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Simple Linear Regression:
Represented by single equation regression model
Y = f(x)
The dependent variable expressed as a function of
only a single explanatory variable
Causal relationship between variables flow in one
direction only.
Example:
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Multiple Linear Regression:
Dependent variable explained by more than one explanatory
variable.
Example; Y = f(X, Z, K, O)
• Regression equation of Y on X.
Variation in C = systematic variation + random variation.
Consumption = f(Income, Wage rate)
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Note: a frequent objective in research is the specification of a
functional relationship between two variables.
E.g. Y = f (x)
Y – Explained variable, or dependent variable, or predicted
variable.
X- Explanatory variable, or independent variable, control
variable, or regressor
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
In this chapter we shall consider a simple linear regression model.
i.e. a relationship between two variables related in a linear
form.
We shall first discuss two important forms of relation:
i. stochastic and
ii. non-stochastic,
Note: among which we shall be using the former in econometric
analysis.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.2. Stochastic and Non-stochastic Relationships
Econometricians say relationship between variables (X and Y) are
generally inexact (stochastic).
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
A. Non-Stochastic Model:
A relationship between X and Y, characterized as Y = f(X) is
said to be deterministic or non-stochastic if for each value of the
independent variable (X) there is one and only one
corresponding value of dependent variable (Y).
Example:
Without the error/or disturbance term (u), the relationship is said
to be exact/deterministic, otherwise stochastic or inexact.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
B. Stochastic /Inexact Relationship
A relationship between X and Y is said to be stochastic if for a
particular value of X there is a whole probabilistic distribution of
values of Y.
In such a case, for any given value of X, the dependent variable
Y assumes some specific value only with some probability.
Stochastic model: is a model in which the dependent variable is
not only determined by the explanatory variable but also others
variables which are not included in the model.
E.g
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Existence of the disturbance is justified in the following points:
Omission of other variables
Measurement error/data collection difficulties.
Randomness in human behavior /humans are not machines
that will do as instructed/
Imperfect specification of the model
Poor proxy variable
Note: In regression analysis we are concerned with a stochastic
or statistical relationship and not of a deterministic or non
stochastic or mathematical relationship.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Example: assume a supply function
The supply for a certain commodity depends on its price (other
determinants taken to be constant) and the function being linear,
the relationship can be put as:
For a particular value of P, there is only one corresponding value
of Q.
This is a deterministic (non-stochastic) relationship since for each
price there is always only one corresponding quantity supplied.
All the variation in Q is due solely to changes in P, and that there
are no other factors affecting
2/4/2023 the
AAU; Prepared by: dependent
Hulunayen Y. variable.
Cont…
If this were true all the points of price-quantity pairs, if plotted on
a two- dimensional plane, would fall on a straight line.
However, if we gather observations on the quantity actually
supplied in the market at various prices and we plot them on a
diagram we see that they do not fall on a straight line.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Note: The derivation of the observation from the line may be
attributed to several factors.
Omission of variables from the function
Random behavior of human beings
Imperfect specification of the mathematical form of
the model
Error of aggregation
Error of measurement
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
To take into account the above sources of errors we introduce in
econometric functions a random variable which is usually
denoted by the letter ‘u’ or ‘ℇ’
And is called error term or random disturbance or
stochastic term of the function,
So called be cause u is supposed to ‘disturb’ the exact linear
relationship which is assumed to exist between X and Y.
By introducing this random variable in the function the model is
rendered stochastic of the form:
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Thus a stochastic model is a model in which the dependent
variable is not only determined by the explanatory variable(s)
included in the model but also by others which are not included
in the model.
In order to take all these sources of error into account, we
introduce the stochastic/random disturbance term into our
econometric models and hence the complete simple econometric
model becomes:
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3. Simple Linear Regression model.
Economic theories are mainly concerned with the relationship
among varies economic variables.
The stochastic relationship with one explanatory variable is
called simple linear regression model.
A simple linear regression model:
It is a relationship between two variables related in a
linear form.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
The true relationship which connects the variables
involved is split into two parts:
i. a part represented by a line and
ii. a part represented by the random term ‘u’.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
The scatter of observations represents the true relationship
between Y and X.
The line represents the exact part of the relationship and the
deviation of the observation from the line represents the random
component of the relationship.
These points diverge from the regression line by U1 ,U2,…….Un.
The first component is the part of Y explained by the changes in X and
The second is the part of Y not explained by X, that is to say the
change in Y is due to the random influence of ui.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Definition of the simple linear regression model
Explains variable in terms of variable “
2/4/2023 AAU; Prepared by: Hulunayen Y.
.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3.1 Assumptions of the Classical Linear
Stochastic Regression Model.
The objective of a regression analysis is not only estimate the
unknown parameters, β‘s, (coefficients ).
Y = f(X) + U = Β0 + Β1 X + Ui
The classical made important assumption in their analysis of
regression.
A. Some assumptions are related to Y and X
B. Some assumptions are related to X and X
C. Some assumptions are related to U
The most important of these assumptions are discussed as
folllows.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 1:
A. The model is linear in parameters
The model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or
not.
This is because if the parameters are non-linear it is difficult to
estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Check yourself whether the following models satisfy the above
assumption or not.
Linearity in variables implies that an equation is linear model if
it is expressed in a straight line.
2/4/2023
The parameters are AAU; Prepared by: Hulunayen Y.
raised to their first degree.
Cont…
Note: Linear regression means linear in parameter but it
may not be linear in the explanatory variable.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 2:
B. Ui is a Random Real Variable
This means that the value which u may assume in any one
period depends on chance;
it may be positive,
negative or
zero.
Every value has a certain probability of being assumed by u in
any particular instance.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 3:
C. Zero Mean Value of the Error term
That is; given the value of X the mean or expected value
of the disturbance term is zero.
Technically, the conditional mean value of ε is zero.
Mathematically,
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 4:
D. The variance of the random variable(U) is constant in each
period: (The assumption of Homoscedasticity)
Equal variance of the error term. Given the value of X, the
variance of is the error term (u) the same for all observations.
The variation of each ℇi around all values of the explanatory value
is the same.
The dispersion of the disturbance is the same.
This constant variance is called homoscedasticity assumption and
The constant variance itself is called homoscedastic variance.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 5:
E. The Random Variable (U) has a Normal Distribution
This means the values of u (for each x) have a bell shaped
symmetrical distribution around their zero mean and constant
variance ,
Normality Test
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 6:
F. The random terms of different observations (Ui ,Uj)
are independent.
(The assumption of no autocorrelation)
This means the value which the random term assumed in one
period does not depend on the value which it assumed in any
other period.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 7:
G. The random variable (U) is independent of the explanatory variables.
There is no correlation between the random variable and the
explanatory variable.
If two variables are unrelated their covariance is zero.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Assumption 8:
H. The explanatory variables are measured without error
Y = f(X) + Ui
U absorbs the influence of omitted variables and possibly errors
of measurement in the y’s.
i.e., we will assume that the regressors are error free, while y
values may or may not include errors of measurement.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Additionally
The regression model is correctly specified.
There is no perfect multicollinearity (this holds in the case of
multiple linear regression model).
The number of observations ( n ) must be greater than the number
of parameters ( k ) to be estimated (in multiple linear regression
Assumption of dependant variable:
We have two assumptions of dependent variables:
o The dependent variable i Y is normally distributed and
o Successive values of the dependent variable are independent.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Example 1:
Let y = a+ bx: is a linear relationship between x and y.
y = a + b x2 : is a non linear relationship between x and y.
Example:
Variable Y Variable X a. Find the function.
b. Identify the slope and intercept
2 1 of a function.
c. Interpret the slope and intercept
4 2 of a function
6 3
8 4
First find the intercept and slope of a function
2/4/2023
Write the mathematical AAU;relationship between
Prepared by: Hulunayen Y. x and y
Example 2: you are given a data on saving and income of five
households as follows:
i Y = saving X = income a. write the function
1 200 500 b. What do observe
c. Is the slope the same
2 100 300 d. Can we solve the above
3 600 1000 equations using math's?
4 700 800
5 400 450
The slope varies. But we need to establish a linear r/hip between x and y.
The relationship is not exact.
So math’s failed to do so.
But econometrics can make it. How?
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3.2. Methods of Estimation
Specifying the model and stating its underlying assumptions are
the first stage of any econometric application.
The next step is the estimation of the numerical values of the
parameters of economic relationships.
The parameters of the simple linear regression model can be
estimated by various methods.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
Three of the most commonly used methods are:
Method of moments (MM)
Ordinary least square method (OLS)
Maximum likelihood method (MLM)
But, here we will deal with the MM and the OLS methods of estimation.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3.2.1 Method of Moment
Method of estimation: uses the rule of keep equating population moments to
their sample counterpart until you have estimated all the population
parameters.
In general, the method of moments, sometimes called MM for short, estimates
population moments by the corresponding sample moments.
In order to apply this method to regression models, we must use the facts that
population moments are expectations, andf that regression models are
specified in terms of the conditional expectations o the error terms.
Look at the following simple linear regression model:
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
The assumptions we have made about the error term εi imply
that
In the method of moments, we replace these conditions by their
sample counterparts.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont…
.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Substituting the value of β0 from the above equation we get:
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3.2.2. The Ordinary Least Squares
(OLS Method)
The model Yi X i i is called the true relationship between
Y and X because Y and X represent their perspective population
value, and α and β are called the true parameters since they
are estimated from the population value of Y and X.
But it is difficult to obtain the population value of Y and X
because of technical or economic reasons.
So we are forced to take the sample value of Y and X.
The parameters estimated from the sample value of Y and X are
called the estimators of the true parameters α and β and are
symbolized as
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
Estimation of by least square method (OLS) or
classical least square (CLS) involves finding values for the
estimates and which will minimize the sum of
square of the squared residuals (e 2 ).
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
Meaning, the residuals should be small.
Therefore, when assessing the fit of a line, the vertical distances of
the points from the line are the only distances that matter.
The OLS method calculates the best-fitting line for a dataset by
minimizing the sum of the squares of the vertical deviations from
each data point to the line (the Residual Sum of Squares, RSS)
Minimize RSS =
we will use differential calculus
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
Why the sum of the squared residuals?
Why not just minimize the sum of the residuals?
To prevent negative residuals from cancelling positive
ones.
If we use , all the error terms ei would receive equal
importance no matter how closely/widely scattered the
individual observations are from SRF.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
2/4/2023 AAU; Prepared by: Hulunayen Y.
Rearranging we will get:
Divide both sides by “n”:
2/4/2023 AAU; Prepared by: Hulunayen Y.
Rewriting:
Rearranging the above equation we obtain:
Substituting the values of α we get:
2/4/2023 AAU; Prepared by: Hulunayen Y.
Rewritten in somewhat different way as follows;
2/4/2023 AAU; Prepared by: Hulunayen Y.
Estimation of a function with zero
intercept
2/4/2023 AAU; Prepared by: Hulunayen Y.
.
Substituting and rearranging:
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.3.2.3. Statistical Properties of Least Square
Estimators
There are various econometric methods with which
we may obtain the estimates of the parameters of
economic relationships.
We would like to an estimate to be as close as the
value of the true population parameters i.e. to vary
within only a small range around the true
parameter.
How are we to choose among the different
econometric methods, the one that gives ‘good’
estimates?
We need some criteria for judging the ‘goodness’ of
an estimate.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
‘Closeness’ of the estimate to the population
parameter is measured by the mean and variance
or standard deviation of the sampling distribution
of the estimates of the different econometric
methods.
We assume the usual process of repeated
sampling i.e. we assume that we get a very large
number of samples each of size ‘n’; we compute
the estimates β ’s from each sample, and for each
econometric method and we form their
distribution.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
We next compare the mean (expected value) and
the variances of these distributions and we choose
among the alternative estimates the one whose
distribution is concentrated as close as possible
around the population parameter.
2/4/2023 AAU; Prepared by: Hulunayen Y.
.
According to the Gauss-Markov theorem, the OLS estimators possess all the
BLUE properties. That is:
2/4/2023 AAU; Prepared by: Hulunayen Y.
.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2.2.2.4. Statistical test of Significance
of the OLS Estimators (First Order
tests)
After the estimation of the parameters and the determination of
the least square regression line, we need to know how ‘good’ is
the fit of this line to the sample observation of Y and X, that is to
say we need to measure the dispersion of observations around
the regression line.
This knowledge is essential because the closer the observation to
the line, the better the goodness of fit, i.e. the better is the
explanation of the variations of Y by the changes in the
explanatory variables.
2/4/2023 AAU; Prepared by: Hulunayen Y.
We divide the available criteria into three groups: the
theoretical a priori criteria, the statistical criteria, and the
econometric criteria. Under this section, our focus is on
statistical criteria (first order tests). The two most
commonly used first order tests in econometric analysis are:
i. The coefficient of determination (the square of the
correlation coefficient i.e. R2 ). This test is used for judging
the explanatory power of the independent variable(s).
ii. The standard error tests of the estimators. This test is used
for judging the statistical reliability of the estimates of the
regression coefficients.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Cont.…
A. TESTS OF THE ‘GOODNESS OF FIT’ WITH R2
R2: shows the percentage of total variation of the dependent
variable that can be explained by the changes in the explanatory
variable(s) included in the model.
To elaborate this let’s draw a horizontal line corresponding to the
mean value of the dependent variable
By fitting the line we try to obtain the
explanation of the variation of the dependent variable Y produced
by the changes of the explanatory variable X.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Comparing the above with the formula of the correlation
coefficient:
Comparing both, we see exactly the expressions. Therefore:
2/4/2023 AAU; Prepared by: Hulunayen Y.
RSS=TSS-ESS. Hence R2 becomes;
2/4/2023 AAU; Prepared by: Hulunayen Y.
B. TESTING THE SIGNIFICANCE OF OLS
PARAMETERS
2/4/2023 AAU; Prepared by: Hulunayen Y.
MOST COMMON TESTING METHODS
All of these testing procedures reach on the same
conclusion.
Let us now see these testing methods one by one.
2/4/2023 AAU; Prepared by: Hulunayen Y.
A. Standard error test
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
B. Student’s t-test
We can derive the t-value of the OLS estimates
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
C. Confidence interval
Rejection of the null hypothesis doesn’t mean that our sample
estimate α and β is the correct estimate of the true population
parameter α and β.
It simply means that our estimate comes from a sample drawn
from a population whose parameter β is different from zero.
In order to define how close the estimate to the true
parameter, we must construct confidence interval for the true
parameter.
In other words, we must establish limiting values around the
estimate with in which the true parameter is expected to lie
within a certain “degree of confidence”.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Substituting, we obtain the following expressions.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Example
Example
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
Reporting the Results of Regression Analysis
2/4/2023 AAU; Prepared by: Hulunayen Y.
Review Questions
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.
2/4/2023 AAU; Prepared by: Hulunayen Y.