Basic ideas of linear regression –
two variable model
G&P Chapter 2
Outcomes
• At the end of this presentation you should be able to:
o Define/describe key terms and concepts such as linear regression model,
dependent variable, independent variable, error term, residual term, regression
coefficients, OLS estimators.
o List the goals of regression analysis.
o Distinguish between regression and causality.
o Distinguish between the deterministic and stochastic components of a regression
function, and to explain the reason for the inclusion of the error term.
o Explain the nature of the stochastic error term.
o Explain how the sample regression function estimates the population regression
function.
o Distinguish between linear variables and coefficients.
o Interpret the results of a regression model.
What is regression analysis?
• The study of the relationship between a dependent (explained)
variable and other independent (explanatory) variables.
o Test an empirical relationship
o Based on an underlying theory
• Note that regression does not imply causation.
• Objectives of regression analysis
o Estimate the mean of a dependent variable
o Hypothesis testing
o Forecast/ predict
The population regression function
Population regression function
•The PRL: E Y X i B1 B2 X i
oNote that the conditional mean values (conditional expected values) of Y E Y X i
is a function of Xi.
oThus, the regression of Y on X is the mean of the distribution of Y values for the
given X.
•B1 and B2 parameters or regression coefficients
•B1 is the intercept:
oThe conditional mean value of Y if X is zero.
•B2 is the slope:
oThe rate of change in the (conditional) mean value of Y per unit change in X.
Stochastic specification of PRF
• Deterministic component of the PRF [ B1 B2 X]i does not completely
explain Yi
o Eg individual expenditure in relation to income.
• Add an error term to the model – the stochastic component.
The nature of the stochastic error term:
o Capture influence of omitted variables
o Random human behaviour
o Errors of measurement
o Simplicity: "do more with less"
The sample regression function
• We estimate the PRF using sample information, but there will
be sampling error.
• From different sample regression lines, which one represents the true
PRL?
The sample regression function
• You want to estimate the
PRF
• On the basis of the SRF
• Note that for a given Xi, the
estimator and residual
approximates the true
value
Note: “linear” regression
• Linearity in variables:
o The rate of change in the dependent variable remains constant for a unit
change in the explanatory variable
• Linearity in parameters: Bs only raised to power of 1
Estimating the parameters – OLS
• OLS: Ordinary Least Squares
• The PRF is
• We estimate the PRF from the SRF
and can re-write it as
• OLS chooses b1 and b2 in such a way that the residual sum of squares is as small as possible:
Minimise ei Yi b1 b2 X i
2 2
• Formulas:
Putting it all together
• OLS gives the following sample regression: Ŷ = 432.41 + 0.0013X
o The slope coefficient means that if family income goes up by a dollar, the
mean maths score goes up by about 0.0013 points.
o The intercept means that if income were zero, the mean maths score will be
about 432.41.
• Section 2.10 gives more examples of regression analysis.