SIMPLE LINEAR SRT605/
SRT666
REGRESSION
LINEAR REGRESSION
Is based on correlation and used when we want to
predict the value of a variable (DV; outcome variable)
based on the value of another variable (IV;
explanatory variable)
Example of RQ:
Can exam performance be predicted based on
revision time?
Can alcohol consumption be predicted based on
smoking duration?
SIMPLE LINEAR REGRESSION
Regression equation:
Y = βX + α
Regression coefficient ()
- the slope of the line is the coefficient
- measures the change in the average value of Y
(DV) for a unit change in X value (IV)
RESIDUALS SCATTERPLOTS
Generated as part of linear regression procedure
Residuals are the differences between the obtained
and the predicted DV scores
Residual = observed value – predicted value
Predicted value
Residual
Observed value
RESIDUALS SCATTERPLOTS
The residuals scatterplots allow you to check:-
➢ normality : the residuals should be normally
distributed about the predicted DV scores
Normal P-P Plot of Regression Standardized Residual
RESIDUALS SCATTERPLOTS
The residuals scatterplots allow you to check:-
➢ linearity : The residuals should have a straight-
line relationship with predicted DV scores
RESIDUALS SCATTERPLOTS
The residuals scatterplots allow you to check:-
➢ homoscedasticity : The variance of the residuals
around predicted DV scores should be the same
for all predicted scores
ASSUMPTIONS
Assumptions 1 : Two continuous variables
Assumption 2 : The relationship between the two
variables should be linear (scatterplot)
Assumption 3 : Independence of residuals (Durbin
Watson test)
ASSUMPTIONS
Assumption 4 : The residuals should be normally
distributed about the predicted DV scores
Assumption 5 : The data needs to show
homoscedasticity (i.e. equal/similar variances)
Assumption 6 : There should be no significant outliers
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Dataset : https://bit.ly/2E1xFWL
Assumptions 1 : Two continuous variables
Variable 1 (IV) :
Variable 2 (DV) :
Academic score = (free meals) +
Where,
Academic score is the dependent (response) variable
Free meals is the independent (explanatory) variable
is a constant
is the regression coefficient
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 2 : The relationship between the two
variables should be linear (scatterplot)
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 2 : The relationship between the two
variables should be linear (scatterplot)
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Generate
unstandardized
predicted
value (PRE) and
unstandardized
residuals (RES)
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
To investigate assumptions 3 (independence), 4
(normality), 5 (homoscedasticity), & 6 (outliers)
Analyze → Regression → Linear
Select DV & IV, click Statistics button and tick Durbin-
Watson & Casewise Diagnostics in Residuals box
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Click Plots button
➢ move *ZRESID into the Y box
➢ move *ZPRED into the X box
➢ In the Standardized Residual Plots, tick the Normal
probability plot and histogram options
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 3 : Independence of residuals
If there is no autocorrelation in the residuals, the
Durbin-Watson statistic should be between 1.5 and
2.5
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 4 : The residuals should be normally
distributed about the predicted DV scores
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 5 : The data needs to show
homoscedasticity (i.e. equal/similar variances)
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Assumption 6 : There should be no significant outliers
Outliers : cases that have a standardised residual of
more than 3.3 or less than –3.3
Casewise Diagnostics shows cases that have
standardised residual values above 3 or below –3
In a normally distributed sample, we would expect
only 1% of cases to fall outside this range
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
As all assumptions are met, we can proceed to SLR
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Coefficientsa
Model Unstandardized Standardized t Sig. 95.0% Confidence Interval
Coefficients Coefficients for B
B Std. Error Beta Lower Bound Upper
Bound
(Constant) 866.482 11.309 76.621 .000 844.231 888.733
1 free meal
-3.762 .149 -.819 -25.280 .000 -4.055 -3.469
given
a. Dependent Variable: academic performance
In the Coefficients table, check if the variable show a
Sig. value less than .05
A Sig. value < 0.05 indicates that the variable is
making a statistically significant contribution to the
equation
Academic score = -3.762(free meal) + 866.48
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Model Summary
Model R R Adjusted R Std. Error Change Statistics
Square Square of the R Square F df1 df2 Sig. F
Estimate Change Change Change
1 .819a .671 .670 64.299 .671 639.098 1 313 .000
a. Predictors: (Constant), free meal given
The correlation of determination (r2) tells you how
much of the variance in the DV (academic score) is
explained by the IV (free meal)
Obtained by squaring the r value
can only take on values from 0 to 1
RQ: CAN WE PREDICT ACADEMIC SCORE BASED
ON THE NUMBER OF FREE MEALS RECEIVED?
Model Summary
Model R R Adjusted R Std. Error Change Statistics
Square Square of the R Square F df1 df2 Sig. F
Estimate Change Change Change
1 .819a .671 .670 64.299 .671 639.098 1 313 .000
a. Predictors: (Constant), free meal given
When expressed as a percentage (multiply r2 by
100), the model explains 67.1% of the variance in
academic score
The closer the r2 is to 1, the greater the ability of the
model to predict a trend
Academic score = -3.762(free meal) + 866.48