Introduction to
Simple Linear
Regression
Medical Rehabilitation Programs
2025
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• Simple linear regression is a statistical method you can use to
understand the relationship between two variables, x and y.
• One variable, x, is known as the predictor variable.
• The other variable, y, is known as the response variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• For example, suppose we have the following dataset with the weight
and height of seven individuals:
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• Let weight be the predictor variable and let height be the response
variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• If we graph these two variables using a scatterplot, with weight
on the x-axis and height on the y-axis, here’s what it would look
like:
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• Suppose we’re interested in understanding the relationship between
weight and height.
• From the scatterplot we can clearly see that as weight increases,
height tends to increase as well, but to actually quantify this
relationship between weight and height, we need to use linear
regression.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• Using linear regression, we can find the line that best “fits” our data.
• This line is known as the least squares regression line and it can be
used to help us understand the relationships between weight and
height.
• Usually you would use software like Microsoft Excel, SPSS, or a graphing
calculator to actually find the equation for this line.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Introduction to Simple Linear
Regression
• The formula for the line of best fit is written as:
• ŷ = b0 + b1x
• where ŷ is the predicted value of the response variable, b0 is the y-
intercept, b1 is the regression coefficient, and x is the value of the
predictor variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Finding the “Line of Best Fit”
• For this example, we can simply plug our data into the Statology
Linear Regression Calculator and hit Calculate:
• The calculator automatically finds the least squares regression line:
• ŷ = 32.7830 + 0.2001x
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Finding the “Line of Best Fit”
• For this example, we can simply plug our data into the Statology Linear Regression
Calculator and hit Calculate or other statistical softwares:
• The calculator automatically finds the least squares regression line:
• ŷ = 32.7830 + 0.2001x
• If we zoom out on our scatterplot from earlier and added this line to the chart, here’s
what it would look like:
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Finding the “Line of Best Fit”
• Notice how our data points are scattered closely around this line.
• That’s because this least squares regression lines is the best fitting line
for our data out of all the possible lines we could draw.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
How to Interpret a Least
Squares Regression Line
• Here is how to interpret this least squares regression line: ŷ = 32.7830
+ 0.2001x
• b0 = 32.7830. This means when the predictor variable weight is zero
pounds, the predicted height is 32.7830 inches. Sometimes the value
for b0 can be useful to know, but in this specific example it doesn’t
actually make sense to interpret b0 since a person can’t weight zero
pounds.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
How to Interpret a Least
Squares Regression Line
• Here is how to interpret this least squares regression line: ŷ = 32.7830
+ 0.2001x
• b1 = 0.2001. This means that a one unit increase in x is associated with
a 0.2001 unit increase in y. In this case, a one pound increase in
weight is associated with a 0.2001 inch increase in height.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
How to Use the Least Squares
Regression Line
• Using this least squares regression line, we can answer questions like:
• For a person who weighs 170 pounds, how tall would we expect them
to be?
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
How to Use the Least Squares
Regression Line
• Caution: When using a regression equation to answer questions like
these, make sure you only use values for the predictor variable that
are within the range of the predictor variable in the original dataset
we used to generate the least squares regression line.
• For example, the weights in our dataset ranged from 140 lbs to 212
lbs, so it only makes sense to answer questions about predicted height
when the weight is between 140 lbs and 212 lbs.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
The Coefficient of Determination
• One way to measure how well the least squares regression line “fits”
the data is using the coefficient of determination, denoted as R2.
• The coefficient of determination is the proportion of the variance in
the response variable that can be explained by the predictor variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
The Coefficient of Determination
• The coefficient of determination can range from 0 to 1.
• A value of 0 indicates that the response variable cannot be explained
by the predictor variable at all.
• A value of 1 indicates that the response variable can be perfectly
explained without error by the predictor variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
The Coefficient of Determination
• An R2 between 0 and 1 indicates just how well the response variable
can be explained by the predictor variable.
• For example, an R2 of 0.2 indicates that 20% of the variance in the
response variable can be explained by the predictor variable; an R2 of
0.77 indicates that 77% of the variance in the response variable can
be explained by the predictor variable.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
The Coefficient of Determination
• Notice in our output from earlier we got an R2 of 0.9311, which
indicates that 93.11% of the variability in height can be explained by
the predictor variable of weight:
• This tells us that weight is a very good predictor of height.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
The Coefficient of Determination
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici
Assumptions of Linear
Regression
• 1. Linear relationship: There exists a linear relationship between the independent
variable, x, and the dependent variable, y.
• 2. Independence: The residuals are independent. In particular, there is no correlation
between consecutive residuals in time series data.
• 3. Homoscedasticity: The residuals have constant variance at every level of x.
• 4. Normality: The residuals of the model are normally distributed.
• If one or more of these assumptions are violated, then the results of our linear regression
may be unreliable or even misleading.
05/07/2025 Munyaradzi Paul Mapingure, Department of Global Public Heal
th and Family Medicine, Faculty of Health Sciences and Medici