lOMoARcPSD|22500047
Stat Prob 11 Q4 Mod3 Regression Analysis v4
BS Mathematics (University of Mindanao)
Studocu is not sponsored or endorsed by any college or university
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
SENIOR HIGH SCHOOL
STATISTICS AND
PROBABILITY
Quarter 4 – Module 3
Regression Analysis
Source image: surveygiz.com
Prepared by: SIXIE ROZZ O. PENASO
SHS Teacher III
Department of Education • Republic of the Philippines
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Statistics and Probability for Senior High School
Alternative Delivery Mode
Quarter 4 – Module 3: Regression Analysis
First Edition, 2019
Republic Act 8293, section 176 states that “No copyright shall subsist in any work of
the Government of the Philippines. However, prior approval of the government agency
or office wherein the work is created shall be necessary for exploitation of such work
for profit. Such agency or office may, among other things, impose as a condition the
payment of royalties.”
Borrowed materials (i.e., songs, stories, poems, pictures, photos, brand names,
trademarks, etc.) included in this book are owned by their respective copyright holders.
Every effort has been exerted to locate and seek permission to use these materials
from their respective copyright owners. The publisher and authors do not represent
nor claim ownership over them.
Published by the Department of Education – Division of Misamis Oriental
Division Superintendent: Dr. Jonathan S. Dela Peña, CESO V
Development Team of the Module
Author/s: Ariel A. Tarucan
Editor: Glenn C. Aradilla Milger A. Baang, PhD
Reviewer/s: Flordeliz D. Laput
Illustrator:
Layout Artist:
Management Team:
Chairperson: Jonathan S. Dela Peña, PhD, CESO V
Schools Division Superintendent
Co-Chairpersons: Nimfa R. Lago, PhD, CESO VI
Assistant Schools Division Superintendent
Members:
Erlinda G. Dael, PhD, CES - CID
Lindo M. Cayadong, PhD, EPS-Science & Mathematics
Celieto B. Magsayo, EPS- LRMS Manager
Loucille M. Paclar, Librarian II
Kim Eric G. Lubguban, PDO II
Printed in the Philippines by
Department of Education – Division of Misamis Oriental
Office Address: Del Pilar corner Velez Street, Brgy. 29, Cagayan de Oro City, 9000
Telephone Nos.: (088) 881-3094: Text: 0917-8992245 (Globe)
Email: misamis.oriental@deped.gov.ph
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
SENIOR HIGH SCHOOL
STATISTICS AND
PROBABILITY
Quarter 4 – Module 3
Regression Analysis
This instructional material is collaboratively developed and reviewed by
educators from public and private schools, colleges, and/or universities. We
encourage teachers and other education stakeholders to email their feedback,
comments, and recommendation to the Department of Education at
action@deped.gov.ph
We value your feedback and recommendations.
FAIR USE AND CONTENT DISCLAIMER: This SLM (Self Learning Module)
is for educational purposes only. Borrowed materials (i.e., songs, stories,
poems, pictures, photos, brand names, trademarks, etc.) included in these
modules are owned by their respective copyright holders. The publisher and
authors do not represent nor claim ownership over them. Sincerest
appreciation to those who have made significant contributions to these
modules.
Department of Education • Republic of the Philippines
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
TABLE OF CONTENTS
Cover Page i
Copyright Page ii
Title Page iii
Table of Contents iv
Introduction v
Lesson 1: Calculation and Interpretation of the Slope &
y-intercept of the Regression Line 1
What you need to know 1
What I know 1
What’s In 2
What’s New (Equation of a Regression Line) 2
What is it 3
What’s more 5
What I have learned 5
What can I do 5
Assessment 6
Answer key 7
Lesson 2: Solving Problems Involving Regression Analysis 8
What you need to know 8
What I know 8
What’s In 9
What’s New (Activity) 9
What is it (Solution) 11
What’s more 11
What I have learned 12
What can I do 12
Assessment 13
Answer key 14
References 15
Writer’s Profile 16
iii
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
iii
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Lesson 1 Calculation and Interpretation of the Slope and Y-
intercept of a Regression Line
What you need to know
This module will assist you in understanding the slope and y-intercept of a
regression line, which will be composed of the lesson below:
• Calculation and interpretation of the slope and y-intercept of a regression
line
You are expected to learn...
After going through this module, you are expected to calculate and interpret the
slope and y-intercept of a regression line (M11/12SP-IVi-3 & M11/12SP-IVi-4).
How to learn from this module...
To achieve the objectives of this module, you need to read its contents
comprehensively and follow the instructions provided in every activity.
What I Know
Direction: Read the following questions carefully and choose the letter of your answer.
You may use a separate sheet of paper.
1. In the equation 𝑦 ′ = 3 + 4𝑥 , what is the slope?
a. 3 b. 𝑦′ c. 4 d. 4𝑥
2. In the equation 𝑦 ′ = 10 − 2𝑥 , what is the value of the y-intercept?
a. 𝑦′ b. −2 c. −10 d. 10
3. Which of the following scenarios could give you a meaningful regression analysis?
a. There is no linear relationship between the variables
b. The value of 𝑟 is not significant
c. There is a strong negative linear relationship between the variables
d. Correlation will be done after the regression analysis
4. In a regression line, how do you call the magnitude of the change in one variable
when the other variable changes at a unit?
a. Unit change c. marginal change
b. Variable change d. regression change
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
5. If the equation of the regression line is 𝑦 ′ = 5 + .123𝑥, how can it be interpreted?
a. Every unit of change in the value of 𝑥, the value of 𝑦 also changes at 5 units
on average
b. Every unit of change in the value of 𝑦, the value of 𝑥 also changes at 5 units
on average
c. Every unit of change in the value of 𝑥, the value of 𝑦 also changes at .123 unit
on average
d. The slope of the line is .123
What’s In
In the previous lessons, you learned that when we study variable relationships, we
first need to collect our data and use correlation to determine if linear relationships exist.
The most commonly used is the Pearson Correlation Coefficient 𝒓. If ever we find that
a relationship is existing between the variables, we then need to test if such relationship
is significant. And if it is tested to be significant, we can proceed to determining the
equation of the regression line.
What’s New
The regression line is also called as the line of best fit. Its significance is in
enabling us to interpret data trends and help us in making predictions based on that data,
the latter which is to be discussed further in the next lesson.
Again, please take note that in doing regression, you first need to consider the
following assumptions:
a. There exists a relationship between the variables; and
b. The relationship is tested to be significant.
The stated conditions are necessary to be first met, otherwise doing a regression
analysis would be totally pointless.
A scatterplot is one way of illustrating a line of best fit. The figure below shows a
scatterplot of a data of two variables. Notice that several lines can be drawn on the graph
near the points. With this, you should be able to draw the line of best fit. Best fit means
that the sum of the squares of the vertical distances from each point to the line is at a
minimum.
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
The Equation of a Regression Line
Going back in our algebra concepts, an equation of a line is given by 𝑦 = 𝑚𝑥 + 𝑏,
where 𝑚 stands for the slope and 𝑏 for the y-intercept. Similarly, an equation of a
regression line is given by 𝑦 ′ = 𝑎 + 𝑏𝑥, where 𝑏 is the slope and 𝑎 is the y-intercept.
Furthermore, the corresponding formulas for the y-intercept 𝑎 and the slope 𝑏 are
as follows:
(∑ 𝑦)(∑ 𝑥 2 )−(∑ 𝑥)(∑ 𝑥𝑦)
𝑎= 2
𝑛(∑ 𝑥 2 )−(∑ 𝑥)
𝑛(∑ 𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
𝑏= 2
𝑛(∑ 𝑥 2 )−(∑ 𝑥)
where 𝑛 is the number of data pairs
The rounding rule for both 𝑎 and 𝑏 is up to three decimal places.
What is it
Activity 1
Given the data below, find the equation of the regression line and provide an
interpretation of the results.
No. of Study Hours Final Grade in
Student
(𝑥) Math(𝑦)
A 2 79
B 3 83
C 5 85
D 9 88
E 11 89
F 15 93
Solution
Before we can successfully proceed to solving for the equation of the regression
line, we need to solve first for the necessary summations. As such, a completed table like
the one shown below would be of great help.
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
No. of Study Final Grade in
Student 𝒙𝒚 𝒙𝟐
Hours (𝒙) Math(𝒚)
A 2 79 158 4
B 3 83 249 9
C 5 85 425 25
D 9 88 792 81
E 11 89 979 121
F 15 93 1395 225
45 517 3998 465
The values needed for solving the equation are as follows:
𝑛 = 6, since there are six pairs of data
∑ 𝑥 = 45
∑ 𝑦 = 517
∑ 𝑥𝑦 = 3998
∑ 𝑥 2 = 465
Solving for the y-intercept𝑎, we get
(∑ 𝑦)(∑ 𝑥 2) − (∑ 𝑥)(∑ 𝑥𝑦) (517)(465) − (45)(3998) 240405 − 179910
𝑎= 2 = =
𝑛(∑ 𝑥 2 ) − (∑ 𝑥) 6(465) − 452 2790 − 2025
60495
= = 79.078
765
Solving for the slope𝑏, we also get
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦) 6(3998) − (45)(517) 23988 − 23265 723
𝑏= 2 = = =
𝑛(∑ 𝑥 2 ) − (∑ 𝑥) 6(465) − 452 2790 − 2025 765
= .945
Hence, the equation of the regression line 𝑦 ′ = 𝑎 + 𝑏𝑥 is𝑦 ′ = 79.078 + .945𝑥
where the slope is .945 and the y-intercept is 79.078. The y-intercept is the value you get
when 𝑥 = 0. That is, it is the value at some point where the line intersects the y-axis.
Interpretation
Marginal change is the magnitude of the change in one variable when the other
variable changes exactly one unit. In the problem, the value of the slope 𝑏, which is .945,
is the marginal change. This means that for every change in the value of 𝑥, which is the
number of study hours, the value of 𝑦 which is the grade also changes at .945unit on the
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
average. Similarly, the value of the y–intercept 𝑎 is 79.078. This means that the grade of
a student would be 79.078 if he/she has zero hours of study.
What’s more
Given the data below, find the equation of the regression line and provide an
interpretation of the results.
No. of Days Absent Score in 50 point
Student
(𝑥) Math Quiz (𝑦)
A 1 47
B 2 40
C 3 35
D 4 27
E 5 15
What I have learned
Answer the following in your answer sheet.
1. What are the two things you should do before you start finding the equation of
the regression line?
2. What are the assumptions in conducting a regression?
3. If the value of the Pearson coefficient 𝑟 is found to be insignificant, what would
be the expected result of the regression analysis?
4. What is the function of the slope 𝑏 in a regression line?
What can I do
Activity. Please follow the instructions below.
1. Think of any pair of data (𝑥 𝑎𝑛𝑑 𝑦) that may appeal to you (e.g., age and
number of sleep hours, etc.).
2. Conduct an interview to at least five (5) persons in your household by
recording their respective responses to your chosen data.
3. Present the results in tabular form and find the corresponding equation of
the regression line.
4. Provide an interpretation of the results.
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Assessment
Direction: Read the following questions carefully and choose the letter of
your answer. You may use a separate sheet of paper.
1. In the equation 𝑦 ′ = 5 + 6𝑥 , what is the slope?
a. 5 c. 6𝑥
b. 𝑦′ d. 6
′
2. In the equation 𝑦 = 12 − 6𝑥 , what is the value of the y-intercept?
a. 𝑦′ c. −12
b. 12 d. −6
3. Which of the following scenarios could give you a meaningful regression
analysis?
a. There is no linear relationship between the variables
b. Correlation will be done after the regression analysis
c. There is a strong negative linear relationship between the variables
d. The value of 𝑟 is not significant
4. In a regression line, how do you call the magnitude of the change in one
variable when the other variable changes at a unit?
a. Unit change c. marginal change
b. Variable change d. regression change
′
5. If the equation of the regression line is 𝑦 = 6 + .234𝑥, how can it be
interpreted?
a. Every unit of change in the value of 𝑦, the value of 𝑥 also changes at 6
units on average
b. The slope of the line is .234
c. Every unit of change in the value of 𝑥, the value of 𝑦 also changes at
.234 unit on average
d. Every unit of change in the value of 𝑥, the value of 𝑦 also changes at 6
units on average
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
ANSWER KEY
What I know
What’s more
What I have learned
Assessment
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Lesson 2 Solving Problems Involving
Regression Analysis
What you need to know
This module will assist you in understanding the concept of regression analysis,
which will be composed of the following lessons:
- Prediction of the value of the dependent variable given the value of the
independent variable; and
- Solving problems involving regression analysis
What you are expected to learn...
After going through this module, you are expected to be able to:
1. Predict the value of the dependent variable given the value of the independent
variable (M11/12SP-IVj-1 ); and
2. Solve problems involving regression analysis (M11/12SP-IVj-2 ).
How to learn from this module...
To achieve the objectives of this module, you need to read its contents
comprehensively and follow the instructions provided in every activity accordingly.
What I Know
Direction: Read the following questions carefully and choose the letter of your
answer. You may use a separate sheet of paper.
1. What is the value of the equation 𝑦 = 3 + 4𝑥 if the value of 𝑥 is twice the slope?
a. 7 b. 11 c. 19 d. 35
′
2. If the linear regression equation is 𝑦 = 103 − 1.7𝑥, what would be the value of 𝑦′
when 𝑥 = 10?
a. 101.3 b. 86 c. 93 d. 106
3. Which of the following scenarios could not possibly give us an acceptable prediction
using the equation of regression?
a. Predicting the future company revenue based on past sales
b. Predicting the crop yield of a farm depending on rainfall days
c. Predicting the number of hospital patients based on the season
d. Predicting the age of a student based on his/her grade
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
4. In a study involving number of student tardy days and their corresponding quiz
scores, the resulting regression equation is 𝑦 = 43 + .756𝑥. What would be the
corresponding score if a student never committed any tardy day?
a. 41 b. 43 c. 42 d. 44
5. In a study involving the number of assists in a basketball game (independent) and
the total points (dependent), find the total points of a game when the number of
assists is 30. Use 𝑦′ = 2.693 + 1.962𝑥.
a. 61 b. 63 c. 62 d. 64
What’s In
In the previous lesson, you have learned how to identify the dependent and
independent variables of a data set. You also learned that the equation of a regression
line is in the form 𝑦 ′ = 𝑎 + 𝑏𝑥, where 𝑏 is the slope and 𝑎 is the y-intercept. Similarly, you
learned how to interpret a regression line equation.
What’s New
Today, you will be learning on how to use the equation of a regression line to make
predictions on the value of the dependent variable. That’s right! You heard it properly –
prediction, or shall I say estimation of a value of a dependent variable in which the value
of the independent variable is not present in your data given the circumstances that you
have found.
To give you an idea on how to do such prediction (or estimation), let me start by
showing you a sample problem.
Activity 1
Below is a sample data about the top achieving students of a school given their
number of study hours (𝑥) and their score in the math final exam (𝑦). Find the equation
of the regression line and predict the value of the dependent variable if the value of the
independent one is 14.
Student No. of Study Hours Score (out of 100)
A 5 83
B 7 87
C 8 89
D 11 93
E 13 96
Before we proceed with our initial computation, we must remember that in making
regression analysis, the data must be correlated and that the correlation must be
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
significant. For the sake of this discussion let us just have the assumption that such
requirements have been met.
Now, like what we did in the previous module, we first need to solve for the
necessary values in finding the slope 𝑎 and the y-intercept 𝑏. Hence, we should come up
with the following:
No. of
Score out
Student Study xy x^2
of 100 (y)
Hours (x)
A 5 83 415 25
B 7 87 609 49
C 8 89 712 64
D 11 93 1023 121
E 13 96 1248 169
44 448 4007 428
The values needed for solving the equation are as follows:
𝑛 = 5, since there are five pairs of data
∑ 𝑥 = 44
∑ 𝑦 = 448
∑ 𝑥𝑦 = 4007
∑ 𝑥 2 = 428
Solving for the y-intercept 𝑎, we get
(∑ 𝑦)(∑ 𝑥 2) − (∑ 𝑥)(∑ 𝑥𝑦) (448)(428) − (44)(4007) 191744 − 176308
𝑎= 2 = =
𝑛(∑ 𝑥 2 ) − (∑ 𝑥) 5(428) − 442 2140 − 1936
15436
= = 75.667
204
Solving for the slope 𝑏, we also get
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦) 5(4007) − (44)(448) 20035 − 19712 323
𝑏= 2 = = =
𝑛(∑ 𝑥 2 ) − (∑ 𝑥) 5(428) − 442 2140 − 1936 204
= 1.583
Hence, the equation of the regression line 𝑦 ′ = 𝑎 + 𝑏𝑥 is 𝑦 ′ = 75.667 + 1.583𝑥
where the slope is 1.583 and the y-intercept is 75.667.
Interpretation
In the regression line equation, our slope 𝑏 is 1.583 which means that for every
change in the value of 𝑥, which is the number of study hours, the value of 𝑦 which is the
10
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
score also changes at 1.583 unit on the average. Similarly, the value of the y–intercept 𝑎
is 75.667. This means that the score of a student would be 75.667 if he/she has zero
hours of study.
What is it
Now, since our main objective is to predict the value of 𝑦 when the value of 𝑥 is
14, we will now use our newfound equation. We will replace 𝑥 with 14.
𝑦 ′ = 75.667 + 1.583𝑥
′
𝑦 = 75.667 + 1.583(14)
𝑦 ′ = 75.667 + 22.162
𝑦 ′ = 97.829
Hence, if a student’s study hours is 14, his/her expected score in the math exam
would be 97.829.
PLEASE TAKE NOTE:
When using a regression line, you can only apply the interpretations of the slope
and y-intercept over the range of x values. It is dangerous to make predictions or
statements beyond the scope of what you observed in the data set.
In our example, we found that when a student studies for about 14 hours he/she
would have a score of 97.829. But should we use that same equation to predict their
scores when the number of study hours are already very large, say 100? Definitely not.
What’s more
The data below shows the ages of students 𝑥 in a certain school, and the
corresponding number of them having smartphones 𝑦. Find the equation of the regression
line and predict the number of students with smartphones with the age of 20. Consider
the variables to be correlated and that the correlation is significant.
No. of Students
Age
with Smartphones
(x)
(y)
13 19
14 32
16 37
17 45
19 49
11
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
What I have learned
Answer the following in your answer sheet.
1. Why it is dangerous to make predictions beyond the scope of what you have
observed in your data set?
2. Give at least two examples of situations wherein prediction of the value of the
dependent variable using the equation of the regression line would be
meaningless.
What can I do
There has been a study saying that the speed of a vehicle before it met an
accident can be estimated by measuring the distance of the skid marks it has created
during full braking. Consider the table below.
MPH Braking Distance (ft)
20 20
30 45
40 81
50 133
60 205
80 411
Assume MPH is going to be used to predict stopping distance.
1. Find the regression equation.
2. Interpret the slope and the y-intercept of the equation.
3. Find the braking distance when MPH=45.
4. Find the braking distance when MPH=100.
5. What would you say if we proceed on predicting beyond the data values?
12
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Assessment
Direction: Read the following questions carefully and choose the letter of
your answer. You may use a separate sheet of paper.
1. What is the value of the equation 𝑦 = 4 + 5𝑥 if the value of 𝑥 is twice the slope?
a. 7 c. 19
b. 11 d. 54
2. If the linear regression equation is 𝑦 ′ = 103 − 1.7𝑥, what would be the value
of 𝑦′ when 𝑥 = 20?
a. 101.3 c. 93
b. 86 d. 99.6
3. Which of the following scenarios could not possibly give us an acceptable
prediction using the equation of regression?
a. Predicting the future company revenue based on past sales
b. Predicting the crop yield of a farm depending on rainfall days
c. Predicting the number of hospital patients based on the season
d. Predicting the age of a student based on his/her grade
4. In a study involving number of student tardy days and their corresponding quiz
scores, the resulting regression equation is 𝑦 = 34 + .756𝑥. What would be
the corresponding score if a student never committed any tardy day?
a. 41 c. 34
b. 43 d. 44
5. In a study involving the number of assists in a basketball game (independent)
and the total points (dependent), find the total points of a game when the
number of assists is 30. Use 𝑦′ = 2.693 + 1.962𝑥.
a. 61 c. 62
b. 63 d. 64
13
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Key Answers
Pretest Enrichment Activity
Post test
14
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Reference
Bluman, Alan. (2012) Elementary Statistics: A Step by Step Approach. New York:
McGraw Hill
15
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)
lOMoARcPSD|22500047
Writer’s Profile
Name: ARIEL A. TARUCAN
Position: Teacher III
Educational Attainment:
Master of Science in Teaching Mathematics (CAR)
Bachelor of Secondary Education Major in Mathematics
Module Title: Module 2 – Regression Analysis
Division: Misamis Oriental
School: Talisayan National High School – Senior High School
District: Talisayan
For inquires or feedback, please write or call:
Department of Education – Division of Misamis Oriental
Office Address: Del Pilar corner Velez Street, Brgy. 29,
Cagayan de Oro City, 9000
Telephone Nos.: (088) 881-3094: Text: 0917-8992245
(Globe)
Email: misamis.oriental@deped.gov.ph
16
Downloaded by Ellah Iracielli Teves (hallesevet@gmail.com)