KEMBAR78
Module9-Correlation and Regression (Business) | PDF | Linear Regression | Spearman's Rank Correlation Coefficient
0% found this document useful (0 votes)
174 views15 pages

Module9-Correlation and Regression (Business)

This document discusses correlation and regression. It defines correlation as a test used to determine the relationship between paired variables. Correlation coefficients range from -1 to 1, where -1 is a perfect negative correlation, 1 is a perfect positive correlation, and 0 is no correlation. The document also discusses calculating correlation using Pearson's r and Spearman's rho, as well as interpreting the results. Linear regression is introduced as a way to model the relationship between variables and estimate future values.

Uploaded by

CIELICA BURCA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views15 pages

Module9-Correlation and Regression (Business)

This document discusses correlation and regression. It defines correlation as a test used to determine the relationship between paired variables. Correlation coefficients range from -1 to 1, where -1 is a perfect negative correlation, 1 is a perfect positive correlation, and 0 is no correlation. The document also discusses calculating correlation using Pearson's r and Spearman's rho, as well as interpreting the results. Linear regression is introduced as a way to model the relationship between variables and estimate future values.

Uploaded by

CIELICA BURCA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Module 9

Correlation and
Regression

Objectives:

At the end of the lesson the students are expected to:

1. identify the direction of the correlation of paired variables;


2. calculate the correlation coefficient of data set;
3. interpret the obtained correlation coefficient;
4. compute linear least-squares regression equations of a given data set; and
5. estimate the next value of Y, given a value of X.
CORRELATION AND REGRESSION

In everyday discourse, almost all statements about the mutual relation between variables
are accepted without question. For example, age and physical capacity, income and educational
attainment, intelligence and academic performance, cigarette smoking and lung disease,
unemployment and the condition of the economy, and so on. In almost every field, we find that
one variable is somewhat related to another variable, or that relationship exists between variables.
It should be noted, however, that relationship does not mean causality. That is, relationship does
not necessarily imply that one variable is the cause of the other variable.

The investigation of two or more variables requires not only procedures for defining and
measuring the variables under study, but also for describing the nature of relations between them.
A procedure that may be used to determine the relationship between variables is the correlation.

Correlation is the test of measurement when the degrees of relationship are measured.
The statistic used to describe the degree or magnitude of relationship between variables is called a
correlation coefficient (r) which is composed of the direction and magnitude.

The types of correlation may be classified in terms of its magnitude and direction. The
degree or magnitude may be described as perfect, high, moderate or low. The direction may be
classified as positive correlation, negative correlation or zero correlation. A positive correlation
means that there is a direct relationship between variables. It exists when high values in one
variable are associated with high values in the other variable, and low values in one variable are
associated with low values in the other variable. The negative correlation, on the other hand,
exists when high values in one variable are associated with low values in the second variable, and
vice-versa. When values in one variable tend to score neither systematically high nor
systematically low in the other variable, then there is a zero correlation.

The correlation coefficient or r ranges from -1 to 1. If the value of r is 1, then there is a


perfect positive correlation. If the value of r is -1, then there is a perfect negative correlation. If r
= 0, then there is no correlation between variables.

Business Statistics
Here is the correlation scale and the corresponding interpretation of r.

Value of r Interpretation

1 perfect correlation
±.80 - ±.99 high correlation
±.60 - ±.79 moderate high correlation
±.40 - ±.59 moderate correlation
±.20 - ±.39 low correlation
±.01 - ±.19 negligible correlation
0 zero correlation

Degree of relationship of two variables may be represented using the scatter diagram. If
the points in the scatter diagram, follow a straight line, this accurately suggests a linear
relationship. However, we must take note that not all relationships are linear. When a scatterplot
of the X and Y variables is drawn, a curved line fits the point better than a straight line, then the
relationship tends to be curvilinear.

Below is the graphical representation of the different types of correlation.

Business Statistics
Pearson Product-Moment Correlation Coefficient

The most widely used measure of correlation is the Pearson product-moment correlation
coefficient, or simply Pearson r which was developed by Karl Pearson. This statistic is used for
interval and ratio type of data. If two variables, X and Y, are under investigation the correlation
coefficient is determined by:
𝑛 ∑ 𝑋𝑌−(∑ 𝑋)(∑ 𝑌)
𝑟=
√[𝑛 ∑ 𝑋 2 −(∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 −(∑ 𝑌)2 ]

where: n – number of paired observations

Example:

Determine if there exists a relationship between the satisfactory rating of the


employees in the first half of the year and second half rating of employees in a certain
company. Use Pearson’s r and test at 0.05 level of significance.

First Half Rating Second Half Rating


84 85
88 89
78 86
79 83
91 88
84 87
77 81
83 86
85 82
86 85

Business Statistics
Solution:

First Half Second Half


XY 𝑿𝟐 𝒀𝟐
Rating (X) Rating (Y)
84 85 7140 7056 7225
88 89 7832 7744 7921
78 86 6708 6084 7396
79 83 6557 6241 6889
91 88 8008 8281 7744
84 87 7308 7056 7569
77 81 6237 5929 6561
83 86 7138 6889 7396
85 82 6970 7225 6724
86 85 7310 7396 7225
835 852 71,208 69,901 72,650

10(71208) − (835)(852)
𝑟=
√[10(69901) − (835)2 ][10(72650) − (852)2 ]

712080 − 711420
𝑟=
√[(699010) − 697225][(726500) − 725904]

660
𝑟=
√[1785][596]

𝑟 = 0.640 =moderate high positive correlation

Business Statistics
Name: ____________________________________________ Date: ____________________

Section: ___________ Professor: ___________________________ Score: _____________

Activity 1

I. Indicate the direction of the correlation between two variables as positive, negative, or zero
correlation.
___________________1. Grade of a student and number of hours spent in studying
___________________2. Price of a computer unit and monthly water consumption
___________________3. Stress level and blood pressure of a patient
___________________4. Heights of husbands and income of their wives
___________________5. Unemployment rate and interest rates

II. An insurance company wants to know how the amount of life insurance depends on the
income of persons. The research department of the company collected information on ten
policy holders. The following table lists the monthly incomes (in thousand pesos) and
amounts (in million pesos) of their life insurance policies.
Policy Monthly Life
Holder Income Insurance
01 62 4.0
02 75 5.3
03 50 5.0
04 34 2.0
05 28 2.2
06 23 1.5
07 34 1.8
08 34 2.0
09 67 4.5
10 39 3.4

Find the correlation coefficient between income and insurance amount using Pearson r. Interpret
the result.

Business Statistics
Spearman rho (𝝆) Rank Correlation

The Spearman rho (𝜌) is used in determining the correlation coefficient. This is used
to find out if there is a significant relationship between two variables of ordinal type. The
formula of Spearman rho is as follows:

6 ∑ 𝐷2
𝜌=1−
𝑛(𝑛2 − 1)

where: D - the difference between ranks

Example:

Using Spearman rho, determine the relationship between the capital and profit of
cinnamon rolls of a certain store at 0.05 level of significance.

Capital (X) Profit (Y)


20,000 5,000
50,000 15,000
10,000 3,000
100,000 30,000
18,000 4,000
25,000 9,000
11,000 6,000
150,000 70,000
5,000 3,000
40,000 15,000

Business Statistics
Solution:

Capital (X) Profit (Y) 𝑹𝒙 𝑹𝒚 D 𝑫𝟐


20,000 5,000 6 7 -1 1
50,000 15,000 3 3.5 -0.5 0.25
10,000 3,000 9 9.5 -0.5 0.25
100,000 30,000 2 2 0 0
18,000 4,000 7 8 -1 1
25,000 9,000 5 5 0 0
11,000 6,000 8 6 2 4
150,000 70,000 1 1 0 0
5,000 3,000 10 9.5 0.5 0.25
40,000 15,000 4 3.5 0.5 0.25
7

R signifies the ranking of X and Y.

6(7)
𝜌=1−
10(102 − 1)

42
𝜌 = 1−
10(100 − 1)

42
𝜌=1−
990

𝜌 = 0.958

There is a high positive correlation between capital and profit of cinnamon rolls.

Business Statistics
Name: ____________________________________________ Date: ____________________

Section: ___________ Professor: ___________________________ Score: _____________

Activity 2

1. Using Spearman rho, determine the relationship between weight and height of
babies who were admitted in a certain hospital at 0.05 level of significance.

Infant Weight(X) Height(Y)


1 27 0.70
2 25 0.64
3 28 0.77
4 23 0.62
5 21 0.60
6 20 0.62
7 29 0.77
8 24 0.64

Business Statistics
Name: ______________________________________________ Date: __________________

Section: ______________ Professor: ________________________ Score: _____________

Activity 3

Solve for the following.

A. Pearson’s r

Determine if there is a relationship between the company employee’s half-year rating and
second half of the year rating. Test at 0.05 significance level.

Half-year rating Second half rating


86 88
80 86
85 87
94 93
95 95
95 92
75 75

B. Spearman rho

Due to the growth of technology nowadays, a group of researchers decides to examine


whether there is a correlation between the cost of their internet consumption at home and
the degree of their satisfaction of the speed of the service from 1-10. The sample data is
shown below, use 0.01 significance level.

Cost of Service (Pesos) Degree of Satisfaction


300 6
200 8
500 7
600 9
300 10
400 9
800 5
900 6
400 8
600 7
500 7
700 9

Business Statistics
Linear Regression of Y on X

A regression line is a model that simplifies the relationship between two variables by
approximating a line through the center of a scatterplot that represents the data and creating a
two-dimensional center of the data.

In general, the equation of any line is given by Y = bX + a, where a and b are constant
and b ≠ 0. The constant a is the distance on the Y axis from the origin to the point where the line
cuts the Y axis, in other words, the y-intercept. The quantity b is the slope of the line. The slope
of any line is simply the ratio of the distance in a vertical direction to the distance in horizontal
direction. The slope describes the rate of increase in Y with increase in X.

If X and Y are correlated variables, we can predict or estimate the value of Y given the
value of X by finding the regression equation. The regression line of Y on X is represented by the
equation

𝑌′ = 𝑏𝑦𝑥 𝑋 + 𝑎𝑦𝑥

In the regression equation, the regression coefficient is simply the slope of the regression
line. It represents the change in Y for every one unit change in X.

To obtain the value of the coefficients ayx and byx, the following formula may be used:

(∑ 𝑌)(∑ 𝑋 2 )−(∑ 𝑋)(∑ 𝑋𝑌)


𝑎𝑦𝑥 =
𝑛 ∑ 𝑋 2 −(∑ 𝑋)2

𝑛(∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)


𝑏𝑦𝑥 =
𝑛 ∑ 𝑋 2 −(∑ 𝑋)2

Given that 10 students have taken the college admission test (X) and have a general
weighted average (Y), the computed values from the data are as follows:

 X=745  Y=836  X2 =57269  Y2=70172  XY=62945

What is the estimated general weighted average of a student who scored 95 in the college
admission test?

Solution:

To find the coefficients bxy and axy, substitute these in the formula:
𝑛(∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌) (10)(62945)−(745)(836)
𝑏𝑦𝑥 = 𝑏𝑦𝑥 = = 0.375
𝑛 ∑ 𝑋 2 −(∑ 𝑋)2 10(57269)−(745)2

(∑ 𝑌)(∑ 𝑋 2 )−(∑ 𝑋)(∑ 𝑋𝑌) (836)(57269)−(745)(62945)


𝑎𝑦𝑥 = 𝑎𝑦𝑥 = = 55.64
𝑛 ∑ 𝑋 2 −(∑ 𝑋)2 10(57269)−(745)2

Business Statistics
The equation of the regression line is given by:

𝑌′ = 𝑏𝑦𝑥 𝑋 + 𝑎𝑦𝑥

𝑌′ = 0.375𝑋 + 55.64

If X = 95, then:

𝑌′ = 0.375(95) + 55.64 = 91.265 or ≈ 91

Therefore, if a student scored 95 in the college admission test, his estimated or predicted
weighted average at the end of the semester is 91.

Business Statistics
Name: ____________________________________________ Date: ____________________

Section: ___________ Professor: ___________________________ Score: _____________

Activity 4

Using the same table; Try this.

 X=745  Y=836  X2 =57269  Y2=70172  XY=62945

a) Get the equation of regression line.


b) What is estimated amount of life insurance (in millions) of a policy holder
who has monthly income of 25 (in thousands)?
c) What is estimated amount of life insurance (in millions) of a policy holder
who has monthly income of 80 (in thousands)?

Business Statistics
Name: ____________________________________________ Date: ____________________

Section: ___________ Professor: ___________________________ Score: _____________

Activity 5

CORRELATION AND REGRESSION

I. Indicate the direction of the correlation between two variables as positive, negative, or zero
correlation.

___________________1. Hours spent in computer network gaming and academic


achievement
___________________2. Supply and prices of a commodity
___________________3. Number of hours of practice and Skill test score in keyboarding
___________________4. Attitude towards Math and Math achievement
___________________5. Learning anxiety and scholastic success
II. The director of an agency would like to determine the relationship between foreign revenues
(in million dollars) and foreign assets (in million dollars) of companies. Ten randomly selected
companies were asked to provide company reports to agency and the following data were
obtained.

Company Assets Revenues


01 52 50
02 70 45
03 40 20
04 35 22
05 28 22
06 23 15
07 28 18
08 42 22
09 65 45
10 39 34

1. Compute r then interpret the result.

2. Find the equation of the regression line.

Business Statistics
REFERENCES

Sirug, W. S. (2018), Elementary Statistics

Blay, B. E. (2013), Elementary Statistics

https://www.zcalculator.com

https://www.youtube.com/watch?v=aztcS-3MwHO&features

Business Statistics

You might also like