0% found this document useful (0 votes)

34 views12 pages

Correlation and Regression Analysis

Uploaded by

Evelyn Mlay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views12 pages

Correlation and Regression Analysis

Uploaded by

Evelyn Mlay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIVERSITY OF DODOMA

COLLEGE OF HEALTH AND ALLIED SCIENCIES

SCHOOL OF PUBLIC HEALTH

Topic 5: Correlation and Regression Analysis

Instructor: C. Mbotwa

Relationship Between two Variables

Introduction

So far we have studied problems relating to one variable only. In practice we come across a large
number of problems involving the use of two or more variables. If two quantities vary in such a way
that movements in one are accompanied by movements in the others, these quantities are correlated.
The study of this relationship is called BIVARIATE ANALYSIS. More formally, if for every
measurement of a variable X we know a corresponding value of a second variable Y, the resulting
set of pairs of varieties is called a BIVARIATE POPULATION and the data used are called
BIVARIATE DATA. To be more specific, is to say the data which involve two variables are referred
as BIVARIATE DATA.

Examples,

i. In health studies of populations, it is common to obtain variables such as height and weight.
ii. Economic studies may be interested in, among other things, personal income and years of
education, personal income and private consumption.
iii. Most university admissions committees ask for an applicant’s high school grade point
average and standardized admission test scores.

Scatter Diagrams

The simplest device for determining a relationship between two variables is by the use of a special
type of dot chart called scatter diagram. The method is so called because it indicates the scatter of
the various points. When this method is used the given bivariate data are plotted on graph in form of
dots, ie for each pair of X and Y values we put a dot and thus obtain many points as the number of
observations we have. To plot the data, the dependent variable Y is always plotted on the Y-axis
(vertical-axis) and the independent variable X is plotted on the X-axis (horizontal axis).

By looking at the scatter of the various points we can form an idea as to whether the variables are
related or not. The more the plotted points “scatter” over a chart the less relationship there is
between the two variables. The more nearly the points come to falling on a line, the higher the
degree of a linear relationship. If all points lie on the straight line rising from the lower left-hand
corner to the upper right-hand corner the relationship is said to be PERFECTLY POSITIVE (Figure
5.2). On the other hand, if all points are lying on a straight line falling from the upper left-hand
1
corner to the lower right-hand corner of the diagram, the relationship is said to be PERFECTLY
NEGATIVE (Figure 5.4).

If the plotted points fall in a narrow band there would be a high degree of relationship between the
variables. The relationship shall be positive if the points show a RISING TENDENCY from the
lower left-hand corner to the upper right-hand corner (Figure 5.1). Conversely, the relationship will
be negative if the points show a DECLINING TENDENCY from the upper left-hand corner to the
lower right-hand corner of the diagram (Figure 5.3).

Correlation Analysis

Correlation refers to the extent of a linear relationship between two or more variables. If there is a
close linear relationship between the two variables, the variables are said to be highly correlated; if
there is no linear relationship between the two variables, the variables are said to be uncorrelated.
Thus, correlation analysis refers to the technique used in measuring the closeness of the linear
relationship between variables.

Correlation Coefficient

The strength of the linear relationship between two variables is popularly measured by what is
known as the PERSON PRODUCT-MOMENT COEFFICIENT OF CORRELATION. The sample
correlation coefficient is computed as:

The value of the coefficient of correlation obtained by the above formula lies between -1 and +1.
When , it means there is a PERFECT POSITIVE linear relationship between the variables.
When , it means there is a PERFECT NEGATIVE linear relationship between the variables.
When , it means there is NO LINEAR RELATIONSHIP between the variables. However, in
practice, such values of as and 0 are rare. We normally get non-zero values which lie
between and .

The coefficient of correlation describes not only the magnitude of correlation but also its type of
direction. Thus, and has the same magnitude of 0.95 but the first coefficient
describes a positive correlation where as the second describes a negative correlation.

Properties of the Coefficient of Correlation

The following are the important properties of the correlation coefficient, r.

1. The coefficient of correlation lies between -1 and +1. In symbols we write or

.
2. The coefficient of correlation is independent of change of the scale and origin of the
variables X and Y.

2
Y 9 Y 8
8
7
7
6 6
5 5
4 4
3 3
2 2
1 1
0
0
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
X
X

Figure 5.1: Positive linear relationship (r=0.98) Figure 5.2: Perfectly positive linear relationship (r=1)

Y 7
Y 7
6
6
5
5
4
4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

X X

Figure 5.3: Negative Linear relationship ( ) Figure 5.4: Perfectly negative linear relationship ( )

Y 6
5

0
0 1 2 3 4 5 6 7 8 9
X

Figure 5.5: No linear relationship (r=0.02)

3
Regression Analysis

The purpose of linear regression is to develop a mathematical relationship (model) between the
variables that can be used to estimate the value of the variable if the value of another variable is
known. The relationship that is developed has a form of a straight line and that is why it is called
linear regression. Linear regression is further classified into two, i.e simple linear regression and
multiple linear regression.

Simple Liner Regression

In simple linear regression, we develop a linear relationship between one dependent variable against
one independent (explanatory) variable. The relationship we need to establish in simple linear
regression has got the form:

Where:

Y is the dependent variable.

is the intercept in the Y-axis (Constant).

is the gradient (slope) of the relationship or coefficient of the independent variable.

X is the independent or explanatory or predictor variable

is random error in Y

Since we cannot fit exactly the line we need, as the case in inferential statistics is, we estimate the
relationship by:

Where:

a is estimating the

b is estimating the

In order to establish the relationship, we need to find the values of and . By using the method of
least squares, the values of and can be shown to be:

4
Interpretation of the Regression Coefficients

“a” and “b” are called regression coefficients and have the following interpretation:

a (Y intercept)-shows the minimum value of dependent variable Y can take without any impact of X
if the slope b is positive. In case the slope is negative, the intercept will show the maximum value
that Y can attain if there is no impact of X.

On the other hand, b (slope) has two interpretations. First, it shows the direction of the relationship.
If its value is positive, then we say that there is a positive relationship between the regressed
variables. Conversely, if the value of the slope is negative, then we understand that the two variables
are negatively related. The second interpretation is that, it shows the amount by which Y will change
by increasing one unit of X.

Fitting the Least Squares Line or Line of Best Fit

Consider as set of data of points which can be identified by corresponding values of X and Y, say
.

The straight-line model for the response Y in terms of X is .

The line that gives the mean (or expected) value of Y for a given value of X is And
the fitted line which we hope to find, is represented as .

Where is the estimator of the mean value of Y, and a predictor of some future values of Y,
and are estimators of and respectively.

For a given data point, say the observed value of Y is and the predicted value of Y would
be obtained by substituting into the prediction equation: .

The deviation of the ith value of Y from its predicted value is:

Then the sum of squares of the deviation of the Y values about their predicted values for all of the
data points is:

The quantities and that make the SSE a minimum are called the least squares estimates of the
population parameters and and the prediction equation , is called least squares line.

Definition: The least squares line is one that has a smaller SSE than any other straight line model.

5
The values of and that minimize the SSE are derived by performing a partial derivatives with
respect to and whereby we get two equations known a normal equations.

Example 5.1

Suppose an experiment involving five subjects is conducted to determine the relationship between
the percentage of a certain drug in the bloodstream and the length of time it takes to react to
stimulus. The results are shown below:

Subject Amount of Drug X (%) Reaction time Y (Minutes)

1 1 1
2 2 1
3 3 2
4 4 2
5 5 4

a) Plot these points in an X-Y plane (Scatter diagram).

b) Determine the Pearson Correlation coefficient of amount of drug and reaction time and
interpret it.
c) Determine the least square line and interpret the slope.
d) Determine the value of SSE
e) Estimate the time the drug takes to react to a stimulus if 4.7% of the drug is used.

Solution
c) The scatter diagram on the data is shown below.

4.5
4
3.5
Reaction Time (Min)

3
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5 6
Amount of Drug (%)

For the next parts, Preliminary computations are done as shown below.
6
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
Totals 15 10 55 26 37
and

b) The correlation coefficient

Therefore, the Pearson correlation coefficient of amount of drug and reaction time is 0.90. This
shows that amount of drug and reaction time have strong positive linear relationship.

c) The least square line

The slope of the squares line is

And the y-intercept is

7
Thus, the least squares line is

Interpretation of the Slope

The slope shows that there is a positive linear relationship between reaction time and amount of
drug. That is as amount of drug increases by 1%, reaction time will increase by 0.7 minutes (42
seconds).

d) Determination of the SSE.

The observed and predicted values of the deviations of the Y values about their predicted values, and
the squares of the these deviations are shown below:-

1 1 0.6 0.4 0.16

2 1 1.3 -0.3 0.09
3 2 2 0 0.00
4 2 2.7 -0.7 0.49
5 4 3.4 0.6 0.36
0 SSE=1.10

e) Estimate the time the drug takes to react to a stimulus if 4.7% of the drug is used.

This is given by minutes.

Model Assumptions

Regression analysis requires us to specify the probability distribution of the random error . We will make
four basic assumptions about the general form of this probability distribution.

Assumption 1

The mean of the probability distribution of is zero i.e . This assumption implies that the mean
value of Y, for a given value of X is

Assumption 2

The variance of the probability distribution of is constant for all settings of the independent variable X i.e
for all values of X.

Assumption 3

The probability distribution of is Normal i.e

8
Assumption 4

The errors associated with any two different observations are independent i.e .

That is, the errors associated with one value of Y have no effect on errors associated with other Y values.

Differences between Correlation and Regression Analysis

1. Whereas correlation coefficient is the measure of the strength of the linear relationship, the objective of
regression analysis is to study the nature of the relationship between the variables so that we may be
able to predict the value of one variable on the basis of another. Conventionally, the variable which is
the basis of prediction is called the independent or explanatory variable and the variable that is to be
predicted is referred to as the dependent variable. The choice of dependent and independent variables is
a crucial one in regression analysis.
2. The cause and effect relationship is clearly indicated through linear regression analysis than by
correlation. Correlation is merely a tool of ascertaining the degree of relationship between two
variables, and therefore we cannot say that one variable is the cause and the other the effect.

Multiple Linear Regression

Multiple linear regression (MLR) is an extension of simple linear regression. In Simple Linear
Regression we considered a single dependent variable, Y, and a single independent variable X.
MLR is used when there are two or more independent variables where the model using population
information is:

Examples:

i. The height of a child can depend on the height of the mother, the height of the father,
nutrition, and environmental factors.
ii. The level of blood pressure may be influenced weight of an individual, amount of salt in the
diet, alcohol consumption, age, stress e.tc.

A multiple linear regression model with k predictor variables X1, X2, ..., Xk and a response Y , can
be written as

Classical Linear Regression Assumptions

i. There should be no multicollinearity problem.

That means the correlations among the independent variables should be low (say rij<0.3)
ii. The variance of the error terms should be equal and constant (homoscedasticity)

iii. There should be no autocorrelation

The error terms should not be correlated to each other (
iv. Each of the error term should be normally distributed. That is

9
Example 5.2:
Consider the following regression of baby weight (grams) on maternal height (cm), maternal weight
(kg), maternal age at birth (years).

Where:

X1 : Maternal height
X2 : Maternal weight
X3 Age of mothers
Y : Baby weight

Interpretations
- The above results show that baby’s weight is negative related to the maternal are height and
positively related to other explanatory variables (Maternal weight and age of the mother).
- If height of the mother increases by one centimetre, the weight of the baby will decrease by
96.82 grams.
- If the weight of the mother increases by one kilogram, the weight of the baby will increase by
9.05 grams.
- If the age of the mother increases by one year, the weight of the baby will increase by 68.39
grams

Coefficient of Multiple Determination (R2)

Proportion of variation in dependent variable Y “explained” by the the k independent variables.

Adjusted-R2 is used to compare models with different sets of independent variables in terms of
predictive capabilities. Penalizes models with unnecessary or redundant predictors.

10
REVIEW QUESTIONS

1. The table below shows the average rate of growth of GDP, g, and employment, e, for 25
countries for the period of 1988-1997.
Country Employment GDP (g) Country Employment GDP (g)
(e) (e)
1 1.68 3.04 14 2.57 7.73
2 0.65 2.55 15 3.02 5.64
3 0.34 2.16 16 1.88 2.86
4 1.17 2.03 17 0.91 2.01
5 0.02 2.02 18 0.36 2.98
6 -1.06 1.78 19 0.33 2.79
7 0.28 2.08 20 0.89 2.60
8 0.08 2.71 21 -0.94 1.17
9 0.87 2.08 22 0.79 1.15
10 -0.13 1.54 23 2.02 4.18
11 2.16 6.40 24 0.66 1.97
12 -0.30 1.68 25 1.53 2.46
13 1.06 2.81

i. Find the Pearson correlation coefficient of Employment and GDP and interpret it.
ii. Estimate the linear regression of “g” on “e”, and interpret the coefficients.
iii. What will be the value of GDP when employment rate is 2.00?

2. The linear regression equation obtained after regression weight (Kg) of respondent against
her height (cm) and hours spend for physical exercises per week is given below. Interpret the
coefficients.
.

3. The table below shows years of schooling, S, and hourly earnings in 1944, in dollars, Y, for a
subset of 20 respondents from the United States National Longitudinal Survey of Youth.

Observation S Y Observation S Y
1 15 17.24 11 17 15.38
2 16 15.00 12 12 12.70
3 8 14.91 13 12 26.00
4 6 4.50 14 9 7.50
5 15 18.00 15 15 5.00
6 12 6.29 16 12 21.63
7 12 19.23 17 16 12.10
8 18 18.69 18 12 5.55
9 12 7.21 19 12 7.50
10 20 42.06 20 14 8.00

a) Find the correlation coefficient.

b) Fit the regression line of Y on S and interpret the coefficients.
11
4. A public health scientist exploring the relationship between family size and food expenditure
randomly selected six female at a street. Each selected female was asked how many children
under the age of 18 years with her, and she was asked the number of litres of milk consumed
weekly by the household, on average. The data resulting from this inquiry are shown below.

Number of children under 18 years Weekly milk expenditure (litres)

2 14
4 20
2 9
6 25
3 16
1 14

i. Construct a scatter plot for the given data.

ii. Determine the least square line that exists between the two variables.
iii. Compute and interpret the coefficient of correlation between the variables

5. Given below is level of education (X) attained and number of children born (Y) by a sample
of six women:
X 0 4 8 12 14 17
Y 8 7 5 4 3 2

i. Compute a sample correlation coefficient of X and Y

ii. Estimate a linear regression of Y on X.
iii. Estimate the expected number of children in a family of a mother who spent 16 years
at school.
6. A study carried out in Iringa last year revealed the following data concerning family size and
years spent at school by the family head for ten respondents. The data is provided in the table
below as follows:

Family Size Year at School

10 14
5 12
8 8
25 0
22 2
19 7
16 16
14 10
9 20
6 6
Fit the linear regression of the family size on year at school.

REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Correlation and Regression Guide
No ratings yet
Correlation and Regression Guide
9 pages
SQQS2073 Note 1 Simple Linear Regression
No ratings yet
SQQS2073 Note 1 Simple Linear Regression
11 pages
CH 6
No ratings yet
CH 6
42 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Cha 6
No ratings yet
Cha 6
8 pages
Statistics of Two Variables: Functions
No ratings yet
Statistics of Two Variables: Functions
15 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Advanced Data Analysis Techniques
No ratings yet
Advanced Data Analysis Techniques
23 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
2 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
CH 6
No ratings yet
CH 6
43 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Regression and Correlation
No ratings yet
Regression and Correlation
28 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages
Final Report - Introduction of Regression Analysis
No ratings yet
Final Report - Introduction of Regression Analysis
5 pages
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
No ratings yet
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
13 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
15 pages
Linear Regression & Correlation
No ratings yet
Linear Regression & Correlation
9 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Unit 2
No ratings yet
Unit 2
44 pages
Simple Linear Correlation-1
No ratings yet
Simple Linear Correlation-1
15 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
RM Chap 18 Bivariate Analysis
No ratings yet
RM Chap 18 Bivariate Analysis
30 pages
Unit 3FDS
No ratings yet
Unit 3FDS
10 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
12 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
17 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Correlation
100% (1)
Correlation
29 pages
Engineering Regression Techniques
No ratings yet
Engineering Regression Techniques
8 pages
Multiplelinear SWregressionanalysislatest
No ratings yet
Multiplelinear SWregressionanalysislatest
16 pages
MMW Module 10 - Correlation and Linear Regression
No ratings yet
MMW Module 10 - Correlation and Linear Regression
13 pages
Regression and Correlation Notes
No ratings yet
Regression and Correlation Notes
28 pages
CH VII - Regression & Correlation
No ratings yet
CH VII - Regression & Correlation
7 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Class Note II - 044242
No ratings yet
Class Note II - 044242
19 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
3 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Business License Payment Guide
No ratings yet
Business License Payment Guide
1 page
Descriptive Statistics for Qualitative Data
No ratings yet
Descriptive Statistics for Qualitative Data
44 pages
Biostatistics Kitabu
No ratings yet
Biostatistics Kitabu
97 pages
Roggen Et Al 1992 Antigenic Diversity in Haemophilus Ducreyi As Shown by Western Blot (Immunoblot) Analysis
No ratings yet
Roggen Et Al 1992 Antigenic Diversity in Haemophilus Ducreyi As Shown by Western Blot (Immunoblot) Analysis
6 pages
Classification of Anti-Bacterial Agents and Their
No ratings yet
Classification of Anti-Bacterial Agents and Their
18 pages
Infrared Inspection Report
No ratings yet
Infrared Inspection Report
1 page
Deep Neural Network Framework Based On Backward Stochastic Differential Equations For Pricing and Hedging American Options in High Dimensions
No ratings yet
Deep Neural Network Framework Based On Backward Stochastic Differential Equations For Pricing and Hedging American Options in High Dimensions
35 pages
Ecofmet
No ratings yet
Ecofmet
2 pages
CH-6, MATH-5 - LECTURE - NOTE - Summer - 20-21
No ratings yet
CH-6, MATH-5 - LECTURE - NOTE - Summer - 20-21
16 pages
Vasicek - Term Structure Modeling Using Exponential Splines
No ratings yet
Vasicek - Term Structure Modeling Using Exponential Splines
11 pages
Peramalan Penjualan Bulanan
No ratings yet
Peramalan Penjualan Bulanan
24 pages
Stock Prediction for Tech Students
100% (1)
Stock Prediction for Tech Students
19 pages
Introduction To Social Macrodynamics. Compact Macromodels of The World System Growth
No ratings yet
Introduction To Social Macrodynamics. Compact Macromodels of The World System Growth
33 pages
Ad3491 Foda Question Bank
No ratings yet
Ad3491 Foda Question Bank
7 pages
Cambridge Standard 12 Chapter 6
No ratings yet
Cambridge Standard 12 Chapter 6
11 pages
Introduction To Mediation Moderation and Conditional Process Analysis A Regression Based Approach 1st Edition Andrew F. Hayes Download
No ratings yet
Introduction To Mediation Moderation and Conditional Process Analysis A Regression Based Approach 1st Edition Andrew F. Hayes Download
28 pages
Bayesian Probability & Regression
No ratings yet
Bayesian Probability & Regression
6 pages
Ai & ML 2 Marks Was
No ratings yet
Ai & ML 2 Marks Was
23 pages
Multiple Regression for STAT 501
No ratings yet
Multiple Regression for STAT 501
23 pages
Ch3 Cost Estimation and Behaviour
No ratings yet
Ch3 Cost Estimation and Behaviour
28 pages
SPE-182808-MS Reservoir Simulation Assisted History Matching: From Theory To Design
No ratings yet
SPE-182808-MS Reservoir Simulation Assisted History Matching: From Theory To Design
19 pages
Final 3rd-4th Sem Syllabus - CS Stream
No ratings yet
Final 3rd-4th Sem Syllabus - CS Stream
8 pages
Geodetic Deformation Monitoring
No ratings yet
Geodetic Deformation Monitoring
86 pages
Regression Analysis - VCE Further Mathematics
No ratings yet
Regression Analysis - VCE Further Mathematics
5 pages
A General, Fast and Robust B-Spline Fitting Scheme For Micro-Line Tool Path Under Chord Error Constraint
No ratings yet
A General, Fast and Robust B-Spline Fitting Scheme For Micro-Line Tool Path Under Chord Error Constraint
12 pages
A Second Course in Statistics: Regression Analysis: Journal of The American Statistical Association June 1997
No ratings yet
A Second Course in Statistics: Regression Analysis: Journal of The American Statistical Association June 1997
9 pages
Linear Regression and Correlation: Mcgraw Hill/Irwin
No ratings yet
Linear Regression and Correlation: Mcgraw Hill/Irwin
37 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
31 pages
Math in The Modern World All in Source by Jayson Lucena
86% (14)
Math in The Modern World All in Source by Jayson Lucena
104 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Regress
No ratings yet
Regress
11 pages
Seismic Interformertry
No ratings yet
Seismic Interformertry
23 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
MUF0142 Sample Exam Questions 2
No ratings yet
MUF0142 Sample Exam Questions 2
17 pages
Core Data Analysis Worksheet 9-Ex4DE
No ratings yet
Core Data Analysis Worksheet 9-Ex4DE
15 pages

Correlation and Regression Analysis

Uploaded by

Correlation and Regression Analysis

Uploaded by

UNIVERSITY OF DODOMA

COLLEGE OF HEALTH AND ALLIED SCIENCIES

SCHOOL OF PUBLIC HEALTH

Topic 5: Correlation and Regression Analysis

Relationship Between two Variables

Properties of the Coefficient of Correlation

The following are the important properties of the correlation coefficient, r.

1. The coefficient of correlation lies between -1 and +1. In symbols we write or

Figure 5.5: No linear relationship (r=0.02)

Simple Liner Regression

Y is the dependent variable.

is the intercept in the Y-axis (Constant).

is the gradient (slope) of the relationship or coefficient of the independent variable.

X is the independent or explanatory or predictor variable

Fitting the Least Squares Line or Line of Best Fit

The straight-line model for the response Y in terms of X is .

Subject Amount of Drug X (%) Reaction time Y (Minutes)

a) Plot these points in an X-Y plane (Scatter diagram).

b) The correlation coefficient

c) The least square line

The slope of the squares line is

And the y-intercept is

Interpretation of the Slope

d) Determination of the SSE.

1 1 0.6 0.4 0.16

This is given by minutes.

The probability distribution of is Normal i.e

Differences between Correlation and Regression Analysis

Multiple Linear Regression

Classical Linear Regression Assumptions

i. There should be no multicollinearity problem.

iii. There should be no autocorrelation

Coefficient of Multiple Determination (R2)

Proportion of variation in dependent variable Y “explained” by the the k independent variables.

a) Find the correlation coefficient.

Number of children under 18 years Weekly milk expenditure (litres)

i. Construct a scatter plot for the given data.

i. Compute a sample correlation coefficient of X and Y

Family Size Year at School

You might also like