KEMBAR78
Simple Linear Regression and Correlation | PDF | Dependent And Independent Variables | Linear Regression
0% found this document useful (0 votes)
29 views32 pages

Simple Linear Regression and Correlation

Uploaded by

barajaalalaa133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views32 pages

Simple Linear Regression and Correlation

Uploaded by

barajaalalaa133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Wachemo University

School of Public Health

Abriham S. Areba (Assistant Professor)

Simple Linear Regression and Correlation

October, 2024
Hossana, Ethiopia
Variable: is a characteristic that takes on different values in
different things.
Types of Variables

Quantitative (Numerical) variables

Example: Number of children in a family, Weight, height,


BP,...

Qualitative (Categorical) variables

Example: Marital status, religion, Education status, patient


satisfaction, …
Abriham S. Areba
Variables can be again classified into two broad categories

Dependent Variable
o Also called response/regressed/endogenous/outcome/effect/
explained variable
o It is the focus of the research
o Affected by other (independent) variables

Independent Variables
o Also called explanatory/regressors/Exogenous/predictor/
Covariate/causal variables
o Affects the outcome variable

Abriham S. Areba
Simple Linear Regression and Correlation

Abriham S. Areba
o Regression analysis is concerned with describing and evaluating
the relationship between a given variable (often called the
dependent variable) and one or more variables which are assumed
to influence the given variable (often called explanatory variables).

o Predict the value of a dependent variable based on the value of at


least one independent variable.

o Explain the impact of changes in an independent variable on the


dependent variable.

Abriham S. Areba
Linear Regression Model
o When we observe pairs (X, Y ), we would like to write a statistical
relation with uniformly small error.
o We do not know Y exactly for every X, we will often approximate
the relation between X and Y.

 The relationship between X and


Y is described by a linear
function
 Changes in Y are assumed to be
caused by changes in X

Abriham S. Areba
Simple Linear Regression Model
o Is to determine how the average value of the continuous
outcome y varies with the value of a single predictor x.

o Linear in the parameters since no parameter appears as an


exponent or is multiplied or divided by another parameter.

Consider the following 2 models:

Model 1 : Yi = β0 + β1Xi + εi
Model 2 : Yi = β0 + β1Xi + β 2 Xi2 + εi
Models 1 and 2 are both linear in the parameters, and
can thus both be considered as linear models.
Abriham S. Areba
Error term (ε)
In this context, error does not mean mistake but it is a statistical
term representing random fluctuations, measurement error or
the effect of factors outside of our control.
𝜀~𝑁(0, 𝜎 2 )
The true model cannot be observed since β0 and β1 are not
known. We must estimate them from the data.
This gives the estimated or fitted regression line is:
෢0 +β෠ 1 xi
yෝi = β
෢0 : estimates of β0
Where: β
β෠ 1 ∶ estimates of β1
yෝi : estimates of yi Abriham S. Areba
Assumptions of Simple Linear Regression Model
1. Linearity
2. Normality
3. Homogeneity of variance
4. Independence of error
Linearity: the relationships between the predictor and the
outcome the variable should be linear
Normality: the errors should be normally distributed
Homogeneity of Variance: the error variance should be constant
Independence: the errors associated with one observation are
not correlated with the errors of any other observation.

Abriham S. Areba
Assumption 3: The variance of y is the same for any x that is,
the spread of values for y at each level of x remains approximately
constant.
o The magnitude of the residuals is the vertical distance
between the actual observed points and the estimating line.

o The estimating line will have a ‘good fit’ if it minimizes the


error between the estimated points on the line and the actual
observed points that were used to draw it.
Abriham S. Areba
Abriham S. Areba
Abriham S. Areba
Abriham S. Areba
෢0 and β෠ 1 calculated as:
The parameter β

෢0 = yത − β෠ 1 xത
β

n σn n n
i=1 xi yi − σi=1 yi σi=1 yi
β෠ 1 = 2
n σn 2 n
i=1 xi − σi=1 x

Abriham S. Areba
o The regression line is that for independent variable the
corresponding dependent variable will be normally distributed
with:
❖ Mean, 𝟎 + 𝟏 x
and
❖ Variance, 2
o If 2 were 0, then every point would fall exactly on the
regression line, whereas
o the larger 2 is, the more scatter occurs about the regression
line.

Abriham S. Areba
The β0 and β1 are not known. We must estimate them,
This gives the estimated or fitted regression line is:
෢0 +β෠ 1 Xi
Y෡i = β
෢0 is the estimated mean response when X = 0.
β
β෠ 1 is the estimated change in the mean response for a unit increase
in X.

β෠ 1 > 0, indicates that there is a direct linear r/ship b/n x & y.

β෠ 1 < 0, indicates that there is an inverse r/ship b/n x and y.

β෠ 1 = 0, indicates that there is no linear r/ship between x and y.

Abriham S. Areba
Tests of Significance of Regression Coefficients

The null hypothesis is that there is no relationship between X and


Y is expressed as:
Ho : β1 = 0

The alternative hypothesis is that there is a significant relationship


between X and Y, that is,

HA : β1 ≠ 0

In order to reject or not reject the HO , we calculate the test


statistic given by:
෡ 1 −β0
β ෡1
β
t= ෡1) = ෡1)
Se(β Se(β
and compare the student’s t distribution with (n-2) df for a given
significance level α.
Decision rule:
If t > t αൗ n − 2
2

then we reject the null hypothesis, and conclude that there is a


significant relationship between X and Y.

A 1 − α 100% CI for β1 and β0 are given by:

෢1 ± t αൗ n − 2 Se β෠ 1
β 2
Correlation
Correlation Analysis: deals with the measurement of the closeness of
the relation ship which are described in the regression equation.
❖ Correlation: measures the relative strength of the linear
relationship between two variables
❖ Unit-less
❖ Ranges between –1 and 1
❖ r = -1 implies perfect negative linear correlation between the
variables under consideration
❖ r = +1 implies perfect positive linear correlation between the
variables under consideration

Abriham S. Areba
❖ The closer to –1, the stronger the negative linear relationship
❖ The closer to 1, the stronger the positive linear relationship
❖ The closer to 0, the weaker any positive linear relationship

Abriham S. Areba
The correlation coefficient, r between x & y is given by:

𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥,𝑦)
r=
𝑉𝑎𝑟 𝑥 𝑉𝑎𝑟(𝑦)

σni=1 xi − 𝑥ҧ yi − 𝑦ത
𝑟=
𝑥𝑖 − 𝑥ҧ 2 𝑦𝑖 − 𝑦ത 2

n σni=1 xi yi − σni=1 xi σni=1 yi


𝑟=
n σni=1 xi 2 − σni=1 xi 2 n σni=1 yi 2 − σni=1 yi 2

𝑆𝑆𝑥𝑦
r=
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
Abriham S. Areba
Scatter Plots of Data with Various Correlation Coefficients

Abriham S. Areba
Hypothesis Testing for Correlation

H0: ρ = 0 (no correlation between two variables)


HA: ρ ≠ 0 (correlation exists between two variables)

𝑛−2
Test statistic 𝑡=𝑟
1−𝑟 2

has a t distribution with n-2 degrees of freedom.


conclusion: if P<0.05, Reject the Ho and conclude that there is
evidence to suggest that there is a correlation between two
variables.

Abriham S. Areba
Coefficient of Determination (R2)

o The coefficient of determination is the portion of the total


variation in the dependent variable that is explained by
variation in the independent variable
o It is an indicator of how well the model fits the data.
o Adding more predictors to the model increases R2

SSR SSE
R2 = =1− 0  R2  1
SST SST

The proportion of total variation in the dependent variable (y)


that is explained by changes in the independent variable (x) or by
the regression line is equal to: R2 𝑥100%
Abriham S. Areba
Example: A researcher wants to find out if there is any relationship between
the height of the son and his father. He took a random sample of 6 fathers and
their sons. The height in inches is given in the table below.

σ X i = 392, σ X i 2 = 25628, σ X i Yi = 26476, σ Yi =405, σ Yi 2 = 27355

A. Estimate the parameters β෠ 0 and β෠ 1


B. Fit a simple linear regression line and interpret the estimates
C. What would be the height of the son if his father’s height is 70 inch?
D. Calculate coefficient of correlation and Interpret it
E. Calculate coefficient of determination

Abriham S. Areba
n σn n
nσ XY − i=1 Xi σi=1 Yi 6∗26476−392∗405
β෠ 1 = i=1 in i 2 = = 0.92
n σi=1 Xi 2 − σn 6∗25628−3922
i=1 X

405 392
β෠ 0 = yത − β෠ 1 xത = − 0.92 ∗ = 7.2
6 6
B. The fitted (regression) line of Y on X is:
yො = β෠ 0 +β෠ 1 𝑥 = 7.2 + 0.92 x
β෠ 0 = 7.2, indicates that the value of Y when no effect of X for the Y.

β෠ 1 = 0.92, indicates for every unit increase in height of father, the

mean height of the son increase by 0.92.


β෠ 1 > 0, there is direct r/ship between height of father and son.

yො = 7.2 + 0.92 x
ො = 7.2+0.92(70) =71.8, thus the height of the son is 71.8 inch.
C. y
n σn n n
i=1 xi yi − σi=1 xi σi=1 yi
D. r =
2 2
n σn 2 n
i=1 xi − σi=1 xi n σn 2 n
i=1 yi − σi=1 yi

6 ∗ 26476 − 392 ∗ 405


r= = 0.92
6 ∗ 25628 − 3922 6 ∗ 27355 − 4052

There is strongest positive correlation between height of father and


height of son.
E.
r 2 = 0.922 = 84.6%
About 84.6% of the variation in the height of the son is due to
changes in the height of the father.

You might also like