KEMBAR78
05 Correlation Regression1 | PDF | Linear Regression | Regression Analysis
0% found this document useful (0 votes)
6 views11 pages

05 Correlation Regression1

correlation regression

Uploaded by

Daniel Rotari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views11 pages

05 Correlation Regression1

correlation regression

Uploaded by

Daniel Rotari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons

Correlation and
Regression I
Day 5
Trends · Correlation · Causation · Spurious correlation · Pearson ·
Spearman · Linear regression · General model · 1 Dependent variable
· Strength · Model · Least square method · Explained / remaining
variation · (Adjusted) R square · Effect size · Normality residuals ·
Linearity ·Homoscedasticity · GLM · Coefficients · ANCOVA

Ecological Methods 2025

1 2

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Trend analysis: Correlation and Regression Regression


1. Correlation and introduction regression
 General model

2. Different distributions
 Different response curves

3. Multiple regression
 More than 1 independent variable

4. Zero-inflated models
 Lots of zeros

3 4

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics Today’s topics

1. Correlation 1. Correlation
2. Linear regression 2. Linear regression
 Basics of regression  Basics of regression
 Linear model  Linear model
 Calculations  Calculations
 Assumptions  Assumptions
 Regression compared to …  Regression compared to …

5 6

1
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Correlation Correlation IS NOT causation

Relation between two variables:


3.5

3 Causation Correlation
2.5
Null hypothesis: X Y X Y
variable 2
2

1.5
there is no relationship
1 Once X has happened, Z
0.5 Y will follow
Not necessarily causal relation! 0
0 2 4 6 8 10 12
variable 1

7 8

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Spurious correlations

Spurious correlation occurs when two variables appear


causally related to one another but are not

Examples

9 10

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Correlation or not?

Measure for relation between two variables

 Between –1 (negative correlation) and 1 (positive correlation)

 Correlation coefficient r = 0, no relation

 Correlation coefficient r  0, but significant ?

Calculate correlation coefficient

11 12

2
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Strong/weak correlation
Significant correlation?
9 18 r = 0.7
r=0
8 16 Weight
Weight

Weight
7 14
Weight
6 12

5 10

4 8

3 6

2 Lenght Lenght
4

1 2

0 0
0 5 10 15 20 25 0 5 10 15 20 25 r = -0.7
Lenght Lenght Weight 0 < r < 0.3 (positive or negative) weak correlation

r = 0.978, P < 0.001, N = 25 r = 0.494, P = 0.12, N = 25 0.3 < r < 0.6 moderate correlation

0.6 < r < 1 strong correlation


Lenght

13 14

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons
Pearson correlation coefficient, rp
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦) (parametric test)
Pearson correlation coefficient 𝑟 =
(parametric test) x y
∑(𝑥 − 𝑥̅ ) ∑(𝑦 − 𝑦) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦) 12 1

 Normally distributed data


𝑟 = 18
13
5
6
 Linear relationship between variables ∑(𝑥 − 𝑥̅ ) ∑(𝑦 − 𝑦)
Averages:
14.3 4

6 d 2
Spearman rank correlation coefficient 12 − 14.3 × 1 − 4 + 18 − 14.3 × 5 − 4 + 13 − 14.3 × 6 − 4
𝑟 = = 0.47
(non-parametric test)
rs  1  12 − 14.3 + 18 − 14.3 + 13 − 14.3 × 1−4 + 5−4 + 6−4

n3  n
 Can always be used: not-normally distributed data df = total number of pairs of observations -2
 For relationship between ranks

15 16

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons
Pearson correlation coefficient, rp Pearson correlation coefficient, rp
(parametric test) (parametric test)

1 r 2
Standard error of r sr 
n2 Normal distribution of both
variables (bivariate normal
distribution)
9 18

If not: transform one or both


Weight

Weight

8 16

7 14

6 12

5 10

4 8

3 6

2 4

1 2

0 0
0 5 10 15 20 25 0 5 10 15 20 25

Height Height

17 18

3
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons


Correlation between length and body mass of birds
x y rx ry d
Null hypothesis: there is no correlation between length and body mass
6 d 2
12 1 1 1 0
18 5 3 2 1

rs  1  13 6 2 3 -1

n3  n n = number of pairs
Spearman rank correlation coefficient
(non-parametric test)

6 × (0) +(1) + (−1)


𝑟 =1− = 0.5
(3) −3

df = total number of pairs of observations -2

19 20

Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics
ggplot(data = flycatcher, mapping = aes(x = length, y = weight)) +
geom_point()
Example R Make scatter plot

 readxl
 tidyverse 1. Correlation
2. Linear regression
Compute correlation coefficient  Basics of regression
cor.test(flycatcher$weight, flycatcher$length, method = "pearson")  Linear model
cor.test(flycatcher$weight, flycatcher$length, method = "spearman")  Calculations
 Assumptions
 Regression compared to …

21 22

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Regression How an environmental variable affects


one particular species

 Relationship between 2 or more variables  Results often come from observational studies

 Independent variable(s) affects dependent variable 1


2 At 8 locations:
 Not for more than 1 dependent variable 3
4 Environmental variable:
 Dependent variable: biomass, abundance, etc. 5 - pH
6
7 Response variable:
- abundance of a plant species
8
 Causality is assumed, not tested!

23 24

4
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

What is strength of relation? What is direction of relation (i.e. effect size


and positive or negative effect)?
Less variability, P << 0.001 Lot of variability, P > 0.05 Large effect Small effect

Abundance species
Abundance species
Abundance species
Abundance species

pH pH
pH pH

25 26

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

What is shape of relation? Regression model


Abundance species
Abundance species

 Based on locations that have been measured

Abundance
Model

pH pH
Data
Abundance species

Abundance species

Expected

pH
Measured

pH pH

27 28

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics y  b0  b1x  e


Dependent Systematic part Error part
1. Correlation variable y (explained variation) (unexplained variation)

2. Linear regression
Remaining variation
Y
 Basics of regression normally distributed

 Linear model
b0 b1
 Calculations
0 1
 Assumptions X

 Regression compared to … Simple regression model

29 30

5
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Simple regression model


 Remaining variation (residuals)
Linear regression is normally distributed
 b0 and b1 estimated based on observations

 Normality test on residuals  Using the least-square method


 sum (observed-expected)2 over all observations
 minimise sum
 If not: transform dependent
variable (ln, sqrt, etc.)

Ordinary Least Square (OLS) regression


 Then: calculate residuals again

31 32

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Simple regression model Residuals: unexplained variation

 b0 and b1 estimated based on observations


Abundance - pH Residuals - pH
 Using the least-square method
 sum (observed-expected)2 over all observations +
Abundance

 minimise sum

Residuals
}
Abundance
Abundance

}
0
} - pH
pH
pH pH
Minimize unexplained variation (residuals)

33 34

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics  Variation explained by regression (Regression SS)

 Remaining variation (Residual SS)

1. Correlation
2. Linear regression
 Basics of regression
 Linear model
 Calculations
 Assumptions
 Regression compared to …
ANOVA calculation

35 36

6
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

abundance
F-ratio and ANOVA output for regression

y SS d.f. M.S. F p

Regression … 1 =SS =MSregression …


d.f. MSresiduals

Residuals … n-2 =SS


x pH
d.f.
Total … n-1
Regression line always passes through
the point (mean of x, mean of y)

37 38

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

 Variation explained by regression (Regression SS) F = MSregression Regression model


 Remaining variation (Residual SS) MSresiduals

 Null hypothesis
 No effect of independent variable on dependent variable
= Regression coefficient b1 is zero

 Using t- or F-value to test

 Fraction of variation explained: adjusted R square


Large value for F = low P value Low value for F = high P value

39 40

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Relation between size and predator density Example on regression analysis

Clones of Daphnia

Null hypothesis: there is no effect of


predator density on size

Reared in the presence


of predators

41 42

7
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Linear regression Linear regression

Null hypothesis: there is no effect of predator density on size Reject or not?


Density of predators has a positive effect on the size of daphnia
Null hypothesis: intercept equals 0 Reject or not? (linear regression, df=11, t=13.386, p<0.001)

43 44

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Linear regression
Fraction explained variance

y  b0  b1 x
Size = 0.865 + 0.247 x Pred_density

45 46

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons
regr.model <- lm(size ~ pred_dens, data = daphnia) What variation can be explained?
Example R
summary(regr.model)
Compute regression Strength of relation
 readxl
daphnia$residuals <- residuals(regr.model)
 tidyverse
 ggh4x Retrieve residuals Radj2 = 0.90 Radj2= 0.08
Less variation, P << 0.001 Lot of variation, P > 0.05
shapiro.test(regr.model$residuals)
Test for normality

daphnia$predicted = predict(regr.model)
Size

Size

Add predicted values to the dataset

ggplot(data = daphnia, mapping = aes(x = pred_dens)) +


geom_point(aes(y = predicted, colour = "predicted")) +
geom_line(aes(y = predicted, colour = "predicted")) +
geom_point(aes(y = size, colour = "observations")) +
labs(x = "Predator density", y = "Daphnia size") Pred dens Pred dens
Make plot of observations and predictions

47 48

8
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

What variation can be explained?


Strength of relation

Size

Size
Lot of variation: Radj2 = 0.08 Lot of variation: Radj2= 0.08
Small sample size: P > 0.05 Large sample size: P < 0.01 Pred dens Pred dens

Size
Size

Pred dens Pred dens How precisely have we estimated the


response?

49 50

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Effect size?
Size

Size

Radj2 = 0.90 Radj2= 0.90


P < 0.05 P < 0.05
Pred dens Pred dens

How precisely have we estimated the response?


95% confidence interval for b1:
regression coefficient b1 ± t0.05(df) x standard error b1

Size
Size

 df = degrees of freedom of the residuals


 t0.05(df) from t-table
Pred dens Pred dens
Confidence interval for regression coefficients Large effect Small effect

51 52

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics

1. Correlation
2. Linear regression
 Basics of regression
 Linear model
 Calculations
 Assumptions
 Regression compared to …

53 54

9
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Model selection
Multiple regression

Zero-inflated models
Assumptions regression

Today
Normal distribution (residuals)

Linearity: dependent variable is linear combination of regression


coefficients and independent variable

Independence of errors

55 56

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Residuals should be Assumptions regression


random

 Test by eye Normal distribution (residuals)

Linearity: dependent variable is linear combination of regression


coefficients and independent variable

Independence of errors

Homogeneity of variances: homoscedasticity

If needed: transform dependent variable

57 58

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons
Size

Pred density Pred density

 Test by eye

 Transformation?

Homoscedasticity vs. Heteroscedasticity

59 60

10
EM Regression 1 2022

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Today’s topics Correlation versus Regression

1. Correlation  Different interpretation of coefficient:  When r = 0, no relation. The same for b1 = 0

correlation coefficient r between –1 and 1

2. Linear regression regression coefficient b between – ∞and ∞


Correlation: causality unknown between X1 and X2
 Basics of regression  Units:

correlation coefficient has no units Regression: X has effect on Y (causality assumed)


 Linear model regression coefficient has units & meaning

 Calculations  Different shape of relationship in regression

 Assumptions
 Regression compared to …

61 62

Introduction Correlation Regression Calculations Assumptions Comparisons Introduction Correlation Regression Calculations Assumptions Comparisons

Groups and Trends General Linear Model GLM


𝑌 =𝐵×𝑋+𝜖
ANOVA Regression
... for comparing different groups ... for trend analysis (B = coefficients)
 for example: measurements of plant  for example: measurements of plant
biomass in different climate types biomass over a gradient of rainfall
 GLM incorporates a number of different statistical
models: ANOVA, ANCOVA, linear regression, t-test
 ANCOVA: combination of both groups (factor) and
trends (covariate)

Humid Arid
Rainfall

63 64

Tomorrow
General linear model: errors follow normal
distribution

Generalized linear model: if errors do not


follow normal distribution

65 66

11

You might also like