KEMBAR78
Corelation and Regression | PDF | P Value | Applied Mathematics
0% found this document useful (0 votes)
15 views15 pages

Corelation and Regression

The document outlines the assumptions necessary for conducting correlation analysis, including linearity, continuous variables, normality, homoscedasticity, and the absence of significant outliers. It also describes different types of correlation, statistical methods, and the interpretation of p-values, as well as regression analysis techniques and their associated metrics. Key concepts such as R, R-squared, F-statistic, and the importance of residuals in regression analysis are also discussed.

Uploaded by

waheeda.anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Corelation and Regression

The document outlines the assumptions necessary for conducting correlation analysis, including linearity, continuous variables, normality, homoscedasticity, and the absence of significant outliers. It also describes different types of correlation, statistical methods, and the interpretation of p-values, as well as regression analysis techniques and their associated metrics. Key concepts such as R, R-squared, F-statistic, and the importance of residuals in regression analysis are also discussed.

Uploaded by

waheeda.anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Assumptions of Correlation

Before conducting a correlation analysis, certain assumptions should be met:

1. Linearity:

There must be a linear relationship between the two variables (i.e., a straight-line pattern

in a scatter plot). A straight-line relationship between two variables. When one

variable increases, the other increases or decreases at a constant rate—that’s

linearity.

2. Continuous Variables:

Correlation typically requires interval or ratio-level data (not ordinal or nominal).

3. Normality:

For Pearson correlation, both variables should be approximately normally distributed.

4. Homoscedasticity:

The variability in scores for one variable should be similar at all values of the other.

Homoscedasticity means that the spread (or variance) of the dependent variable is

roughly the same at all levels of the independent variable.

5. No Significant Outliers:

Extreme values can distort the correlation coefficient.

Types of Correlation
1. Based on Direction:

 Positive Correlation:

As one variable increases, the other also increases.

(e.g., height and weight)

 Negative Correlation:

As one variable increases, the other decreases.

(e.g., stress level and sleep duration)

 Zero/No Correlation:

No predictable relationship between the variables.

(e.g., shoe size and intelligence)

2. Based on Statistical Method:

 Pearson’s Correlation (r):

Measures linear correlation between two continuous, normally distributed variables.

 Spearman’s Rank Correlation (ρ):

Non-parametric measure; used when variables are ordinal or not normally distributed.

Non-parametric tests do not assume that the data follows a normal distribution or that

the variables have equal variances.


The strength of a correlation (relationship) is often described using words like weak,

moderate, strong, or very strong/excellent, depending on the correlation coefficient

value (r).

 Here’s a commonly used guideline (based on Cohen, 1988) for Pearson’s r and similar

coefficients:

Correlation (r) Strength of Relationship

0.00 – 0.10 Negligible / No correlation

0.10 – 0.30 Weak

0.30 – 0.50 Moderate

0.50 – 0.70 Strong

0.70 – 0.90 Very Strong / Excellent

0.90 – 1.00 Near-perfect / Perfect

Example Interpretations:

 r = .28: “There is a weak positive correlation between anxiety and screen time.”

 r = –.52: “There is a strong negative correlation between sleep quality and stress.”

 r = .87: “There is a very strong positive correlation between height and weight.”

Understanding p-values

The p-value tells you the probability of getting your results by chance, assuming the null

hypothesis is true.
🔹 Meaning of Specific p-values

P value Interpretation

There’s a 5% chance the results are due to random chance. This is the standard

.05 cutoff for statistical significance. If p ≤ .05, the result is considered statistically

significant.

Only a 0.1% chance the result is due to chance — highly significant. Strong
.001
evidence against the null hypothesis.

Often appears in SPSS output. It does not mean absolute zero, but that p < .001.

.000 This means the result is extremely significant — very strong evidence against the

null.

🔹 APA Style Reporting:

 Report exact p-values if available (e.g., p = .004).

 If SPSS shows .000, report as: p < .001.

 Do not write p = .000 in APA style.

🔹 Example:

"A Pearson correlation showed a strong, significant relationship between resilience and well-

being, r(98) = .68, p < .001."

Pearson Correlation – Example Interpretation according to APA


 A Pearson correlation was conducted to assess the relationship between stress levels and

sleep duration.

There was a significant negative correlation, r (98) = –.45, p < .001, indicating that

higher stress levels were associated with shorter sleep duration.

The effect size was moderate, suggesting a meaningful relationship.

 Spearman Correlation – Example Interpretation according to APA

 A Spearman’s rank-order correlation showed a significant positive association between

rank in class and hours studied per week, rs (48) = .62, p < .01.

This suggests that students who studied more tended to have higher class ranks.
What is Regression Analysis?

Regression analysis is a predictive modeling technique used to examine the relationship

between:

 Independent variable(s) (predictors)

 Dependent variable (outcome)

It answers:

How much does Y change when X changes?

✅ 2. Types of Regression

Type Use Case

Simple Linear Regression Predicts one dependent variable from one independent variable.

Multiple Linear Regression Predicts one outcome from two or more predictors.

Logistic Regression Used when the dependent variable is binary (e.g., yes/no).

Hierarchical Regression Predictors are entered in blocks/steps based on theory.

Stepwise Regression Predictors are selected automatically based on statistical criteria.

For Simple & Multiple Linear Regression:

1. Linearity – The Relationship between predictors and outcome is linear.


2. Independence of Errors – Residuals (residuals are the differences between the observed

values (actual data points) and the predicted values (values predicted by the regression model)

are independent (Durbin-Watson test)

3. Residuals: regression analysis, residuals are the differences between the observed values

(actual data points) and the predicted values (values predicted by the regression model).

Why Residuals Are Important:

Residuals are essential for diagnosing the goodness of fit and validating model assumptions:

1. Linearity – Check if the relationship between predictors and the outcome is linear.
2. Homoscedasticity – Verify if residuals have constant variance across all levels of
predictors.
3. Normality of residuals – Residuals should ideally be normally distributed for reliable
inference.

What Should Residuals Look Like?

For a well-fitting regression model:

 The residuals should have random distribution (no clear pattern).


 They should be normally distributed with a mean of zero.
 Their variance should remain constant across the range of predicted values.
4. Homoscedasticity – Constant variance of errors across all levels of predictors.

5. Normality of Residuals – Errors should be normally distributed (check histogram or Q-

Q plot).

6. No Multicollinearity (for multiple regression) – Predictors shouldn’t be too highly

correlated (VIF < 10).


7. No significant outliers or influential points – Check Cook’s Distance.

Value Meaning

R Correlation between predicted and actual values.

Proportion of variance in the dependent variable explained by


R² (R-squared)
predictors.

Like R², but adjusts for the number of predictors (more accurate in
Adjusted R²
multiple regression).
Value Meaning

F-statistic Tests whether the overall regression model is significant.

p-value If p < .05, the model or predictor is statistically significant.

B (Unstandardized Beta) Change in DV for each one-unit change in IV.

β (Standardized Beta) Shows strength of each predictor’s effect in standardized terms.

t-value Tests the significance of each individual predictor.

VIF (Variance Inflation


Assesses multicollinearity; should be less than 10.
Factor)

1. R (Correlation Coefficient)

 Meaning:

o R represents the strength of the relationship between the predicted values and

the actual values.

o R values range from 0 to 1 (0 = no relationship, 1 = perfect relationship).

o For a simple linear regression, R is just the correlation between the predictor and

the outcome variable.

o In multiple regression, it’s the correlation between the observed and predicted

values.

✅ 2. R² (R-squared)

 Meaning:
o R² tells you the proportion of variance in the dependent variable (Y) that can be

explained by the independent variables (X).

o Values range from 0 to 1, where 0 means the model explains 0% of the variance,

and 1 means it explains 100%.

o For example:

R² = 0.45 means the model explains 45% of the variation in the dependent

variable.

✅ 3. Adjusted R²

 Meaning:

o Adjusted R² adjusts R² for the number of predictors in the model. It’s especially

useful for multiple regression, as adding more predictors always increases R²,

even if they don’t improve the model.

o Adjusted R² can be negative if the model is a poor fit.

where N is the sample size and k is the number of predictors.

✅ 4. F-statistic

 Meaning:
o The F-statistic tests whether the overall regression model is significant (i.e.,

whether the model explains a significant amount of variance in the dependent

variable).

o A higher F-statistic means the model is a better fit.

o The p-value for the F-statistic tests whether the overall model is significant (if p ≤

.05, the model is significant).

✅ 5. p-value

 Meaning:

o The p-value tests the significance of the relationship. It tells you the probability

that you would get the observed results if the null hypothesis were true (i.e., no

relationship).

o A p-value less than 0.05 typically means the result is statistically significant.

✅ 6. B (Unstandardized Beta)

 Meaning:

o The unstandardized beta (B) represents the actual change in the dependent

variable for each one-unit change in the independent variable.

o For example: If B = 2, it means that for every 1 unit increase in the predictor, the

dependent variable increases by 2 units.


✅ 7. β (Standardized Beta)

 Meaning:

o The standardized beta (β) represents the change in the dependent variable in

standard deviation units for a one-standard deviation change in the predictor

variable.

o It's useful for comparing the relative importance of predictors when the variables

are on different scales.

o Larger absolute values of β indicate more influence.

✅ 8. t-value

 Meaning:

o The t-value tests the significance of each individual predictor in the model.

o It’s the ratio of the estimated coefficient (B or β) to the standard error of that

coefficient.

o The higher the absolute t-value, the more significant the predictor.

✅ 9. VIF (Variance Inflation Factor)

 Meaning:
o The VIF measures how much the variance of a regression coefficient is inflated

due to multicollinearity (i.e., the predictors are highly correlated with each

other).

o If VIF > 10, it suggests significant multicollinearity.

o The higher the VIF, the more problematic the multicollinearity.

📝 APA Style Example Interpretation:

A multiple regression was conducted to predict academic performance based on study time,

stress level, and social support.

The overall model was significant, F(3, 96) = 8.67, p < .001, explaining 27% of the variance in

academic performance (R² = .27).

Study time was a significant positive predictor (B = 1.23, p < .001), while stress level was a

significant negative predictor (B = –0.45, p = .02). Social support did not significantly predict

academic performance (B = 0.12, p = .36).

Multicollinearity was not an issue, with VIF values all below

APA Interpretation Format (Example)

A multiple linear regression was conducted to examine whether self-esteem, social support,

and stress predicted well-being.

The overall model was significant, F(3, 96) = 18.45, p < .001, and explained 36% of the

variance in well-being, R² = .36.


Social support (β = .45, p < .001) and self-esteem (β = .33, p = .002) were significant positive

predictors, while stress was not significant (β = –.12, p = .09).

You might also like