Assumptions of Correlation
Before conducting a correlation analysis, certain assumptions should be met:
   1. Linearity:
       There must be a linear relationship between the two variables (i.e., a straight-line pattern
       in a scatter plot). A straight-line relationship between two variables. When one
       variable increases, the other increases or decreases at a constant rate—that’s
       linearity.
   2. Continuous Variables:
       Correlation typically requires interval or ratio-level data (not ordinal or nominal).
   3. Normality:
       For Pearson correlation, both variables should be approximately normally distributed.
   4. Homoscedasticity:
       The variability in scores for one variable should be similar at all values of the other.
       Homoscedasticity means that the spread (or variance) of the dependent variable is
       roughly the same at all levels of the independent variable.
   5. No Significant Outliers:
       Extreme values can distort the correlation coefficient.
Types of Correlation
1. Based on Direction:
      Positive Correlation:
       As one variable increases, the other also increases.
       (e.g., height and weight)
      Negative Correlation:
       As one variable increases, the other decreases.
       (e.g., stress level and sleep duration)
      Zero/No Correlation:
       No predictable relationship between the variables.
       (e.g., shoe size and intelligence)
2. Based on Statistical Method:
      Pearson’s Correlation (r):
       Measures linear correlation between two continuous, normally distributed variables.
      Spearman’s Rank Correlation (ρ):
       Non-parametric measure; used when variables are ordinal or not normally distributed.
       Non-parametric tests do not assume that the data follows a normal distribution or that
       the variables have equal variances.
       The strength of a correlation (relationship) is often described using words like weak,
       moderate, strong, or very strong/excellent, depending on the correlation coefficient
       value (r).
      Here’s a commonly used guideline (based on Cohen, 1988) for Pearson’s r and similar
       coefficients:
          Correlation (r)                             Strength of Relationship
0.00 – 0.10                         Negligible / No correlation
0.10 – 0.30                         Weak
0.30 – 0.50                         Moderate
0.50 – 0.70                         Strong
0.70 – 0.90                         Very Strong / Excellent
0.90 – 1.00                         Near-perfect / Perfect
Example Interpretations:
      r = .28: “There is a weak positive correlation between anxiety and screen time.”
      r = –.52: “There is a strong negative correlation between sleep quality and stress.”
      r = .87: “There is a very strong positive correlation between height and weight.”
Understanding p-values
The p-value tells you the probability of getting your results by chance, assuming the null
hypothesis is true.
🔹 Meaning of Specific p-values
  P value Interpretation
              There’s a 5% chance the results are due to random chance. This is the standard
.05           cutoff for statistical significance. If p ≤ .05, the result is considered statistically
              significant.
              Only a 0.1% chance the result is due to chance — highly significant. Strong
.001
              evidence against the null hypothesis.
              Often appears in SPSS output. It does not mean absolute zero, but that p < .001.
.000          This means the result is extremely significant — very strong evidence against the
              null.
🔹 APA Style Reporting:
         Report exact p-values if available (e.g., p = .004).
         If SPSS shows .000, report as: p < .001.
         Do not write p = .000 in APA style.
🔹 Example:
"A Pearson correlation showed a strong, significant relationship between resilience and well-
being, r(98) = .68, p < .001."
          Pearson Correlation – Example Interpretation according to APA
   A Pearson correlation was conducted to assess the relationship between stress levels and
    sleep duration.
    There was a significant negative correlation, r (98) = –.45, p < .001, indicating that
    higher stress levels were associated with shorter sleep duration.
    The effect size was moderate, suggesting a meaningful relationship.
   Spearman Correlation – Example Interpretation according to APA
   A Spearman’s rank-order correlation showed a significant positive association between
    rank in class and hours studied per week, rs (48) = .62, p < .01.
    This suggests that students who studied more tended to have higher class ranks.
              What is Regression Analysis?
Regression analysis is a predictive modeling technique used to examine the relationship
between:
      Independent variable(s) (predictors)
      Dependent variable (outcome)
It answers:
How much does Y change when X changes?
✅ 2. Types of Regression
              Type                                        Use Case
Simple Linear Regression     Predicts one dependent variable from one independent variable.
Multiple Linear Regression Predicts one outcome from two or more predictors.
Logistic Regression          Used when the dependent variable is binary (e.g., yes/no).
Hierarchical Regression      Predictors are entered in blocks/steps based on theory.
Stepwise Regression          Predictors are selected automatically based on statistical criteria.
For Simple & Multiple Linear Regression:
   1. Linearity – The Relationship between predictors and outcome is linear.
   2. Independence of Errors – Residuals (residuals are the differences between the observed
       values (actual data points) and the predicted values (values predicted by the regression model)
       are independent (Durbin-Watson test)
   3. Residuals: regression analysis, residuals are the differences between the observed values
       (actual data points) and the predicted values (values predicted by the regression model).
Why Residuals Are Important:
Residuals are essential for diagnosing the goodness of fit and validating model assumptions:
   1. Linearity – Check if the relationship between predictors and the outcome is linear.
   2. Homoscedasticity – Verify if residuals have constant variance across all levels of
      predictors.
   3. Normality of residuals – Residuals should ideally be normally distributed for reliable
      inference.
What Should Residuals Look Like?
For a well-fitting regression model:
      The residuals should have random distribution (no clear pattern).
      They should be normally distributed with a mean of zero.
      Their variance should remain constant across the range of predicted values.
4. Homoscedasticity – Constant variance of errors across all levels of predictors.
5. Normality of Residuals – Errors should be normally distributed (check histogram or Q-
   Q plot).
6. No Multicollinearity (for multiple regression) – Predictors shouldn’t be too highly
   correlated (VIF < 10).
    7. No significant outliers or influential points – Check Cook’s Distance.
           Value                                          Meaning
R                            Correlation between predicted and actual values.
                             Proportion of variance in the dependent variable explained by
R² (R-squared)
                             predictors.
                             Like R², but adjusts for the number of predictors (more accurate in
Adjusted R²
                             multiple regression).
              Value                                             Meaning
F-statistic                      Tests whether the overall regression model is significant.
p-value                          If p < .05, the model or predictor is statistically significant.
B (Unstandardized Beta)          Change in DV for each one-unit change in IV.
β (Standardized Beta)            Shows strength of each predictor’s effect in standardized terms.
t-value                          Tests the significance of each individual predictor.
VIF (Variance Inflation
                                 Assesses multicollinearity; should be less than 10.
Factor)
1. R (Correlation Coefficient)
      Meaning:
              o   R represents the strength of the relationship between the predicted values and
                  the actual values.
              o   R values range from 0 to 1 (0 = no relationship, 1 = perfect relationship).
              o   For a simple linear regression, R is just the correlation between the predictor and
                  the outcome variable.
              o   In multiple regression, it’s the correlation between the observed and predicted
                  values.
✅ 2. R² (R-squared)
      Meaning:
           o   R² tells you the proportion of variance in the dependent variable (Y) that can be
               explained by the independent variables (X).
           o   Values range from 0 to 1, where 0 means the model explains 0% of the variance,
               and 1 means it explains 100%.
           o   For example:
               R² = 0.45 means the model explains 45% of the variation in the dependent
               variable.
✅ 3. Adjusted R²
      Meaning:
           o   Adjusted R² adjusts R² for the number of predictors in the model. It’s especially
               useful for multiple regression, as adding more predictors always increases R²,
               even if they don’t improve the model.
           o   Adjusted R² can be negative if the model is a poor fit.
       where N is the sample size and k is the number of predictors.
✅ 4. F-statistic
      Meaning:
          o    The F-statistic tests whether the overall regression model is significant (i.e.,
               whether the model explains a significant amount of variance in the dependent
               variable).
          o    A higher F-statistic means the model is a better fit.
          o    The p-value for the F-statistic tests whether the overall model is significant (if p ≤
               .05, the model is significant).
✅ 5. p-value
      Meaning:
          o    The p-value tests the significance of the relationship. It tells you the probability
               that you would get the observed results if the null hypothesis were true (i.e., no
               relationship).
          o    A p-value less than 0.05 typically means the result is statistically significant.
✅ 6. B (Unstandardized Beta)
      Meaning:
          o    The unstandardized beta (B) represents the actual change in the dependent
               variable for each one-unit change in the independent variable.
          o    For example: If B = 2, it means that for every 1 unit increase in the predictor, the
               dependent variable increases by 2 units.
✅ 7. β (Standardized Beta)
      Meaning:
           o   The standardized beta (β) represents the change in the dependent variable in
               standard deviation units for a one-standard deviation change in the predictor
               variable.
           o   It's useful for comparing the relative importance of predictors when the variables
               are on different scales.
           o   Larger absolute values of β indicate more influence.
✅ 8. t-value
      Meaning:
           o   The t-value tests the significance of each individual predictor in the model.
           o   It’s the ratio of the estimated coefficient (B or β) to the standard error of that
               coefficient.
           o   The higher the absolute t-value, the more significant the predictor.
✅ 9. VIF (Variance Inflation Factor)
      Meaning:
           o   The VIF measures how much the variance of a regression coefficient is inflated
               due to multicollinearity (i.e., the predictors are highly correlated with each
               other).
           o   If VIF > 10, it suggests significant multicollinearity.
           o   The higher the VIF, the more problematic the multicollinearity.
📝 APA Style Example Interpretation:
A multiple regression was conducted to predict academic performance based on study time,
stress level, and social support.
The overall model was significant, F(3, 96) = 8.67, p < .001, explaining 27% of the variance in
academic performance (R² = .27).
Study time was a significant positive predictor (B = 1.23, p < .001), while stress level was a
significant negative predictor (B = –0.45, p = .02). Social support did not significantly predict
academic performance (B = 0.12, p = .36).
Multicollinearity was not an issue, with VIF values all below
APA Interpretation Format (Example)
A multiple linear regression was conducted to examine whether self-esteem, social support,
and stress predicted well-being.
The overall model was significant, F(3, 96) = 18.45, p < .001, and explained 36% of the
variance in well-being, R² = .36.
Social support (β = .45, p < .001) and self-esteem (β = .33, p = .002) were significant positive
predictors, while stress was not significant (β = –.12, p = .09).