Research Methodology II
Module 3
Distribution
Distribution refers to the way in which the values of a variable
or dataset are spread or arranged. It describes how data points
are distributed across a range of values and helps in
understanding the frequency or probability of different
outcomes.
Types of Distribution
• Normal Distribution: A bell-shaped curve where most data points cluster
around the mean, with symmetrical tails on both sides.
• Skewed Distribution: When data points are not evenly distributed (e.g.,
more values concentrated on one side).
Positive Skew: Tail on the right side.
Negative Skew: Tail on the left side.
• In a normal distribution of students' test scores, most scores will be
around the class average (mean), with fewer students scoring extremely high
or low.
Discrete Distribution
• A discrete distribution refers to a probability distribution that describes the
likelihood of outcomes for a discrete random variable—one that can take on a finite
or countable set of distinct values. Eg - 1, 2, 3, yes, no, true, or false.
• Discrete distributions help model situations where variables can only take specific,
countable values. They are widely used in probability theory and real-world
scenarios to analyze events such as successes in trials, counts of events, and
outcomes in sampling processes.
Example of Discrete Distribution in Real Life
• Tossing a Coin:
When you flip a coin multiple times, the number of heads or tails follows a
binomial distribution.
• Each toss can result in one of two outcomes (head or tail), and the total
number of heads in multiple flips is a discrete variable. For example, if you
toss the coin 10 times, the number of heads could be any value between 0
and 10, but only whole numbers are possible (like 3, 5, or 7 heads).
Continuous Distribution in Statistics
• A continuous distribution refers to a probability distribution where the
variable can take any value within a given range.
• Unlike discrete distributions, which deal with specific, countable values,
continuous distributions involve measurements (like time, height, or weight)
that can take an infinite number of possible values within an interval.
• Continuous distributions are essential in real-life scenarios involving
measurements, such as height, temperature, and time. They provide insights
into data patterns and allow to calculate probabilities for ranges.
• Discrete Distributions in SPSS
• Discrete distributions involve countable outcomes, such as frequencies or success
counts (e.g., binomial, Poisson).
Here’s how to analyze them:
1. Binomial Distribution
Example: Number of heads in 10 coin flips (success/failure).
Steps in SPSS:
• Go to Transform > Compute Variable.
• Use the function RV.BINOM(n, prob), where:
• n: Number of trials
• prob: Probability of success
• Click OK to generate random binomial values in a new variable.
2. Poisson Distribution
• Example: Number of emails received in an hour.
Steps in SPSS:
• Go to Transform > Compute Variable.
• Use RV.POISSON(lambda), where lambda is the expected number of occurrences.
• Click OK to generate a Poisson-distributed variable.
3. Frequency Analysis (for Discrete Data)
• Go to Analyze > Descriptive Statistics > Frequencies.
• Select your categorical (discrete) variable and click OK.
• SPSS will generate frequency tables showing the distribution of the variable’s values.
Continuous Distributions in SPSS
• Continuous distributions involve measurable outcomes, such as time, weight, or height.
Below are the key distributions and their analysis in SPSS.
1. Normal Distribution
• Steps:
• Go to Analyze > Descriptive Statistics > Descriptives.
• Select your continuous variable (e.g., height or test scores).
• Click Options and choose Mean, Std. Deviation, and Skewness/Kurtosis to check normality.
• Additionally, go to Analyze > Descriptive Statistics > Explore to generate histograms and Q-Q plots.
Testing for Normality:
• Use Analyze > Descriptive Statistics > Explore.
• Under Plots, select Normality plots with tests to run the Shapiro-Wilk or Kolmogorov-Smirnov test.
Substantive Research
• Substantive research refers to the exploration and investigation of underlying
concepts, theories, and real-world phenomena in order to develop a deeper
understanding of specific issues or topics. Rather than focusing solely on
statistical methods, it emphasizes the content, meaning, and interpretation of
data within a particular field of study.
• The primary aim is to generate theoretical insights, contribute to the
development of concepts, or apply knowledge to real-world contexts.
• Example - A study exploring the impact of classroom culture on student
motivation. It may use interviews with teachers and students to interpret how
specific practices affect behavior and learning outcomes.
• Substantive research focuses on understanding the meaning and context
behind phenomena, emphasizing conceptual development and real-world
applications. It offers valuable insights that go beyond numbers, contributing
to theoretical progress and practical solutions in various fields.
Difference between Statistical and Substantive Research
1-Tailed and 2-Tailed Tests in Hypothesis Testing
• In hypothesis testing, a 1-tailed test and a 2-tailed test determine the
direction of the relationship between variables. The key difference lies in
whether the researcher is looking for an effect in one specific direction (1-
tailed) or in either direction (2-tailed).
• A 1-tailed test is used when the hypothesis specifies a direction (e.g., increase
or decrease), whereas a 2-tailed test is appropriate when the hypothesis only
seeks to detect any difference, regardless of direction.
1-Tailed Test
A 1-tailed test checks whether a parameter is greater than or less than a certain value. It is
used when the researcher has a specific directional hypothesis.
• Null Hypothesis (H₀): There is no effect or no difference.
• Alternative Hypothesis (H₁): The effect is either positive or negative (but not both).
• Example:
• A company believes a new training program will increase employee productivity.
• H₀: The productivity after training ≤ productivity before training.
• H₁: The productivity after training > productivity before training.
• Significance Level: If we use α = 0.05, all 5% is assigned to one tail of the distribution.
• When to Use: When the direction of the effect is known (e.g., "increase" or "decrease").
2-Tailed Test
• A 2-tailed test checks whether a parameter is different (either higher or lower) from a certain
value. It is used when the researcher does not assume a specific direction of the effect.
• Null Hypothesis (H₀): There is no effect or difference.
• Alternative Hypothesis (H₁): There is some difference, either positive or negative.
• Example: A researcher wants to know if a new drug has any effect (either increase or decrease)
on blood pressure.
• H₀: The drug has no effect on blood pressure (mean difference = 0).
• H₁: The drug has some effect (mean ≠ 0).
• Significance Level: If α = 0.05, it is split into 2.5% in each tail of the distribution.
• When to Use: When the direction of the effect is unknown or when both directions are of
interest.
Type I and Type II Errors in Hypothesis Testing
• In hypothesis testing, Type I and Type II errors occur when incorrect
conclusions are drawn about the null hypothesis).
• These errors are related to the probability of incorrectly accepting or
rejecting the null hypothesis.
Type I Error (False Positive)
A Type I error occurs when the null hypothesis (H0H_0H0) is rejected even though it is true.
• Meaning: The test incorrectly concludes that there is an effect or difference when none exists. The
probability of a Type I error is denoted by α (significance level, usually 0.05).
• Example:
• A medical test incorrectly indicates that a healthy patient has a disease (false positive).
• Null Hypothesis: The patient is healthy.
• Type I Error: The test says the patient is sick.
• Consequences: Unnecessary interventions or actions (e.g., treating someone for a disease they don’t
have).
Type II Error (False Negative)
A Type II error occurs when the null hypothesis is not rejected even though it is false.
• Meaning: The test fails to detect an effect or difference that actually exists. The probability
of a Type II error is denoted by β.
• Example:
• A medical test fails to detect a disease in a sick patient (false negative).
• Null Hypothesis: The patient is healthy.
• Type II Error: The test says the patient is healthy when they are sick.
• Consequences: Missed opportunities to act or intervene (e.g., failing to treat a sick person).
Significance Level
The level of significance (α) is a threshold used to determine whether the null
hypothesis (H₀) should be rejected in hypothesis testing. It represents the
probability of making a Type I error, i.e., rejecting the null hypothesis when it is
actually true.
In other words, it is the cut-off point or critical value set by the researcher to
judge the strength of the evidence against the null hypothesis. It quantifies the
risk of concluding that there is an effect or difference when none exists (false
positive).
Common Levels of Significance
• 0.05 (5%): The most common threshold. There is a 5% chance of making a Type I
error.
• 0.01 (1%): Used when stricter criteria are needed, reducing the chance of a Type I
error to 1%.
• 0.10 (10%): Occasionally used in exploratory research where a higher tolerance for
error is acceptable.
0.05 is standard in many fields (social sciences, business) 0.01 is used in critical fields
like medicine or engineering, where the cost of a false positive is high.
Steps of Hypothesis Testing
• Hypothesis testing is a structured process used to determine whether there is
enough evidence to support or reject a hypothesis about a population
parameter. Below are the key steps involved in hypothesis testing.
1. State the Null and Alternative Hypotheses (H₀ and H₁)
• Null Hypothesis (H₀): Assumes there is no effect or no difference. It
represents the status quo or what is assumed to be true.
• Alternative Hypothesis (H₁): Assumes there is an effect or a difference.
It is what the researcher seeks to prove.
• Example:
H₀: There is no difference in the average test scores of two teaching
methods.
H₁: There is a difference in the average test scores of two teaching methods.
2. Choose the Significance Level (α)
• The significance level (α) is the threshold for rejecting the null hypothesis,
typically set at 0.05 or 5%.
• It reflects the probability of making a Type I error (false positive).
3. Select the Appropriate Test
• Choose the statistical test based on the type of data and hypothesis.
Common tests include:
• t-test (for comparing means)
• Chi-square test (for categorical data)
• ANOVA (for comparing more than two groups)
• Z-test (for large samples with known population variance)
4. Formulate the Decision Rule and Identify the Critical Value
• Based on the selected test and α, identify the critical value (cut-off point)
from the relevant statistical distribution (e.g., t-distribution, normal
distribution).
• Decision Rule:
• If the test statistic exceeds the critical value, reject H₀.
• If not, fail to reject H₀.
5. Collect Data and Compute the Test Statistic
• Gather the required data and calculate the test statistic (e.g., t-statistic, z-
score). This value reflects how far the observed result is from the expected
result under H₀.
• Software like SPSS, R, or Excel can be used to compute the statistic.
6. Calculate the p-value
• The p-value is the probability of obtaining the observed result (or more
extreme) assuming the null hypothesis is true.
• If p ≤ α, reject the null hypothesis (H₀).
• If p > α, fail to reject the null hypothesis.
7. Make the Decision and Interpret Result
• Reject H₀ if the test statistic is beyond the critical value or the p-value is less than α.
• Fail to reject H₀ if the test statistic is within the critical value or the p-value is greater than α.
Interpretation
Explain what the statistical decision means in the context of the research problem.
Example: If H₀ is rejected, conclude that the new teaching method significantly affects student
performance.