DIMS
KMBN 104: Business Statistics and Analytics
Unit-5
Hypothesis Testing
Hypothesis: Once you have developed a clear and focused research question or set of
research questions, you‘ll be ready to conduct further research, a literature review, on
the topic to help you make an educated guess about the answer to your question(s).
This educated guess is called a hypothesis.
In research, there are two types of hypotheses: null and alternative. They work as a
complementary pair, each stating that the other is wrong.
Null Hypothesis (H0) – This can be thought of as the implied hypothesis. ―Null‖
meaning ―nothing.‖ This hypothesis states that there is no difference between
groups or no relationship between variables. The null hypothesis is a
presumption of status quo or no change.
Alternative Hypothesis (Ha) – This is also known as the claim. This hypothesis
should state what you expect the data to show, based on your research on the
topic. This is your answer to your research question.
Examples:
Null Hypothesis:
1. H0: There is no difference in the salary of factory workers based on gender.
2. H0: There is no relationship between height and shoe size.
3. H0: Experience on the job has no impact on the quality of a brick mason‘s work.
Alternative Hypothesis:
1. Ha: Male factory workers have a higher salary than female factory workers.
2. Ha: There is a positive relationship between height and shoe size.
3. Ha: The quality of a brick mason‘s work is influenced by on-the-job experience.
In statistics, a Type I error is a false positive conclusion, while a Type II error is a false
negative conclusion.
Making a statistical decision always involves uncertainties, so the risks of making these
errors are unavoidable in hypothesis testing.
The probability of making a Type I error is the significance level, or alpha (α), while the
probability of making a Type II error is beta (β). These risks can be minimized through
careful planning in your study design.
Example: Type I vs. Type II error you decide to get tested for COVID-19 based on mild
symptoms. There are two errors that could potentially occur:
Type I error (false positive): the test result says you have coronavirus, but you actually
don’t.
Type II error (false negative): the test result says you don’t have coronavirus, but you
actually do.
Error in statistical decision-making
Using hypothesis testing, you can make decisions about whether your data support or
refute your research predictions with null and alternative hypotheses.
Hypothesis testing starts with the assumption of no difference between groups or no
relationship between variables in the population—this is the null
hypothesis. It‘s always paired with an alternative hypothesis, which is your research
prediction of an actual difference between groups or a true relationship
between variables.
Example: Null and alternative hypothesis you test whether a new drug intervention can
alleviate symptoms of an autoimmune disease.
In this case:
The null hypothesis (H0) is that the new drug has no effect on symptoms of the
disease.
The alternative hypothesis (H1) is that the drug is effective for alleviating
symptoms of the disease.
Then, you decide whether the null hypothesis can be rejected based on your data and
the results of a statistical test. Since these decisions are based on probabilities, there is
always a risk of making the wrong conclusion.
If your results show statistical significance, that means they are very unlikely to
occur if the null hypothesis is true. In this case, you would reject your null
hypothesis. But sometimes, this may actually be a Type I error.
If your findings do not show statistical significance, they have a high chance of
occurring if the null hypothesis is true. Therefore, you fail to reject your null
hypothesis. But sometimes, this may be a Type II error.
Example: Type I and Type II errors
Type I error happens when you get false positive results: you conclude that
the drug intervention improved symptoms when it actually didn‘t. These
improvements could have arisen from other random factors or measurement
errors.
A Type II error happens when you get false negative results: you conclude that
the drug intervention didn‘t improve symptoms when it actually did. Your study
may have missed key indicators of improvements or attributed any improvements
to other factors instead.
Type I error
A Type I error means rejecting the null hypothesis when it‘s actually true. It means
concluding that results are statistically significant when, in reality, they came about
purely by chance or because of unrelated factors.
The risk of committing this error is the significance level (alpha or α) you choose. That‘s
a value that you set at the beginning of your study to assess the statistical probability of
obtaining your results (p value).
The significance level is usually set at 0.05 or 5%. This means that your results only
have a 5% chance of occurring, or less, if the null hypothesis is actually true.
If the p value of your test is lower than the significance level, it means your results are
statistically significant and consistent with the alternative hypothesis. If your p value is
higher than the significance level, then your results are considered statistically non-
significant.
Example: Statistical significance and Type I error in your clinical study, you compare
the symptoms of patients who received the new drug intervention or a control treatment.
Using a t test, you obtain a p value of .035. This p value is lower than your alpha of .05,
so you consider your results statistically significant and reject the null hypothesis.
However, the p value means that there is a 3.5% chance of your results occurring if the
null hypothesis is true. Therefore, there is still a risk of making a Type I error.
To reduce the Type I error probability, you can simply set a lower significance level.
Type II error
A Type II error means not rejecting the null hypothesis when it‘s actually false. This is
not quite the same as ―accepting‖ the null hypothesis, because hypothesis testing can
only tell you whether to reject the null hypothesis.
Instead, a Type II error means failing to conclude there was an effect when there
actually was. In reality, your study may not have had enough statistical power to
detect an effect of a certain size.
Power is the extent to which a test can correctly detect a real effect when there is one.
A power level of 80% or higher is usually considered acceptable.
The risk of a Type II error is inversely related to the statistical power of a study. The
higher the
Statistical power, the lower the probability of making a Type II error.
Example: Statistical power and Type II error when preparing your clinical study, you
complete a power analysis and determine that with your sample size, you have an 80%
chance of detecting an effect size of 20% or greater. An effect size of 20% means that
the drug intervention reduces symptoms by 20% more than the control treatment.
However, a Type II may occur if an effect that‘s smaller than this size. A smaller effect
size is unlikely to be detected in your study due to inadequate statistical power.
Statistical power is determined by:
Size of the effect: Larger effects are more easily detected.
Measurement error: Systematic and random errors in recorded data reduce
power.
Sample size: Larger samples reduce sampling error and increase power.
Significance level: Increasing the significance level increases power.
To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the
significance level.
Trade-off between Type I and Type II errors
The Type I and Type II error rates influence each other. That‘s because the significance
level (the Type I error rate) affects statistical power, which is inversely related to the
Type II error rate.
This means there‘s an important tradeoff between Type I and Type II errors:
Setting a lower significance level decreases a Type I error risk, but increases a
Type II error risk.
Increasing the power of a test decreases a Type II error risk, but increases a
Type I error risk.
This trade-off is visualized in the graph below. It shows two curves:
The null hypothesis distribution shows all possible results you‘d obtain if the
null hypothesis is true. The correct conclusion for any point on this distribution
means not rejecting the null hypothesis.
The alternative hypothesis distribution shows all possible results you‘d obtain
if the alternative hypothesis is true. The correct conclusion for any point on this
distribution means rejecting the null hypothesis.
Type I and Type II errors occur where these two distributions overlap. The blue
shaded area represents alpha, the Type I error rate, and the green shaded area
represents beta, the Type II error rate.
By setting the Type I error rate, you indirectly influence the size of the Type II
error rate as well.
It‘s important to strike a balance between the risks of making Type I and Type II
errors. Reducing the alpha always comes at the cost of increasing beta, and vice
versa.
A faster, more affordable way to improve your paper
Testing of Hypothesis: Large Sample Tests, Small Sample test, (t, F, Z Test and
Chi Square Test)
Large-Sample Test of Hypothesis about a Population Mean
Assumptions:
1. sample is randomly selected
2. sample is large (n > 30) CLM applies
3. If is unknown, we can use sample standard deviation s as estimate for.
Goal: Identify a sample result that is significantly different from the claimed value; in this
case, is our sample mean statistically different from the claimed null hypothesis mean?
Large-Sample Test of Hypothesis about a Population Mean Step by Step
1. Identify the null hypothesis (specific claim to be tested)
H0: µ = µ0
2. Identify the alternative hypothesis that must be true when the original claim is
false.
One-tailed test two-tailed test
Ha: µ > µ0 or Ha: µ µ0
(Or, Ha: µ < µ0)
3. Calculate the test statistic:
√
̅
4. Select the significant level based on the seriousness of a type I error. The
values of 0.05 and 0.01 are very common.
5. Determine the critical values and the critical region. Draw a graph and include the
test statistic, critical value(s), and critical (rejection) region.
6. Reject H0 if the test statistic is in the critical region. Fail to reject H0 if the test
statistic is not in the critical region.
7. Restate this decision in simple, non-technical terms.
Large-Sample Test of Hypothesis about a Population Mean
Example: Given a data set of 106 healthy body temperatures, where the mean was
98.20 and s = 0.620, at the 0.05 significance level, test the claim that the mean body
temperature of all healthy adults is equal to 98.60. (Example taken from Triola, Chapter
7, Elementary Statistics, Eighth Ed. )
Steps 1 and 2: Identify the hypotheses.
H0: µ = 98.6
Ha: µ 98.6
Step 3: Calculate the test statistic:
√
̅
√
6.64
Z = -6.64
Step 4: Select the significance level. This is given in our example to be 0.05.
H0: µ = 98.6
Ha: µ 98.6
Z = -6.64
α 0.05
Step 5: Two tailed test – split the area between the two tails
α = 0.05
α/2= 0.025
Use the normal table to find 1.96 as we did for the confidence intervals
Step 6: Reject H0 since the test statistic z falls in the critical region in the left tail.
Step 7: Restate this decision in simple, nontechnical terms:
There is sufficient evidence to warrant rejection of claim that the mean body
temperatures of healthy adults are equal to 98.60.
Student’s t-Distribution (Test of Significance for Small Samples)
If the sample size is less than 30 i.e., n < 30, the sample may be regarded as small
sample. The greatest contribution to the theory of small samples was made by Sir
William Gosset and R.A. Fisher. Gosset published his discovery in 1905 under the pen
name ‗students‘ and it is popularly known as t-test or students‘ t-distribution or students‘
distribution. The following are some important applications of the t-distribution:
1. Test of Hypothesis about the population mean; The formula is
̅ √∑ ̅
√ , Where
Illustration 1: For a random sample of size 20 from a normal population, the mean is
12.1 and the standard deviation is 3.2. Is it reasonable to suppose that the population
mean is 14.5? Test at 5% significance level.
(Given t0.05 at 19 d.f= 1.729)
Solution: Here in this problem Degree of Freedom (v) is required to compute:
v = n-1, 20-1=19
Let us take the null hypothesis that there is no significant difference between the sample
mean and population mean.
H0= 𝑥 = 0
𝑥̅
√
= 12.1; = 14.5; S=3.2; n= 20
t = [(12.1 14.5)√20]/3.2
= -0.75 X 4.47 = 3.35
For 19 degree of freedom t0.05 =1.729
The calculated value of t is greater than its corresponding table value, hence the null
hypothesis is rejected. It can be concluded that there is significant difference between
the sample mean and population mean and it is not due to sampling fluctuation.
ii) Test of significance of the difference between two means (independent samples)
The formula is:
̅̅̅
𝑥 ̅̅̅
𝑥
√
1= Mean of the first sample
2= Mean of the second sample
1= Number of observations in the first sample
2= Number of observations in the second sample
S= Combined Standard deviation
The value of S is calculated by the following formula:
𝑥 ̅̅̅
𝑥 𝑥 𝑥
̅̅̅
√
Illustration 2: The heights of six randomly chosen soldiers are in inches: 76, 70, 68, 69,
69 and 69. Those of 6 randomly chosen sailors are 68, 64, 65, 69, 72, 70. Discuss in
the light of these data that whether soldiers are, on the average, taller than sailors. Use
t-test.
Solution:
Let us take the null hypothesis that there is no difference in height of soldiers and
sailors. Ho: 1= 2
Level of Significance: it can be taken 5% i.e. =0.05.
Applying t-test;
Height 1 𝑥 ̅̅̅
𝑥 𝑥 𝑥
̅̅̅ Height 2 𝑥 ̅̅̅
𝑥 𝑥 ̅̅̅
𝑥
76 6 36 68 0 0
70 0 0 64 -4 16
68 -2 4 65 -3 9
69 -1 1 69 1 1
68 -2 4 72 4 16
69 -1 1 70 2 4
∑ 10 408 47
1= 421/6= 70
2= 408/6= 68
̅̅̅
𝑥 ̅̅̅
𝑥
√
t = 1.59
And
𝑥 ̅̅̅
𝑥 𝑥 𝑥
̅̅̅
√
= 2.38
Types of Hypothesis Testing
Z Test
To determine whether a discovery or relationship is statistically significant, hypothesis
testing uses a z-test. It usually checks to see if two means are the same (the null
hypothesis). Only when the population standard deviation is known and the sample size
is 30 data points or more, can a z-test be applied.
T Test
A statistical test called a t-test is employed to compare the means of two groups. To
determine whether two groups differ or if a procedure or treatment affects the population
of interest, it is frequently used in hypothesis testing.
F Test
F test is a statistical test that is used in hypothesis testing to check whether the
variances of two populations or two samples are equal or not. In an f test, the data
follows an f distribution. This test uses the f statistic to compare two variances by
dividing them. An f test can either be one-tailed or two-tailed depending upon the
parameters of the problem.
The f value obtained after conducting an f test is used to perform the one-way ANOVA
(analysis of variance) test. In this article, we will learn more about an f test, the f
statistic, its critical value, formula and how to conduct an f test for hypothesis testing.
Chi-Square
You utilize a Chi-square test for hypothesis testing concerning whether your data is as
predicted. To determine if the expected and observed results are well-fitted, the Chi-
square test analyzes the differences between categorical variables from a random
sample. The test's fundamental premise is that the observed values in your data should
be compared to the predicted values that would be present if the null hypothesis were
true.
The Chi-Square test is a statistical procedure for determining the difference between observed
and expected data. This test can also be used to determine whether it correlates to the
categorical variables in our data. It helps to find out whether a difference between two
categorical variables is due to chance or a relationship between them.
Chi-Square Test Definition
A chi-square test is a statistical test that is used to compare observed and expected
results. The goal of this test is to identify whether a disparity between actual and
predicted data is due to chance or to a link between the variables under consideration.
As a result, the chi-square test is an ideal choice for aiding in our understanding and
interpretation of the connection between our two categorical variables.
A chi-square test or comparable nonparametric test is required to test a hypothesis
regarding the distribution of a categorical variable. Categorical variables, which indicate
categories such as animals or countries, can be nominal or ordinal. They cannot have a
normal distribution since they can only have a few particular values.
For example, a meal delivery firm in India wants to investigate the link between gender,
geography, and people's food preferences.
It is used to calculate the difference between two categorical variables, which are:
As a result of chance or
Because of the relationship
Formula For Chi-Square Test
Where
c = Degrees of freedom
O = Observed Value
E = Expected Value
The degrees of freedom in a statistical calculation represent the number of variables
that can vary in a calculation. The degrees of freedom can be calculated to ensure that
chi-square tests are statistically valid. These tests are frequently used to compare
observed data with data that would be expected to be obtained if a particular hypothesis
were true.
The Observed values are those you gather yourselves.
The expected values are the frequencies expected, based on the null hypothesis.
Fundamentals of Hypothesis Testing
Hypothesis testing is a technique for interpreting and drawing inferences about a
population based on sample data. It aids in determining which sample data best support
mutually exclusive population claims.
Null Hypothesis (H0) - The Null Hypothesis is the assumption that the event will not
occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.
H0 is the symbol for it, and it is pronounced H-naught.
Alternate Hypothesis(H1 or Ha) - The Alternate Hypothesis is the logical opposite of the
null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the
null hypothesis. H1 is the symbol for it.
Example
Let's say you want to know if gender has anything to do with political party preference.
You poll 440 voters in a simple random sample to find out which political party they
prefer. The results of the survey are shown in the table below:
To see if gender is linked to political party preference, perform a Chi-Square test of
independence using the steps below.
Step 1: Define the Hypothesis
H0: There is no link between gender and political party preference.
H1: There is a link between gender and political party preference.
Step 2: Calculate the Expected Values
Now you will calculate the expected frequency.
For example, the expected value for Male Republicans is:
Similarly, you can calculate the expected value for each of the cells.
Step 3: Calculate (O-E)2 / E for Each Cell in the Table
Now you will calculate the (O - E)2 / E for each cell in the table.
Where
O = Observed Value
E = Expected Value
Step 4: Calculate the Test Statistic X2
X2 is the sum of all the values in the last table
= 0.743 + 2.05 + 2.33 + 3.33 + 0.384 + 1
= 9.837
Before you can conclude, you must first determine the critical statistic, which requires
determining our degrees of freedom. The degrees of freedom in this case are equal to
the table's number of columns minus one multiplied by the table's number of rows minus
one, or (r-1) (c-1). We have (3-1) (2-1) = 2.
Finally, you compare our obtained statistic to the critical statistic found in the chi-square
table. As you can see, for an alpha level of 0.05 and two degrees of freedom, the critical
statistic is 5.991, which is less than our obtained statistic of 9.83. You can reject our null
hypothesis because the critical statistic is higher than your obtained statistic.
This means you have sufficient evidence to say that there is an association between
gender and political party preference.
Question:
A survey on cars had conducted in 2011 and determined that 60% of car owners
have only one car, 28% have two cars, and 12% have three or more. Supposing that
you have decided to conduct your own survey and have collected the data below,
determine whether your data supports the results of the study.
Use a significance level of 0.05. Also, given that, out of 129 car owners, 73 had one car
and 38 had two cars.
Solution:
Let us state the null and alternative hypotheses.
H0: The proportion of car owners with one, two or three cars is 0.60, 0.28 and 0.12
respectively.
H1: The proportion of car owners with one, two or three cars does not match the
proposed model.
A Chi-Square goodness of fit test is appropriate because we are examining the
distribution of a single categorical variable.
Let‘s tabulate the given information and calculate the required values.
Observed (Oi) Expected (Ei) Oi – Ei (Oi – Ei)2 (Oi – Ei)2/Ei
One car 73 0.60 × 129 = 77.4 -4.4 19.36 0.2501
Two cars 38 0.28 × 129 = 36.1 1.9 3.61 0.1
Three or more cars 18 0.12 × 129 = 15.5 2.5 6.25 0.4032
Total 129 0.7533
2
Therefore, χ2 = ∑(Oi – Ei) /Ei = 0.7533
Let‘s compare it to the chi-square value for the significance level 0.05.
The degrees for freedom = 3 – 1 = 2
Using the table, the critical value for a 0.05 significance level with df = 2 is 5.99.
That means that 95 times out of 100, a survey that agrees with a sample will have a
χ2 value of 5.99 or less.
The Chi-square statistic is only 0.7533, so we will accept the null hypothesis.