EDA Module
EDA Module
University
University
it Nueva
of of
in Caceres
Nueva Caceres
College of
College ofEngineering
Engineeringandand
Architecture
Architecture
ELECTRONICS ENGINEERING
We champion
excellence
______________________________________
We nurture Dreams Name of Student
We do the right
things right
______________________________________
We are dynamic Instructor
and creative
cea@unc.edu.ph
054-4726100 loc. 121
COPYRIGHT © 2020
UNIVERSITY OF NUEVA CACERES COLLEGE OF ENGINEERING AND ARCHITECTURE
Contents
Course Details ............................................................................................................................................... 9
Week 1: Introduction to Engineering Data Analysis ................................................................................... 14
I. Introduction ....................................................................................................................................... 14
II. Objectives........................................................................................................................................... 14
III. The Engineering Method................................................................................................................... 14
IV. Methods of Data Collection ............................................................................................................... 16
V. Planning and Conducting Surveys....................................................................................................... 16
Conducting a Survey ........................................................................................................................... 17
Designing a Survey .............................................................................................................................. 17
Constructing a Survey: Sample Problems ........................................................................................... 18
VI. Planning and Conducting Experiments ....................................................................................... 19
Experimental Design ........................................................................................................................... 20
Determining the Parameters in an Experimental Study ..................................................................... 21
VII. Summary ..................................................................................................................................... 24
Week 2: Introduction to Probability ........................................................................................................... 25
I. Introduction .................................................................................................................................... 25
II. Objectives........................................................................................................................................ 25
III. Introduction to Probability ......................................................................................................... 25
IV. Basic Terminologies in Probability .............................................................................................. 26
V. Determining the Sample Space: Tree Diagram ............................................................................... 26
VI. Review on Combinatorics ........................................................................................................... 29
The Fundamental Counting Principle.................................................................................................. 29
Permutation ........................................................................................................................................ 30
Combination........................................................................................................................................ 31
VII. Three Basic Interpretations of Probability .................................................................................. 35
Classical Probability............................................................................................................................. 35
Empirical Probability ........................................................................................................................... 37
Subjective Probability ......................................................................................................................... 41
VIII. Summary ..................................................................................................................................... 42
IX. Exercises ...................................................................................................................................... 43
II. Objectives........................................................................................................................................ 81
III. Confidence Intervals ................................................................................................................... 81
IV. Confidence Interval for a Single Population Mean μ: σ Known .................................................. 83
Steps in Calculating the Confidence Interval ...................................................................................... 83
V. Confidence Interval for a Single Population Mean μ: σ Unknown ................................................. 94
Properties of the Student’s t-Distribution .......................................................................................... 94
Steps in Calculating the Confidence Interval ...................................................................................... 96
VI. Confidence Interval for a Population Proportion ..................................................................... 102
Steps in Calculating the Confidence Interval .................................................................................... 102
VII. Summary ................................................................................................................................... 107
VIII. Licenses and Attributions.......................................................................................................... 108
IX. Exercises .................................................................................................................................... 108
Week 11: Statistical Intervals .................................................................................................................... 110
I. Introduction .................................................................................................................................. 110
II. Objectives...................................................................................................................................... 110
III. Confidence Interval for Variance and Standard Deviation ....................................................... 110
The Chi-Square (Χ2) distribution........................................................................................................ 110
Steps in Calculating the Confidence Interval .................................................................................... 111
IV. Which Procedure Do I Use? ...................................................................................................... 119
V. Prediction Interval......................................................................................................................... 119
Steps in Calculating the Prediction Interval ...................................................................................... 119
VI. Tolerance Interval ..................................................................................................................... 125
Steps in Calculating the Tolerance Interval....................................................................................... 126
VII. Summary ................................................................................................................................... 132
VIII. Exercises .................................................................................................................................... 132
Week 12: Hypothesis Testing .................................................................................................................... 134
I. Introduction .................................................................................................................................. 134
II. Objectives...................................................................................................................................... 134
III. Introduction to Hypothesis Testing .......................................................................................... 134
Null and Alternative Hypotheses ...................................................................................................... 135
Deciding Whether to Reject the Null Hypothesis: One-Tailed and Two-Tailed Hypothesis Tests .... 140
Type I and Type II Errors ................................................................................................................... 142
A Courtroom Analogy for Hypothesis Tests ...................................................................................... 146
Course Details
Grading System:
Midterm Finals
Assessments Percentage Assessments Percentage
Quizzes 30% Quizzes 30%
Exercises 20% Exercises 20%
Examination 50% Examination 50%
Total 100% Total 100%
General Average 40% Midterm Ave + 60% Final Ave.
Required Gen Ave. to Pass 75%
Course Outline
Prelim Exam
1.Distinguish between the three
discrete probability distributions
2. Recognize the appropriate
discrete probability distribution for
Discrete any given problem
Solve
Week 5 Probability 3. Use the three probability
problems/exercises
Distribution distributions in solving statistical
problems
4. Calculate the mean and standard
deviation of a discrete probability
distribution
I. Introduction
In this module, we will examine the importance of data to engineering. We will also discuss the
phases of the engineering method. We will also examine the two common methods of collecting
engineering data: survey and experiment.
This section explores how to design and conduct surveys that accurately reflect the perspectives
of a larger group of people by identifying unbiased sample groups and by writing unbiased questions. It
also investigates different methods for representing survey results through graphs. This section will also
investigate the use of experiments and observational studies as other forms of data gathering techniques.
II. Objectives
Engineers efficiently applies scientific principles to solve various problems existing in our society.
This can be accomplished either by improving an existing process or product or by a designing a new
product or process to meet specific needs [1]. In fact, problem solving have become an integral part of an
engineer’s life that Scott Adams even said that “if there are no problems handily available, engineers will
create their own problems”. Engineers uses a systematic approach to attain the desired solution in any
problem which is called the Engineering Method or Engineering Design [2].
The phases in the engineering method clearly shows that it is essential for an engineer to be skilled
in planning experiments, collecting, analyzing, and interpreting data, and making conclusions and
solutions to problems based on the results. Since an engineer deals with many different aspects of data,
some knowledge in the field of statistics is necessary. Oxford Languages defines statistics as “The practice
or science of collecting and analyzing numerical data in large quantities, especially for the purpose of
inferring proportions in a whole from those in a representative sample.” The statistical techniques you
will learn in this course can be a powerful tool once you start innovating or improving products, processes
systems.
[3]There are various ways in gathering the required data in any given engineering problem. Each
method has its own advantages and disadvantages. Understanding the specific method will help you
gather the appropriate data which will lead to the right conclusions and solutions. In this lesson, we will
only cover the two most common data gathering method used by engineers:
1. Survey. This method collects data from a direct method, through questionnaires, polls and
surveys. Data is based on beliefs, principles, point of views, opinions, personal observations
and/or information from a selected number of participants with qualifications appropriate to the
study. A researcher prepares a set of questions administered through paper and pen surveys,
personal interviews, social media polls, and online surveys. Because of this, the data gathered
through this method is immediately acquired from the respondents and is mostly qualitative.
2. Experiments. This method utilizes scientific and systematic methods of extracting data. This
involves the conduct of tests and experiments commonly done in laboratories, as the name
suggests. In experiments, the respondents are separated into groups namely, the experimental
or manipulated group and the controlled group. This method mostly provides promising results
because evidence are directly taken from the experiments.
[3]A survey is a method of gathering data from a large population by asking few well-constructed
questions. A survey questionnaire is a series of unbiased questions that a participant must answer. Table
1.1 presents the advantages and disadvantages of conducting a survey to gather data.
Table 1. 1 Advantages and Disadvantages of a Survey
Advantages Disadvantages
Conducting a Survey
There are two common methods of administering a survey and each has their own advantage and
disadvantages. Table 1.2 presents the comparison of these two methods.
Designing a Survey
Surveys can take different forms. They can be used to ask only one question, or they can ask a series
of questions. We can use surveys to test out people’s opinions or to test a hypothesis. When designing a
survey, the following steps are useful:
1. Determine the goal of your survey: What question do you want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview, phone interview, self-administered
paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase them. (This is important
if there is more than one piece of information you are looking for.)
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
d. Create a data collection sheet that she can use to record her results.
In order to collect the data to this simple survey Martha can design a data collection
sheet such as the one below:
Table 1. 3 Tally Sheet for Example 1
Sport Tally
Baseball
Basketball
Football
Soccer
Volleyball
swimming
The answer from each interviewee can be quickly collected and then the data
collector can move on to the next person.
Once the data has been collected, suitable graphs can be made to display the results.
2. Raoul wants to construct a survey that shows how many hours per week the average student at
his schoolwork.
d. Create a data collection sheet that Raoul can use to record his results.
In order to collect the data for this survey Raoul designed the data collection sheet
shown below:
This data collection sheet allows Raoul to write down the actual numbers of hours
worked per week by students as opposed to just collecting tally marks for several
categories.
[3]If we had infinite time and resource budgets there probably wouldn't be a big fuss made over
designing experiments. In production and quality control we want to control the error and learn as much
as we can about the process or the underlying theory with the resources at hand. From an engineering
perspective we're trying to use experimentation for the following purposes:
Experimental Design2
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing
roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these
are answered using randomized experiments. Proper study design ensures the production of reliable and
accurate data.
The purpose of an experiment is to investigate the relationship between two variables. When one
variable causes change in another, we call the first variable the explanatory variable. The affected
variable is called the response variable. In a randomized experiment, the researcher manipulates values
of the explanatory variable and measures the resulting changes in the response variable. The different
values of the explanatory variable are called treatments. An experimental unit is a single object or
individual to be measured.
You want to investigate the effectiveness of vitamin E in preventing disease. You recruit a group
of subjects and ask them if they regularly take vitamin E. You notice that the subjects who take vitamin E
exhibit better health on average than those who do not. Does this prove that vitamin E is effective in
disease prevention? It does not. There are many differences between the two groups compared in
addition to vitamin E consumption. People who take vitamin E regularly often take other steps to improve
their health: exercise, diet, other vitamin supplements, choosing not to smoke. Any one of these factors
could be influencing health. As described, this study does not prove that vitamin E is the key to disease
prevention.
Additional variables that can cloud a study are called lurking variables. In order to prove that the
explanatory variable is causing a change in the response variable, it is necessary to isolate the explanatory
variable. The researcher must design her experiment in such a way that there is only one difference
between groups being compared: the planned treatments. This is accomplished by the random
assignment of experimental units to treatment groups. When subjects are assigned treatments randomly,
all of the potential lurking variables are spread equally among the groups. At this point the only difference
between groups is the one imposed by the researcher. Different outcomes measured in the response
variable, therefore, must be a direct result of the different treatments. In this way, an experiment can
prove a cause-and-effect connection between the explanatory and response variables.
1
A Quick History of the Design of Experiments (DOE) by The Pennsylvania State University, available under a CC
BY-NC 4.0 license at https://online.stat.psu.edu/stat503/lesson/1/1.1
2
Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
The power of suggestion can have an important influence on the outcome of an experiment.
Studies have shown that the expectation of the study participant can be as important as the actual
medication. In one study of performance-enhancing drugs, researchers noted:
Results showed that believing one had taken the substance resulted in [performance] times almost
as fast as those associated with consuming the drug itself. In contrast, taking the drug without knowledge
yielded no significant performance increment3.
1. Researchers want to investigate whether taking aspirin regularly reduces the risk of heart attack.
Four hundred men between the ages of 50 and 84 are recruited as participants. The men are divided
randomly into two groups: one group will take aspirin, and the other group will take a placebo. Each
man takes one pill each day for three years, but he does not know whether he is taking aspirin or the
placebo. At the end of the study, researchers count the number of men in each group who have had
heart attacks. Identify the following values for this study: population, sample, experimental units,
explanatory variable, response variable, treatments.
Solution:
The population is men aged 50 to 84.
The sample is the 400 men who participated.
The experimental units are the individual men in the study.
The explanatory variable is oral medication.
The treatments are aspirin and a placebo.
The response variable is whether a subject had a heart attack.
2. The Smell & Taste Treatment and Research Foundation conducted a study to investigate whether
smell can affect learning. Subjects completed mazes multiple times while wearing masks. They
completed the pencil and paper mazes three times wearing floral-scented masks, and three times
with unscented masks. Participants were assigned at random to wear the floral mask during the first
three trials or during the last three trials. For each trial, researchers recorded the time it took to
complete the maze and the subject’s impression of the mask’s scent: positive, negative, or neutral.
a. Describe the explanatory and response variables in this study.
3
McClung, M. Collins, D. “Because I know it will!”: placebo effects of an ergogenic aid on athletic performance.
Journal of Sport & Exercise Psychology. 2007 Jun. 29(3):382-94. Web. April 30, 2013.
Solution:
a. The explanatory variable is scent, and the response variable is the time it takes to complete
the maze.
b. There are two treatments: a floral-scented mask and an unscented mask.
c. All subjects experienced both treatments. The order of treatments was randomly assigned so
there were no differences between the treatment groups. Random assignment eliminates the
problem of lurking variables.
d. Subjects will clearly know whether they can smell flowers or not, so subjects cannot be blinded
in this study. Researchers timing the mazes can be blinded, though. The researcher who is
observing a subject will not know which mask is being worn.
3. A researcher wants to study the effects of birth order on personality. Explain why this study could
not be conducted as a randomized experiment. What is the main problem in a study that cannot be
designed as a randomized experiment?
Solution:
The explanatory variable is birth order. You cannot randomly assign a person’s birth order.
Random assignment eliminates the impact of lurking variables. When you cannot assign subjects to
treatment groups at random, there will be differences between the groups other than the
explanatory variable.
Designing an Experiment
1. You are concerned about the effects of texting on driving performance. Design a study to
test the response time of drivers while texting and while driving only. How many seconds
does it take for a driver to respond when a leading car hits the brakes?
a. Describe the explanatory and response variables in the study.
b. What are the treatments?
c. What should you consider when selecting participants?
d. Your research partner wants to divide participants randomly into two groups: one to
drive without distraction and one to text and drive simultaneously. Is this a good idea?
Why or why not?
e. Identify any lurking variables that could interfere with this study.
f. How can blinding be used in this study?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
2. How does sleep deprivation affect your ability to drive? A recent study measured the effects
on 19 professional drivers. Each driver participated in two experimental sessions: one after
normal sleep and one after 27 hours of total sleep deprivation. The treatments were
assigned in random order. In each session, performance was measured on a variety of tasks
including a driving simulation.
Use key terms from this module to describe the design of this experiment.
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
VII. Summary
The goal of the experiment is to provide evidence for a cause-and-effect relationship between
two variables. When we investigate a relationship between two variables, we identify an explanatory
variable and a response variable. To establish a cause-and-effect relationship, we want to make sure the
explanatory variable is the only factor that impacts the response variable. But other factors, called
confounding variables, may also influence the response.
A well-designed experiment takes steps to eliminate the effects of confounding variables. These
steps include direct control, random assignment of people to treatment groups, use of a control or
placebo, and blind conditions. Incorporating such precautions, a well-designed experiment provides
convincing evidence of cause-and-effect.
Random assignment uses random chance to assign participants to treatments, which creates
similar treatment groups. With random assignment, we can be confident that any differences we observe
in the response of treatment groups is due to the explanatory variable. In this way, we have evidence for
a cause-and-effect relationship.
I. Introduction
It is often necessary to “guess” about the outcome of an event to decide. Politicians study polls to
guess their likelihood of winning an election. Teachers choose a course of study based on what they think
students can comprehend. Doctors choose the treatments needed for various diseases based on their
assessment of likely results. You may have visited a casino where people play games chosen because of
the belief that the likelihood of winning is good. You may have chosen your course of study based on the
probable availability of jobs.
You have, more than likely, used probability. In fact, you probably have an intuitive sense of
probability. Probability deals with the chance of an event occurring. Whenever you weigh the odds of
whether to do your homework or to study for an exam, you are using probability. In this chapter, you will
learn how to solve probability problems using a systematic approach.
In this module, we will examine the concepts behind probability. We will also consider different
terminologies and properties related to probability. Also, we will have a brief review on the basic
combinatorics which can be utilized later in calculating for probability. Lastly, you will be able to
differentiate the three basic interpretations of probability and calculate its respective probability.
II. Objectives
1. identify sample spaces and events for random experiments with graphs, tables, lists, or tree
diagrams
2. calculate the classical and empirical probability of an event
“The only two sure things are death and taxes.” – Ben Franklin
This philosophy no doubt arose because so much in our lives is affected by chance. From the time you
wake up, until you go to bed, you make decisions about the possible events that are usually governed at
least in part by chance. For example, when you go to school you think about whether you would carry an
umbrella and when choose the best answer in a multiple-choice exam.
To illustrate, assume you have an opportunity to invest some money in a software company. Suppose
you know that the company’s records indicate that in the past five years, its profits have been consistently
decreasing. Would you still invest your money in it? Do you think the chances are good for the company
in the future?
Also, suppose that you are playing a game that involves tossing a single die. Assume that you have
already tossed it 10 times, and the outcome is always a 2. What is your prediction of the eleventh toss?
Would you be willing to bet $100 that you will not get a 2 on the next toss? Do you think the die is loaded?
Notice that the decision concerning a successful investment in the software company and the decision
of whether to bet $100 on the next outcome of the die are both based on probabilities of certain sample
results. Namely, the software company’s profits have been declining for the past five years, and the
outcome of rolling 2 ten times in a row seems strange. From these sample results, we might conclude that
we are not going to invest our money in the software company or bet on this die. In this lesson, you will
learn mathematical ideas and tools that can help you understand such situations.
Probability is generally defined as the chance of an event occurring. Many people are familiar with
probability from observing or playing games of chance, such as card games, slot machines or lotteries. In
addition to games of chance, probability theory is used in the fields of insurance, investments, and
weather forecasting. But the application of probability goes beyond that. Inferential Statistics like
prediction is based on probability. Hypothesis testing is also validated using probability.
An event is something that occurs or happens. For example, flipping a coin is an event, and so is
walking in the park and passing by a bench. Anything that could possibly happen is an event.
A sample space is the set of all possible outcomes of a probability experiment and is denoted by S.
Roll a die 1, 2, 3, 4, 5, 6
[4]A tree diagram is a device consisting of line segments emanating from a starting point and to the
outcome point. It is used to determine possible outcomes of a probability experiment.
Examples:
1. Find the sample space for the gender of the children if a family has three children. Use B for boy
and G for girl.
Solution:
To draw a tree diagram, we first must signify a start, in this case we use the letter S. In this
problem we are to find the sample space for the gender of the children if a family has three
children.
For the first child (marked green), we have can either a boy (B) or a girl (G).
For the second child (marked blue),
o If the eldest is B, the second child can again be either B or G, this will give us BB
and BG
o If the eldest is B, the second child can again be either B or G, this will give us GB
and GG
For the third child (marked orange),
o If the eldest is a boy and the second is also a boy (BB), the third child can again
be either B or G, which will give us BBB and BBG.
o If the eldest is a boy and the second is a girl (BG), the third child can again be
either B or G, which will give us BGB and BGG.
o If the eldest is a girl and the second is a boy (GB), the third child can again be
either B or G, which will give us GBB and GBG.
o If the eldest is a girl and the second is also a girl (GG), the third child can again
be either B or G, which will give us GGB and GGG.
Thus, the sample space is S = {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG} with 8 outcomes.
B
B
G
B
B
G
G
S
B
B
G
G
B
G
G
Solution:
To draw a tree diagram, we first must signify a start, in this case we use the letter S. In this
problem we are to find the sample space of a probability experiment that a coin is tossed, and a
die is rolled.
For the first experiment (marked green), which is tossing of the coin, we can either get a
Head (H) or a Tail (T)
For the second experiment (marked blue),
o If the coin toss gave us H, the possible rolls of die is any number from 1 to 6,
which will give us H1, H2, H3, H4, H5, H6.
o If the coin toss gave us T, the possible rolls of die is any number from 1 to 6,
which will give us T1, T2, T3, T4, T5, T6.
Thus, the sample space is S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6} with 12 outcomes.
3
H
4
6
S
1
3
T
4
In sequence of n events in which the first one has m possibilities and the second event
has m possibilities, and the third has o, and so forth, the total number of possibilities of the
sequence will be
m ● n ● o…
Examples:
1. A paint manufacturer wishes to manufacture several different paints. The categories include:
How many different types of paint can be made if a person can select one color, one type, one
texture, and one use?
2. There are 4 blood types: A, B, AB, and O. Blood can also be Rh+ and Rh-. Finally, a blood donor
can be classified as either male or female. How many ways can a donor have his or her blood
labeled?
4 ● 2 = 8 ways
Permutation
Permutation
The number of ways to choose and arrange k objects from a group of n objects is
𝒏!
𝒏 𝑷𝒌 =
(𝒏 − 𝒌)!
Examples:
4. Suppose a business owner has a choice of five locations in which to establish her business. She
decides to rank each location according to certain criteria such as price of the store, parking
facilities, etc.
a. How many ways can she rank the five locations?
b. Now, suppose she only wanted to rank the top three. How many ways can she rank
them?
5. A television news director wishes to use three news stories on an evening news show. One story
will be the lead story, one will be the second story, and the last will be the closing story. If the
director has a total of eight stories to choose from, how many possible ways can the program be
set up?
6. How many possible ways can a chairperson and an assistant chairperson be selected for a research
project if there are seven scientists available?
Given: n = 7 scientists r = 2
𝒏! 𝟕!
𝟕 𝑷𝟐 = = = 𝟒𝟐 𝒘𝒂𝒚𝒔
(𝒏 − 𝒌)! (𝟕 − 𝟐)!
Combination
Combination
The number of ways to choose k objects from a group of n objects where order does not matter
is
𝒏!
𝒏 𝑪𝒌 =
𝒌! (𝒏 − 𝒌)!
The difference between a permutation and a combination is that in a combination, the order or
arrangement of objects is not important; by contrast, order is important in a permutation. The
following example illustrates this difference.
Given the letters A, B, C & D, list the permutations & combinations for selecting 2 letters.
Permutations Combinations
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD CD
Note that in permutations, AB is different from BA. But in combinations, AB is the same as BA,
so only AB is listed.
Examples:
7. A bicycle shop owner has 12 mountain bicycles in the showroom. The owner wishes to select 5
of them to display at a bicycle show. How many ways can a group of 5 be selected?
𝒏! 𝟏𝟐!
𝟏𝟐 𝑪𝟓 = = = 𝟕𝟗𝟐 𝒘𝒂𝒚𝒔
𝒌! (𝒏 − 𝒌)! 𝟓! (𝟏𝟐 − 𝟓)!
8. In a club there are 7 women and 5 men. A committee of 3 women and 2 men is to be chosen.
How many different possibilities are there?
Also, since there are two categories, women and men, you also need to apply the
fundamental counting principle.
n = 5 men r = 2 men
Using the Fundamental Counting Principle to calculate for the total possibilities,
35 ● 10 = 350 ways
2. Abby is registering at a Web site. She must select a password containing six digits from 1 to 9
to be able to use the site. How many passwords are allowed if no digit may be used more
than once?
3. There are 10 finalists in a figure skating competition. How many ways can gold, silver, and
bronze medals be awarded?
4. How many ways can the letters of the word Mississippi be arranged?
5. A group of seven students working on a project needs to choose two students to present the
group's report. How many ways can they select the two students?
6. Jack has a reading list of 12 books. How many ways can he select 9 books from the list to
check out from the library?
7. The manager of a softball team has 7 possibly players in mind for the top 4 spots in the lineup.
How many ways can she choose the top 4 spots?
8. A store manager wishes to display 8 different brands of shampoo in a row. How many ways
can this be done?
9. How many ways can a person select 7 television commercials from 11 television
commercials?
10. Anderson Research Co. decides to test-market a product in 6 areas. How many ways can 3
areas be selected in a certain order for the first test?
11. How many ways can 3 cars and 4 trucks be selected from 8 cars and 11 trucks to be tested
for a safety inspection?
12. There are 7 women and 5 men in a department. How many ways can a committee of 4
people be selected? How many ways can this committee be selected if there must be 2 men
and 2 women on the committee?
13. How many different 3-digit identification tags can be made if the digits can be used more
than once? If the first digit must be a 5 and repetitions are not permitted?
14. In a beauty pageant with 20 contestants, how many ways can a winner, a first runner-up, a
second runner-up, and a third runner-up be selected?
15. How many 5-digit zip codes are possible if digits can be repeated? If there cannot be
repetitions?
Classical Probability
Classical Probability is defined to be the ratio of the number of cases favorable to an event to the
number of all outcomes possible, where each of the outcomes is equally likely. Its uses sample spaces to
determine the numerical probability that an event will happen which indicates that you don’t need to
perform the experiment to determine the probability. It also assumes that all outcomes in the sample
space are equally likely to occur, for example, when a single die is rolled, each outcome would have the
same probability of occurring, which is 1/6.
Properties of Probability
1. The probability of any event E is a number (either a fraction or decimal) between and including 0 and
1.
0 ≤ 𝑃(𝐴) ≤ 1
𝑃(𝐴) = 1
3. If an event E cannot occur (i.e., the event contains no members in the sample space), its probability is
0. This is sometimes called as an impossible event.
4. If an event E is certain, then the probability of E is 1. This is sometimes called a guaranteed event.
Examples:
1. Find the probability of getting a red face card (jack, queen, or king) when randomly drawing a card
from an ordinary deck.
Solution:
Let P(A) be the probability of getting a red face card from an ordinary deck.
As shown in Figure 2.3, an ordinary deck of cards has 52 cards with 4 suits, 13 hearts, 13
diamonds, 13 spades, and 13 clubs thus, n(S) = 52.
The red cards are heart and diamonds. There are 3 face cards for each suit, thus, n(A) = 6
𝑛(𝐴) 6
𝑃(𝐴) = = = 0.115
𝑛(𝑆) 52
2. If a family has three children, find the probability that exactly two of the three children are girls.
Solution:
Let P(A) be the probability that exactly two of the three children are girls.
Thus, n(S) = 8.
The outcomes with exactly two girls are BGG, GBG, GGB, thus n(A) = 3.
3. Find the probability of getting two heads when tossing two coins.
Solution:
Thus, n(S) = 4
Solution:
S = {1, 2, 3, 4, 5, 6}
Thus, n(S) = 6
5. In a group of 500 women, 120 have played golf at least once. Suppose one of these 500 women
is randomly selected. What is the probability that she has played golf at least once?
Solution:
Let P(A) be the probability that a woman has played golf at least once.
The number of women who have played golf at least once is 120, thus n(A) = 120.
𝑛(𝐴) 120
(𝐴) = = = 0.24
𝑛(𝑆) 500
Empirical Probability
Empirical Probability relies on actual experience to determine the likelihood of outcomes. The
difference between classical and empirical probability is that classical probability assumes that certain
outcomes are equally likely while empirical probability relies on actual experience to determine the
likelihood of outcomes. In empirical probability, one might actually roll a given die 6000 times, observe
the various frequencies, and use these frequencies to determine the probability of an outcome.
Given a frequency distribution, the empirical probability of an event A in a given class is,
Solution:
a. Let P(A) be the probability that the person will travel by airplane.
Based on the frequency table, the number of people who will travel by airplane is 6, thus f=6.
The total frequencies are 50, thus n=50.
Calculating P(A),
𝑓 6
𝑃(𝐴) = = = 0.12
𝑛 50
Calculating P(A),
𝑓 41
𝑃(𝐴) = = = 0.82
𝑛 50
2. In a sample of 50 people, 21 had type O blood, 22 had type A blood, 5 had type B blood, and 2
had type AB blood. Set up a frequency distribution and find the following probabilities.
a. A person has type O blood.
b. A person has type A or type B blood.
c. A person has neither type A nor type O blood.
d. A person does not have type AB blood.
Solution:
a. Let P(A) be the probability that the person has type O blood.
Based on the frequency table, the number of people with type O blood is 21, thus f=21.
The total frequencies are 50, thus n=50.
Calculating P(A),
𝑓 21
𝑃(𝐴) = = = 0.42
𝑛 50
b. Let P(A) be the probability that the person has type A or type B blood.
Based on the frequency table, the number of people with type A blood is 22, and those
with type B blood is 5, thus f = 22 + 5 = 27.
The total frequencies are 50, thus n=50.
Calculating P(A),
𝑓 27
𝑃(𝐴) = = = 0.54
𝑛 50
c. Let P(A) be the probability that the person has neither type A nor type O blood.
If it’s neither type A or O, thus they can only be type B and type AB.
Based on the frequency table, the number of people with type B blood is 5 and those with
type AB is 2, thus f = 5 + 2 = 7.
The total frequencies are 50, thus n=50.
Calculating P(A),
𝑓 7
𝑃(𝐴) = = = 0.14
𝑛 50
d. Let P(A) be the probability that the person does not have type AB blood.
If the person does not have type AB blood, then the person selected should have either
type O, type B, or type A.
Based on the frequency table, the number of people with type O blood is 21, type A blood
is 22, and those with type B blood is 5, thus f = 21 + 22 + 5 = 48.
The total frequencies are 50, thus n=50.
Calculating P(A),
𝑓 48
𝑃(𝐴) = = = 0.96
𝑛 50
Solution:
a. Let P(A) be the probability that the patient stayed exactly 5 days.
Based on the frequency table, the number of people who stayed exactly 5 days is 56, thus
f=56.
The total frequencies are 127, thus n=127.
Calculating P(A),
𝑓 56
𝑃(𝐴) = = = 0.44
𝑛 127
b. Let P(A) be the probability that the patient stayed at most 4 days.
The keyword “at most” means not more than or at maximum, thus mathematically, ≤.
So, you will only consider the patients who stayed less than or equal to 4 days which are
the patients who stayed for 3 and 4 days.
Based on the frequency table, the number of people who stayed for 3 days is 15 while
those who stayed for 4 days is 32, thus f=15+32 = 47.
Calculating P(A),
𝑓 47
𝑃(𝐴) = = = 0.37
𝑛 127
c. Let P(A) be the probability that the patient stayed fewer than 6 days.
Thus, you will only consider the patients who stayed fewer than 6 days, or those who
stayed for 5 days, 4 days, and 3 days.
Based from the frequency table, the number of people who stayed for 3 days is 15, those
who stayed for 4 days is 32, and those who stayed for 5 days is 56, thus f=15 + 32 + 56 = 103.
Calculating P(A),
𝑓 103
𝑃(𝐴) = = = 0.81
𝑛 127
d. Let P(A) be the probability that the patient stayed at least 5 days.
The keyword “at least” means not less than or at the minimum, thus mathematically, ≥.
So, you will only consider the patients who stayed greater than or equal to 5 days which
are the patients who stayed for 5, 6 and 7 days.
Based from the frequency table, the number of people who stayed for 5 days is 56, those
who stayed for 6 days are 19, and those who stayed for 7 days is 5, thus f=56 + 19 + 5 = 80.
The total frequencies are 127, thus n=127.
Calculating P(A),
𝑓 80
𝑃(𝐴) = = = 0.63
𝑛 127
Subjective Probability
Subjective probability uses a probability value based on an educated guess or estimate,
employing opinions and inexact information. It is the probability assigned to an event based on subjective
judgment, experience, information, and belief. In subjective probability, a person or group makes an
educated guess at the chance that an event will occur. This guess is based on the person’s experience and
evaluation of a solution.
Examples:
1. A sportswriter may say that there is a 41% probability that the Lakers will make it to the 2019 –
2020 NBA playoffs [6].
2. A physician might say that, based on her diagnosis, there is a 30% chance the patient will need an
operation.
3. A seismologist might say there is an 80% probability that an earthquake will occur in a certain
area.
4. The probability that Taylor, who is taking an Engineering Data Analysis course, will earn a 1.0 mark
in the course.
5. The probability that the Dow Jones Industrial Average will be higher at the end of the next trading
day.
VIII. Summary
An event is something that occurs, or happens, with one or more possible outcomes.
An experiment is the process of taking a measurement or making an observation.
A simple event is the simplest outcome of an experiment.
The sample space is the set of all possible outcomes of an experiment, typically denoted by S.
The Fundamental Counting Principle states that if one event has m possible outcomes and a 2nd
event has n possible outcomes, then there is m●n total possible outcomes for the two events
together.
A combination is the number of ways of choosing k objects from a total of n objects (order
does not matter).
𝒏!
𝒏 𝑪𝒌 =
𝒌! (𝒏 − 𝒌)!
A permutation is the number of ways of choosing and arranging k objects from a total
of n objects (order does matter).
𝒏!
𝒏 𝑷𝒌 =
(𝒏 − 𝒌)!
The probability of an event is a measure of the likelihood that the event occurs.
Probability is always a number between 0 and 1, where 0 means an event is impossible and 1
means an event is certain. The closer the probability is to 0, the less likely the event is to occur.
The closer the probability is to 1, the more likely the event is to occur.
The sum of all probabilities must sum to 1.
When the outcomes of an experiment are all equally likely, we can find the probability of an
event by dividing the number of outcomes in the event by the total number of outcomes in the
sample space for the experiment.
The two ways of determining probabilities are empirical and classical.
o Empirical methods use a series of trials that produce outcomes that cannot be predicted
in advance (hence the uncertainty). The probability of an event is approximated by the
relative frequency of the event.
o Classical methods use the nature of the situation to determine probabilities. Probability
rules allow us to calculate theoretical probabilities.
IX. Exercises
1. A box contains a $1 bill, a $5 bill, a $10 bill, and a $20 bill. A bill is selected at random, and it is
not replaced; then a second bill is selected at random. Draw a tree diagram and determine the
sample space.
2. An experiment consists of flipping a coin and then flipping it a second time if a head occurs. If a
tail occurs on the first flip, then a die is tossed once. Draw a tree diagram and determine the
sample space.
3. Suppose a box contains three balls, one red, one blue, one yellow, and one green. One ball is
selected, its color is observed, and then the ball is placed back in the box. The balls are
scrambled, and again, a ball is selected, and its color is observed. Draw a tree diagram and
determine the sample space.
4. Four balls numbered 1 through 4 are placed in a box. A ball is selected at random, and its
number is noted; then it is replaced. A second ball is selected at random, and its number is
noted. Draw a tree diagram and determine the sample space.
5. The source of federal government revenue for a specific year is 50% from individual income
taxes, 32% from social insurance payroll taxes, 10% from corporate income taxes, 3% from
excise taxes and 5% other. If a revenue source is selected at random, what is the probability that
it comes from individual or corporate income taxes?
6. According to the 2011 primary energy consumption of the Philippines, 31% of the consumption
was met by oil, 20% by coal, 22% by geothermal, 12% by biomass, 6% by hydro and 1% by other
renewable energy like wind, solar and biofuel. [7] Choose one energy source at random. Find
the probability that it is (a) not hydro; (b) geothermal or biomass; (c) coal.
7. A roulette wheel has 38 spaces numbered 1 through 36, 0, and 00. Find the probability of
getting (a) An odd number (Do not count 0 or 00.); (b) A number greater than 27; and (c) A
number that contains the digit 0.
8. In a large city, 15,000 workers lost their jobs last year. Of them, 7400 lost their jobs because
their companies closed down or moved, 4600 lost their jobs due to insufficient work, and the
remainder lost their jobs because their positions were abolished. If one of these 15,000 workers
is selected at random, find the probability that this worker lost his or her job (a) because the
company closed down or moved; (b) due to insufficient work; (c)because the position was
abolished.
9. Roll two dice and multiply the numbers. (a) What is the probability that the product is a multiple
of 6? (b) What is the probability that the product is less than 10?
10. A bowl of candy holds 16 peppermint, 14 butterscotch, and 10 strawberry flavored candies.
Suppose a person grabs a handful of 7 candies. What is the percent chance that exactly 3 are
butterscotch?
11. Classify each statement as an example of classical probability (CP), empirical probability (EP), or
subjective probability (SP). Write your answers beside every number.
a. The probability that a person will watch the 6 o’clock evening news is 0.15.
b. The probability that a bus will be in an accident on a specific run is about 6%.
c. The probability that a student will get a C or better in a statistics course is about 70%.
d. The probability that a new fast-food restaurant will be a success in Chicago is 35%.
e. The probability that interest rates will rise in the next 6 months is 0.50.
I. Introduction
In this module, we will examine the different rules of probability. You will also learn how to
combine two or more events by finding the union of the two events or the intersection of the two events.
You will learn what is meant by the complement of an event, and you will be introduced to the
Complement Rule. You will also be presented with information about mutually exclusive events and
independent events. You will also learn how to calculate probabilities using the different rules of
probability.
II. Objectives
A compound event is a combination of two or more events into one. It can be formed in two ways:
The union of events A and B occurs if either event A, event B, or both occur in a single
performance of an experiment. We denote the union of the two events by the symbol A∪B. You
read this as either “A union B” or “A or B.” A∪B means everything that is in set A or in set B or in
both sets.
The intersection of events A and B occurs if both event A and event B occur in a single
performance of an experiment. It is where the two events overlap. We denote the intersection of
two events by the symbol A∩B. You read this as either “A intersection B” or
“A and B.” A∩B means everything that is in set A and in set B. That is, when looking at the
intersection of two sets, we are looking for where the sets overlap.
Example:
1. Consider the throw of a die experiment. Assume we define the following events:
Solution:
a. An observation on a single toss of the die is an element of the union of A and B if it is either an
even number, a number that is less than or equal to 3, or a number that is both even and less
than or equal to 3. In other words, the simple events of A∪B are those for
which A occurs, B occurs, or both occur:
𝑛(A ∪ B) 5
𝑃(A ∪ B) = =
𝑛(𝑆) 6
𝑛(A ∩ B) 1
𝑃(A ∩ B) = =
𝑛(𝑆) 6
Intersections and unions can also be defined for more than two events. For
example, A∪B∪C represents the union of three events.
1. Consider the throw of a die experiment. Assume we define the following events:
Solution:
where ∅ is the empty set. This means that there are no elements in the set A∩D.
c. Here, we need to be a little careful. We need to find the intersection of the three sets. To do so,
it is a good idea to use the associative property by first finding the intersection of
sets A and B and then intersecting the resulting set with C.
The complement of an event E’ is the set of outcomes in the sample space that are not included in the
outcomes of event E.
To illustrate, let us refer to the experiment of throwing one die. As you know, the sample space of a fair
die is S = {1,2,3,4,5,6}. If we define the event A as observing an odd number, then A = {1,3,5}. The
complement of A will be all the elements of the sample space that are not in A. Thus, A′ = {2,4,6}. A Venn
diagram that illustrates the relationship between A and A′ is shown in Figure 3.2:
S
2 1
4 3
6 5
This leads us to say that the sum of the possible outcomes for event A and the possible outcomes for its
complement, A′, is all the possible outcomes in the sample space of the experiment. Therefore, the
probabilities of an event and its complement must sum to 1, this is known as the Complement Rule.
The Complement Rule states that the sum of the probabilities of an event and its complement must
equal 1.
P(A) + P(A′) = 1
Example:
1. Suppose you know that the probability of getting the flu this winter is 0.43. What is the probability
that you will not get the flu?
Let the event A be getting the flu this winter. We are given P(A)=0.43.
2. Two coins are tossed simultaneously. Let the event A be observing at least one head.
a. What is the complement of A?
a. The sample space of towing two coins is S = {HH, HT, TH, TT}.
The sample space of event A = {HT, TH, HH}.
The complement of A will be all events in the sample space that are not in A or those do not
involve heads. That is, A′={TT}.
1 3
P(A) = 1 − P(A′ ) = 1 − = = 0.75
4 4
3. Consider the experiment of tossing a coin ten times. What is the probability that we will observe
at least one head?
What are the simple events of this experiment? As you can imagine, there are many simple events, and
it would take a very long time to list them. One simple event may be HTTHTHHTTH, another may
be THTHHHTHTH, and so on. In fact, calculating using the fundamental counting principle, there are
To calculate the probability, it's necessary to keep in mind that each time we toss the coin, the chance
is the same for heads as it is for tails. Therefore, we can say that each simple event among the 1024
possible events is equally likely to occur. Thus, the probability of any one of these events is 1/1024.
We are being asked to calculate the probability that we will observe at least one head. You will probably
find it difficult to calculate, since heads will almost always occur at least once during 10 consecutive tosses.
However, if we determine the probability of the complement of A (i.e., the probability that no heads will
be observed), our answer will become a lot easier to calculate. The complement of A contains only one
event: A′={TTTTTTTTTT}. This is the only event in which no heads appear, and since all simple events are
equally likely, P(A′) = 1/1024.
That is a very high percentage chance of observing at least one head in ten tosses of a coin.
Two events A and B that cannot occur at the same time are mutually exclusive events. Mutually
exclusive events have no common outcomes. As shown in Figure 3.3, P(A∩B) = 0. Notice that there is
no intersection between the possible outcomes of event A and the possible outcomes of event B. For
example, if you were asked to pick a number between 1 and 10, you cannot pick a number that is both
even and odd. These events are mutually exclusive.
A B
Another example is when choosing a card from an ordinary deck of cards. Suppose event A is
choosing an eight and event B is choosing an Ace.
Notice that the sets containing the possible outcomes of the events have no elements in common.
Therefore, the events are mutually exclusive.
Whereas if events A and B share some overlap in the Venn diagram, they may be considered mutually
inclusive events. Figure 3.4 shows the Venn diagram for a mutually inclusive event. Mutually inclusive
events can occur at the same time. Say, for example, you wanted to pick a number from 1 to 10 that is
less than 4 and pick an even number. Let event A be picking a number less than 4 and event B be picking
an even number. The sample space of event A is {1,2,3} while the sample space of event B is {2,4,6,8,10},
and the intersection of A and B is {2}.
Two events are independent events if the occurrence of one event does not impact the probability of
the other event. To illustrate, say you were asked to pick a card from a deck of cards and roll a 6 on a die.
It does not matter if you choose the card first and roll a 6 second, or vice versa. The probability of rolling
the 6 would remain the same, as would the probability of choosing the card. Another example, if you flip
a coin 3 times and get heads 3 times, the probability of getting a tail on the 4 th flip isn’t affected by the
first 3 flips. Because one flip of the coin has no effect on the outcome of any other flips, each flip of the
coin counts as an independent event.
Whereas dependent events are events where one outcome impacts the probability of the other. For
2 events to be dependent, the probability of the second event is dependent on the probability of the first
event. In English, remember, the term dependent means to be unable to do without. This is like the
mathematical definition of dependent events, where the probability of the second event occurring is
affected by the first event occurring.
To illustrate, if you pick two cards from a deck without replacement, the chances of getting an ace on
the first pick is 4 of 52 or 1⁄13. If you keep that ace and draw again, the chance of getting another ace on
your second pick is less: there are now only 3 aces left in the deck (of 51 cards), so the chance of getting
an ace is 3 of 51 or 1⁄17. The probability of the second event is affected by the first event, do it is a
dependent event.
The probability of the union of two events can be obtained by adding the individual probabilities and
subtracting the probability of their intersection:
Examples:
1. Suppose we have a loaded (unfair) die, and we toss it several times and record the outcomes.
We will define the following events:
A: observe an even number
B: observe a number less than 3
Let us suppose that we have P(A) = 0.4, P(B) = 0.3, and P(A∩B) = 0.1. Find P(A∪B).
Solution:
It is probably best to draw a Venn diagram to illustrate this situation. As you can see, the
probability of events A OR B occurring is the union of the individual probabilities of each event.
P(A∩B) = P (2)=0.1
Note that P (2) is included twice. We need to be sure not to double-count this probability. Also
note that 2 is in the intersection of A and B. It is where the two sets overlap. Thus, applying the
additive rule of probability,
2. Consider the experiment of randomly selecting a card from a deck of 52 playing cards. What is
the probability that the card selected is either a spade or a face card?
Solution:
Let P(E) be the probability that the card selected is either a spade or a face card
P(A∪B) = 0.423
3. If you know that 84.2% of the people arrested in the mid 1990’s were males, 18.3% of those
arrested were under the age of 18, and 14.1% were males under the age of 18, what is the
probability that a person selected at random from all those arrested is either male or under the
age of 18?
Solution:
Also, keep in mind that the following probabilities have been given to us:
Therefore, the probability of the person selected being male or under 18 is P(A∪B) and is
calculated as follows:
This means that 88.4% of the people arrested in the mid 1990’s was either male or under 18.
4. What is the probability of choosing a card from a deck of cards that is a club or a ten?
Solution:
P(A∪B) = 4/13
5. What is the probability of choosing a number from 1 to 10 that is less than 5 or odd?
Solution:
P(A∪B) = 7/10
6. If two coins are tossed, what is the probability of observing at least one head?
Solution:
Since the problems is asking for the probability of at least one head, we are to have a minimum
of one head, that is, one head or two heads since you are to toss two coins. Thus, we can define
the events as follows:
You will notice that A∩B is empty (A∩B=∅), or, in other words, there is no overlap between the
two sets. Then, we say that A and B are mutually exclusive.
If the events A and B are mutually exclusive, then the probability of the union of A and B is
the sum of the probabilities of A and B:
7. What is the probability of randomly picking a number from 1 to 10 that is even or randomly picking
a number from 1 to 10 that is odd?
Solution:
Now you know that Picking a number that is both even or add is mathematically impossible
thus events A and B are mutually exclusive. So, we use the additive rule for mutually exclusive
events.
8. 2 fair dice are rolled. What is the probability of getting a sum less than 7 or a sum equal to 10?
Solution:
To find the sample space of event A (blue boxes), and the sample space of event B (pink boxes) use the
following table:
Table 3. 1 Summary Table for Example 7
+ 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
The sample space of event A is {11, 12, 13, 14, 15, 21, 22, 23, 24, 31, 31, 33, 41, 42, 51}, thus, P(A) =
15/36.
The sample space of event B is {46, 55, 64}, thus, P(B) = 3/36.
Notice that there are no elements that are common, so the events are mutually exclusive. Therefore,
use the Additive Rule for Mutually Exclusive Events.
P (A or B) = P(A) + P(B)
P (A or B) = ½
9. A card is chosen at random from a standard deck of cards. What is the probability that the card
chosen is a diamond or club? Are these events mutually exclusive?
Solution:
There are 13 diamond cards in a standard deck with 52 cards, so P(A) = 13/52.
There are 13 club cards in a standard deck with 52 cards, so P(B) = 13/52.
P (A or B) = P(A) + P(B)
P (A or B) = ½
10. 3 coins are tossed simultaneously. What is the probability of getting 1 or 2 heads?
Solution:
We can use the Fundamental Counting Principle in calculating the possible outcomes in tossing 3 coins,
2●2●2 = 8
The sample space is {HHH, HTH, HHT, HTT, THH, THT, TTH, TTT}.
Based on the sample space, there are 3 possible ways to get 1 head, thus, P(A) = 3/8.
There are also 3 possible ways to get 2 heads, thus P(B) = 3/8.
Since getting 1 head and 2 heads cannot occur at the same time, we say that these two events are
mutually exclusive. So, we use the Additive Rule of Probability for Mutually Exclusive Events to calculate
the probability,
P (A or B) = P(A) + P(B)
P (A or B) = 3/4
The Multiplicative Rule of Probability says that the probability that both A and B occur equals the
probability that B occurs times the conditional probability that A occurs, given that B has occurred. It is
sometimes called as the probability of the intersection of events.
Examples:
1. In a certain city in the USA some time ago, 30.7% of all employed female workers were white-
collar workers. If 10.3% of all workers employed at the city government were female, what is
the probability that a randomly selected employed worker would have been a female white-
collar worker?
Solution:
Let P(E) be the probability that a randomly selected government employed worker would have
been a female and white-collar worker
We are trying to find the probability of randomly selecting a female worker who is also a white-collar
worker. This can be expressed as P(A∩B).
P(A) = 10.3%=0.103
P(A|B) = 30.7%=0.307
2. A college class has 42 students of which 17 are male and 25 are female. Suppose the teacher
selects two students at random from the class. Assume that the first student who is selected is
not returned to the class population. What is the probability that the first student selected is
female and the second is male?
Solution:
There are 42 students of which 25 are female, thus, probability of selecting a female student, P(A) =
25/42.
Now, given that the first student selected is not returned back to the population, the remaining
number of students is 41, of which 24 are female and 17 are male. Thus, the conditional probability that
a male student is selected (B), given that the first student selected was a female(A), can be calculated as
P(B|A) = 17/41
In this problem, we have a conditional probability situation. We want to determine the probability
that the first student selected is female and the second student selected is male. To do so, we apply the
Multiplicative Rule:
P(A ∩ B) = 25 42 ∙ 17 41 = 0.247
Thus, there is a probability of 24.7% that the first student selected is female and the second student
selected is male.
3. If Mark goes to the store, the probability that he buys ice cream is 30%. The probability that he
goes to the store is 10%. What is the probability of him going to the store and buying ice cream?
Solution:
The first sentence of the problem is a statement of conditional probability. You could restate it as
“the probability of Mark buying ice cream given that Mark has gone to the store is 30%”, that is P(B|A).
Since we are asked to find the probability of Mark going to the store AND buying ice cream, we are to
use the Multiplicative Rule of Probability.
4. Consider the experiment of choosing a card from a deck, keeping it, and then choosing a second
card from the deck. Let A be the event that the first card is a diamond. Let B be the event that
the second card is a red card. Find P(B∩A).
Solution:
There are 13 diamonds in the deck of 52 cards, so the probability that the first card is a diamond is
P(A) = 13/52.
P(B|A) is the probability that the second card is a red card given that the first card was a diamond.
After the first card was chosen, there are 51 cards left in the deck. 25 of them are red since the first card
was a diamond. Therefore, P(B|A) = 25/51.
P(A ∩ B) = 25 51 ∙ 13 52 = 0.122
5. 10% of the emails that Michelle receives are spam emails. Her spam filter catches spam 95% of
the time. Her spam filter misidentifies non-spam as spam 2% of the time. What is the probability
of an email chosen at random being spam and being correctly identified as spam by her spam
filter?
Solution:
A: email is spam
P(B|A′) = 2% = 0.02
We are to find the probability of an email chosen at random being spam AND being correctly
identified as spam by her spam filter, that is P(A∩B)
6. 0.1% of the population is said to have a new disease. A test is developed to test for the disease.
98% of people without the disease will receive a negative test result. 99.5% of people with the
disease will receive a positive test result. A random person who was tested for the disease is
chosen. What is the probability that the chosen person does not have the disease and got a
negative test result?
Solution:
We are to find the probability that the chosen person does not have the disease and got a negative
test result, that is P(A’∩B’). We use the complement because of the key terms “does not have” and
“negative”.
This means that the probability that a random person who had this test done doesn't have the
disease and got a negative test result is 97.902%. Most of the people who took the test for the disease
will not have it and will get a negative test result.
IX. Summary
The union of the two events A and B, written A∪B, occurs if either event A, event B, or both occur
on a single performance of an experiment. A union is an 'or' relationship.
The intersection of the two events A and B, written A∩B, occurs only if both event A and event B
occur on a single performance of an experiment. An intersection is an 'and' relationship.
Intersections and unions can be used to combine more than two events.
The complement A′ of the event A consists of all outcomes in the sample space that are not in
event A.
The Complement Rule states that the sum of the probabilities of an event and its complement
must equal 1, or for the event A, P(A)+P(A′) = 1.
The Additive Rule of Probability states that the union of two events can be found by adding the
probabilities of each event and subtracting the intersection of the two events,
or P(A∪B)=P(A)+P(B)−P(A∩B). If A∩B contains no simple events, then A and B are mutually
exclusive. Mathematically, this means P(A∪B) = P(A)+P(B).
The Multiplicative Rule of Probability states P(A∩B) = P(B)∙P(A|B). If event B is independent of
event A, then the occurrence of event A does not affect the probability of the occurrence of
event B. Mathematically, this means P(B)=P(B|A). Another formulation of independence is that if
the two events A and B are independent, then P(A∩B) = P(A)∙P(B).
X. Exercises
1. Are these events mutually exclusive (E) or mutually inclusive (I)? Write your answers before the
number.
a. Rolling an even and an odd number on one die.
b. Rolling an even number and a multiple of three on one die.
c. Randomly drawing one card and getting a result of a jack and a heart.
d. Randomly drawing one card and getting a result of a black and a diamond.
e. Choosing an orange and fruit from a basket.
f. Choosing a vowel and a consonant from a Scrabble bag.
2. Determine whether the events are dependent (D) or independent (I).
a. Driving at night and falling asleep at the wheel.
b. Visiting the zoo and seeing a giraffe.
c. The next 2 cars you see are both red.
d. A coin tossed twice comes up heads both times.
e. Being dealt 4 aces in a hand of poker.
f. It is your birthday, and it is a windy day
g. Doug flips a coin and Marlene chooses a card out of a deck.
h. In a bag with 5 white marbles and 5 black marbles, Sanjay pulls out a white marble.
Without returning the marble to the bag, Sanjay pulls out a second marble.
i. Eddie chooses blue for his new bike. Eddie chooses lasagna from the dinner menu.
j. The probability that it will rain tomorrow. The probability that the Red Wings hockey team
will win their game tomorrow.
3. Consider a sample set as S = {2,4,6,8,10,12,14,16,18,20}. Event A is the multiples of 4, while
event B is the multiples of 5.
a. What is the probability that a number chosen at random will be from both A and B?
b. what is the probability that a number chosen at random will be from either A or B?
4. Jack is a student in Bluenose High School. He noticed that a lot of the students in his math class
were also in his chemistry class. In fact, of the 60 students in his grade, 28 students were in his
math class, 32 students were in his chemistry class, and 15 students were in both his math class
and his chemistry class. He decided to calculate what the probability was of selecting a student at
random who was either in his math class or his chemistry class, but not both. Draw a Venn diagram
and help Jack with his calculation.
5. What is the probability of choosing a number from 1 to 10 that is greater than 5 or even?
6. A bag contains 26 tiles with a letter on each, one tile for each letter of the alphabet. What is the
probability of reaching into the bag and randomly choosing a tile with one of the letters in the
word ENGLISH on it or randomly choosing a tile with a vowel on it?
7. Thomas bought a bag of jellybeans that contained 10 red jellybeans, 15 blue jellybeans, and 12
green jellybeans. What is the probability of Thomas reaching into the bag and pulling out a blue
or green jellybean?
8. 0.05% of the population is said to have a new disease. A test is developed to test for the disease.
97% of people without the disease will receive a negative test result. 99% of people with the
disease will receive a positive test result. A random person who was tested for the disease is
chosen.
a. What is the probability that the chosen person does not have the disease?
b. What is the probability that the chosen person does not have the disease and received a
negative test result?
c. What is the probability that the chosen person does have the disease and received a
negative test result?
d. If 1,000,000 people were given the test, how many of them would you expect to have the
disease but receive a negative test result?
9. If Kaitlyn goes to the store, the probability that she buys blueberries is 90%. The probability of her
going to the store is 30%. What is the probability of her going to the store and buying blueberries?
10. On rainy weekend days, the probability that Karen bakes bread is 90%. On the weekend, the
probability of rain is 50%. There is a 29% chance that today is a weekend day. What is the
probability that today is a rainy weekend day in which Karen is baking bread?
11. In a survey of 65 people, 28 consider themselves republicans, 27 consider themselves democrats.
The rest are considered independent. What is the probability that a person chosen at random will
be a democrat or independent?
12. Amy is taking a statistics class and a biology class. Suppose her probabilities of getting A’s are:
P(grade of A in statistics) = .65 P( grade of A in biology) = .70 P(grade of A in statistics and a grade
of A in biology) = .50. Find the probability that Amy will get at least one A between her statistics
and biology classes.
I. Introduction
In this lesson, we will examine random variable and learn how to find the probabilities of specific
numerical outcomes. You will also learn how to construct a probability distribution for a discrete random
variable and represent this probability distribution with a graph, or a table. You will also learn the two
conditions that all probability distributions must satisfy. You will also be presented with the formulas for
the mean, variance, and standard deviation of a discrete random variable. You will also be shown many
real-world examples of how to use these formulas. In addition, the meaning of expected value will be
discussed.
II. Objectives
If we let X represent a quantitative variable that can be measured or observed, then we will be
interested in finding the numerical value of this quantitative variable. A random variable is a function that
maps the elements of the sample space to a set of numbers.
To illustrate, three voters are asked whether they are in favor of building a charter school in a certain
district. Each voter’s response is recorded as 'Yes (Y)' or 'No (N)'. What are the random variables that could
be of interest in this experiment?
To answer this question, we must note that the simple events in this experiment are not numerical in
nature, since each outcome is either a 'Yes' or a 'No'. However, one random variable of interest is the
number of voters who are in favor of building the school. We can summarize all the possible outcomes
using the table below. Notice that we assigned 3 to the first simple event (3 'Yes' votes), 2 to the second
(2 'Yes' votes), 1 to the third (1 'Yes' vote), and 0 to the fourth (0 'Yes' votes).
Table 4. 1 Summary Table of Outcomes
6 N Y N 1
7 N N Y 1
8 N N N 0
In the light of this example, what do we mean by random variable? The adjective 'random' means that
the probability experiment may result in one of several possible values of the variable. So, Random
variables are simply quantities that take on different values depending on chance, or probability. It assigns
numeric values to the outcomes of independent random events.
Example:
Probability Experiment: Count the number of customers who use the drive-up window in a fast-food
restaurant between the hours of 8 AM and 11 AM
Random Variable: The number of customers who drive up within this time interval.
Possible Values: The values may range from 0 to the maximum number that the restaurant can handle,
and it may vary from day to day, depending on random phenomena, such as today’s weather, among
other things
The probability of a discrete random variable can range anywhere from 0 to 1. The less likely a
discrete random variable is to occur, the closer the probability will be to 0, and the more likely a discrete
random variable is to occur, the closer the probability will be to 1.
Examples:
Examples:
The length of time it takes a truck driver to go from New York City to Miami
The depth of drilling to find oil
The weight of a truck in a truck-weighing station
The amount of water in a 12-ounce bottle
For each of these, if the variable is X, then x>0 and less than some maximum value possible, but it can
take on any value within this range.
[
In short: If the possible values of a random variable are countable, it is a discrete random variable. If
the values are uncountable, it is a continuous random variable.
When we talk about the probability of discrete random variables, we normally talk about a probability
distribution. Probability Distribution of a random variable is a complete description of all the possible
values of the random variables, X, along with their associated possibilities, P(X). It may be represented as
a table, a graph, or a chart.
Examples:
1. Suppose you simultaneously toss two fair coins. Let X be the number of heads observed. Find the
probability associated with each value of the random variable X.
Solution:
Since there are two coins, and each coin can be either heads or tails, there are four possible
outcomes (HH, HT, TH, TT), each with a probability of 0.25. Since X is the number of heads
observed, its possible values are 0, 1, and 2.
The next step is to determine the probabilities of the simple events associated with each value of X.
For x = 0, we have one possible outcome which is TT, thus P(X=0) = 1/4.
For x = 1, we have two possible outcome which is HT and TH, thus P(X=1) = 2/4.
For x = 2, we have one possible outcome which is HH, thus P(X=0) = 1/4.
Lastly, we can present probability distribution in two ways, the first one is by using a table:
Table 4. 2 Probability Distribution Table of Example 1
x P(X)
0 0.25
1 0.50
2 0.25
0.5
0.4
0.3
P(X)
0.2
0.1
0 0 2
1
2. What is the probability distribution for the number of yes votes for three voters?
Solution:
The first thing that we can do is to identify the possible outcomes.
The sample space of the probability experiment which has 8 possible outcomes:
Since X is the number of yes votes observed, the possible values are 0, 1, 2, and 3.
The next step is to determine the probabilities of the simple events associated with each value of X.
For x = 0, we have one possible outcome which is NNN, thus P(X=0) = 1/8.
For x = 1, we have three possible outcomes which is YNN, NYN, and NNY, thus P(X=1) = 3/8.
For x = 2, we have three possible outcomes which is YYN, YNY, and NYY thus P(X=2) = 3/8.
For x = 3, we have one possible outcome which is YYY, thus P(X=3) = 1/8.
Lastly, we can present probability distribution in two ways, the first one is by using a table:
Table 4. 3 Probability Distribution Table of Example 2
x P(x)
0 0.125
1 0.375
2 0.375
3 0.125
0.4
0.35
0.3
0.25
P(X)
0.2
0.15
0.1
0.05
0 0 3
1 2
X 0 1 2 3
P(X) 0.1 0.2 0.3 0.4
Solution:
To identify whether a table represents a probability distribution, check the two conditions:
Since it satisfies both conditions, thus this table represents a probability distribution.
4. Does the following table represent the probability distribution for a discrete random variable?
X 1 2 3 4 5
P(X) 0.202 0.174 0.096 0.078 0.055
Solution:
To identify whether a table represents a probability distribution, check the two conditions:
The most important characteristics of any probability distribution are the mean (or average value) and
the standard deviation (a measure of how spread out the values are). A common symbol for the mean
is μ (mu), and the lowercase m of the Greek alphabet. A common symbol for standard deviation
is σ (sigma), and the Greek lowercase s.
Mean
When evaluating the long-term results of statistical experiments, we often want to know the “average”
outcome. This “long-term average” is known as the mean or expected value of the experiment and is
denoted by the Greek letter μ. In other words, after conducting many trials of an experiment, you would
expect this average value.
To illustrate, you toss a coin and record the result. What is the probability that the result is heads?
We might say that it is ½. But when you flip a coin two times, you might record two tails or two heads.
You might even toss a fair coin ten times and record nine heads. What does this mean?
This simple tells us that probability does not describe the short-term results of an experiment. It gives
information about what can be expected in the long term. To demonstrate this, Mathematician Karl
Pearson once tossed a fair coin 24,000 times! He recorded the results of each toss, obtaining heads 12,012
times. In his experiment, Pearson illustrated the Law of Large Numbers.
The Law of Large Numbers states that, as the number of trials in a probability
experiment increases, the difference between the theoretical (classical)
probability of an event and the empirical probability approaches zero (the classical
and the empirical probability get closer and closer together).
To find the mean or expected value we simply multiply each value of the random variable by its
probability and add the products.
𝜇 = 𝐸(𝑋) = 𝑥 ∙ 𝑃(𝑋)
Examples:
5. Suppose you simultaneously toss two fair coins. Let X be the number of heads observed. Calculate
the mean of this distribution.
Solution:
The first thing that we need to do is to construct the probability distribution table. Based on our
previous example, the probability distribution table is:
x P(X)
0 0.25
1 0.50
2 0.25
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
x P(X) x●P(X)
0 0.25 0(0.25) = 0
1 0.50 1(0.50) = 0.50
2 0.25 2(0.25) = 0.50
6. A child psychologist is interested in the number of times a newborn baby's crying wakes its mother
after midnight. For a random sample of 50 mothers, the following information was obtained.
Let X = the number of times per week a newborn baby's crying wakes its mother after midnight.
Find the mean of this probability distribution.
Table 4. 6 Probability Distribution of Example 6
x P(X)
0 2/50
1 11/50
2 23/50
3 9/50
4 4/50
5 1/50
Solution:
The first thing that we need to do is to construct the probability distribution table. Since it is
already given, we may proceed to the next procedure.
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
x P(X) x●P(X)
0 2/50 0(2/50) = 0
1 11/50 1(11/50) = 11/50
2 23/50 2(23/50) = 46/50
3 9/50 3(9/50) = 27/50
4 4/50 4(4/50) = 16/50
5 1/50 5(1/50) = 5/50
11 46 27 16 5 105
𝜇= 𝑥 ∙ 𝑃(𝑋) = 0 + + + + + = = 2.1
50 50 50 50 50 50
7. An insurance company sells life insurance of $15,000 for a premium of $310 per year. Actuarial
tables show that the probability of death in the year following the purchase of this policy is 0.1%.
What is the expected gain for this type of policy?
Solution:
The first thing that we need to do is to construct the probability distribution table. You may
notice that there are two simple events here: either the customer will live this year or will die.
Let X be the company’s expected gain from this policy in the year after the purchase.
Event X P(X)
Live $310 0.999
Die −$14,690 0.001
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
Thus, the average gain or profit of the company for every insurance policy they sold is $295.
8. A men's soccer team plays soccer zero, one, or two days a week. The probability that they play
zero days is 0.2, the probability that they play one day is 0.5, and the probability that they play
two days is 0.3. Find the expected value, μ, of the number of days per week the men's soccer team
plays soccer.
Solution:
The first thing that we need to do is to construct the probability distribution table.
Let X = the number of days the men's soccer team plays soccer per week.
Based on the problem, its possible values are 0, 1, 2. The probabilities are also given and it is
summarized in Table 4.10.
x P(X)
0 0.20
1 0.50
2 0.30
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
x P(X) x●P(X)
0 0.20 0(0.20) = 0
1 0.50 1(0.50) = 0.50
2 0.30 2(0.30) = 0.60
The mean of this probability distribution is 1.1. The men’s soccer team would, on average, expect
to play soccer 1.1 days per week.
9. Suppose you play a game of chance in which five numbers are chosen from 0, 1, 2, 3, 4, 5, 6, 7, 8,
9. A computer randomly selects five numbers from zero to nine with replacement. You pay $2 to
play and could profit $100,000 if you match all five numbers in order (you get your $2 back plus
$100,000). Over the long term, what is your expected profit of playing the game?
Solution:
The first thing that we need to do is to construct the probability distribution table.
You may notice that there are two simple events here: either you will win the game or will lose.
So, we let X be the amount of money you profit.
Since you are interested in your profit, the values of x are not 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Then the
values of X are:
If you win the game, you’ll get your $2 back plus the $ 100,000 so your profit is 100, 000.
If you lose the game, you’ll lose your $2, so your profit is -2.
To win, you must get all five numbers correct, in order. The probability of choosing one correct
number is 1/10 because there are ten numbers. You may choose a number more than once. The
probability of choosing all five numbers correctly and in order by using the fundamental counting
principle is
1 1 1 1 1 1
𝑃(𝑤𝑖𝑛) = ∙ ∙ ∙ ∙ = = 0.00001
10 10 10 10 10 100000
Thus, by using the Complement Rule, we can get the probability of losing,
𝑃(𝑙𝑜𝑠𝑒) = 1 − 0.00001 = 0.9999
Event X P(X)
Win $100, 000 0.00001
Lose −$2 0.9999
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
Thus, your average profit for every game you play is -$1. This means that on average, you are
expected to lose approximately $1 for each game you play after playing this game over and over.
The variance, σ2, of a discrete random variable X is the average of the squared
distance of the values of the random variable X from the mean value. It is given
by the following formula
𝜎 = (𝑥 − 𝜇) ∙ 𝑃(𝑋)
𝜎= 𝜎
Examples:
10. A university medical research center finds out that treatment of skin cancer using chemotherapy
has a success rate of 70%. Suppose five patients are treated with chemotherapy. The probability
distribution of x successful cures of the five patients is given in the table below.
x P(X)
0 0.002
1 0.029
2 0.132
3 0.309
4 0.360
5 0.168
a. Find μ.
b. Find σ.
Solution:
The first thing that we need to do is to construct the probability distribution table. Since it is already
given, we may proceed to the next procedure.
Next is we calculate the population mean. We multiply each possible outcome of the random
variable X by its associated probability.
Table 4. 15 Calculation Table for Mean
x P(X) x●P(X)
0 0.002 0
1 0.029 0.029
2 0.132 0.264
3 0.309 0.927
4 0.360 1.44
5 0.168 0.84
The standard deviation is the square of the variance, so we need to get the variance first. To solve
for the variance we use the formula,
𝜎 = (𝑥 − 𝜇) ∙ 𝑃(𝑋)
𝜎= 𝜎 = √10.432 = 3.23
11. Find the standard deviation of the random variable in Problem no. 6. That is what is the standard
deviation of the number of times a newborn baby's crying wakes its mother after midnight.
Solution:
The standard deviation is the square of the variance, so we need to get the variance first. To solve
for the variance we use the formula,
𝜎 = (𝑥 − 𝜇) ∙ 𝑃(𝑋)
𝜎= 𝜎 = √1.05 = 1.02
VI. Summary
A random variable represents the numerical value of a simple event of an experiment. There are
two types of random variables:
o Discrete random variables have numeric values that can be listed and often can be counted.
o Continuous random variables can take any value in an interval and are often
measurements.
A probability distribution of a random variable tells us the probabilities of all the possible outcomes
(for discrete random variables) of the variable or ranges of values (for continuous random
variables). A probability distribution shows us the regular, predictable distribution of outcomes in
many repetitions of a random variable.
The probability distribution of a discrete random variable is a graph, a table, or a formula that
specifies the probability associated with each possible value that the random variable can assume.
All probability distributions must satisfy the following two conditions:
o P(x≥0), for all values of X
o ∑P(x)=1, for all values of X
The mean value, or expected value, of the discrete random variable X is given by 𝜇 = 𝐸(𝑋) =
∑ 𝑥 ∙ 𝑃(𝑋)
The variance of the discrete random variable X is given by 𝜎 = ∑(𝑥 − 𝜇) ∙ 𝑃(𝑋).
The square root of the variance, or, in other words, the square root of σ2, is the standard deviation
of a discrete random variable: 𝜎 = √𝜎 .
VII. Exercises
1. Determine whether each situation is a discrete (D) or continuous random variable (C), or if it is
neither (N). Write your answers before the number.
a. The number of cats in a shelter at any given time.
b. The weight of newborn babies.
c. The weight of a book in the library.
d. The types of book in the library.
e. The number of books in the library.
f. The average number of stars a business is rated online.
g. The grade given to a student, as a letter.
h. The grade given to a student, as a percentage.
X 2 4 6 8
P(X) 0.2 0.4 0.6 0.8
3. Does the following table represent the probability distribution for a discrete random variable?
X 1 2 3 4 5 6
P(X) 0.302 0.251 0.174 0.109 0.097 0.067
4. A fair die is tossed twice, and the up face is recorded each time. Let X be the sum of the up faces.
Give the probability distribution for X in tabular form.
5. A stockroom clerk returns three safety helmets at random to three steel mill employees who had
previously checked them. If Smith, Jones, and Brown, in that order, receive one of the three hats,
list the sample points for the possible orders of returning the helmets, and find the value in of the
random variable M that represents the number of correct matches.
6. Jeremiah has basketball practice two days a week. Ninety percent of the time, he attends both
practices. Eight percent of the time, he attends one practice. Two percent of the time, he does
not attend either practice. What is X and what values does it take on?
7. A hospital researcher is interested in the number of times the average post-op patient will ring
the nurse during a 12-hour shift. For a random sample of 50 patients, the following information
was obtained. What is the expected value and the standard deviation?
x P(x)
0 4/50
1 8/50
2 16/50
3 14/50
4 6/50
5 2/50
8. You are playing a game of chance in which four cards are drawn from a standard deck of 52 cards.
You guess the suit of each card before it is drawn. The cards are replaced in the deck on each
draw. You pay $1 to play. If you guess the right suit every time, you get your money back and
$256. What is your expected profit of playing the game over the long term?
9. Suppose you play a game with a spinner. You play each game by spinning the spinner once. P(red)
= 2/5, P(blue) = 2/5, and P(green) = 1/5. If you land on red, you pay $10. If you land on blue, you
don't pay or win anything. If you land on green, you win $10. Find the expected value.
10. On May 11, 2013 at 9:30 PM, the probability that moderate seismic activity (one moderate
earthquake) would occur in the next 48 hours in Iran was about 21.42%. Suppose you make a bet
that a moderate earthquake will occur in Iran during this period. If you win the bet, you win $50.
If you lose the bet, you pay $20. Let X = the amount of profit from a bet.
a. If you bet many times, will you come out ahead?
b. What is the standard deviation of X?
11. Suppose you must take the bus to school. The probability that you will have to wait for the bus is
0.25. If you don’t have to wait for the bus the commute takes 20 minutes, but it you must wait
for the bus, the commute takes 30 minutes.
a. What is the expected value of the time it takes you to commute to school?
b. Find the standard deviation.
I. Introduction
A probability distribution function is a pattern. You try to fit a probability problem into a pattern or
distribution in order to perform the necessary calculations. These distributions are tools to make solving
probability problems easier. Each distribution has its own special characteristics. Learning the
characteristics enables you to distinguish among the different distributions.
In this lesson, we will only describe and discuss discrete random variables and the aspects that make
them important for the study of statistics. Some of the more common discrete probability functions are
binomial, hypergeometric, and Poisson. You will be introduced to these three common discrete
probability distributions including their unique characteristics. Not only will you learn how to describe
these distributions, but you will also learn how to apply the formula used each type of distribution. Many
real-world problems will be shown.
II. Objectives
Many probability experiments result in responses for which there are only two possible outcomes,
such as either 'yes' or 'no', 'pass' or 'fail', 'good' or 'defective', 'male' or 'female', etc. A simple example is
the toss of a coin wherein in each toss, we will observe either a head, H, or a tail, T. Scenarios like these
are called Binomial Experiments.
Binomial Experiment
1. The experiment consists of independent and identical trials denoted by n. Trials are repetitions
of a probability experiment.
2. There are only two possible outcomes on each trial - one known as a success (S), and the other
known as a failure (F).
3. The probability of S, denoted by p, remains constant from trial to trial. The probability of F,
denoted by q, is 1-p.
4. The binomial random variable X is the number of successes in n trials.
Example:
1. Suppose a university decides to give two scholarships to two students. The pool of applicants is
ten students: six males and four females. All ten of the applicants are equally qualified, and the
university decides to randomly select two. Let X be the number of female students who receive
the scholarship.
Solution:
To determine whether a random variable is binomial, it must satisfy ALL the characteristics of a
binomial experiment.
The probability of selecting a female applicant for the first trial is 1/10.
The probability of selecting a female applicant for the second trial is 3/9.
As you have noticed, the trials are DEPENDENT. The success of choosing a female student on
the second trial depends on the outcome of the first trial.
Therefore, the trials are not independent, and X is NOT a binomial random variable.
2. Suppose we select 100 students from a large university campus and ask them whether they are
in favor of a certain issue that is being debated on their campus. Let X be the number of students
who favor the issue (a 'yes').
Solution:
To determine whether a random variable is binomial, it must satisfy ALL the characteristics of a
binomial experiment.
In this experiment we have 100 trials since we are to select 100 students.
The trials are IDENTICAL since each student will only answer whether they are in favor of a
certain issue or not. Their answer is also INDEPENDENT of the answer of other students.
2. There are only two possible outcomes on each trial: S (for success) or F (for failure).
A student can only answer in two ways: In Favor (Yes) and Not in favor (No).
The random variable X is the number of students who favor the issue thus we will consider
those who answered YES as the SUCCESS (S) and those who answered NO as the FAILURE
(F).
3. The probability of S remains constant from trial to trial. We will denote it by p. We will
denote the probability of F by q. Thus, q=1−p.
The probability of answering YES for all student is ½. It is constant for every student selected.
The random variable X is the number of students who favor the issue thus we will consider
those who answered YES as the SUCCESS (S)
Because the random variable X satisfies ALL four characteristics, X is a binomial random
variable.
3. A company decides to conduct a survey of customers to see if its new product, a new brand of
shampoo, will sell well. The company chooses 100 randomly selected customers and asks them
to state their preference among the new shampoo and two other leading shampoos on the
market. Let X be the number of the 100 customers who choose the new brand over the other two.
Solution:
To determine whether a random variable is binomial, it must satisfy ALL the characteristics of a
binomial experiment.
In this experiment we have 100 trials since we are to select 100 costumers.
The trials are IDENTICAL since each costumer will only answer whether choose the new brand
or the other two. Their answer is also INDEPENDENT of the answer of other costumers.
2. There are only two possible outcomes on each trial: S (for success) or F (for failure).
The only possible outcome of this experiment is either the new brand (NB) or the other two
brands (OB).
It might seem like there are three outcomes because we have 2 for other brands but take note
that the random variable X only counts whether the costumer chose the new brand or not.
Thus, we will consider the costumers who chose the new brand as the SUCCESS (S) and those
who chose any of the other two brands as the FAILURE (F).
3. The probability of S remains constant from trial to trial. We will denote it by p. We will
denote the probability of F by q. Thus, q=1−p.
The probability of choosing the new brand is 1/3. It is constant for every costumer.
The random variable X is number of the customers who choose the new brand over the other
two thus we will consider those who chose the new brand as the success (S).
Because the random variable X satisfies ALL four characteristics, X is a binomial random
variable.
For a random variable X having a binomial distribution with n trials and a probability of success of p.
The probability that you get exactly k successes is as follows:
𝑃(𝑋 = 𝑘) = 𝐶 ∙𝑝 ∙𝑞
where:
𝒏!
𝒏 𝑪𝒌 =
𝒌! (𝒏 − 𝒌)!
The expected value (mean) and standard deviation for the binomial distribution can be determined
by the following formulas:
𝐸(𝑋) = 𝜇 = 𝑛𝑝
𝜎= 𝑛𝑝𝑞
Examples:
4. According to a study conducted by a telephone company, the probability is 25% that a randomly
selected phone call will last longer than the mean value of 3.8 minutes. What is the probability
that out of three randomly selected calls
a. Exactly two last longer than 3.8 minutes?
b. None last longer than 3.8 minutes?
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
2. Identify a success.
Based on the problem we are asked “What is the probability that out of three randomly
selected calls exactly two last longer than 3.8 minutes?”, thus, the success is
Based on the problem: “the probability is 25% that a randomly selected phone call will
last longer than the mean value of 3.8 minutes”, thus,
𝑝 = 25% = 0.25
𝑞 = 1 − 𝑝 = 1 − 0.25 = 0.75
4. Determine n, the number of experiments or trials.
Based on the problem we are asked “What is the probability that out of three randomly
selected calls exactly two last longer than 3.8 minutes?”, thus,
𝑛=3
𝑃(𝑋 = 𝑘) = 𝐶 ∙𝑝 ∙𝑞
For a, “What is the probability that out of three randomly selected calls exactly two last
longer than 3.8 minutes?” Thus, k = 2,
For b, “What is the probability that out of three randomly selected calls none last longer
than 3.8 minutes?” Thus, k = 0.
5. A car dealer knows from experience that he can make a sale to 20% of the customers who he
interacts with.
a. What is the probability that, in five randomly selected interactions, he will make a sale to:
i. Exactly three customers?
ii. At most one customer?
iii. At least one customer?
b. Determine the probability distribution for the number of sales.
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either he can make a sale or not. Every
sale is also independent from one another.
2. Identify a success.
Based on the problem we are asked “What is the probability that, in five randomly selected
interactions, he will make a sale”, thus
Based on the problem: “A car dealer knows from experience that he can make a sale to 20%
of the customers who he interacts with.”, thus,
𝑝 = 20% = 0.20
𝑞 = 1 − 𝑝 = 1 − 0.20 = 0.80
𝑛=5
𝑃(𝑋 = 𝑘) = 𝐶 ∙𝑝 ∙𝑞
For i, exactly three costumers, thus k = 3
The keyword “at most” means not more than or at maximum, thus k ≤ 1. We need the
probability that he will make 1 or 0 sale.
Thus,
The keyword “at least” means not less than or at the minimum, thus k ≥ 1. We need the
probability that he will make 1, 2, 3, 4, or 5 sales.
Thus,
𝑃(𝑋 ≤ 1) = 0.672
Another Approach:
As you may have noticed in the formula below, the only probability that is not included is
P(X=0). So, we can save time by calculating the complement of the probability we're looking
for and subtracting it from 1 as stated by the complement rule.
For b:
Summarizing the answers that we have in part a to create a probability distribution we have,
Table 5. 1 Probability Distribution Table for Example 5
x P(X)
0 0.32768
1 0.4096
2 0.2048
3 0.0612
4 0.0064
5 0.00032
6. Owen flips a coin 3 times. Find the probability of flipping exactly 0, 1, 2 and 3 heads.
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either Owen flips Heads or Tails. Every
flip is also independent from one another.
2. Identify a success.
Based on the problem we are asked “Find the probability of flipping exactly 0, 1, 2 and 3
heads”, thus,
S: Owen will flip HEAD (H).
F: Owen will flip TAIL (T).
As we have discussed in the previous lessons, the probability of getting a head when flipping
a coin is 0.50, thus,
𝑝 = 0.50
𝑞 = 1 − 𝑝 = 1 − 0.50 = 0.50
4. Determine n, the number of experiments or trials.
Based on the problem “Owen flips a coin 3 times.”, thus,
𝑛=3
𝑃(𝑋 = 𝑘) = 𝐶 ∙𝑝 ∙𝑞
For exactly 0 head, thus k = 0
𝑃(𝑋 = 1) = 0.375
Thus, the probability that Owen will flip exactly 1 head when he tosses 3 coins is 0.375.
7. Mark is taking a multiple-choice quiz that he did not study for. There are 10 questions on the quiz
and each question has 4 possible answer choices. What is the probability that Mark will pass the
quiz with a score of 6 or better if he guesses randomly on each question?
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either Mark will pass or fail the quiz.
Every item he answers is also independent from one another.
2. Identify a success.
Based on the problem we are asked “What is the probability that Mark will pass the quiz with
a score of 6 or better if he guesses randomly on each question?”, thus
Based on the problem: “There are 10 questions on the quiz and each question has 4 possible
answer choices.” To pass the quiz Mark must get the correct answer which is 1 of those four
choices thus,
1
𝑝= = 0.25
4
𝑞 = 1 − 𝑝 = 1 − 0.25 = 0.75
4. Determine n, the number of experiments or trials.
Based on the problem “There are 10 questions on the quiz and each question has 4 possible
answer choices.” thus,
𝑛 = 10
5. Use the binomial formula to write the probability distribution of X.
𝑃(𝑋 = 𝑘) = 𝐶 ∙𝑝 ∙𝑞
Since we are asked to find the “What is the probability that Mark will pass the quiz with a
score of 6 or better if he guesses randomly on each question?”, our k ≥ 6. So, we need to get
the sum of the probabilities of getting a score of 6, 7, 8, 9, and 10.
8. [8]According to a Gallup poll, 60% of American adults prefer saving overspending. Let X be the
number of American adults out of a random sample of 50 who prefer saving to spending. Calculate
the mean and standard deviation of X.
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either an American adult prefers to
save or to spend. Every chosen American adult is also independent from one another.
2. Identify a success.
Based on the problem “Let X be the number of American adults out of a random sample of 50
who prefer saving to spending.”, thus
Based on the problem: “According to a Gallup poll, 60% of American adults prefer saving
overspending.”, thus,
𝑝 = 60% = 0.60
𝑞 = 1 − 𝑝 = 1 − 0.60 = 0.40
4. Determine n, the number of experiments or trials.
Based on the problem “Let X be the number of American adults out of a random sample of
50 who prefer saving to spending”, thus,
𝑛 = 50
For this problem we are asked to find the mean and standard deviation. The formulas we
need are:
𝐸(𝑋) = 𝜇 = 𝑛𝑝
𝜎= 𝑛𝑝𝑞
Substituting,
𝜇 = 𝑛𝑝 = 50(0.60) = 30
Thus, on average, for every 50 American adults, there are 30 ± 3.46 prefer saving than
spending.
9. [9]The lifetime risk of developing pancreatic cancer is about one in 78 (1.28%). Suppose we
randomly sample 200 people. Let X be the number of people who will develop pancreatic cancer.
Calculate the mean and standard deviation of X.
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either a person will develop pancreatic
cancer or not. Every person is also independent from one another.
2. Identify a success.
Based on the problem “Let X be the number of people who will develop pancreatic cancer.”,
thus
Based on the problem: “The lifetime risk of developing pancreatic cancer is about one in 78
(1.28%).”, thus,
𝑝 = 1.28% = 0.0128
𝑞 = 1 − 𝑝 = 1 − 0.0128 = 0.9872
4. Determine n, the number of experiments or trials.
Based on the problem “Suppose we randomly sample 200 people.”, thus,
𝑛 = 200
For this problem we are to find the mean and standard deviation. Substituting in the
respective formula,
𝜇 = 𝑛𝑝 = 200(0.0128) = 2.56
Thus, on average, for every 200 people, there are 2.56 ± 1.59 persons whose more likely to
develop pancreatic cancer.
10. [10]During the 2013 regular NBA season, DeAndre Jordan of the Los Angeles Clippers had the
highest field goal completion rate in the league. DeAndre scored with 61.3% of his shots. Suppose
you choose a random sample of 80 shots made by DeAndre during the 2013 season. Let X be the
number of shots that scored points. Calculate the mean and standard deviation of X.
Solution:
1. Determine if the problem is a binomial probability. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. Based on the characteristics of a binomial experiment. This is a binomial experiment with
two possible outcomes whose probability is constant, either DeAndre will score or not. Every
shot is also independent from one another.
2. Identify a success.
Based on the problem “Let X be the number of shots that scored points.”, thus
Based on the problem: “DeAndre scored with 61.3% of his shots.”, thus,
𝑝 = 61.3% = 0.613
𝑞 = 1 − 𝑝 = 1 − 0.613 = 0.387
4. Determine n, the number of experiments or trials.
Based on the problem “Suppose you choose a random sample of 80 shots made by DeAndre
during the 2013 season.”, thus,
𝑛 = 80
For this problem we are to find the mean and standard deviation. Substituting in the
respective formula,
𝜇 = 𝑛𝑝 = 80(0.613) = 49.04
Thus, on average, for every 80 shots DeAndre throws, there are 49.04 ± 4.36 which scores.
The Poisson Probability Distribution is useful for describing the number of events that will occur
during a specific interval of time or in a specific distance, area, or volume. It is popular for modelling the
number of times an event occurs in an interval of time or space. It is a discrete probability distribution
that expresses the probability of a given number of events occurring in a fixed interval of time and/or
space if these events occur with a known average rate and independently of the time since the last event.
Examples:
A Poisson probability distribution of a discrete random variable gives the probability of several events
occurring in a fixed interval of time or space, if these events happen at a known average rate and
independently of the time since the last event. In a binomial distribution, if the number of trials, n, gets
larger and larger as the probability of success, p, gets smaller and smaller, we obtain a Poisson
distribution.
1. The experiment consists of counting the number of events that will occur during a specific
interval of time or in a specific distance, area, or volume.
2. The probability that an event occurs in a given time, distance, area, or volume is the same.
3. Each event is independent of all other events. For example, the number of people who arrive in
the first hour is independent of the number who arrive in any other hour.
The probability distribution, mean, and variance of a Poisson random variable are given as follows:
The probability that you get exactly k successes of a random variable X having a Poisson distribution.
is as follows:
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
The mean and standard deviation for the Poisson distribution can be determined by the following
formulas:
𝜇=𝜆
𝜎 =𝜆
where:
Examples:
11. A lake, popular among boat fishermen, has an average catch of three fish every two hours during
the month of October.
a. What is the probability that you will catch 0 fish in seven hours of fishing?
b. What is the probability of catching 3 fish in seven hours of fishing?
c. What is the probability that you will catch 4 or more fish in 7 hours?
Solution:
1. Determine if the problem is a Poisson probability distribution. To check, the average rate,
λ, for the events to occur must be known. If yes, proceed to the next step, if not determine
the kind of probability distribution.
2. Identify a success.
Based on the problem we are asked “What is the probability that you will catch 0 fish in seven
hours of fishing?”, thus,
Based on the problem, “A lake, popular among boat fishermen, has an average catch of three
fish every two hours during the month of October.”, thus,
3 𝑓𝑖𝑠ℎ
𝜆= = 0.6𝑓𝑖𝑠ℎ/ℎ𝑟
2 ℎ𝑜𝑢𝑟𝑠
4. Use the Poisson formula to write the probability distribution of X.
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
For a, we are asked to find “the probability that you will catch 0 fish in seven hours of fishing?”,
our k = 0.
Also please note that the unit of the rate must be consistent to the unknown probability.
The unit of the average rate is fish per hour while the unit of the unknown probability is fish
per 7 hours, so we convert:
0.6𝑓𝑖𝑠ℎ
𝜆= ∙ 7ℎ𝑜𝑢𝑟𝑠 = 10.5 𝑓𝑖𝑠ℎ
ℎ𝑜𝑢𝑟
Substituting to the Poisson formula,
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 0) = = 0.0000275 ≈ 0
0!
Thus, the probability that you will catch 0 fish is 0. This means that it is almost guaranteed
that you will catch fish in 7 hours.
For b, we are asked to find “the probability that you will catch 3 fish in seven hours of
fishing?”, our k = 3.
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 3) = = 0.0053
3!
Thus, the probability that you will catch 3 fish is 0.0053.
For c, we are asked to find “the probability that you will catch 4 or more fish in 7 hours”, our
k ≥ 4. Thus, we are to find the sum of all the probabilities that you will catch 4 fish, 5 fish, 6
fish, and so on.
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 0) = = 0.0000275
0!
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 1) = = 0.000289
1!
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 2) = = 0.00152
2!
(10.5) ∙ 𝑒 .
𝑃(𝑋 = 3) = = 0.0053
3!
Thus,
12. A zoologist is studying the number of times a rare kind of bird has been sighted. The random
variable X is the number of times the bird is sighted every month. We assume that X has a Poisson
distribution with a mean value of 2.5.
a. Find the probability that exactly five birds are sighted in one month.
b. Find the probability that two or more birds are sighted in a 1-month period.
c. Find the mean and standard deviation of X.
Solution:
1. Determine if the problem is a Poisson probability distribution. To check, the average rate,
λ, for the events to occur must be known. If yes, proceed to the next step, if not determine
the kind of probability distribution.
2. Identify a success.
Based on the problem “The random variable X is the number of times the bird is sighted every
month.”, thus,
Based on the problem, “We assume that X has a Poisson distribution with a mean value of
2.5.”, thus,
𝜆 = 2.5 𝑏𝑖𝑟𝑑𝑠/𝑚𝑜𝑛𝑡ℎ
4. Use the Poisson formula to write the probability distribution of X.
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
For a, we are asked to find “the probability that exactly five birds are sighted in one month”,
our k = 0.
Also please note that the unit of the rate must be consistent to the unknown probability.
Since it has similar units then there’s no need to convert.
(2.5) ∙ 𝑒 .
𝑃(𝑋 = 5) = = 0.2138
3!
Therefore, the probability that exactly 5 birds are sighted in one month is 0.2138.
For b, we are asked to find “Find the probability that two or more birds are sighted in a 1-
month period.”, our k ≥ 2. Thus, we are to find the sum of all the probabilities that the zoologist
will sight 2 birds, 3 birds, and so on.
(2.5) ∙ 𝑒 .
𝑃(𝑋 = 0) = = 0.0821
0!
(2.5) ∙ 𝑒 .
𝑃(𝑋 = 1) = = 0.2052
1!
Thus,
For c, we are asked to find “Find the mean and standard deviation of X.”
𝜇 = 𝜆 = 2.5
For the standard deviation, which is the square root of the variance,
𝜎 = 𝜆 = 2.5
𝜎 = √2.5 = 1.58
13. The average number of loaves of bread put on a shelf in a bakery in a half-hour period is 12. Of
interest is the number of loaves of bread put on the shelf in five minutes. What is the probability
that the number of loaves, selected randomly, put on the shelf in five minutes is three?
Solution:
1. Determine if the problem is a Poisson probability distribution. To check, the average rate,
λ, for the events to occur must be known. If yes, proceed to the next step, if not determine
the kind of probability distribution.
2. Identify a success.
Based on the problem “What is the probability that the number of loaves, selected randomly,
put on the shelf in five minutes is three?”, thus,
Based on the problem, “The average number of loaves of bread put on a shelf in a bakery in a
half-hour period is 12”, thus,
12 𝑙𝑜𝑎𝑣𝑒𝑠 12 𝑙𝑜𝑎𝑣𝑒𝑠
𝜆= = = 0.4 𝑙𝑜𝑎𝑣𝑒𝑠 𝑝𝑒𝑟 𝑚𝑖𝑛
0.5 ℎ𝑜𝑢𝑟 30 𝑚𝑖𝑛𝑠
4. Use the Poisson formula to write the probability distribution of X.
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
We are asked to find “the probability that the number of loaves, selected randomly, put on the
shelf in five minutes is three?”. So, k=3
Also please note that the unit of the rate must be consistent to the unknown probability.
The unit of the average rate is loaves per min while the unit of the unknown probability is
loaves per 5 mins, so we convert:
0.4 𝑙𝑜𝑎𝑣𝑒𝑠
𝜆= ∙ 5 𝑚𝑖𝑛𝑠 = 2 𝑙𝑜𝑎𝑣𝑒𝑠
𝑚𝑖𝑛
Substituting to the Poisson formula,
(2) ∙ 𝑒
𝑃(𝑋 = 3) = = 0.180
3!
Therefore, the probability that exactly 3 loaves of bread are on the shelf in 5 minutes is 0.180.
14. Leah’s answering machine receives about six telephone calls between 8 a.m. and 10 a.m. What is
the probability that Leah receives more than one call in the next 15 minutes?
Solution:
1. Determine if the problem is a Poisson probability distribution. To check, the average rate,
λ, for the events to occur must be known. If yes, proceed to the next step, if not determine
the kind of probability distribution.
2. Identify a success.
Based on the problem “What is the probability that Leah receives more than one call in the
next 15 minutes?”, thus,
Based on the problem, “Leah’s answering machine receives about six telephone calls between
8 a.m. and 10 a.m.”, thus,
6 𝑐𝑎𝑙𝑙𝑠
𝜆= = 3 𝑐𝑎𝑙𝑙𝑠 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
2 ℎ𝑜𝑢𝑟𝑠
4. Use the Poisson formula to write the probability distribution of X.
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
We are asked to find, “What is the probability that Leah receives more than one call in the
next 15 minutes?”, so our k > 1. Thus, we are to find the sum of all the probabilities that Leah
will receive 2 calls, 3 calls, and so on.
The unit of the average rate is calls per hour while the unit of the unknown probability is call
in 15 mins, so we convert: 1 hour = 60 mins
3 calls
𝜆= ∙ 15 𝑚𝑖𝑛𝑠 = 0.75 calls
60 𝑚𝑖𝑛
(0.75) ∙ 𝑒 .
𝑃(𝑋 = 0) = = 0.4724
0!
(0.75) ∙ 𝑒 .
𝑃(𝑋 = 1) = = 0.3543
1!
Thus,
15. [11]According to Baydin, an email management company, an email user gets, on average, 147
emails per day. Let X be the number of emails an email user receives per day. What is the mean
and standard deviation?
Solution:
1. Determine if the problem is a Poisson probability distribution. To check, the average rate,
λ, for the events to occur must be known. If yes, proceed to the next step, if not determine
the kind of probability distribution.
2. Identify a success.
Based on the problem “What is the probability that an email user receives exactly 160 emails
per day?”, thus,
Based on the problem, “According to Baydin, an email management company, an email user
gets, on average, 147 emails per day”, thus,
147 𝑒𝑚𝑎𝑖𝑙𝑠
𝜆=
𝑑𝑎𝑦
4. Use the Poisson formula to write the probability distribution of X.
𝜆 ∙𝑒
𝑃(𝑋 = 𝑘) =
𝑘!
We are asked to find “Find the mean and standard deviation of X.”
𝜇 = 𝜆 = 147
For the standard deviation, which is the square root of the variance,
𝜎 = 𝜆 = 147
𝜎 = √147 = 12.12
Hypergeometric Probability distribution is the simplest probability density function. This is the most
basic one because it is created by combining our knowledge of probabilities from Venn diagrams, the
addition and multiplication rules, and the combinatorial counting formula. The hypergeometric
distribution is an example of a discrete probability distribution because there is no possibility of partial
success. This probability distribution is best to use for dependent events, that is, the probability of a
success changes with each draw.
Hypergeometric Experiment
1. You take samples from two groups. The population must be dividable into two and
only two independent subsets.
2. You sample without replacement from the combined groups.
To illustrate, you want to choose a softball team from a combined group of 11 men and 13
women. The team consists of ten players.
3. Each pick is dependent, since sampling is without replacement, that is, the experiment
must have changing probabilities of success with each experiment
To illustrate, In the softball example, the probability of picking a woman first is 13/24. The
probability of picking a man second is 11/23 if a woman was picked first. It is 10/23 if a
man was picked first. The probability of the second pick depends on what happened in the
first pick.
4. You are not dealing with Bernoulli’s Trials, that is, the probability of success changes
for every trial.
The mean and standard deviation for the hypergeometric distribution can be determined by the
following formulas:
𝑛∙𝑘
𝜇=
𝑁
𝑛 ∙ 𝑘 ∙ (𝑁 − 𝑘) ∙ (𝑁 − 𝑛)
𝜎=
𝑁 (𝑁 − 1)
Examples:
16. You are president of an on-campus special events organization. You need a committee of seven
students to plan a special birthday party for the president of the college. Your organization
consists of 18 women and 15 men. You are interested in the number of men on your committee.
If the members of the committee are randomly selected, what is the probability that your
committee has more than four men?
Solution:
1. Determine if the problem is hypergeometric. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. There are two groups: men and women. The experiment of selecting a member without
replacement (you can only select a specific person once) to form a committee is a dependent
event because the probability changes as you select a person one after the other.
2. Identify a success.
Based on the problem, we are being asked “If the members of the committee are randomly
selected, what is the probability that your committee has more than four men?”, thus,
Based on the problem, “You need a committee of seven students to plan a special birthday
party for the president of the college.”, thus our sample is the committee that you will form.
The population is the sum of the elements of the two groups. Based on the problem, “Your
organization consists of 18 women and 15 men.” Thus, N = 18 + 15 = 33.
Based on the problem, “You need a committee of seven students to plan a special birthday
party for the president of the college.” Thus, n = 7.
Since, the success is “selecting a man” then, the success in the population is the total number
of men which is 15, K =15.
Based on the problem, we are asked to find “the probability that your committee has more
than four men”, k > 4.
Since we are asked to find “the probability that your committee has more than four men”, k
> 4. This means that we are to find the sum of all the probabilities that you will select more
than 4 men which are: 5, 6, and 7 men. We stop at 7 because it is our sample size.
( 𝐶 )∙( 𝐶 )
𝑃(𝑋 = 7) = = 0.00151
( 𝐶 )
17. Suppose that a technology task force is being formed to study technology awareness among
instructors. Assume that ten people will be randomly chosen to be on the committee from a group
of 28 volunteers, 20 who are technically proficient and eight who are not. We are interested in
the number on the committee who are not technically proficient. Find the probability that at least
five on the committee are not technically proficient.
Solution:
1. Determine if the problem is hypergeometric. If yes, proceed to the next step, if not
determine the kind of probability distribution.
Yes. There are two groups: technically proficient and not. The experiment of selecting a
member without replacement (you can only select a specific person once) to form a
committee is a dependent event because the probability changes as you select a person one
after the other.
2. Identify a success.
Based from the problem, “We are interested in the number on the committee are not
technically proficient.”, thus,
Based from the problem, “Assume that ten people will be randomly chosen to be on the
committee from a group…”, thus our sample is the committee that will be formed.
Based from the problem, “Assume that ten people will be randomly chosen to be on the
committee from a group of 28 volunteers, 20 who are technically proficient and eight who
are not.” Thus, N = 28.
Based from the problem, “Assume that ten people will be randomly chosen to be on the
committee from a group…” Thus, n = 10.
Since, the success is “a member who is not technically proficient” then, the success in the
population is the total number of those who are not proficient which is 8, K =8.
Based from the problem, we are asked to find “the probability that at least five on the
committee are not technically proficient”, k ≥ 5.
Since we are asked to find “the probability that at least five on the committee are not
technically proficient”. You may recall that the key term “at least” means greater than or
equal to mathematically. This means that we are to find the sum of all the probabilities that
you will select a member who is not technically proficient from 5 to 8. We stop at because it
is our sample size.
VI. Summary
A statistical experiment can be classified as a binomial experiment if the following conditions are
met:
o There are a fixed number of trials, n.
o There are only two possible outcomes, called “success” and, “failure” for each trial. The
letter p denotes the probability of a success on one trial and q denotes the probability of a
failure on one trial. The n trials are independent and are repeated using identical conditions.
The binomial probability distribution is: 𝑃(𝑋 = 𝑘) = 𝐶 ∙ 𝑝 ∙ 𝑞 .
For a binomial random variable, the mean is 𝜇 = 𝑛𝑝 and the standard deviation is 𝜎 = 𝑛𝑝𝑞.
Characteristics of a Poisson distribution:
o The experiment consists of counting the number of events that will occur during a specific
interval of time or in a specific distance, area, or volume.
o The probability that an event occurs in a given time, distance, area, or volume is the same.
o Each event is independent of all other events.
∙
The probability distribution of a Poisson random variable is 𝑃(𝑋 = 𝑘) = .
!
A hypergeometric experiment is a statistical experiment with the following properties
You take samples from two groups.
You are concerned with a group of interest, called the first group.
You sample without replacement from the combined groups.
Each pick is not independent, since sampling is without replacement.
The outcomes of a hypergeometric experiment fit a hypergeometric probability distribution. The
random variable X = the number of items from the group of interest and is given by 𝑃(𝑋 = 𝑘) =
( )∙( )
( )
VII. Exercises
1. Over the years, a medical researcher has found that one out of every ten diabetic patients
receiving insulin develops antibodies against the hormone, thus, requiring a more costly form of
medication.
a. Find the probability that in the next five patients the researcher treats, none will
develop antibodies against insulin.
b. Find the probability that at least one will develop antibodies.
2. According to the Canadian census of 2006, the median annual family income for families in Nova
Scotia is $56,400. [Source: Stats Canada. www.statcan.ca] Consider a random sample of 24 Nova
Scotia households.
a. What is the expected number of households with annual incomes less than $56,400?
b. What is the standard deviation of households with incomes less than $56,400?
c. What is the probability of getting at least 18 out of the 24 households with annual
incomes under $56,400?
3. At the Fencing Center, 60% of the fencers use the foil as their main weapon. We randomly survey
25 fencers at The Fencing Center. We are interested in the number of fencers who do not use the
foil as their main weapon.
a. How many are expected to not to use the foil as their main weapon?
b. Find the probability that six do not use the foil as their main weapon.
4. Rent-A-Car have five cars to rent out daily. The number of requests each day is distributed
according to a Poisson distribution with a mean of 4. Determine each of the following
probabilities:
a. None of its cars are rented
I. Introduction
In this chapter, you will study the normal distribution, the standard normal distribution, and
applications associated with them. We will continue our investigation of normal distributions to include
density curves and learn various methods for calculating probabilities from the normal density curve. We
will also calculate the probability of getting a value with a z-score between two other z-scores, by using a
reference table to look up the value for both scores and subtract them to find the difference.
We will also investigate the normal distribution; the normal distribution is the foundation for
statistical inference and will be an essential part of many of those topics in later chapters. In the
meantime, we will cover some of the types of questions that can be answered using the properties of a
normal distribution. The first examples deal with more theoretical questions that will help you master
basic understandings and computational skills, while the later problems will provide examples with real
data, or at least a real context.
Lastly, we will also approximate discrete probability distributions, specifically, binomial and Poisson
distribution, using the normal distribution.
II. Objectives
In the previous section, we learned about discrete probability distributions. We used both probability
tables and probability histograms to display these distributions. In this section, we shift our focus from
discrete to continuous random variables. We start by looking at the probability distribution of a discrete
random variable and use it to introduce our first example of a probability distribution for a continuous
random variable.
Let X be the shoe size of an adult male. X is a discrete random variable, since shoe sizes can only be
whole and half number values, nothing in between. For this example, we will consider shoe sizes from 6.5
to 15.5. So, the possible values of X are 6.5, 7.0, 7.5, 8.0, and so on, up to and including 15.5. The
probability distribution of the random variable X is given is in Table 6.1.
X P(X)
6.5 0.001
7 0.003
7.5 0.007
8 0.018
8.5 0.034
9 0.054
9.5 0.08
10 0.113
10.5 0.127
11 0.134
11.5 0.122
12 0.107
12.5 0.085
13 0.052
13.5 0.032
14 0.016
14.5 0.009
15 0.004
15.5 0.002
The probability histogram to represent the random variable X is shown in Figure 6.1.
This rectangle
has an area of
0.107.
For probability histograms, the area of the rectangle centered above each value is equal to the
corresponding probability. For example, in the preceding table, we see that the probability for X = 12 is
0.107. In the probability histogram, the rectangle centered above 12 has area of 0.107. We write this
probability as P (X = 12) = 0.107.
Also, for all probability histograms, because the sum of the probabilities of all possible outcomes
must add up to 1, the sums of the areas of all the rectangles shown must also add up to 1.
Now we can find the probability of shoe size taking a value in any interval just by finding the area of
the rectangles over that interval. For instance, the area of the rectangles up to and including 9 shows the
probability of having a shoe size less than or equal to 9 as shown in Figure 6.2.
Just like what we did in the previous lesson, we can find this probability from the table by adding
together the probabilities for shoe sizes 6.5, 7.0, 7.5, 8.0, 8.5 and 9.
Thus, the probability that the shoe size is less than or equal to 9 is 0.117.
Also, recall that for a discrete random variable like shoe size, the probability is affected by whether
we include the end point of the interval. For example, the area – and corresponding probability – is
reduced if we consider only shoe sizes strictly less than 9 as shown in Figure 6.3.
This time when we add the probabilities from the table, we exclude the probability for shoe size 9
and just add together the probabilities for shoe sizes 6.5, 7.0, 7.5, 8.0, and 8.5.
Thus, the probability that the shoe size is less than 9 is 0.063.
What happens to the probability histogram when we measure foot length with more precision?
When we increase the precision of the measurement, we will have a larger number of bins in our
histogram. This makes sense because each bin contains measurements that fall within a smaller interval
of values. For example, if we measure foot lengths in inches, one bin will contain measurements from 6-
inches up to 7-inches. But if we measure foot lengths to the nearest half-inch, then we now have two
bins: one bin with lengths from 6 up to 6.5-inches and the next bin with lengths from 6.5 up to 7-inches.
To illustrate, let’s analyze the following probability distributions in Figure 6.4. The curve you see in
the following figures is generated by a mathematical formula to fit the shape of the probability histogram.
Figure 6. 4 Probability distribution of a foot length rounded to the nearest (a) 0.5 in; (b) 0.25 in; (c) 0.1 in. [12]
Notice that as the width of the intervals gets smaller, the probability histogram gets closer to this
curve. More specifically, the area in the histogram’s rectangles more closely approximates the area under
the curve. If we continue to reduce the size of the intervals, the curve becomes a better and better way
to estimate the probability histogram. We’ll use smooth curves like this one to represent the probability
distributions of continuous random variables. This smooth curve shown in Figure 6.5 is called a probability
density curve.
As in a probability histogram, the total area under the density curve equals 1, and the area under
the curve represents probabilities. Density curves, like probability histograms, may have any shape
imaginable as long as the total area underneath the curve is 1. Each density curve is a mathematical model
with an equation that is used to find the area underneath the curve.
To find the probability that X is in an interval, find the area above the interval and below the density
curve. For example, if the random variable X is foot length, to find the probability that a randomly chosen
male has a foot length anywhere between 10 and 12 inches or P(10 < X < 12), we simply find the area
above the interval 10 < X < 12 and below the curve. This are is represented as green in Figure 6.6.
If, for example, we are interested in P(X < 9), the probability that a randomly chosen male has a foot
length of less than 9 inches, we have to find the area shaded in green as shown in Figure 6.7.
We have seen that for a discrete random variable the endpoint of the interval changes the probability.
For example, P (X < 9) ≠ P(X ≤ 9) in a discrete probability distribution. In contrast, for a continuous random
variable the endpoint of the interval does not change the probability. For example, P (X < 9) = P(X ≤ 9).
Visually, in terms of our density curve, the area under the curve up to and including a certain point is
the same as the area up to and excluding the point. This is because there is no area over a single point.
There are infinitely many possible values for a continuous random variable, so technically, the probability
of any single value occurring is zero!
Figure 6. 8 The heights of these radish plants are continuous random variables. (Credit: Rev Stan)
Continuous random variables have many applications. Baseball batting averages, IQ scores, the
length of time a long-distance telephone call lasts, the amount of money a person carries, the length of
time a computer chip lasts, and SAT scores are just a few. The field of reliability depends on a variety of
continuous random variables.
Probability is represented by area under the curve and is given by a different function called
the cumulative distribution function (cdf). The cumulative distribution function is used to evaluate
probability as area.
There are many continuous probability distributions. When using a continuous probability distribution
to model probability, the distribution used is selected to model and fit the specific situation in the best
way.
- Standard deviation 𝜎 =
2. Exponential Distribution
- a continuous random variable (RV) that appears when we are interested in the
intervals of time between some random events, for example, the length of time
between emergency arrivals at a hospital
- notation: X∼Exp(m)
- Mean: μ=1m
- Standard deviation: σ=1m
- Probability Density Function: 𝑓(𝑥) = 𝑚𝑒 ,𝑥 ≥ 0
- Cumulative Distribution Function: 𝑃(𝑋 ≤ 𝑥) = 1 − 𝑒
3. Normal Distribution
The normal distribution is a continuous random variable distribution which can be described by two
numerical descriptive measures which is the mean (μ) and the standard deviation (σ). Its probability
density function is a bell-shaped curve as shown in Figure 6.11.
The normal distribution is an extremely important concept, because it occurs so often in the data we
collect from the natural world, as well as in many of the more theoretical ideas that are the foundation of
statistics. Here are some of the applications of a normal distribution:
The probability density function of a normal distribution is a rather complicated function. Do not
memorize it. It is not necessary.
( )
𝑒
𝑓(𝑥) =
𝜎√2𝜋
The cumulative distribution function is P (X < x). It is calculated either by a calculator or a computer,
or it is looked up in a table.
Because so many real data sets closely approximate a normal distribution, we can use the idealized
normal curve to learn a great deal about such data. With a practical data collection, the distribution will
never be exactly symmetric, so just like situations involving probability, a true normal distribution only
results from an infinite collection of data. Also, it is important to note that the normal distribution
describes a continuous random variable, that is, there are no gaps or holes, for each value of X, there is a
corresponding value of Y.
2. Center
Due to the exact symmetry of a normal curve, the center of a normal distribution, or a data set that
approximates a normal distribution, is located at the highest point of the distribution, and all the statistical
measures of central tendency (the mean, median, and mode) are equal.
It is also important to realize that this center peak divides the data into two equal parts as shown in
Figure 6.14.
3. Spread
asymptotically. In other words, it will move towards infinity in both negative and positive direction, but it
will never touch the horizontal axis.
Because of this infinite spread, the range would not be a useful statistical measure of spread. The
most common way to measure the spread of a normal distribution is with the standard deviation, or the
typical distance away from the mean. Because of the symmetry of a normal distribution, the standard
deviation indicates how far away from the maximum peak the data will be.
A change in the standard deviation, σ, causes a change in the shape of the curve; the curve
becomes fatter or skinnier depending on σ.
A change in μ causes the graph to shift to the left or right.
To illustrate, Figure 6.16 shows two normal distributions with different standard deviation.
The normal distribution in Fig 6.16a has a smaller standard deviation, and so more of the data are
heavily concentrated around the mean than in the normal distribution in Figure 6.16b. Also, in (a) there
are fewer data values at the extremes than in (b). Because (b) has a larger standard deviation, the data
are spread farther from the mean value, with more of the data appearing in the tails.
Because of the similar shape of all normal distributions, we can measure the percentage of data
that is a certain distance from the mean no matter what the standard deviation of the data set is. Figure
6.17 shows a standard normal curve, a normal distribution with μ = 0 and σ = 1. In this case, the values
of x represent the number of standard deviations away from the mean.
The Empirical Rule states that the percentages of data in a normal distribution within 1, 2, and 3
standard deviations of the mean are approximately 68%, 95%, and 99.7%, respectively. It is commonly
known as the 68 – 95 – 99.7 Rule. To accommodate the percentages given by the Empirical Rule, there
are defined values in each of the regions to the left and to the right of the mean.
Note that the total area under a normal distribution curve is equal to 1.00, or 100%.
5. z-Scores
Z-scores are related to the Empirical Rule from the standpoint of being a method of evaluating how
extreme a specific value, X, is in a data set. You can think of a z-score as the number of standard deviations
there are between a given value and the mean of the set. While the Empirical Rule allows you to associate
the first three standard deviations with the percentage of data that each σ includes, the z-score allows
you to state, as accurately as you like, just how many σ a given value is above or below the mean.
𝑋−𝜇
𝑧=
𝜎
Since σ is always positive, all values that are below the mean have negative z-scores, while all values
that are above the mean have positive z-scores. A z-score of zero means that the term has the same value
as the mean.
Examples:
1. On a nationwide math test, the mean was 65 and the standard deviation was 10. If Robert scored
81, what was his z-score?
Solution:
2. What is the z-score of the price of a pair of skis that cost $247, if the mean ski price is $279, with
a standard deviation of $16?
Solution:
μ = 279
σ = 16
x = 247
3. What is the z-score of a 5-scoop ice cream cone if the mean number of scoops is 3, with a standard
deviation of 1 scoop?
Solution:
μ=3
σ=1
x=5
VI. Finding the Area Under the Curve given the Z - Score
As we have discussed, to find the probability of a continuous random variable X in an interval, we simply
find the area above the interval and below the density curve. [4]To find the area the following two-step
process is recommended with the use of the Procedure Table shown. The two steps are as follows:
Figure 6. 19 Procedure Table of Finding the Area under the curve [4]
Examples:
4. Find the area under the curve that lies between 1.20 and 2.31.
Solution:
We are to find the area for 1.20 < z < 2.31. Following the two – step process:
Find the appropriate figure in the Procedure Table and follow the directions given.
The appropriate figure in the Procedure Table is 3. Thus, we need to look up both the z values
and then subtract the corresponding areas.
To find the corresponding area, we read down the left side of the table for the z-score first 2
digits (the whole number and the first number after the decimal point), then we read across the
top part of the table for the 2nd decimal place of the z-score that we are interested in. Their
intersection is the required area.
Based on the table, the area of z = 1.20 is 0.88493 while the area of z = 2.31 is 0.98956.
5. Find the area under the curve that lies between -1.32 and +1.49.
Solution:
We are to find the area for -1.32 < z < 1.49. Following the two – step process:
Find the appropriate figure in the Procedure Table and follow the directions given.
The appropriate figure in the Procedure Table is 3. Thus, we need to look up both the z values
and then subtract the corresponding areas.
To find the corresponding area, we read down the left side of the table for the z-score first 2 digits
(the whole number and the first number after the decimal point), then we read across the top part of
the table for the 2nd decimal place of the z-score that we are interested in. Their intersection is the
required area.
For z = +1.49,
For z = -1.32,
Based on the table, the area of z = 1.49 is 0.93189 while the area of z = -1.32 is 0.09342.
6. Find the area under the curve of a z-score greater than +0.09?
Solution:
We are to find the area for z > 0.09. Following the two – step process:
Since we are to get the area greater than 0.09, we will be shading the areas to the right of 0.09.
Find the appropriate figure in the Procedure Table and follow the directions given.
The appropriate figure in the Procedure Table is 2. Thus, we need to look up the z value and then
subtract the corresponding area to 1.
To find the corresponding area, we read down the left side of the table for the z-score first 2 digits
(the whole number and the first number after the decimal point), then we read across the top part of
the table for the 2nd decimal place of the z-score that we are interested in. Their intersection is the
required area.
7. Find the area under the curve of a z-score lesser than -0.02?
Solution:
We are to find the area for z < 0.02. Following the two – step process:
Since we are to get the area lesser than -0.02, we will be shading the areas to the left of -0.02.
Find the appropriate figure in the Procedure Table and follow the directions given.
The appropriate figure in the Procedure Table is 1. Thus, we simply look up the z value, and use
the corresponding area.
To find the corresponding area, we read down the left side of the table for the z-score first 2 digits
(the whole number and the first number after the decimal point), then we read across the top part of
the table for the 2nd decimal place of the z-score that we are interested in. Their intersection is the
required area.
If you understand the relationship between the area under a density curve and mean, standard deviation,
and z-scores, you should be able to solve problems in which you are provided all but one of these values
and are asked to calculate the remaining value. In the previous section, we found the area the area under
a density curve within a specific range. What if you are asked to find a value that gives a specific area?
Examples:
8. Find the z value such that the area under the standard normal distribution curve between 0 and
the z value is 0.2123.
Solution:
We are to find the z value that would give the area of 0.2123 from z = 0. That is, A (0 < z < X) =
0.2123.
Calculate the area of the curve to the left of the unknown z score.
As you may have noticed, the areas found in our reference table is always going to the left, thus
we need to get the total area to the left of the unknown z-score. Because a normal distribution
curve is symmetrical, z = 0 divides the curve into 2 equal parts with an area of 0.50 each.
Unknown
z-score
0.5
As shown in Figure 6.30, the total area to the left of the unknown z – score is,
Look for the required area in the middle. Move to the right to find the first 2 digits (the whole
number and the first number after the decimal point, then move upwards for the 2nd decimal
place of the z-score. In case there are no exact values, you can use the nearest are available.
9. Find the z value to the right of the mean so that 54.78% of the area under the distribution curve
lies to the left of it.
Solution:
We are to find the z value to the right of the mean or z = 0 that would give the area of 54.78% or
0.5478. Since we are looking for a value to the right of the mean, it is a POSITIVE value. Also, the
area is to the left of the z – score, thus it moves towards the negative infinity.
Calculate the area of the curve to the left of the unknown z score.
Because the shaded area is already an area to the left of the unknown z- score, we can already
proceed to the next procedure.
A = 0.5478
Look for the required area in the middle. Move to the right to find the first 2 digits (the whole
number and the first number after the decimal point, then move upwards for the 2nd decimal
place of the z-score. In case there are no exact values, you can use the nearest are available.
10. Find the z value to the left of the mean so that 98.87% of the area under the distribution curve
lies to the right of it.
Solution:
We are to find the z value to the left of the mean or z = 0 that would give the area of 98.87% or
0.9887. Since we are looking for a value to the left of the mean, it is a NEGATIVE value. Also, the
area is to the right of the z – score, thus it moves towards the positive infinity.
Calculate the area of the curve to the left of the unknown z score.
Because the shaded area is already an area to the right of the unknown z- score and remembering
that the total area under the curve is 1, we simply subtract the given area to 1. Thus,
A = 1 – 0.9887 = 0.113
Look for the required area in the middle. Move to the right to find the first 2 digits (the whole
number and the first number after the decimal point, then move upwards for the 2nd decimal
place of the z-score. In case there are no exact values, you can use the nearest are available.
11. Find two z values, one positive and one negative, that are equidistant from the mean so that the
areas in the two tails add to 5%.
Solution:
We are to find two z – scores that are equidistant from the mean with a total area of 5% or 0.05.
0.025 0.025
. .
Calculate the area of the curve to the left of the unknown z score.
Note that because the two area is equidistant to the mean, the mean divides equally the area into
two. This will give us an area of 0.025 to the left and 0.025 to the right.
Figure 6. 37 Required area for the Positive z score in Example 11. [14]
For the positive z – score, we need to get the area to its left. Based on the graph as shown in
Figure 6.37, we simply add the lower half of the curve (0.50) and the given area,
Figure 6. 38 Required area for the negative z score in Example 11. [14]
For the negative z – score, we again need to get the area to its left as shown in Figure 6.38. We
know that the total area of the lower half is 0.5, so we simply subtract the given area to it,
Look for the required area in the middle. Move to the right to find the first 2 digits (the whole
number and the first number after the decimal point, then move upwards for the 2nd decimal
place of the z-score. In case there are no exact values, you can use the nearest are available.
For the positive z – score, since we don’t have the exact value, we can just choose the nearest
area which is 0.52392.
For the negative z – score, since we don’t have the exact value, we can just choose the nearest
area which is 0.47608.
The normal distribution is the foundation for statistical inference. In the meantime, this section will
cover some of the types of questions that can be answered using the properties of a normal distribution.
The problems will provide examples with real data, or at least a real context. We will be using the concepts
from the two previous sections in these problems.
1. Draw a normal curve and shade the desired area that represents the
probability, proportion, or percentile.
2. Find the z – value from the table that corresponds to the desired area.
3. Calculate the X value by using the formula
𝑋 = 𝑧𝜎 + 𝜇
Examples:
12. The Information Centre of the National Health Service in Britain collects and publishes a great deal
of information and statistics on health issues affecting the population. One such comprehensive
data set tracks information about the health of children [16]. According to its statistics, in 2006,
the mean height of 12-year-old boys was 152.9 cm, with a standard deviation estimate of
approximately 8.5 cm.
a. If 12-year-old Cecil is 158 cm, approximately what percentage of all 12-year-old boys in
Britain is he taller than?
b. How tall would Cecil need to be in order to be in the top 1% of all 12-year-old boys in
Britain?
Solution:
For a, we are to find the percentage of all the boys in Britain is Cecil taller than. In other words, we are
to look for the probability of a randomly selected British 12-year-old boy being shorter than Cecil.
Mathematically, P (X<158 cm). Since Probability is required, we use the steps in finding the probability.
Note that the mean is 152.9 cm, thus when we plot 158 cm, it’s to the right of the mean, or at the
upper half. Because we want to obtain the probability LESSER than 158, we are to shade the area to
the left.
P(X<158)
152.9 cm 158 cm
Figure 6. 41 Probability Density Curve for Example 12a [15]
μ = 152.9 cm
σ = 8.5 cm
X = 158 cm
Substituting we have,
𝑋 − 𝜇 158 − 152.9
𝑧= = = 0.6
𝜎 8.5
z = 0.60
Since the area required is to the left, we use Procedure 1 (Refer to Figure 6.19), which means that we
simply obtain the desired area from the table.
P(X<158) = 0.72575
For b, we are to find the required height if Cecil for him to be in the top 1% of all 12-year-old boys in
height. In other words, 99% of the boys are shorter than him. Since we are to find the data value, we
use the steps in finding the data value.
1. Draw a normal curve and shade the desired area that represents the probability, proportion,
or percentile.
0.99
X
Figure 6. 42 Probability Density Curve for Example 12b [15]
2. Find the z – value from the table that corresponds to the desired area.
Since the area given is already to the left of the required z – score, we simply obtain the z – score
from the table.
𝑋 = 𝑧𝜎 + 𝜇 = 2.36(8.5) + 152.9
𝑿 = 𝟏𝟕𝟐. 𝟗𝟓 𝒄𝒎
Thus, Cecile would need to be about 173 cm tall to be in the top 1% of 12-year-old boys in Britain.
13. Suppose yearly rainfall totals for a city in upstate New York follow a normal distribution, with
mean 20 inches and standard deviation of 5 inches. For a randomly selected year, what is the
probability that total rainfall will be in each of the following intervals?
a. Less than 12 inches
b. Greater than 25 inches
c. Between 14 and 24 inches
Solution:
For a, we are to find the probability that the total rainfall is less than 12 inches or P(X < 12 in). We use
the steps in finding the probability.
Note that the mean is 20 inches, thus when we plot 12 in, it’s to the left of the mean, or at the lower
half of the curve. Because we want to obtain the probability of rainfall LESSER than 12, we are to
shade the area to the left.
12 in 20 in
Figure 6. 43 Probability Density Curve for Example 13a [15]
μ = 20 in
σ = 5 in
X = 12 in
Substituting we have,
𝑋 − 𝜇 12 − 20
𝑧= = = −1.6
𝜎 5
z = - 1.60
Since the area required is to the left, we use Procedure 1 (refer to Figure 6.19), which means that we
simply obtain the desired area from the table.
P(X<12) = 0.05480
Thus, the probability that the total rainfall is less than 12 inches in a randomly selected year is 0.0548
or 5.48%.
For b, we are to find the probability that the total rainfall is greater than 25 inches or P(X > 25 in). We
use the steps in finding the probability.
Note that the mean is 20 inches, thus when we plot 25 in, it is to the right of the mean, or at the
upper half of the curve. Because we want to obtain the probability of rainfall GREATER than 25, we
are to shade the area to the right.
20 in 25 in
Figure 6. 44 Probability Density Curve for Example 13b [15]
μ = 20 in
σ = 5 in
X = 25 in
Substituting we have,
𝑋 − 𝜇 25 − 20
𝑧= = =1
𝜎 5
z = 1.00
Since the area required is to the right, we use Procedure 2 (refer to Figure 6.19), which means that
we obtain the desired area from the table and then subtract the area from 1.
Thus, the probability that the total rainfall is greater than 25 inches in a randomly selected year is
0.15866 or 15.87%.
For c, we are to find the probability that the total rainfall is between 14 and 24 inches or P (14< X <
24). We use the steps in finding the probability.
Note that the mean is 20 inches, thus when we plot 14 in, to the left of the mean, or at the upper
half of the curve while 24 in is to the right of the mean. Because we want to obtain the probability of
rainfall between the two values, we are to shade the area between them.
14 20 in 24 in
Figure 6. 45 Probability Density Curve for Example 13c [15]
μ = 20 in
σ = 5 in
X1 = 14 in
X2 = 24 in
𝑋 − 𝜇 14 − 20
𝑧= = = −1.20
𝜎 5
𝑋 − 𝜇 24 − 20
𝑧= = = 0.80
𝜎 5
-1.20 < z < 0.80
Since the area required is between two values, we use Procedure 3 (refer to Figure 6.19), which means
that we obtain the desired areas from the table and then subtract.
From the table, A (z = -1.20) = 0.11507 and A(z = 0.80) = 0.78814. Thus,
Thus, the probability that the total rainfall is between 14 and 24 inches in a randomly selected year is
0.67307 or 67.31%.
14. The final exam scores in a statistics class were normally distributed with a mean of 63 and a
standard deviation of five. Find the 90th percentile.
Solution:
We are to find the 90th percentile, that is, the score x that has 90% of the scores below x and 10%
of the scores above x. Since we are to find X, we use the steps in finding the data value.
1. Draw a normal curve and shade the desired area that represents the probability, proportion,
or percentile.
0.90
X
Figure 6. 46 Probability Density Curve for Example 14 [15]
2. Find the z – value from the table that corresponds to the desired area.
Since the area given is already to the left of the required z – score, we simply obtain the z – score
from the table.
From the table, there is no exact value, so we just use the nearest which is 0.89973, thus z = 1.28.
μ = 63
σ=5
𝑋 = 𝑧𝜎 + 𝜇 = 1.28(5) + 63
𝑿 = 𝟔𝟗. 𝟔
Many real-life situations involve binomial probabilities, as we saw in prior lessons on binomial
experiments. In fact, even many questions that don’t appear binomial at first can be formatted so that
they are, allowing the probability of success or failure of a given study to be calculated as a binomial
probability. Unfortunately, if the probability of success spans a wide range of possible values, the
calculation can become very burdensome.
To illustrate, suppose you were completing a multiple-choice test, and you are worried that you don’t
know the information well enough. If there are 75 questions, each with 4 answers, what is the probability
that you would get at least 60 correct just by guessing randomly?
You could probably answer this question using binomial probability, but it would be quite a
calculation, requiring you to individually calculate the probability of getting 60 correct, adding it to the
probability of getting 61 correct, and so on, all the way up to 75!
The good news is that there is another way to approximate the probability of success, and you can
see what it is by comparing the following graphs. The first graph displays the probability of getting various
numbers of heads over 100 flips of a fair coin, in other words, the distribution of a binomial random
variable with P(success)=.50. The second graph is a normal distribution. Notice any similarities?
They are extremely similar in shape, in fact, if you follow a “rule of thumb”, you can use a normal
distribution to estimate the results of a binomial distribution with quite acceptable accuracy.
The rule of thumb for knowing when the normal distribution will provide a good
approximation of a binomial distribution with the same mean and standard deviation
is.
where:
Learning Activity 6
(in this case, 7.5 to 8.5) must be used. Hence, when you employ a normal distribution to approximate the
binomial, you must use the boundaries of any specific value X as they are shown in the binomial
distribution. Table 6.2 summarizes the correction factors used in Binomial Approximation.
Table 6. 2 Table of Correction for Continuity
Binomial Normal
When finding: Use:
P (X = a) P (a ─ 0.5 < X < a + 0.5)
P (X ≥ a) P (X > a ─ 0.5)
P (X > a) P (X > a + 0.5)
P (X ≤ a) P (X < a + 0.5)
P (X < a) P (X < a ─ 0.5)
There are two major reasons to employ a correction for continuity adjustment here.
First, recall that a discrete random variable can take on only specified values while a continuous random
variable can take on any values within a continuum or interval around those specified values. Hence, when
using the normal distribution to approximate the binomial or the Poisson distributions, more accurate
approximations of the probabilities are likely to be obtained if a correction for continuity adjustment is
employed.
Second, recall that with a continuous distribution (such as the normal), the probability of obtaining a
particular value of a random variable is zero. On the other hand, when the normal distribution is used to
approximate a discrete distribution, a correction for continuity adjustment can be employed so that the
probability of a specific value of the discrete distribution can be approximated.
1. Using the rule of thumb, check to see whether the normal approximation can
be utilized.
2. Find the mean μ and the standard deviation σ.
𝜇 =𝑛∙𝑝 𝜎 = 𝑛∙𝑝∙𝑞
3. Write the problem in probability notation, using X.
4. Rewrite the notation by using the continuity correction factor and show the
corresponding area under the normal distribution.
5. Convert the values of X to z values.
𝑋−𝜇
𝑧=
𝜎
6. Find the corresponding areas using the reference table.
Examples:
15. Ciara works in a production plant. Due to the balance of speed and accuracy in production, each
part off the line has a 98.8% probability of defect free production. What is the probability that
Ciara will produce at least 990 parts without a defect in a 1000-part run?
Solution:
We are to find the probability that Ciara will produce at least 990 parts without a defect in a 1000-
part run. Thus, our event of success is “without a defect”, so we have the following given:
p = 98.8% = 0.988
q = 1 – 0.988 = 0.012
n = 1000
x = at least 990
1. Using the rule of thumb, check to see whether the normal approximation can be
utilized.
𝜇 = 𝑛 ∙ 𝑝 = 1000(0.988) = 988
The problem states that we are to find “the probability that Ciara will produce at least
990 parts without a defect in a 1000-part run”. Writing it in a probability notation we
have,
𝑃(𝑋 ≥ 990)
4. Rewrite the notation by using the continuity correction factor and show the
corresponding area under the normal distribution.
P(x>989.5)
988 989.5
Figure 6. 48 Probability Density Curve for Example 15 [15]
Substituting we have,
𝑋 − 𝜇 989.5 − 988
𝑧= = = 0.4360
𝜎 3.44
Since the area required is to the right, we use Procedure 2 (refer to Figure 6.19), which
means that we obtain the desired area from the table and subtract it to 1.
Thus, the probability that Ciara will produce at least 990 parts without a defect in a 1000-part run is
approximately 33%.
16. Suppose you were completing a multiple-choice test, and you are worried that you don’t know
the information well enough. If there are 75 questions, each with 4 answers, what is the
probability that you would get at least 60 correct just by guessing randomly?
Solution:
We are to find the probability that you would get at least 60 correct just by guessing randomly
Thus, our event of success is “getting a correct answer”, so we have the following given:
p = 1/4 = 0.25 ---- “each with 4 answers”
q = 1 – 0.25 = 0.75
n = 75
x = at least 60
1. Using the rule of thumb, check to see whether the normal approximation can be utilized.
𝜇 = 𝑛 ∙ 𝑝 = 75(0.25) = 18.75
The problem states that we are to find “the probability that you would get at least 60
correct just by guessing randomly”. Writing it in a probability notation we have,
𝑃(𝑋 ≥ 60)
4. Rewrite the notation by using the continuity correction factor and show the
corresponding area under the normal distribution.
P(x>59.5)
18.75 59.5
Figure 6. 49 Probability Density Curve for Example 16 [15]
Substituting we have,
𝑋 − 𝜇 59.5 − 18.75
𝑧= = = 10.8667
𝜎 3.75
Since the area required is to the right, we use Procedure 2 (refer to Figure 6.19), which
means that we obtain the desired area from the table and subtract it to 1.
Because we have no value greater than 3.99, we can approximate that the value would
be so small almost approaching 1. Thus,
P(X>989.5) = 1 – 1 ≈ 0.
Thus, the probability that you would get at least 60 correct just by guessing randomly is
approximately 0%. This tells us that we should not rely on guessing, we must study! :D
17. A magazine reported that 6% of American drivers read the newspaper while driving. If 300
drivers are selected at random, find the probability that exactly 25 say they read the newspaper
while driving.
Solution:
We are to find the probability that find the “probability that exactly 25 say they read the
newspaper while driving”. Thus, our event of success is “read the newspaper while driving”, so
we have the following given:
p = 6% = 0.06
q = 1 – 0.06 = 0.94
n = 300
x = 25
1. Using the rule of thumb, check to see whether the normal approximation can be utilized.
𝑛 ∙ 𝑝 = 300(0.06) = 18 > 5
𝑛 ∙ 𝑞 = 300(0.94) = 282 > 5
Since both conditions are satisfied, we can therefore use the approximation.
𝜇 = 𝑛 ∙ 𝑝 = 300(0.06) = 18
The problem states that we are to find “probability that exactly 25 say they read the
newspaper while driving”. Writing it in a probability notation we have,
𝑃(𝑋 = 25)
4. Rewrite the notation by using the continuity correction factor and show the
corresponding area under the normal distribution.
Referring to table 6.2, the binomial probability of 𝑃(𝑋 = 𝑎) is corrected to P (a ─ 0.5 < X
< a + 0.5) . Thus, we will be utilizing,
P(24.5<x<25.5)
24.5 25.5
Since the area required is between two z - scores, we use Procedure 3 (refer to Figure
6.19), which means that we obtain the desired area from the table for each of the z -score
and subtract it with each other.
Thus, the probability that exactly 25 drivers read the newspaper while driving is 2.27%.
Just like the binomial probability distribution, we can use the normal distribution to approximate
Poisson distribution. When is the normal distribution an excellent approximation? We can approximate,
if λ is greater than about 5, such that an appropriate continuity correction is performed.
Examples:
18. The annual number of earthquakes registering at least 2.5 on the Richter Scale and having an
epicenter within 40 miles of downtown Memphis follows a Poisson distribution with mean 6.5.
What is the probability that at least 9 such earthquakes will strike next year? (Adapted from An
Introduction to Mathematical Statistics, by Richard J. Larsen and Morris L. Marx.)
Solution:
We are to find the probability that find the “probability that at least 9 such earthquakes will strike
next year”. We have the following given:
λ = 6.5
1. Using the rule of thumb, check to see whether the normal approximation can be utilized.
𝜆 = 6.5 > 5
Since the condition is satisfied, we can therefore use the approximation.
𝜇 = 𝜆 = 6.5
𝜎 = √𝜆 = √6.5 = 2.55
3. Write the problem in probability notation, using X.
The problem states that we are to find “probability that at least 9 such earthquakes will
strike next year”. Writing it in a probability notation we have,
𝑃(𝑋 ≥ 9)
4. Rewrite the notation by using the continuity correction factor and show the
corresponding area under the normal distribution.
Referring to table 6.2, the Poisson probability of 𝑃(𝑋 ≥ 𝑎) is corrected to P (X > a ─ 0.5).
Thus, we will be utilizing,
P(x>8.5)
6.5 8.5
Figure 6. 51 Probability Density Curve for Example 18 [15]
Substituting, we have,
𝑋 − 𝜇 8.5 − 6.5
𝑧= = = 0.78
𝜎 2.55
Since the area required is to the right of the z -score, we use Procedure 2 (refer to Figure
6.19), which means that we obtain the desired area from the table and subtract it with 1.
Thus, the probability that at least 9 such earthquakes will strike next year is 21.77%.
19. Suppose that at a certain automobile plant the average number of work stoppages per day due
to equipment problems during the production process is 12.0. What is the approximate
probability of having 15 or fewer work stoppages due to equipment problems on any given day?
Solution:
We are to find the probability that find the “probability that 15 or fewer work stoppages due to
equipment problems on any given day”. We have the following given:
λ = 12
1. Using the rule of thumb, check to see whether the normal approximation can be utilized.
𝜆 = 12 > 5
Since the condition is satisfied, we can therefore use the approximation.
𝜇 = 𝜆 = 12
𝜎 = √𝜆 = √12 = 3.46
3. Write the problem in probability notation, using X.
The problem states that we are to find “the probability that 15 or fewer work stoppages due
to equipment problems on any given day”. The terms “15 or fewer” signifies that we need to
find probabilities less than or equal to 15. Writing it in a probability notation we have,
𝑃(𝑋 ≤ 15)
4. Rewrite the notation by using the continuity correction factor and show the corresponding
area under the normal distribution.
Referring to table 6.2, the Poisson probability of 𝑃(𝑋 ≤ 𝑎) is corrected to P (X < a + 0.5)).
Thus, we will be utilizing,
P(x<15.5)
12 15.5
Figure 6. 52 Probability Density Curve for Example 19 [15]
Substituting, we have,
𝑋 − 𝜇 15.5 − 12
𝑧= = = 1.01
𝜎 3.46
Since the area required is to the left of the z -score, we use Procedure 1 (refer to Figure
6.19), which means that we obtain the desired area from the table.
Thus, the approximate probability of having 15 or fewer work stoppages due to equipment problems
on any given day is 84.38%.
XI. Summary
The probability density function (pdf) is used to describe probabilities for continuous random
variables. The area under the density curve between two points corresponds to the probability
that the variable falls between those two values. In other words, the area under the density
curve between points a and b is equal to P(a<x<b).
The cumulative distribution function (cdf) gives the probability as an area. If X is a continuous
random variable, the probability density function (pdf), f(x), is used to draw the graph of the
probability distribution. The total area under the graph of f(x) is one. The area under the graph
of f(x) and between values a and b gives the probability P(a<x<b).
A normal distribution is a perfectly symmetric, mound-shaped distribution that appears in many
practical and real data sets. It is an especially important foundation for making conclusions, or
inferences, about data.
A standard normal distribution is a normal distribution for which the mean is 0 and the standard
deviation is 1.
A density curve is an idealized representation of a distribution in which the area under the curve
is defined as 1, or in terms of percentages, a probability of 100%. A normal density curve is
simply a density curve for a normal distribution. Normal density curves have two inflection
points, which are the points on the curve where it changes concavity. These points correspond
to the points in the normal distribution that are exactly 1 standard deviation away from the
mean.
The Empirical Rule is the name given to the observation that approximately 68% of a normally
distributed data set is within 1 standard deviation of the mean, about 95% is within 2 standard
deviations of the mean, and about 99.7% is within 3 standard deviations of the mean. Some
refer to this as the 68-95-99.7 Rule.
A z-table often provides the area under the standard normal density curve between the mean
and a specific z-score.
A z-score is a measure of the number of standard deviations a specific data value is away from
the mean. It tells you how many standard deviations x is above (greater than) or below (less
than) µ.
Concepts in Statistics. Provided by: Open Learning Initiative. Located at: http://oli.cmu.edu. License: CC
BY: Attribution
Why It Matters: Normal Distribution. Authored by Paul Jones. Provided by Columbia Basin
College. License: CC BY: Attribution
Introductory Statistics. Authored by: Barbara Illowski, Susan Dean. Provided by: OpenStax. Located at:
http://cnx.org/contents/30189442-6998-4686-ac05-ed152b91b9de@17.44. License: CC BY:
Attribution. License Terms: Download for free at http://cnx.org/contents/30189442-6998-4686-ac05-
ed152b91b9de@17.44.
XIII. Exercises
1. The grades on a statistics mid-term for a high school are normally distributed, with μ = 81 and σ =
6.3. Calculate the z-scores for each of the following exam grades: 65, 83, 93, 100.
2. Assume that the mean weight of 1-year-old girls in the USA is normally distributed, with a mean
of about 9.5 kilograms and a standard deviation of approximately 1.1 kilograms. Calculate the z-
scores for each of the following weights in kilograms: 8.4, 7.3, and 11.7.
3. Find the area under the curve that lies
a. to the right of z = 1.84
b. between z = −1.97 and z = 0.86
c. between z = 1.62 and z = -1.35
d. to the right of z= -1.14
e. to the left of z = 2.09
4. Find the z value to the left of the mean so that 82.12% of the area under the distribution curve
lies to the right of it.
5. Find the z value to the right of the mean so that 88.10% of the area under the distribution curve
lies to the left of it.
6. Find two z values so that 48% of the middle area is bounded by them.
7. The 2007 AP Statistics examination scores were not normally distributed, with μ=2.8 and σ=1.34.
What is the approximate z-score that corresponds to an exam score of 5?
8. Suppose that the wrapper of a certain candy bar lists its weight as 2.13 ounces. Naturally, the
weights of individual bars vary somewhat. Suppose that the weights of these candy bars vary
according to a normal distribution, with μ=2.2 ounces and σ=0.04 ounces.
a. What proportion of the candy bars weigh less than the advertised weight?
b. A candy bar of what weight would be heavier than all but 1% of the candy bars out there.
9. The heights of women are ages 18 to 24 are approximately normally distributed with mean 64.5
inches and standard deviation 2.5 inches. What percent of women in this age group are taller than
62 inches?
10. For a medical study, a researcher wishes to select people in the middle 60% of the population
based on blood pressure. If blood pressure readings are normally distributed, and the mean
systolic blood pressure is 120 and the standard deviation is 8, find the upper and lower readings
that would qualify people to participate in the study.
11. A certain machine makes electrical resistors having a mean resistance of 40 ohms and a standard
deviation of 2 ohms. If the resistance follows a normal distribution and can be measured to any
degree of accuracy, what percentage of resistors will have a resistance exceeding 43 ohms?
12. Scores on an intelligence test for the age group 20 to 34 are approximately normally distributed
with mean 110 and standard deviation 25. About what percent of people in this age group have
scores
a. Above 110?
b. Below 85?
c. What percent of people ages 20 – 34 have IQs below 100?
d. If only 1% of people in this age group have IQs higher than Elizabeth, what is Elizabeth’s
IQ?
13. Karen is playing a game of chance with a probability of success of 33%. If she plays the game 43
times, what is the probability that she will win more than 19 times?
14. Sharon can’t decide between two guys that she likes. She picks a daisy from the garden and
decides to play “I like Greg more; I like Stan more” with the petals. The chance of the last petal
being “I like Greg more” is 67%. She decides to go through this process with 48 daisies. What is
the probability that she will select Greg more than 36 times?
15. Steve has created a “grab the marble” game. If you grab a green marble you get a dollar, if you
grab a yellow marble you get nothing. There are 31 green marbles, and 69 yellow ones. You decide
to reach into the bag and grab a marble 39 times, replacing the marble you grab each time. What
is the probability that you will win more than 9 dollars?
16. Cars arrive at Kenny’s Car Wash at a rate of nine per half-hour. What is the approximate
probability that in any given half-hour period at least three cars arrive?
17. On average, 10.0 persons per minute are waiting for an elevator in the lobby of a large office
building between the hours of 8 A.M. and 9 A.M. What is the approximate probability that in any
1-minute period at most four persons are waiting?
I. Introduction
In real life, we are often interested in several random variables that are related to each other. For
example, suppose that we choose a random family, and we would like to study the number of people in
the family, the household income, the ages of the family members, etc. Each of these is a random variable,
and we suspect that they are dependent. In this chapter, we develop tools to study joint distributions of
random variables. The concepts are similar to what we have seen so far. The only difference is that instead
of one random variable, we consider two or more. In this chapter, we will focus on two random variables.
We will first discuss joint distributions of discrete random variables and then extend the results to
continuous random variables.
II. Objectives
1. Explain the joint probability mass function, probability density function, and cumulative
distribution of two random variables.
2. Calculate the probabilities and marginals from a joint probability mass function or probability
density function.
3. Calculate the conditional probability distribution of joint probability mass function and
probability density function.
In science and in real life, we are often interested in two (or more) random variables at the same
time. For example, we might measure the height and weight of giraffes, or the IQ and birthweight of
children, or the frequency of exercise and the rate of heart disease in adults, or the level of air pollution
and rate of respiratory illness in cities, or the number of Facebook friends and the age of Facebook
members.
Think: What relationship would you expect in each of the five examples above? Why?
In such situations the random variables have a joint probability distribution that allows us to
compute probabilities of events involving both variables and understand the relationship between the
variables. This is simplest when the variables are independent. There are two types of joint probability
distribution: Joint Probability Mass function, and the Joint Probability Density Function.
If X and Y are discrete random variables, this distribution can be described with a joint probability
mass function.
If X and Y are continuous random variables, this distribution can be described with a joint
probability density function.
If we have two discrete random variables X and Y, and we would like to study them jointly, we
define the joint probability mass function (pmf) as follows:
Suppose X and Y are two discrete random variables and that X takes values {x , x , … , x },
Y takes values {y , y , … , y } and the ordered pair (X, Y) take values in the product
{(x , y ), (x , y ), … , (x , y )}. The joint probability mass function of X and Y is
𝑃 (𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)
𝑃 (𝑥, 𝑦) = 𝑃((𝑋 = 𝑥) 𝑎𝑛𝑑 (𝑌 = 𝑦))
X Y y1 y2 … ym
… … … … …
When there are two random variables of interest, we also use the term bivariate probability
distribution or bivariate distribution to refer to the joint distribution. Here are some examples of joint
probability mass function:
Examples:
1. Roll two dice. Let X be the value on the first die and let Y be the value on the second die. Construct
the joint probability distribution table.
Solution:
Let X be the value on the first die. The sample space is {1, 2, 3, 4, 5, 6}
Let Y be the value on the second die. The sample space is {1, 2, 3, 4, 5, 6}
Thus, the sample space of the joint event for random variable (X,Y) is {11, 12, 13, 14, 15,
16, 21, 22, 23, 24, 25, 26, 31, 32, 33, 34, 35, 36, 41, 42, 43, 44, 45, 46, 51, 52, 53, 53, 55,
56, 61, 62, 63, 64, 65, 66}.
The total number of outcomes is 36, thus each will have a probability of 1/36.
4. Plot in a Joint Probability table choosing X as the first column, and Y as the first row.
X Y 1 2 3 4 5 6
2. Roll two dice. Let X be the value on the first die and let Y be the total on both dice. Construct the
probability distribution table.
Solution:
Let X be the value on the first die. The sample space is {1, 2, 3, 4, 5, 6}.
Let Y be the total on both dice. The sample space is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
Now for the possible outcomes of the joint event for random variable we have the
following:
If the first value of the die is {1}, the possible totals are {2, 3, 4, 5, 6, 7}.
If the first value of the die is {2}, the possible totals are {3, 4, 5, 6, 7, 8}.
If the first value of the die is {3}, the possible totals are {4, 5, 6, 7, 8, 9}.
If the first value of the die is {4}, the possible totals are {5, 6, 7, 8, 9, 10}.
If the first value of the die is {5}, the possible totals are {6, 7, 8, 9, 10, 11}.
If the first value of the die is {6}, the possible totals are {7, 8, 9, 10, 11, 12}.
The total number of outcomes is 36, thus each will have a probability of 1/36.
4. Plot in a Joint Probability table choosing X as the first column, and Y as the first row.
X Y 2 3 4 5 6 7 8 9 10 11 12
1. The joint pmf is a number (either a fraction or decimal) between and including 0 and 1.
0 ≤ 𝑃(𝑥, 𝑦) ≤ 1
2. The sum of the probabilities of all the outcomes in a sample space is 1.
𝑝(𝑥 , 𝑦 ) = 1
Examples:
3. For Example 1, find the joint cumulative distribution function F(2, 3).
Solution:
Since we are to calculate the cumulative distribution of F(2,2) we will be considering the values of x
that is less than or equal to 2 that is {1, 2}, and y that is less than or equal to 3, that is {1, 2, 3}. These
will give us these possible combinations of (X, Y):
Substituting we have,
𝑃 (𝑥 ) = 𝑝(𝑥, 𝑦 )
𝑃 𝑦 = 𝑝(𝑥 , 𝑦)
Examples:
4. Consider the probability experiment where we toss a fair coin three times and record the
sequence of heads and tails. We let random variable X denote the number of heads obtained. We
also let random variable Y denote the winnings earned in a single play of a game with the following
rules, based on the outcomes of the probability experiment:
player wins $1 if first H occurs on the first toss
player wins $2 if first H occurs on the second toss
player wins $3 if first H occurs on the third toss
player loses $1 if no H occur
The following joint pmf is represented using a table:
Y
X -1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
a. Find P(1, 2).
b. Find F(1,1).
c. Find the marginal probability mass function of X and Y.
Solution:
a. For a, we are to find the probability p(1, 2). To find this probability we simply locate the
intersection of X and Y. The first coordinate is always X and the second is Y. So, we have x = 1,
and y = 2.
Table 7. 5 Finding a Joint Probability Mass function
Y
X -1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
P(1, 2) = 1/8.
b. For b, we are asked to find the joint cumulative distribution function F(1,2). We are to use the
formula which is,
Since we are to calculate the cumulative distribution of F(1, 1) we will be considering the values of x
that is less than or equal to 1 that is {0, 1}, and y that is less than or equal to 1, that is {-1, 1}. These
will give us these possible combinations of (X, Y):
Substituting we have,
To get the marginal pmf for X we use the formula, 𝑃 (𝑥 ) = ∑ 𝑝(𝑥, 𝑦 ) note that you are to
choose a fix a value of X and sum over possible values of Y. For x = 0, we have,
1 1
𝑃 (𝑥 = 0) = +0+0+0 =
8 8
In other words, simply get the sum for each row. Table 7.6 presents these sums,
Table 7. 6 Marginal pmf of X
Y Marginal
X -1 1 2 3 pmf of X
0 1/8 0 0 0 1/8
1 0 1/8 1/8 1/8 3/8
2 0 2/8 1/8 0 3/8
3 0 1/8 0 0 1/8
To get the marginal pmf for Y we use the formula 𝑃 𝑦 = ∑ 𝑝(𝑥 , 𝑦), note that you are to
choose a fix a value of Y and sum over possible values of X. For y = 1, we have,
1 2 1 4
𝑃 (𝑥 = 0) = 0 + + + = = 1/2
8 8 8 8
In other words, simply get the sum for each column, Table 7.7 presents these sums,
Y
X -1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
Marginal
1/8 4/8 2/8 1/8
pmf of Y
Suppose that X and Y are jointly distributed discrete random variables with joint pmf p(x,y) . If g(X,Y)
is a function of these two random variables, then its expected value is given by the following:
Example:
5. Consider again the discrete random variables we defined in Example 4 with joint pmf given
in Table 7.4. We will find the expected value of three different functions applied to (X, Y).
a. Calculate the expected value of XY if g(x, y) = xy.
b. Calculate the expected value of X if g(x, y) = x.
c. Calculate the expected value of Y if g(x, y) = y.
Solution:
a. For a, we are to find the expected value given that g(x, y) = xy. Thus, we must multiply the
probabilities with the corresponding values of the random variable x and y. We discard those
that contain zero probabilities. Afterwards we get the summation of these products.
1 1 2 1 1 1 1
𝐸[𝑥𝑦] = (0)(−1) + (1)(1) + (2)(1) + (3)(1) + (1)(2) + (2)(2) + (1)(3)
8 8 8 8 8 8 8
17
𝐸[𝑥𝑦] = = 2.125
8
b. For b, we are to find the expected value given that g(x, y) = x. Thus, we must multiply the
probabilities with the corresponding values of the random variable x. We discard those that
contain zero probabilities. Afterwards we get the summation of these products.
1 1 2 1 1 1 1
𝐸[𝑥𝑦] = (0) + (1) + (2) + (3) + (1) + (2) + (1)
8 8 8 8 8 8 8
12
𝐸[𝑥𝑦] = = 1.5
8
c. For c, we are to find the expected value given that g(x, y) = y. Thus, we must multiply the
values of the random variable x and y with its corresponding probability. Afterwards we get
the summation of these products.
1 1 2 1 1 1 1
𝐸[𝑥𝑦] = (−1) + (1) + (1) + (1) + (2)( ) + (2)( ) + (3)( )
8 8 8 8 8 8 8
10
𝐸[𝑥𝑦] = = 1.25
8
Having considered the discrete case, we now look at joint distributions for continuous random
variables. The continuous case is essentially the same as the discrete case: we just replace discrete sets of
values by continuous intervals, the joint probability mass function by a joint probability density function,
and the sums by integrals.
Time when bus driver picks you up and the Quantity of caffeine in bus driver’s system
Dosage of a drug (ml) and Blood compound measure (percentage)
If X takes values in [a, b] and Y takes values in [c, d] then the pair (X, Y) takes values in the product [a,
b] × [c, d]. The joint probability density function (joint pdf) of X and Y is a function f(x, y) giving the
probability density at (x, y). That is, the probability that (X, Y) is in a small rectangle of width dx and height
dy around (x, y) is f(x, y) dx dy.
1. For coordinates of (x, y), joint pdf is always greater than or equal to 0, that is, all values
are positive.
𝑓(𝑥, 𝑦) ≥ 0
2. The sum of all probabilities is always equal to 1.
𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1
where X takes values in the interval [a, b] and Y takes values in the interval [c, d].
This is used to find the probability of any given event of continuous random variables.
Example:
6. Suppose X and Y both take values in [0,1] with density f(x, y) = 4xy.
a. Show f(x, y) is a valid joint pdf.
b. For the event A = X < 0.5 and Y > 0.5 and find its probability.
Solution:
a. To show f(x, y) is a valid joint pdf we must check that it is positive (which it clearly is)
and that the total probability is 1. Since both X and Y take values of [0,1], then a = 0,
b = 1, c = 0, and d = 1.
Using your scientific calculator to find this definite integral, we can separate x and y,
4𝑥 𝑑𝑥 𝑦 𝑑𝑦
Note: your calculator has only x for its integral, so we just input the variable y as x as shown
in Figure 7.1.
(y4Q)$0$1$)
(yQ)$0E1$)=
Figure 7. 1 Calculator Display and the corresponding Key Log for Example 6
Since the total probability is 1, therefore f(x, y) = 4xy is a valid joint probability density function.
b. For b, we are to find the probability of the event A = X < 0.5 and Y > 0.5. Thus, for X,
our interval will be all values of X less than 0.5, that is from the starting value of X
which is 0 to 0.5. Also, our interval for Y will be all values of Y greater than 0.5, that is
from 0.5 to the end value of Y which is 1. Substituting,
Using your scientific calculator to find this definite integral, we can separate x and y,
.
4𝑥 𝑑𝑥 𝑦 𝑑𝑦
.
Note: Your calculator has only x for its integral, so we just input the variable y as x as shown
in Figure 7.2.
(y4Q)$0$0.5$)
(yQ)$0.5$1$)=
n
Figure 7. 2 Calculator Display and the corresponding Key Log for Example 7
Examples:
7. An article describes a model for the movement of a particle. Assume that a particle moves within the
region A bounded by the x - axis, the line x = 1, and the line y = x. Let (X, Y) denote the position of the
particle at a given time. The joint density of X and Y is given by
Solution:
Since we are not given the values of X and Y we will only get a function. We also integrate out x.
Substituting we have,
𝑓 (𝑥) = 8𝑥𝑦 𝑑𝑦 = 8𝑥 𝑦 𝑑𝑦
𝑦
𝑓 (𝑥) = 8𝑥 = 4𝑥𝑦
2
Since we are not given the values of X and Y we will only get a function. We also integrate out x.
Substituting we have,
𝑓 (𝑦) = 8𝑥𝑦 𝑑𝑥 = 8𝑦 𝑥 𝑑𝑥
𝑥
𝑓 (𝑦) = 8𝑥 = 4𝑥 𝑦
2
8. Suppose (X, Y) takes values on the square [0, 1] × [0, 1] with joint pdf
3
𝑓 = (𝑥 + 𝑦 )
2
Find the marginal pdf of X and Y.
Solution:
We are to use the interval of y which is [0, 1]. We also integrate out x. Substituting we have,
3 3 3
𝑓 (𝑥) = (𝑥 + 𝑦 )𝑑𝑦 = (𝑥 ) 𝑑𝑦 + (𝑦 )𝑑𝑦
2 2 2
3 𝑦
𝑓 (𝑥) = 𝑥 𝑦+ 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 0 𝑡𝑜 1
2 2
3 1
𝑓 (𝑥) = 𝑥 +
2 2
We are to use the interval of x which is [0, 1]. We also integrate out y. Substituting we have,
3 3 3
𝑓 (𝑥) = (𝑥 + 𝑦 )𝑑𝑥 = (𝑥 ) 𝑑𝑥 + (𝑦 )𝑑𝑥
2 2 2
1 3
𝑓 (𝑥) = 𝑥 + 𝑥𝑦 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 0 𝑡𝑜 1
2 2
1 3
𝑓 (𝑥) = + 𝑦
2 2
In this section, we consider the probability distribution of one random variable given information
about the value of another random variable. As we will see in the formal definition, this kind of conditional
distribution will involve the joint distribution of the two random variables under consideration, which we
introduced in the previous two sections. We begin with discrete random variables, and the consider the
continuous case.
Note that if the marginal probability of y, 𝑝 (𝑦) = 0 , then for that value of Y the conditional pmf
of X does not exist.
Note that if the marginal probability of x, 𝑝 (𝑥) = 0 , then for that value of X the conditional pmf
of Y does not exist.
1. Conditional pmf's are valid pmf's. In other words, the conditional pmf for X given Y = y , for
a fixed y, is a valid pmf satisfying the following:
Similarly, for a fixed x , we also have the following for the conditional pmf of Y given X = x :
2. In general, the conditional distribution of X given Y does not equal the conditional
distribution of Y given X , i.e.,
𝑝 | (𝑥|𝑦) ≠ 𝑝 | (𝑦|𝑥)
Examples:
9. Consider the probability experiment where we toss a fair coin three times and record the sequence
of heads and tails. We let random variable X denote the number of heads obtained. We also let
random variable Y denote the winnings earned in a single play of a game with the following rules,
based on the outcomes of the probability experiment:
player wins $1 if first H occurs on the first toss
player wins $2 if first H occurs on the second toss
player wins $3 if first H occurs on the third toss
player loses $1 if no H occur
The following joint pmf is represented using a table:
Y
X -1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
Solution:
To find the conditional probability we must calculate first the marginal probability, we have already
this from Example 4.
Table 7. 9 Marginal Probability Mass Function for Example 9
Y Marginal
X -1 1 2 3 pmf of X
0 1/8 0 0 0 1/8
1 0 1/8 1/8 1/8 3/8
2 0 2/8 1/8 0 3/8
3 0 1/8 0 0 1/8
Marginal
1/8 4/8 2/8 1/8
pmf of Y
To find the conditional distribution of X, we must find the conditional probability for each value in X,
we use the formula,
𝑝(𝑥, 𝑦)
𝑃 | (𝑥|𝑦) =
𝑝 (𝑦)
For x = 1, we have
𝑝(1, 𝑦)
𝑃 | (1|𝑦) =
𝑝 (𝑦)
1
𝑝(1, 1) 8 1
𝑖𝑓 𝑦 = 1, 𝑃 | (1|1) = = =
𝑝 (1) 4 4
8
1
𝑝(1, 2) 8 1
𝑖𝑓 𝑦 = 2, 𝑃 | (1|2) = = =
𝑝 (2) 2 2
8
1
𝑝(1, 3) 8
𝑖𝑓 𝑦 = 3, 𝑃 | (1|3) = = =1
𝑝 (1) 1
8
For x = 2, we have
𝑝(2, 𝑦)
𝑃 | (2|𝑦) =
𝑝 (𝑦)
2
𝑝(1, 2) 8 1
𝑖𝑓 𝑦 = 2, 𝑃 | (2|2) = = =
𝑝 (2) 4 2
8
1
𝑝(1, 3) 8 1
𝑖𝑓 𝑦 = 3, 𝑃 | (2|3) = = =
𝑝 (1) 2 2
8
For x = 3, we have
𝑝(3, 𝑦)
𝑃 | (3|𝑦) =
𝑝 (𝑦)
1
𝑝(3, 1) 8 1
𝑖𝑓 𝑦 = 1, 𝑃 | (3|1) = = =
𝑝 (3) 4 4
8
Summarizing the results, we have
Table 7. 10 Conditional Probability Mass Function of X given Y for Example 9
𝑃 | (𝑥|𝑦)
X -1 1 2 3
0 1 0 0 0
1 0 1/4 1/2 1
2 0 1/2 1/2 0
3 0 1/4 0 0
Y
𝑃 | (𝑥|𝑦) -1 1 2 3
0 1 0 0 0
1 0 1/3 2/3 1
2 0 1/3 1/3 0
3 0 1/3 0 0
10. Suppose we are interested in the relationship between an individual's hair and eye color. Based on a
random sample of Saint Mary's students, we have the following joint pmf:
Y (Eye Color)
a. 𝑃 | (2|1)
b. 𝑃 | (2|1)
Solution:
To find the conditional probability we must find first the marginal probability, following the steps in
calculating the marginal probability, we have the following:
To find the conditional distribution of X, we must find the conditional probability for each value in X,
we use the formula,
𝑝(𝑥, 𝑦)
𝑃 | (𝑥|𝑦) =
𝑝 (𝑦)
For a, we are finding the probability that an individual in the sub-population of individuals with blue
eyes has red hair.
𝑝(2, 1) 0.05 1
𝑃 | (2|1) = = = = 0.167
𝑝 (1) 0.3 6
Thus, 1/6 or approximately 16.7% of SMC students with blue eyes have red hair.
For b, we are finding the probability that an individual in the sub-population of individuals with blonde
hair have green eyes.
𝑝(1, 2) 0.12
𝑃 | (2|1) = = = 0.30
𝑝 (1) 0.40
Thus, 30% of SMC students with blue eyes have red hair.
If X and Y are continuous random variables with joint pdf given by f(x,y) , then the conditional
probability density function (pdf) of X given that Y = y, is given by
𝑓(𝑥, 𝑦)
𝑓 | (𝑥|𝑦) =
𝑓 (𝑦)
Similarly, the conditional probability density function (pdf) of Y given that X = x, is given by
𝑓(𝑥, 𝑦)
𝑓 | (𝑦|𝑥) =
𝑓 (𝑥)
1. Conditional pdf's are valid pdf's. In other words, the conditional pdf for X given Y = y, for a
fixed y, is a valid pdf satisfying the following:
Similarly, for a fixed x, we also have the following for the conditional pmf of Y given X = x :
2. In general, the conditional distribution of X given Y does not equal the conditional distribution
of Y given X , i.e.,
𝑓 | (𝑥|𝑦) ≠ 𝑓 | (𝑦|𝑥)
Example:
11. At a gas station, gasoline is stocked in a bulk tank each week. Let random variable X denote the
proportion of the tank's capacity that is stocked in each week, and let Y denote the proportion of the
tank's capacity that is sold in the same week. Note that the gas station cannot sell more than what
was stocked in each week, which implies that the value of Y cannot exceed the value of X. A possible
joint pdf of X and Y is given by
3𝑥, 𝑖𝑓 0 ≤ 𝑦 ≤ 𝑥 ≤ 1
𝑓(𝑥, 𝑦) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the conditional probability distribution for gas sold in each week, when only half of the tank was
stocked.
Solution:
We are to find the conditional probability distribution for gas sold in each week, when only half of the
tank was stocked, that is we find the conditional pdf of Y given that X = 0.50.
𝑓 (𝑥 = 0.5) = 3𝑥 = 0.75
Thus, the conditional pdf of Y is,
𝑓(0.5, 𝑦) 3(0.5)
𝑓 | (𝑦|0.5) = = =2
𝑓 (0.5) 0.75
Therefore, the conditional distribution of the amount of gas sold in a week, given that only half of
the tank is stocked is 2.
VII. Summary
There are two types of joint probability distribution: Joint Probability Mass function, and the
Joint Probability Density Function.
o If X and Y are discrete random variables, this distribution can be described with a joint
probability mass function.
o If X and Y are continuous random variables, this distribution can be described with a
joint probability density function.
The joint probability mass function of X and Y is 𝑃 (𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦).
The joint probability mass function of pair of random variables (X, Y) must satisfy these two
following conditions in order to be a valid joint pmf:
1. The joint pmf is a number (either a fraction or decimal) between and including 0 and 1.
2. The sum of the probabilities of all the outcomes in a sample space is 1.
The joint probability density function of pair of random variables (X, Y) must satisfy these two
following conditions in order to be a valid joint pdf:
o For coordinates of (x, y), joint pdf is always greater than or equal to 0, that is, all values
are positive.
o The sum of all probabilities is always equal to 1.
Joint Distributions, Independence. Authored by: Jeremy Orloff and Jonathan Bloom. Provided by: MIT
OpenCourseWare. Located at: https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-
probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading7a.pdf, License: CC BY:
Attribution
Joint Distributions of Continuous Random Variables. Authored by: Kristin Kuter. Provided by: Open
Education Resource (OER) LibreTexts Project. Located at:
https://stats.libretexts.org/Courses/Saint_Mary's_College%2C_Notre_Dame/MATH_345__-
_Probability_(Kuter)/5%3A_Probability_Distributions_for_Combinations_of_Random_Variables.
License: CC BY: Attribution
IX. Exercises
3. The random variable X has a range of {0, 1, 2} and the random variable Y has a range of {1, 2}. The
joint distribution of X and Y is given by Table 7.14.
Table 7. 14 Joint Probability Table for Exercise 3
Y
X 1 2
0 0.10 0.30
1 0.05 0.25
2 0.13 0.17
4. You roll one red die and one green die. Define the random variables X be the number showing on
the red die and Y be the number of dice that show the number two.
a. Write down a table showing the joint probability mass function for X and Y
b. find the marginal distribution for Y.
c. Find E(Y).
5. The joint probabilities P(X = a, Y = b) of discrete random variables X and Y are given in the following
table. Determine the marginal probability distributions of X and Y.
Table 7. 15 Probability Mass function of Exercise 5
Y
X 1 2 3 4
1 16/136 3/136 2/136 13/136
2 5/136 10/136 11/136 8/136
3 9/136 6/136 7/136 12/136
4 4/136 15/136 14/136 1/136
𝑓(𝑋, 𝑌) = 𝑥 + 𝑦
for 0 < x < 1 and 0 < y < 1. Determine the marginal probability density functions of X and Y.
7. To investigate the relation between hair color and eye color, the hair color and eye color of 5383
persons was recorded. The data are given in the following table:
Hair Color
Eye
1 2 3
Color
Light 1168 825 305
Dark 573 1312 1200
8. Let X and Y be two continuous random variables with joint probability density function
12
𝑓(𝑥, 𝑦) = 𝑥𝑦(1 + 𝑦) 𝑓𝑜𝑟 0 ≤ 𝑥 ≤ 1 𝑎𝑛𝑑 0 ≤ 𝑦 ≤ 1
5
a. Find the probability P (1/4 ≤ X ≤ 1/2, 1/3 ≤ Y ≤ 2/3).
b. Determine the marginal distribution functions of X and Y.
c. Determine the conditional probability density function of X and Y
I. Introduction
This chapter deals with survey analysis by considering probability samplings, and the Central Limit
theorem. The concept of degrees of freedom and its relationship to estimation is also discussed including
its two important concepts: bias and precision. We will also analyze how sampling
We will also discuss the concept of a sampling distribution which is perhaps the most basic concept
in inferential statistics. We will discuss how sampling distributions are used in inferential statistics. we will
also investigate how collecting data by random sampling helps us to draw more rigorous conclusions and
yield reliable estimates about the data. We will also define some properties of a sampling distribution of
sample means and examine what we can conclude about the entire population based on these properties.
The last section of the chapter concerns the sampling distributions of important statistics: The Sampling
Distribution of the Mean.
II. Objectives
[1]Inferential statistics is the branch of statistics that focuses in making decisions or drawing
conclusions about a specified population. [3]Data gathering is conducted through random sampling which
aims to select a set of units, or elements, from a population that we can use to estimate the parameters
of the population. Random sampling erases the danger of a researcher consciously or unconsciously
introducing bias when selecting a sample. In addition, it also allows us to use tools from probability theory
that provide the basis for estimating the characteristics of the population, as well as for estimating the
accuracy of the samples.
Probability theory is the branch of mathematics that provides the tools researchers need to make
statistical conclusions about sets of data based on samples. As previously stated, it also helps statisticians
estimate the parameters of a population. A parameter is a summary description of a given variable in a
population. A population mean is an example of a parameter. When researchers generalize from a sample,
they’re using sample observations to estimate population parameters. Probability theory enables them
to both make these estimates and to judge how likely it is that the estimates accurately represent the
actual parameters of the population.
Probability theory accomplishes this by way of the concept of sampling distributions. A single sample
selected from a population will give an estimate of the population parameters. Other samples would give
the same, or slightly different, estimates. Probability theory helps us understand how to make estimates
of the actual population parameters based on such samples.
1. Parameter Estimation
To illustrate the use of parameter estimation, consider a structural engineer analyzing the tensile
strength of a component used in an automobile chassis. Because there are many factors that affect
the tensile strength of a component such as differences in raw material batches, manufacturing
processes, and measurement procedures, the engineer may just opt to estimate the mean tensile
strength. In practice, the engineer will use sample data to calculate a quantity that is in some sense
a realistic value of the exact mean.
2. Hypothesis Testing
To illustrate the use of hypothesis testing, consider a situation in which two different reaction
temperatures can be used in a chemical process. The engineer conjectures that one would result in
higher yields than the other. For this situation, the hypothesis would be that the mean yield using
temperature 1 is greater than the mean yield using temperature 2. Notice that there is no emphasis
on estimating yields; instead, the focus is on drawing conclusions about a stated hypothesis.
Suppose that we want to describe a specific characteristic of a population which we call the
parameter. We may utilize point estimation which refers to the process of estimating a parameter from
a probability distribution based on observed data from the distribution. Before collecting any data, you
are to ensure that the observation is random variables and that would also mean that any function of the
observation, or any statistic is also a random variable. For example, the sample mean, and the sample
variance are statistics and they are also random variables. All random variables have a probability
distribution. The probability distribution of a statistic is called a sampling distribution.
Point Estimation of a population parameter θ is applied to choose a singular number to describe the
population based on a sample data. This chosen numerical value of a sample statistic will be used to
describe the population.
A point estimate of some population parameter θ is a single numerical value 𝜃 of a statistic Θ. The
statistic Θ is called the point estimator.
We deal with a lot of estimation problems in engineering. Table 9.1 summarizes them including
their corresponding reasonable point estimates.
𝑥
𝑝̂ =
𝑛
The proportion p of items in a
population that belong to a Sample proportion, 𝑝̂ x is the number of items in a
class of interest random sample of size n that
belong to the class of interest
To choose the best point estimator for every specific parameter to utilize in any situation, it is
necessary to investigate their statistical properties and utilize the criteria for estimator comparison.
Bias refers to whether an estimator tends to either over or underestimate the parameter.
Sampling variability refers to how much the estimate varies from sample to sample.
To illustrate these two concepts, consider a bathroom scale. Each time you weigh, it may give you
different measurements. With this in mind, let's compare two scales. Scale 1 is a very high-tech digital
scale and gives essentially the same weight each time you weigh yourself; it varies by at most 0.02 pounds
from weighing to weighing. Although this scale has the potential to be very accurate, it may have been
calibrated incorrectly and, on average, overstates your weight by one pound. Scale 2 is a cheap scale and
gives very different results from weighing to weighing. However, it is just as likely to underestimate as
overestimate your weight. Sometimes it vastly overestimates it and sometimes it vastly underestimates
it. However, the average of many measurements would be your actual weight. Scale 1 is biased since, on
average, its measurements are one pound higher than your actual weight. Scale 2, by contrast, gives
unbiased estimates of your weight. However, Scale 2 is highly variable, and its measurements are often
very far from your true weight. Scale 1, despite being biased, is accurate. Its measurements are never
more than 1.02 pounds from your actual weight.
Unbiased Estimators
One important factor you must consider when estimating is that it must be close to the true value of
the unknown parameter. An estimator θ is unbiased when the expected value of the population
parameter is equal to the sample statistic value. In other words, the mean of the sampling distribution of
Θ is equal to the population parameter θ. An unbiased estimator will have a bias of zero, E(Θ) – θ = 0.
For large samples, the bias is very small. [14]Any given sample mean may underestimate or overestimate
μ, but there is no systematic tendency for sample means to either under or overestimate μ.
So how do we avoid unbiased estimation? Let us first consider the formula for the variance in a
population which is
Notice that the denominators of the formulas are different: N for the population and N-1 for the sample.
The reason for this is that if N is used in the formula for s 2, then the estimates tend to be too low and
therefore biased. The formula with N-1 in the denominator gives an unbiased estimate of the population
variance. Note that N-1 is the degrees of freedom.
Degrees of Freedom
Some estimates are based on more information than others. For example, an estimate of the variance
based on a sample size of 100 is based on more information than an estimate of the variance based on a
sample size of 5. The degrees of freedom (df) of an estimate is the number of independent pieces of
information on which the estimate is based.
In general, the degrees of freedom for an estimate is equal to the number of values minus the number
of parameters estimated en route to the estimate in question. Therefore, the degrees of freedom of an
estimate of variance is equal to N - 1, where N is the number of observations.
produce an estimate close to the true value. A logical principle of estimation, when selecting among
several estimators, is to choose the estimator that has minimum variance which is called the minimum
variance unbiased estimator (MVUE).
The standard error of an estimator is its standard deviation. When the estimator follows a normal
distribution, we can be reasonably confident that the true value of the parameter lies within two standard
errors of the estimate. Since many point estimators are normally distributed (or approximately so) for
large n, this is a very useful result. The larger the sample size, the smaller the standard error of the mean
and therefore the lower the sampling variability. The smaller the standard error of a statistic, the more
efficient the statistic.
Even in cases in which the point estimator is not normally distributed, we can state that so long as
the estimator is unbiased, the estimate of the parameter will deviate from the true value by as much as
four standard errors at most 6 percent of the time.
If the relative efficiency is less than 1, we would conclude that the first estimator is a more efficient
estimator than the second, in the sense that it has a smaller mean square error. Sometimes we find that
biased estimators are preferable to unbiased estimators because they have smaller mean square error.
That is, we may be able to reduce the variance of the estimator considerably by introducing a relatively
small amount of bias.
Have you ever wondered how the mean, or average, amount of money per person in a population is
determined? It would be impossible to contact 100% of the population, so there must be a statistical way
to estimate the mean number of dollars per person in the population.
Suppose, more simply, that we are interested in the mean number of dollars that are in each of the
pockets of ten people on a busy street corner. Figure 9.2 reveals the amount of money that each person
in the group of ten has in his/her pocket.
The assumption was made that in the case of a population of size ten, one person had no money,
another had $1.00, another had $2.00, and so on. Until we reached the person who had $9.00. The
purpose of the task was to determine the average amount of money per person in this population. If you
total the money of the ten people, you will find that the sum is $45.00, thus yielding a mean of $4.50.
However, suppose you couldn't count the money of all ten people at once. In this case, to complete
the task of determining the mean number of dollars per person of this population, it is necessary to select
random samples from the population and to use the means of these samples to estimate the mean of the
whole population. To start, suppose you were to randomly select a sample of only one person from the
ten. The ten possible samples are represented in the diagram in the introduction, which shows the dollar
bills possessed by each sample. Since samples of one are being taken, they also represent the means you
would get as estimates of the population. The probability distribution is shown in Figure 9.3.
The distribution of the dots on the graph is an example of a sampling distribution. As can be seen,
selecting a sample of one is not very good, since the group’s mean can be estimated to be anywhere from
$0.00 to $9.00, and the true mean of $4.50 could be missed by quite a bit. So, we can increase the sample
size. Table 9.2 summarizes the results if we increase the sample size and its corresponding sampling
distribution.
Table 9. 2 Sampling Distribution for Varying Sample Size
Number of
Sample Size Sampling Distribution
Samples
2 𝐶 = 45
3 𝐶 = 120
4 𝐶 = 210
5 𝐶 = 252
6 𝐶 = 210
From the graphs above, it is obvious that increasing the size of the samples chosen from the
population of size 10 resulted in a distribution of the means that was more closely clustered around the
true mean. If a sample of size 10 were selected, there would be only one possible sample, and it would
yield the true mean of $4.50. Also, the sampling distribution of the sample means is approximately normal,
as can be seen by the bell shape in each of the graphs.
Figure 9.4 shows the range of possible sample study results. It presents all possible values of the
parameter in question by representing a range of 0 percent to 100 percent of students approving of the
dress code. The number 50 represents the midpoint, or 50 percent of the students approving of the
dress code and 50 percent disapproving. Since the sample size is 100, at the midpoint, half of the
students would be approving of the dress code, and the other half would be disapproving.
To randomly select the sample of 100 students, we use random sampling. Each member of the
sample is then asked whether he or she approves or disapproves of the dress code. If this procedure
gives 48 students who approve of the dress code and 52 who disapprove, the result would be recorded
on the figure by placing a dot at 48%. This statistic is the sample proportion. Let’s assume that the
process was repeated, and it resulted in 52 students approving of the dress code. Let's also assume that
a third sample of 100 resulted in 51 students approving of the dress code. The results are shown in
Figure 9.5.
In this figure, the three different sample statistics representing the percentages of students who
approved of the dress code are shown. The three random samples chosen from the population give
estimates of the parameter that exists for the entire population. Each of the random samples gives an
estimate of the percentage of students in the total student body of 18,000 who approve of the dress
code. Assume for simplicity's sake that the true proportion for the population is 50%. This would mean
that the estimates are close to the true proportion. To more precisely estimate the true proportion, it
would be necessary to continue choosing samples of 100 students. Figure 9.6 summarizes all the possible
results.
Sampling Error
Notice that the statistics resulting from the samples are distributed around the population
parameter. Although there is a wide range of estimates, most of them lie close to the 50% area of the
graph. Therefore, the true value is likely to be in the vicinity of 50%. In addition, probability theory gives
a formula for estimating how closely the sample statistics are clustered around the true value. In other
words, it is possible to estimate the sampling error, or the degree of error expected for a given sample
design.
𝑝(1 − 𝑝)
𝑠=
𝑛
where:
p – population parameter
n – sample size
s – standard error
The square root of the product of p and 1−p is the population standard deviation.
The Central Limit Theorem is a very important theorem in statistics. It basically confirms that as you
increase the sample size for a random variable, the distribution of the sample means better
approximates a normal distribution. Why are we so concerned with means? Two reasons are: they give
us a middle ground for comparison, and they are easy to calculate.
There are two alternative forms of the theorem, and both alternatives are concerned with drawing
finite samples size n from a population with a known mean, μ, and a known standard deviation, σ.
1. if we collect samples of size n with a “large enough n,” calculate each sample’s mean, and create
a histogram of those means, then the resulting histogram will tend to have an approximate
normal bell shape
2. if we again collect samples of size n that are “large enough,” calculate the sum of each sample
and create a histogram, then the resulting histogram will again tend to have a normal bell-
shape.
In either case, it does not matter what the distribution of the original population is, or whether you
even need to know it. The important fact is that the distribution of sample means, and the sums tend to
follow the normal distribution.
The size of the sample, n, that is required in order to be “large enough” depends on the original
population from which the samples are drawn (the sample size should be at least 30 or the data should
come from a normal distribution). If the original population is far from normal, then more observations
are needed for the sample means or sums to be normal. Sampling is done with replacement.
If samples of size n are drawn at random from any population with a finite mean and standard
deviation, then the sampling distribution of the sample means, x¯, approximates a normal
distribution as the sample size increases beyond 30
The properties associated with the Central Limit Theorem is presented in Figure 9.7.
The vertical axis now reads probability density, rather than frequency, since frequency can only be
used when you are dealing with a finite number of sample means. Sampling distributions, on the other
hand, are theoretical depictions of an infinite number of sample means, and probability density is the
relative density of the selections from within this set.
Examples:
1. What is the probability that a random sample of 20 families in Canada will have an average of 1.5 pets
or fewer? Assume that the mean of the population is 0.8 and the standard deviation of the population
is 1.2.
Solution:
𝜇 = 0.8 𝜎 = 1.2 𝑛 = 20
𝜇̅ = 𝜇 = 0.8
𝜎 1.2
𝜎 = = = 0.26833
√𝑛 √20
Based from the problem, we are to find “the probability that a random sample of 20 families in
Canada will have an average of 1.5 pets or fewer”. Thus, x ≤ 1.5. The corresponding probability
density curve is shown in Figure 9.8.
P(x<1.5)
0.8 1.5
Figure 9. 8 Probability Density Curve for Example 1 [15]
Substituting we have,
𝑋 − 𝜇 1.5 − 0.8
𝑧= = = 2.6
𝜎 0.26833
z = 2.60
Since the area required is to the left, we use Procedure 1 (Refer to Figure 6.19), which means that
we simply obtain the desired area from the table.
P(X<1.5) = 0.99534
Thus, the probability that the average number of pets is fewer than 1.5 is 99.53%. It is almost definite that
the average number of pets is less than 1.5
2. The length of time, in hours, it takes an over 40 group of people to play one soccer match is normally
distributed with a mean of two hours and a standard deviation of 0.5 hours. A sample of size 50 is
drawn randomly from the population. Find the probability that the sample mean is between 1.8 hours
and 2.3 hours.
Solution:
𝜇̅ = 𝜇 = 2.0
𝜎 0.5
𝜎 = = = 0.07071
√𝑛 √50
Based from the problem, we are to find “the probability that that the sample mean is between 1.8
hours and 2.3 hours” Thus, 1.8 ≤ x ≤ 2.3. The corresponding probability density curve is shown in
Figure 9.9.
P(x<1.5)
Since the area required is between two z - scores, we use Procedure 3 (refer to Figure 6.19), which
means that we obtain the desired area from the table for each of the z -score and subtract it with
each other.
From the table, A(z = -2.83) = 0.00233. and A(z=4.24) = 0 .99999. Thus,
Thus, the probability that the mean time is between 1.8 hours and 2.3 hours is 0.9977.
VIII. Summary
In this lesson, we have learned about probability sampling, which is the key sampling method used
in survey research. In the example presented above, the elements were chosen for study from a
population by random sampling. The sample size had a direct effect on the distribution of estimates of
the population parameter. The larger the sample size, the closer the sampling distribution was to a normal
distribution.
The Central Limit Theorem confirms the intuitive notion that as the sample size increases for a
random variable, the distribution of the sample means will begin to approximate a normal distribution,
with the mean equal to the mean of the underlying population and the standard deviation equal to the
standard deviation of the population divided by the square root of the sample size, n.
In a population whose distribution may be known or unknown, if the size ( n) of samples is sufficiently
large, the distribution of the sample means will be approximately normal. The mean of the sample means
will equal the population mean. The standard deviation of the distribution of the sample means, called
the standard error of the mean, is equal to the population standard deviation divided by the square root
of the sample size (n).
IX. Exercises
1. The scores of students on a college entrance exam were normally distributed with a mean of 19.4
and a standard deviation of 6.3.
a. If a sample of 70 students who took the test (who have the same distribution as all scores)
is collected, what are the mean and standard deviation of the sample mean for the 70
students?
b. What is the probability that a random sample of 50 students will have an average score
of 22 or higher?
2. The lifetimes of a certain type of calculator battery are normally distributed. The mean lifetime is
400 days, with a standard deviation of 50 days. For a sample of 6000 new batteries, determine
how many batteries will last:
a. between 360 and 460 days.
b. more than 320 days.
c. less than 280 days.
3. NeverReady batteries has engineered a newer, longer lasting AAA battery. The company claims
this battery has an average life span of 17 hours with a standard deviation of 0.8 hours. Your
statistics class questions this claim. As a class, you randomly select 30 batteries and find that the
sample mean life span is 16.7 hours. If the process is working properly, what is the probability of
getting a random sample of 30 batteries in which the sample mean lifetime is 16.7 hours or less?
5. For each of the following situations determine if the Central Limit Theorem can be applied:
a. In the world populations, normal body temperature follows a normal distribution with
mean = 98.6 degrees F and a standard deviation of 0.6. The mean body temperature will
be determined for a randomly selected group of 14 individuals.
b. Mean number of songs on a student’s IPOD will be determined for a randomly selected
group of 10 students. In the college population it is known that the number of songs on a
student’s ipod is skewed to the left.
c. Now assume that you are randomly selecting 800 students to determine the mean
number of songs on the ipod.
6. Suppose we compare 2 random samples taken from the same populations. Sample A is a random
sample of 100 subjects and sample B is a random sample of 1000 subjects. What can be said about
the relationship between the sample standard deviations in sample A relative to the sample
standard deviation of sample B?
7. Two graduate students are each doing a study and are pulling their samples from the same
population. The first investigator takes a sample of 100 and the second takes a sample of 2,000.
a. Which student will tend to get the larger standard deviation in his/her sample?
b. Which student will get a larger standard error of the mean? Or can it not be determined?
8. The average life of a electric rice cooker is 5 years, with a standard deviation of 1 year. Assume
the lives of these cookers follow a normal distribution. Find
a. The probability that the mean life of a random sample of 9 machines falls between 5.7
and 8.1 years.
b. The value of to the left of which 85% of the means computed from random samples of
size 9 would fall.
9. Suppose you select a random sample of 200 student responses to the question, “how many hours
did you study last night?” Suppose that in a large population of students the mean number of
hours of study the previous night was hours with a standard deviation of hours.
a. What is the value of the mean of the sampling distribution of possible sample means?
b. Calculate the standard deviation of the sampling distribution of possible sample means.
10. The amount of soda a dispensing machine pours into a 24-ounce can of soda follows a normal
distribution with a mean of 24.05 ounces and a standard deviation of .02 ounces. Suppose the
quality control department at the soda plant sampled 100 sodas and found the average amount
of soda in the cans was 24 ounces of soda. What should the quality control department
recommend to the management of the plant?
I. Introduction
The objective of inferential statistics is to use sample data to increase knowledge about the entire
population. As we have learned in the previous chapter, we use sample data to generalize about an
unknown population. The sample data help us to make an estimate of a population parameter. We realize
that the point estimate is most likely not the exact value of the population parameter, but close to it. After
calculating point estimates, we construct interval estimates, called confidence intervals. In this lesson,
we will examine how to use samples to make estimates about the populations from which they came.
We will also see how to determine how wide these estimates should be and how confident we should
be about them.
In this chapter, you will also learn to construct and interpret confidence intervals. You will also learn
a new distribution, the Student’s-t, and how it is used with these intervals. Throughout the chapter, it is
important to keep in mind that the confidence interval is a random variable. It is the population parameter
that is fixed.
II. Objectives
1. Construct a confidence interval for estimating a population mean and a population proportion.
2. Interpret confidence intervals for estimating a population mean and a population proportion.
3. Discriminate between confidence level and confidence intervals.
4. Discriminate between problems applying the normal and the Student’s t distributions
Figure 10. 1 Have you ever wondered what the average number of M&Ms in a bag at the grocery store is? You can use
confidence intervals to answer this question. (credit: comedy_nose/flickr)
Suppose you were trying to determine the mean rent of a two-bedroom apartment in your town.
You might look in the classified section of the newspaper, write down several rents listed, and average
them together. You would have obtained a point estimate of the true mean. If you are trying to determine
the percentage of times you make a basket when shooting a basketball, you might count the number of
shots you make and divide that by the number of shots you attempted. In this case, you would have
obtained a point estimate for the true proportion.
Sampling distributions are the connecting link between the collection of data by unbiased random
sampling and the process of drawing conclusions from the collected data. Results obtained from a survey
can be reported as a point estimate. For example, a single sample mean is a point estimate, because this
single number is used as a plausible value of the population mean. Keep in mind that some error is
associated with this estimate – the true population mean may be larger or smaller than the sample mean.
A confidence interval is another type of estimate but, instead of being just one number, it a range of
values calculated from a given set of sample data. The confidence interval is likely to include the unknown
population parameter.
A confidence interval is a range of possible values the parameter might take, controlling the
probability that the parameter is not lower than the lowest value in this range and not higher than
the largest value.
Associated with each confidence interval is a confidence level which indicates the level of assurance
you have that the resulting confidence interval encloses the unknown population mean. In a normal
distribution, we know that 95% of the data will fall within two standard deviations of the mean based
from the empirical rule. Another way of stating this is to say that we are confident that in 95% of samples
taken, the sample statistics are within plus or minus two standard errors of the population parameter. As
the confidence interval for a given statistic increases in length, the confidence level increases.
The selection of a confidence level for an interval determines the probability that the confidence
interval produced will contain the true parameter value. Common choices for the confidence level are
90%, 95%, and 99%. These levels correspond to percentages of the area under the normal density curve.
For example, a 95% confidence interval covers 95% of the normal curve, so the probability of observing a
value outside of this area is less than 5%. Because the normal curve is symmetric, half of the 5% is in the
left tail of the curve, and the other half is in the right tail of the curve. This means that 2.5% is in each tail
as illustrated in Figure 10.2.
Figure 10. 2 Probability Density of a 95% Confidence Interval Invalid source specified.
A 95% confidence interval for the standard normal distribution is the interval (−1.96, 1.96), since
95% of the area under the curve falls within this interval. The ±1.96 are the z-scores that enclose the
given area under the curve. For a normal distribution, the margin of error is the amount that is added to
and subtracted from the mean to construct the confidence interval. For a 95% confidence interval, the
margin of error is within ±1.96 standard deviations of the mean.
This simply means that a 95% confidence interval implies two possibilities:
The confidence interval contains the true mean μ or our sample produced a mean that is not
within the true mean μ.
The true mean is not within the confidence interval which can only possibly happen for only
5% of all the samples.
Remember that a confidence interval is created for an unknown population parameter like the
population mean, μ. Confidence intervals for some parameters have the form:
The margin of error depends on the confidence level or percentage of confidence and the standard
error of the mean.
A confidence interval for a population mean with a known standard deviation is based on the fact
that the sample means follow an approximately normal distribution. To construct the confidence interval
for a single unknown population mean μ, where the population standard deviation is known, we need 𝑥̅
or the point estimate of the unknown population mean μ.
The margin of error (EBM) depends on the confidence level (CL). The confidence level is often
considered the probability that the calculated confidence interval estimate will contain the true
population parameter. However, it is more accurate to state that the confidence level is the percent of
confidence intervals that contain the true population parameter when repeated samples are taken. Most
often, it is the choice of the person constructing the confidence interval to choose a confidence level of
90% or higher because that person wants to be reasonably certain of his or her conclusions.
There is another probability called alpha (α) which is related to the confidence level. α is the
probability that the interval does not contain the unknown population parameter.
Mathematically, α + CL = 1.
The area of α/2 will then be used to find the z – score in the standard normal table in Appendix A.
We will be using the positive z – score. You may refer to Week 6 Section VII as a review.
CL
Here are some common confidence intervals that are utilized in sampling distribution:
For a 90% confidence interval, 𝑧 = 1.65
For a 95% confidence interval, 𝑧 = 1.96
For a 99% confidence interval, 𝑧 = 2.58
n = sample size
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
The interpretation should clearly state the confidence level, explain what population parameter is
being estimated, and state the confidence interval.
“We estimate with ___% confidence that the true population _______ (include the context of the
problem) is between ___ and ___ (include appropriate units).”
The most common mistake made by persons interpreting a confidence interval is claiming that once
the interval has been constructed, there is a CL% probability that the population mean is found within the
confidence interval. Even though the population mean is unknown, once the confidence interval is
constructed, either the mean is within the confidence interval, or it is not. Hence, any probability
statement about this particular confidence interval is inappropriate. The appropriate statement should
refer to the method used to produce the confidence interval. Say for example, if the confidence interval is
95% this means if you did the probability experiment 100 times, 95 of the intervals produced would
contain the population mean. The probability is attributed to the method, not to any particular confidence
interval. Figure 10._ demonstrates how the confidence interval provides a range of plausible values for
the population mean and that this interval may or may not capture the true population mean. If you
formed 100 intervals in this manner, 95 of them would contain the population mean.
Figure 10. 4 Some Possible Confidence Intervals that may or may not capture the population mean.
Examples:
1. Suppose scores on exams in statistics are normally distributed with an unknown population mean and
a population standard deviation of three points. A random sample of 36 scores is taken and gives a
sample mean score of 68.
a. Find a 90% confidence interval estimate for all the mean score on all exams or the true
(population) mean of statistics exam scores.
b. Find a 95% confidence interval for the true (population) mean statistics exam score.
Solution:
For (a):
“…a population standard deviation of three points. A random sample of 36 scores is taken and
gives a sample mean score of 68.”
σ=3 n = 36 𝑋 = 68
Based from the problem, “Find a 90% confidence interval estimate for all the mean score…”
thus, CL = 0.90. This will also give us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
Figure 10. 6
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the true population mean exam score for all statistics
students is between 67.17 and 68.83.
Ninety percent of all confidence intervals constructed in this way contain the true mean statistics
exam score. For example, if we constructed 100 of these confidence intervals, we would expect 90
of them to contain the true population mean exam score.
For (b):
“…a population standard deviation of three points. A random sample of 36 scores is taken and
gives a sample mean score of 68.”
σ=3 n = 36 𝑋 = 68
Based from the problem, “Find a 95% confidence interval estimate for all the mean score…”
thus, CL = 0.95. This will also give us the value of
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
Figure 10. 7
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that the true population mean exam score for all statistics
students is between 67.02 and 68.98.
Ninety-five percent of all confidence intervals constructed in this way contain the true value of
the population mean statistics exam score.
As you have noticed in the two examples, the 90% confidence interval is (67.17, 68.83). The 95%
confidence interval is (67.02, 68.98). The 95% confidence interval is wider. If you look at the graphs as
shown in Figure 10.8, because the area 0.95 is larger than the area 0.90, it makes sense that the 95%
confidence interval is wider. To be more confident that the confidence interval does contain the true value
of the population mean for all statistics exam scores, the confidence interval necessarily needs to be wider.
Figure 10. 8 Comparing the Probability Density of 90% and 95% Confidence Interval
Increasing the confidence level increases the margin of error, making the confidence interval
wider.
Decreasing the confidence level decreases the margin of error, making the confidence interval
narrower.
Increasing the sample size causes the margin of error to decrease, making the confidence
interval narrower.
Decreasing the sample size causes the margin of error to increase, making the confidence
interval wider.
2. Suppose average pizza delivery times are normally distributed with an unknown population mean
and a population standard deviation of six minutes. A random sample of 28 pizza delivery
restaurants is taken and has a sample mean delivery time of 36 minutes.
a. Find a 90% confidence interval estimate for the population mean delivery time.
b. Find a 95% confidence interval estimate for the true mean pizza delivery time.
c. Assume the sample size is changed to 50 restaurants with the same sample mean. Find a
90% confidence interval estimate for the population mean delivery time.
Solution:
For (a):
“…a population standard deviation of six minutes. A random sample of 28 pizza delivery
restaurants is taken and has a sample mean delivery time of 36 minutes.”
σ = 6 minutes n = 28 𝑋 = 36
Based from the problem, “Find a 90% confidence interval estimate for the population mean
delivery time” thus, CL = 0.90. This will also give us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
Figure 10. 10
6 6
36 − (1.65) < 𝜇 < 36 + (1.65)
√28 √28
34.129 < μ < 37.871
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the true population mean delivery time is between 34.13
and 37.87 minutes.
Ninety percent of all confidence intervals constructed in this way contain the true mean delivery
time. For example, if we constructed 100 of these confidence intervals, we would expect 90 of
them to contain the true population mean delivery time.
For (b):
“…a population standard deviation of six minutes. A random sample of 28 pizza delivery
restaurants is taken and has a sample mean delivery time of 36 minutes.”
σ = 6 minutes n = 28 𝑋 = 36
Based from the problem, “Find a 95% confidence interval estimate for the population mean
delivery time…” thus, CL = 0.95. This will also give us the value of
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
5. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that the true population mean delivery time is between
33.78 and 38.22 minutes.
Ninety five percent of all confidence intervals constructed in this way contain the true
mean delivery time. For example, if we constructed 100 of these confidence intervals, we would
expect 95 of them to contain the true population mean delivery time.
For (c):
“Assume the sample size is changed to 50 restaurants with the same sample mean. Find a 90%
confidence interval estimate for the population mean delivery time.”
σ = 6 minutes n = 50 𝑋 = 36
Based from the problem, “Find a 90% confidence interval estimate for the population mean
delivery time…” thus, CL = 0.90. This will also give us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the true population mean delivery time is between
34.60 and 37.40 minutes.
Ninety percent of all confidence intervals constructed in this way contain the true mean
delivery time. For example, if we constructed 100 of these confidence intervals, we would expect
90 of them to contain the true population mean delivery time.
In practice, we rarely know the population standard deviation. In the past, when the sample size was
large, this did not present a problem to statisticians. They used the sample standard deviation s as an
estimate for σ and proceeded as before to calculate a confidence interval with close enough results.
However, statisticians ran into problems when the sample size was small. A small sample size caused
inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the Guinness brewery in Dublin, Ireland ran into this problem. His
experiments with hops and barley produced very few samples. Just replacing σ with s did not produce
accurate results when he tried to calculate a confidence interval. He realized that he could not use a
normal distribution for the calculation; he found that the actual distribution depends on the sample size.
This problem led him to “discover” what is called the Student’s t-distribution. The name comes from the
fact that Gosset wrote under the pen name “Student.”
Up until the mid-1970s, some statisticians used the normal distribution approximation for large
sample sizes and only used the Student’s t-distribution only for sample sizes of at most 30. With graphing
calculators and computers, the practice now is to use the Student’s t-distribution whenever s is used as
an estimate for σ.
If you draw a simple random sample of size n from a population that has an approximately a normal
distribution with mean μ and unknown population standard deviation σ and calculate the t-score:
𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
is from its mean μ. For each sample size n, there is a different Student’s t-distribution.
1. It is bell-shaped.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
Figure 10. 12 Student’s t Distribution in comparison with Normal Distribution Invalid source specified.
The t distribution differs from the standard normal distribution in the following ways:
2. The t distribution is a family of curves based on the concept of degrees of freedom, which is
related to sample size.
3. As the sample size increases, the t distribution approaches the standard normal distribution.
4. The Student’s t-distribution has more probability in its tails than the standard normal
distribution because the spread of the t-distribution is greater than the spread of the standard
normal. So, the graph of the Student’s t-distribution will be thicker in the tails and shorter in the
center than the graph of the standard normal distribution.
To calculate any Student’s t-probabilities, a probability table for the Student’s t-distribution can
also be used. A Student’s t table gives t-scores given the degrees of freedom and the right-tailed
probability such as the one found in Appendix B. The table gives t-scores that correspond to the
confidence level (column) and degrees of freedom (row). When using a t-table, note that some tables are
formatted to show the confidence level in the column headings, while the column headings in some tables
may show only corresponding area in one or both tails.
The area of α/2 will then be used to find the t – score in the student’s t table in Appendix B. We
will be using the positive t – score. This process is like the process of finding a z – score.
CL
n = sample size
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
The interpretation should clearly state the confidence level, explain what population parameter is
being estimated, and state the confidence interval.
“We estimate with ___% confidence that the true population _______ (include the context of the
problem) is between ___ and ___ (include appropriate units).”
Examples:
3. Suppose you do a study of acupuncture to determine how effective it is in relieving pain. You
measure sensory rates for 15 subjects with the results given. Use the sample data to construct a
95% confidence interval for the mean sensory rate for the population (assumed normal) from
which you took the data.
Solution:
Long Method:
This will give us the same result with the long method:
𝑥̅ = 8.2267
𝑠̅ = 1.6722
Based from the problem, “Use the sample data to construct a 95% confidence interval for the
mean sensory rate…” thus, CL = 0.95. This will also give us the value of
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
df = n – 1 = 15 – 1 =14
As shown in Figure 10.12, we use the t – distribution table to locate the t – score.
We estimate with 95% confidence that the true population mean sensory rate is between 7.30 and
9.15.
4. Invalid source specified.The Human Toxome Project (HTP) is working to understand the scope of
industrial pollution in the human body. Industrial chemicals may enter the body through pollution or
as ingredients in consumer products. In October 2008, the scientists at HTP tested cord blood samples
for 20 newborn infants in the United States. The cord blood of the “In utero/newborn” group was
tested for 430 industrial compounds, pollutants, and other chemicals, including chemicals linked to
brain and nervous system toxicity, immune system toxicity, and reproductive toxicity, and fertility
problems. There are health concerns about the effects of some chemicals on the brain and nervous
system. The data below shows how many of the targeted chemicals were found in each infant’s cord
blood.
Use this sample data to construct a 90% confidence interval for the mean number of targeted
industrial chemicals to be found in an in an infant’s blood.
Solution:
Whether you we use the long method or the calculator method, this will give us the same result,
𝑥̅ = 127.45
𝑠̅ = 25.965
Based from the problem, “…construct a 90% confidence interval for the mean number of
targeted industrial chemicals to be found in an infant’s blood.” thus, CL = 0.90. This will also give
us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
df = n – 1 = 20 – 1 =19
As shown in Figure 10.13, we use the t – distribution table to locate the t – score.
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the mean number of all targeted industrial chemicals
found in cord blood in the United States is between 117.412 and 137.488.
5. A random sample of statistics students were asked to estimate the total number of hours they
spend watching television in an average week. The responses are recorded below. Use this sample
data to construct a 98% confidence interval for the mean number of hours statistics students will
spend watching television in one week.
0 3 1 20 9
5 10 1 10 4
14 2 4 4 5
Solution:
Whether you we use the long method or the calculator method, this will give us the same result,
𝑥̅ = 6.133
𝑠̅ = 5.514
Based from the problem, “Use this sample data to construct a 98% confidence interval for the
mean…” thus, CL = 0.98. This will also give us the value of
α = 1 – CL = 1 – 0.98 = 0.02
α/2 = 0.01
df = n – 1 = 15 – 1 =14
As shown in Figure 10.14, we use the t – distribution table to locate the t – score.
𝑠 𝑠
𝑋−𝑡 <𝜇 <𝑋+𝑡
√𝑛 √𝑛
5.514 5.514
6.133 − (2.624) < 𝜇 < 6.133 − (2.624)
√15 √15
2.397 < 𝜇 < 9.869
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 98% confidence that the mean number of all hours that statistics students
spend watching television in one week is between 2.397 and 9.869.
During an election year, we see articles in the newspaper that state confidence intervals in terms of
proportions or percentages. For example, a poll for a candidate running for president might show that the
candidate has 40% of the vote within three percentage points (if the sample is large enough). Often,
election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true
proportion of voters who favored the candidate would be between 0.37 and 0.43.
Investors in the stock market are interested in the true proportion of stocks that go up and down
each week. Businesses that sell personal computers are interested in the proportion of households in the
United States that own personal computers. Confidence intervals can be calculated for the true
proportion of stocks that go up or down each week and for the true proportion of households in the
United States that own personal computers.
The procedure to find the confidence interval, the sample size, the error bound, and the confidence
level for a proportion is similar to that for the population mean, but the formulas are different.
𝜇 = 𝑛𝑝 𝑎𝑛𝑑 𝜎= 𝑛𝑝𝑞
3. If the number of successes 𝑛𝑝̂ and the number of failures 𝑛𝑞 are both greater than five.
𝑛𝑝̂ ≥ 5 𝑎𝑛𝑑 𝑛𝑞 ≥ 5
The area of α/2 will then be used to find the z – score in the standard normal table in Appendix A.
We will be using the positive z – score. You may refer to Week 6 Section VII as a review.
CL
Here are some common confidence intervals that are utilized in sampling distribution:
For a 90% confidence interval, 𝑧 = 1.65
For a 95% confidence interval, 𝑧 = 1.96
For a 99% confidence interval, 𝑧 = 2.58
𝑝̂ 𝑞 𝑝̂ 𝑞
𝑝̂ − 𝑧 < 𝑝 < 𝑝̂ + 𝑧
𝑛 𝑛
where:
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
The interpretation should clearly state the confidence level, explain what population parameter is
being estimated, and state the confidence interval.
“We estimate with ___% confidence that the true population _______ (include the context of the
problem) is between ___ and ___ (include appropriate units).”
Examples:
6. Suppose that a market research firm is hired to estimate the percent of adults living in a large city
who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to
determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes – they own
cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true
proportion of adult residents of this city who have cell phones.
Solution:
Also, according to the problem, “Of the 500 people surveyed, 421 responded yes – they own cell
phones.”, therefore, we have the following given:
n = 500 X = 421
Based from the problem, “Using a 95% confidence level…” thus, CL = 0.95. This will also give us
the value of
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
𝑋 421
𝑝̂ = = = 0.842 𝑎𝑛𝑑 𝑞 = 1 − 𝑝̂ = 1 − 0.842 = 0.158
𝑛 500
(0.842)(0.158) (0.842)(0.158)
0.842 − (1.96) < 𝑝 < 0.842 + (1.96)
500 500
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city
have cell phones.
Ninety-five percent of the confidence intervals constructed in this way would contain the true value
for the population proportion of all adult residents of this city who have cell phones.
7. For a class project, a political science student at a large university wants to estimate the percent of
students who are registered voters. He surveys 500 students and finds that 300 are registered voters.
Compute a 90% confidence interval for the true percent of students who are registered voters, and
interpret the confidence interval.
Solution:
Also, according to the problem, “He surveys 500 students and finds that 300 are registered
voters.”, therefore, we have the following given:
n = 500 X = 300
Based from the problem, “Compute a 90% confidence level…” thus, CL = 0.90. This will also give
us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
(0.6)(0.4) (0.6)(0.4)
0.60 − (1.65) < 𝑝 < 0.60 + (1.65)
500 500
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the true percent of all students that are registered voters
is between 56.4% and 63.6%. That is, we estimate with 90% confidence that between 56.4% and
63.6% of ALL students are registered voters.
Ninety percent of all confidence intervals constructed in this way contain the true value for the
population percent of students that are registered voters.
8. In a sample of 300 students, 68% said they own an iPod and a smart phone. Compute a 97% confidence
interval for the true percent of students who own an iPod and a smartphone.
Solution:
Also, according to the problem, “In a sample of 300 students, 68% said they own an iPod and a
smart phone.”, therefore, we have the following given:
Based from the problem, “Compute a 97% confidence level…” thus, CL = 0.97. This will also give
us the value of
α = 1 – CL = 1 – 0.97 = 0.03
α/2 = 0.015
(0.68)(0.32) (0.68)(0.32)
0.68 − (2.17) < 𝑝 < 0.68 − (2.17)
300 300
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We are 97% confident that the true proportion of all students who own an iPod and a smart
phone is between 0.6531 and 0.7069.
VII. Summary
In this chapter, we learned how to calculate the confidence interval for a single population mean
where the population standard deviation is known. When estimating a population mean, the margin of
error is called the error bound for a population mean (EBM). A confidence interval has the general form:
(lower bound, upper bound) = (point estimate – EBM, point estimate + EBM)
The calculation of EBM depends on the size of the sample and the level of confidence desired. The
confidence level is the percent of all possible samples that can be expected to include the true population
parameter. As the confidence level increases, the corresponding EBM increases as well. As the sample size
increases, the EBM decreases.
In many cases, the researcher does not know the population standard deviation, σ, of the measure
being studied. In these cases, it is common to use the sample standard deviation, s, as an estimate of σ.
The normal distribution creates accurate confidence intervals when σ is known, but it is not as accurate
when s is used as an estimate. In this case, the Student’s t-distribution is much better. The t-score follows
the Student’s t-distribution with n – 1 degrees of freedom.
Some statistical measures, like many survey questions, measure qualitative rather than quantitative
data. In this case, the population parameter being estimated is a proportion. It is possible to create a
confidence interval for the true population proportion following procedures similar to those used in
creating confidence intervals for population means. The formulas are slightly different, but they follow
the same reasoning.
Introductory Statistics. Authored by: Barbara Illowski, Susan Dean. Provided by: Open Stax. Located
at: http://cnx.org/contents/30189442-6998-4686-ac05-ed152b91b9de@17.44. License: CC BY:
Attribution. License Terms: Download for free at http://cnx.org/contents/30189442-6998-4686-ac05-
ed152b91b9de@17.44
IX. Exercises
1. In a local teaching district, a technology grant is available to teachers in order to install a cluster of
four computers in their classrooms. From the 6,250 teachers in the district, 250 were randomly
selected and asked if they felt that computers were an essential teaching tool for their classroom. Of
those selected, 142 teachers felt that computers were an essential teaching tool. Calculate a 99%
confidence interval for the proportion of teachers who felt that computers are an essential teaching
tool.
2. Josie followed the guidelines presented to her and conducted a binomial experiment. She did 300
trials and reported a sample proportion of 0.61. Calculate the 90%, 95%, and 99% confidence intervals
for this sample. What did you notice about the confidence intervals as the confidence level increased?
3. A study was conducted to determine the mean birth weight of a certain breed of kittens. Consider the
birth weights of kittens to be normally distributed. A sample of 45 kittens was randomly selected from
all kittens of this breed at a large veterinary hospital. The birth weight of each kitten in the sample
was recorded. The sample mean was 3.56 ounces, and the sample standard deviation was 0.2 ounces.
Set a 90% confidence interval on the mean birth weight of all kittens of this breed.
4. In a study of seventh grade students, the mean number of hours per week that they watched
television was 18.7 with a standard deviation of 4.5 hours. Assume the population has a normal
distribution. Construct a 95% confidence interval for the mean number of hours of tv watched by
seventh grade students.
5. A random sample of 40 college students has mean annual earnings of $3,245 and a standard deviation
of $567. Construct a 99% confidence interval for the population.
6. A random sample of 16 light bulbs has a mean life of 650 hours and a standard deviation of 32 hours.
Assume the population has a normal distribution. Construct a 90% confidence interval for the
population mean.
7. A random survey of enrollment at 35 community colleges across the United States yielded the
following figures:
6,414 1,550 2,109 9,350 21,828 4,300 5,944 5,722 2,825 2,044 5,481 5,200
5,853 2,750 10,012 6,357 27,000 9,414 7,681 3,200 17,500 9,200 7,380 18,314
6,557 13,713 17,768 7,493 2,771 2,861 1,263 7,285 28,165 5,080 11,622
Assume the underlying population is normal. Construct a 95% confidence interval for the population
mean enrollment at community colleges in the United States.
8. The standard deviation of the weights of elephants is known to be approximately 15 pounds. We wish
to construct a 95% confidence interval for the mean weight of newborn elephant calves. Fifty
newborn elephants are weighed. The sample mean is 244 pounds. The sample standard deviation is
11 pounds. Construct a 95% confidence interval for the population mean weight of newborn
elephants. State the confidence interval, sketch the graph, and calculate the error bound.
9. Five hundred and eleven (511) homes in a certain southern California community are randomly
surveyed to determine if they meet minimal earthquake preparedness recommendations. One
hundred seventy-three (173) of the homes surveyed met the minimum recommendations for
earthquake preparedness, and 338 did not. Find the confidence interval at the 90% Confidence Level
for the true population proportion of southern California community homes meeting at least the
minimum recommendations for earthquake preparedness.
10. The U.S. Census Bureau conducts a study to determine the time needed to complete the short form.
The Bureau surveys 200 people. The sample mean is 8.2 minutes. There is a known standard deviation
of 2.2 minutes. The population distribution is assumed to be normal. Construct a 90% confidence
interval for the population mean time to complete the forms. State the confidence interval, sketch
the graph, and calculate the error bound.
I. Introduction
We have seen that the sampling distribution of the sample mean, when the data come from a normal
distribution (and even, in large samples, when they do not) is itself a normal distribution. This allowed us
to find a confidence interval for the population mean. It is also often useful to find a confidence interval
for the population variance. This is important, for example, in quality control. However, the distribution
of the sample variance is not normal. To find a confidence interval for the population variance we need
to use another distribution called the “chi-squared”.
In this section, we will also show how to obtain a prediction interval on a future value of a normal
random variable. Finally, we will also investigate the use and application of tolerance intervals.
II. Objectives
1. Construct a confidence interval for the variance and standard deviation of a normal
distribution.
2. Interpret a confidence interval for the variance and standard deviation of a normal
distribution.
3. Distinguish chi – square distribution from other distributions
4. Construct prediction interval for a future observation and tolerance interval of a random
variable.
5. Interpret prediction interval for a future observation and tolerance interval of a random
variable.
Just as there is variability in a sample mean, there is also variability in a sample standard deviation.
The chi-square distribution can be used to find a confidence interval the standard deviation or variance.
We know that if we take samples from a population, then each sample will have a mean and a variance
associated with it. Just as the means form a distribution, so do the values of the variance and it is to this
distribution that we turn in order to find an interval estimate for the value of the variance of the
population. Note that if the original population is normal, samples taken from this population have means
which are normally distributed. When we consider the distribution of variances calculated from the
samples, we need the chi-squared (χ2) distribution in order to calculate the confidence intervals.
(𝒏 − 𝟏)𝑺𝟐
𝒙𝟐 =
𝝈𝟐
Note: "chi-square" is pronounced "kai" as in sky, not "chai" like the tea.
Figure 11. 1 Chi-square distribution plots for varying degrees of freedom.Invalid source specified.
3. As the number of degrees of freedom increases, the distribution becomes more symmetric.
4. All chi – square values are greater than or equal to 0, that is, Χ2 ≥ 0
5. The area under each chi – square distribution curve is equal to 1.
Figure 11. 2 Confidence Interval for a Chi – Square Distribution Invalid source specified.
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎 <
𝑋 / 𝑋( )
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝑋 / 𝑋( )
where:
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
The interpretation should clearly state the confidence level, explain what population parameter is
being estimated, and state the confidence interval.
“We estimate with ___% confidence that the true population _______ (include the context of the
problem) is between ___ and ___ (include appropriate units).”
Examples:
1. Suppose a sample of 30 ECC students are given an IQ test. If the sample has a standard deviation of
12.23 points, find a 90% confidence interval for the population standard deviation.
Solution:
Based from the problem, “Suppose a sample of 30 ECC students are given an IQ test. If the sample
has a standard deviation of 12.23 points…”, thus we are given,
n = 30 students s = 12.23
Based from the problem, “…find a 90% confidence interval for the population standard deviation.”
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
df = n – 1 = 30 - 1 = 29
Based from the problem, “…find a 90% confidence interval for the population standard deviation”,
thus, we use the formula for the confidence interval of the standard deviation:
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝑋 / 𝑋( )
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that the true population standard deviation of an IQ of ECC
students is between 10.10 and 15.65 bpm.
2. A large candy manufacturer produces, packages and sells packs of candy targeted to weigh 52 grams.
A quality control manager working for the company was concerned that the variation in the actual
weights of the targeted 52-gram packs was larger than acceptable. That is, he was concerned that
some packs weighed significantly less than 52-grams and some weighed significantly more than 52
grams. In an attempt to estimate σ, the standard deviation of the weights of all of the 52-gram packs
the manufacturer makes, he took a random sample of n = 10 packs off of the factory line. The random
sample yielded a sample variance of 4.2 grams. Use the random sample to derive a 95% confidence
interval for σ.
Solution:
Based from the problem, “…he took a random sample of n = 10 packs off of the factory line. The
random sample yielded a sample variance of 4.2 grams.” thus we are given,
Based from the problem, “Use the random sample to derive a 95% confidence interval for σ.”
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
df = n – 1 = 10 - 1 = 9
Based from the problem, “In an attempt to estimate σ”, thus, we use the formula for the
confidence interval of the standard deviation:
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝑋 / 𝑋( )
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that the true population standard deviation of the weights of
all the packs of candy coming off of the factory line is between 1.41 and 3.74 grams.
3. Imagine you randomly select and weigh 30 samples of an allergy medication. The sample standard
deviation is 1.2 milligrams. Assuming the weights are normally distributed, construct 99%
confidence intervals for the population variance and standard deviation.
Solution:
Based from the problem, “You randomly select and weigh 30 samples of an allergy medication.
The sample standard deviation is 1.2 milligrams” thus we are given,
n = 30 s = 1.2 milligrams
Based from the problem, “…construct 99% confidence intervals for the population variance and
standard deviation”
α = 1 – CL = 1 – 0.99 = 0.01
α/2 = 0.005
df = n – 1 = 30 – 1 = 29
Based from the problem, “Assuming the weights are normally distributed, construct 99%
confidence intervals for the population variance and standard deviation.”, thus, we use both
formulas.
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝑋 / 𝑋( )
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎 <
𝑋 / 𝑋( )
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that the true population standard deviation of the weights of
the allergy medications is between 0.89 and 1.78 milligrams.
Figure 11.2 shows a guide when deciding which confidence interval to use in any population
parameter required. You can use it to decide which is the most appropriate confidence interval to be used.
Known: z - score
Is σ known or
Mean, μ
unknown?
Unknown: t -
score
What parameter
is the problem Proportion, p z - score
referring to?
Standard
Deviation, σ or chi - square
Variance, σ2
V. Prediction Interval
[1]There are some instances where we may want to predict future observations of random variables.
Because we are not estimating mean or variances, we can’t utilize confidence interval. To do this, we use
another type of statistical interval which is the prediction interval. If we want to predict a single future
observation, the value Xn+1, we must consider the following characteristics:
You may refer to Week 10 Section V as a review and guide in locating the t – score.
1 1
𝑥̅ − 𝑡 𝑠 1 + ≤𝑋 ≤ 𝑥̅ + 𝑡 𝑠 1 +
𝑛 𝑛
where:
𝑥̅ is the mean
𝑡 is the t – score
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
The interpretation should clearly state the tolerance level, explain what is being estimated, and
state the tolerance interval.
“We estimate with ___% confidence that the prediction interval _______ (include the context of
the problem) is between ___ and ___ (include appropriate units).”
Examples:
4. An LCD television brightness is determined by measuring the required current to achieve the specified
level of brightness. It was found that a random sample of 15 tubes has a mean of 312.7 amperes and
a standard deviation of 17.5 amperes. Calculate a 90% prediction interval for the brightness level of
the next test tube.
Solution:
Based from the problem, “It was found that a random sample of 15 tubes has a mean of 312.7
amperes and a standard deviation of 17.5 amperes.”, thus, we are given,
Based from the problem, “Calculate a 90% prediction interval for the brightness level of the next
test tube.” thus, CL = 0.90. This will also give us the value of
α = 1 – CL = 1 – 0.90 = 0.10
α/2 = 0.05
df = n – 1 = 15 – 1 =14
As shown in Figure 11.10, we use the t – distribution table to locate the t – score.
1 1
𝑥̅ − 𝑡 𝑠 1 + ≤𝑋 ≤ 𝑥̅ − 𝑡 𝑠 1 +
𝑛 𝑛
1 1
312.7 − (1.761)(17.5) 1 + ≤𝑋 ≤ 312.7 + (1.761)(17.5) 1 +
15 15
280.8718 ≤ 𝑋 ≤ 344.5282
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
We estimate with 90% confidence that the prediction for the brightness level of the next test tube
is between 280.87 and 344.53 amperes.
5. The level of polyunsaturated fatty acid is determined for a brand of margarine. A sample size of 6
packages was analyzed and yields to the following data:
Calculate the level of the polyunsaturated fatty acid in the next package of margarine. Use 99%
prediction interval.
Solution:
Whether you we use the long method or the calculator method, this will give us the same result,
𝑥̅ = 16.9833
𝑠̅ = 0.3189
Based from the problem, “Use 99% prediction interval.” thus, CL = 0.99. This will also give us the
value of
α = 1 – CL = 1 – 0.99 = 0.01
α/2 = 0.005
df = n – 1 = 6 – 1 = 5
As shown in Figure 11.11, we use the t – distribution table to locate the t – score.
1 1
𝑥̅ − 𝑡 𝑠 1 + ≤𝑋 ≤ 𝑥̅ − 𝑡 𝑠 1 +
𝑛 𝑛
1 1
16.9833 − (4.032)(0.3189) 1 + ≤𝑋 ≤ 16.9833 + (4.032)(0.3189) 1 +
6 6
15.5944 ≤ 𝑋 ≤ 18.3721
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
We estimate with 99% confidence that the prediction level for the level of the polyunsaturated
fatty acid in the next package of margarine is between 15.59 and 18.37.
6. A civil engineer tests concrete for its compressive strength. He tested a random sample of 12
specimens. The following are the measurements:
2216 2237 2249 2204 2225 2301 2281 2263 2318 2255 2275 2295
Find the compressive strength of the next specimen of concrete tested using a 95% prediction
interval.
Solution:
Whether you we use the long method or the calculator method, this will give us the same result,
𝑥̅ = 2259.9167
𝑠̅ = 35.5693
Based from the problem, “…using a 95% prediction interval.” thus, CL = 0.95. This will also give us
the value of
α = 1 – CL = 1 – 0.95 = 0.05
α/2 = 0.025
df = n – 1 = 12 – 1 = 11
As shown in Figure 11.5, we use the t – distribution table to locate the t – score.
1 1
𝑥̅ − 𝑡 𝑠 1 + ≤𝑋 ≤ 𝑥̅ − 𝑡 𝑠 1 +
𝑛 𝑛
1 1
2259.9167 − (2.201)(35.5693) 1 + ≤𝑋 ≤ 2259.9167 + (2.201)(35.5693) 1 +
12 12
2178.4319 ≤ 𝑋 ≤ 2341.4015
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
We estimate with 95% confidence that the prediction level of the compressive strength of the next
specimen of concrete tested is between 2178.43 and 2341.40.
[17] Young (2010) defines tolerance intervals as statistical design limits where at least a certain
proportion of the population falls for a given confidence level. [18]It is a statistical method which provides
a way to cover a fixed proportion of the population with a stated confidence. The boundaries of a
tolerance interval are called tolerance limits. [17] Tolerance intervals are commonly used in quality
control. manufacturing, and engineering which includes
How do tolerance intervals differ from confidence intervals and prediction intervals?
[19]Confidence intervals (CI), prediction intervals (PI) and tolerance intervals are commonly used
intervals derived from sample statistics.
Confidence interval is a range of values that is likely to contain the value of an unknown population
parameter, such as the mean, with a specified degree of confidence. For example, if the 95% CI of the
average fill volume of 375 ml bottles is 368–372 ml, you can be 95% confident that the true value of the
process mean is within this interval.
Prediction interval is a range of values for a product's characteristic that represents where the value
of a single new observation is likely to fall with a specified degree of confidence. For example, if the 95%
PI of the average fill volume of 375 ml bottles is 360–379 ml, you can be 95% confident that the next
sampled bottle will have a fill volume that is within this interval.
Tolerance interval is a range of values for a product's characteristic that likely covers where a
specified proportion of the population lies with a specified degree of confidence. For example, if the 95%
tolerance interval for 99% of the population for the fill volume of 375 ml bottles is 358–381 ml, you can
be 95% confident that 99% of the bottles to be filled in the future will have volumes that are within this
interval.
Figure 11. 13 Comparison between Tolerance Interval and Confidence Interval [20]
𝑥̅ ± 𝑘𝑠
where:
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
The interpretation should clearly state the confidence level, explain what population parameter is
being estimated, and state the confidence interval.
“We estimate with ___% confidence that at least __% of the values of _____ (include the context
of the problem) is between ___ and ___ (include appropriate units).”
Examples:
7. A research engineer for a tire manufacturer is investigating tire life for a new rubber compound and
has built 16 tires and tested them to end-of-life in a road test. The sample mean and standard
deviation are 60,139.7 and 3645.94 kilometers. Compute a 95% tolerance interval on the life of the
that has confidence level 95%.
Solution:
Based from the problem, “…has built 16 tires and tested them to end-of-life in a road test. The
sample mean and standard deviation are 60,139.7 and 3645.94 kilometers.”, thus we are given,
2. Find the tolerance interval factor k that corresponds to the specific confidence level.
Based from the problem, “Compute a 95% tolerance interval on the life of the that has confidence
level 95%.”, thus,
Confidence Level – 95% = 0.95
Percent Coverage – 95% = 0.95
Thus, k = 2.903.
𝑥̅ ± 𝑘𝑠
60139.7 + (2.903)(3645.94) = 70,723.86
60139.7 − (2.903)(3645.94) = 49,555.54
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 95% confidence that at least 95% of the values of the tire life is between 70,
723.86 and 49, 555.54 kilometers.
8. An Izod impact test was performed on 20 specimens of PVC pipe. The sample mean is 1.25 and the
sample standard deviation is 0.25. Compute a 99% tolerance interval on the impact strength of PVC
pipe that has confidence level 90%.
Solution:
Based from the problem, “An Izod impact test was performed on 20 specimens of PVC pipe. The
sample mean is 1.25 and the sample standard deviation is 0.25.”, thus we are given,
2. Find the tolerance interval factor k that corresponds to the specific confidence level.
Based from the problem, “Compute a 99% tolerance interval on the impact strength of PVC pipe
that has confidence level 90%.”, thus,
Confidence Level – 90% = 0.90
Percent Coverage – 99% = 0.99
Thus, k = 3.368.
𝑥̅ ± 𝑘𝑠
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that at least 99% of the values of the impact strength of PVC
pipe is between 0.408 and 2.092.
9. The wall thickness of 25 glass 2-liter bottles was measured by a quality-control engineer. The sample
mean was 4.05 millimeters, and the sample standard deviation was 0.08 millimeter. Compute a 90%
tolerance interval on bottle wall thickness that has confidence level 90%.
Solution:
Based from the problem, “The sample mean was 4.05 millimeters, and the sample standard
deviation was 0.08 millimeter.”, thus we are given,
2. Find the tolerance interval factor k that corresponds to the specific confidence level.
Based from the problem, “Compute a 90% tolerance interval on bottle wall thickness that has
confidence level 90%.”, thus,
Confidence Level – 90% = 0.90
Percent Coverage – 90% = 0.90
Thus, k = 2.077.
𝑥̅ ± 𝑘𝑠
4.05 + (2.077)(0.08) = 4.2162
4.05 − (2.077)(0.08) = 3.8915
4. Write a sentence that interprets the estimate in the context of the situation in the problem.
Explain what the confidence interval means, in the words of the problem.
We estimate with 90% confidence that at least 90% of the values of the impact strength of PVC
pipe is between 3.8915 and 4.2162 millimeters.
VII. Summary
The chi-square distribution with k degrees of freedom is the distribution of a sum of the squares
of k independent standard normal random variables.
To form a confidence interval for the population variance, use the chi-square distribution with
degrees of freedom equal to one less than the sample size: df = n −1.
To predict the value of a future observation, we use the prediction interval.
To identify a fixed proportion of the population with a stated confidence, we use the tolerance
interval.
VIII. Exercises
1. A rivet is to be inserted into a hole. A random sample of 15 parts is selected, and the hole diameter
is measured. The sample standard deviation of the hole diameter measurements is 0.008 millimeters.
Construct a 99% confidence level for the variance.
2. The sugar content of the syrup in canned peaches is normally distributed. A random sample of 10 cans
yields a sample standard deviation of 4.8 milligrams.
a. Find a 95% confidence interval for the standard deviation.
b. Compute a 95% tolerance interval on the syrup volume that has confidence level 90%.
3. A research engineer for a tire manufacturer is investigating tire life for a new rubber compound and
has built 16 tires and tested them to end-of-life in a road test. The sample mean and standard
deviation are 60,139.7 and 3645.94 kilometers.
a. Find an 88% confidence bound interval for the population variance and standard
deviation.
b. Compute a 98% prediction interval on the life of the next tire of this type tested under
conditions that are similar to those employed in the original test.
4. An Izod impact test was performed on 20 specimens of PVC pipe. The sample mean is 1.25 and the
sample standard deviation is 0.25. Compute a 99% prediction interval on the impact strength of the
next specimen of PVC pipe tested.
6. A postmix beverage machine is adjusted to release a certain amount of syrup into a chamber where
it is mixed with carbonated water. A random sample of 25 beverages was found to have a mean
syrup content of 1.10 fluid ounces and a standard deviation of 0.015 fluid ounces. Compute a 90%
prediction interval on the syrup volume in the next beverage dispensed.
7. A civil engineer tests concrete for its compressive strength. He tested a random sample of 12
specimens. The following are the measurements:
2216 2237 2249 2204 2225 2301 2281 2263 2318 2255 2275 2295
Compute a 90% tolerance interval on the compressive strength of the concrete that has 90%
confidence.
8. A machine produces metal rods used in an automobile suspension system. A random sample of 15
rods is selected, and the diameter is measured. The resulting data (in millimeters) are as follows:
Compute a 95% tolerance interval on the diameter of the rods that has 90% confidence.
I. Introduction
In this chapter we will explore hypothesis testing, which involves making conjectures about a
population based on a sample drawn from the population. Hypothesis tests are often used in statistics
to analyze the likelihood that a population has certain characteristics. For example, we can use
hypothesis testing to analyze if a senior class has a average SAT score or if a prescription drug has a
certain proportion of the active ingredient A hypothesis is simply a conjecture about a characteristic or
set of facts. When performing statistical analyses, our hypotheses provide the general framework of
what we are testing and how to perform the test. These tests are never certain, and we can never prove
or disprove hypotheses with statistics, but the outcomes of these tests provide information that either
helps support or refute the hypothesis itself.
We will also learn about different hypothesis tests, how to develop hypotheses, how to calculate
statistics to help support or refute the hypotheses and understand the errors associated with hypothesis
testing.
II. Objectives
1. Apply the appropriate hypothesis test for any given statistical situation.
2. Interpret hypothesis tests.
3. Distinguish statistical significance from practical importance.
Figure 12. 1 You can use a hypothesis test to decide if a dog breeder’s claim that every Dalmatian has 35
spots is statistically sound. (Credit: Robert Neff)
One job of a statistician is to make statistical inferences about populations based on samples taken
from the population. Confidence intervals are one way to estimate a population parameter. Another way
to make a statistical inference is to make a decision about a parameter. For instance, a car dealer
advertises that its new small truck gets 35 miles per gallon, on average. A tutoring service claims that its
method of tutoring helps 90% of its students get an A or a B. A company says that women managers in
their company earn an average of $60,000 per year.
A statistician will make a decision about these claims. This process is called “hypothesis testing.” A
hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician
makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, to
reject the null hypothesis.
In this chapter, you will conduct hypothesis tests on single means and single proportions. You will
also learn about the errors associated with these tests. Hypothesis testing consists of two contradictory
hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test,
a statistician will:
The actual test begins by considering two hypotheses. They are called the null hypothesis and the
alternative hypothesis. These hypotheses contain opposing viewpoints.
Null Hypothesis, H0
Alternative Hypothesis, Ha
It is a claim about the population that is contradictory to H0 and what we conclude when we reject
H0.
[4]is a statistical hypothesis that states the existence of a difference between a parameter and a
specific value, or states that there is a difference between two parameters
We test the null hypothesis against an alternative hypothesis and includes the outcomes not
covered by the null hypothesis
2. Directional Hypothesis
An assertion that one measure is less than or greater than another measure of
similar nature
Involves one of the order relatives, “less than” or “greater than”
One – sided hypothesis: one – tailed test
Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if
you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.
After you have determined which hypothesis the sample supports, you make a decision. There are
two options for a decision. They are “reject H0” if the sample information favors the alternative hypothesis
or “do not reject H0” or “decline to reject H0” if the sample information is insufficient to reject the null
hypothesis.
H0 Ha
H0 always has a symbol with an equal in it. Ha never has a symbol with an equal in it. The choice of
symbol depends on the wording of the hypothesis test. However, be aware that many researchers use =
in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is
acceptable because we only make the decision to reject or not reject the null hypothesis.
For each of the following, state the null and alternative hypothesis.
1. We have a medicine that is being manufactured and each pill is supposed to have 14 milligrams of the
active ingredient.
parameter: mean
H0: μ = 14 mg
Our null hypothesis states that the population has a mean equal to 14 milligrams.
Our alternative hypothesis states that the population has a mean that is different than 14
milligrams.
10. The school principal wants to test if it is true what teachers say -- that high school juniors use the
computer an average 3.2 hours a day.
parameter: “average” thus, mean
Our null hypothesis states that the population of high school juniors use the computer an average
equal to 3.2 hours a day.
Our alternative hypothesis states that the population of high school juniors use the computer an
average is different than 3.2 hours a day.
11. A medical trial is conducted to test whether a new medicine reduces cholesterol by 25%.
H0: p = 0.25
Our null hypothesis states that the population of high school juniors use the computer an average
equal to 3.2 hours a day.
Our alternative hypothesis states that the population of high school juniors use the computer an
average is different than 3.2 hours a day.
12. We want to test if college students take less than five years to graduate from college, on the average.
H0: μ ≥ 5
13. We want to test if it takes fewer than 45 minutes to teach a lesson plan.
H0: μ ≥ 45
14. In an issue of U.S. News and World Report, an article on school standards stated that about half of all
students in France, Germany, and Israel take advanced placement exams and a third pass. The same
article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the
percentage of U.S. students who take advanced placement exams is more than 6.6%.
H0: p ≤ 0.066
15. On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40%
pass on the first try.
H0: p = 0.40
A random survey of 75 death row inmates revealed that the mean length of time on death row is 17.4
years with a standard deviation of 6.3 years. If you were conducting a hypothesis test to determine if
the population mean time on death row could likely be 15 years, what would the null and alternative
hypotheses be?
H0: _______________________________________________________________________________
Ha: _______________________________________________________________________________
The National Institute of Mental Health published an article stating that in any one-year period,
approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose
that in a survey of 100 people in a certain town, seven of them suffered from depression or a
depressive illness. If you were conducting a hypothesis test to determine if the true proportion of
people in that town suffering from depression or a depressive illness is lower than the percent in the
general adult American population, what would the null and alternative hypotheses be?
H0: _______________________________________________________________________________
Ha: _______________________________________________________________________________
Some of the following statements refer to the null hypothesis, some to the alternate hypothesis. State
the null hypothesis, H0, and the alternative hypothesis. Ha, in terms of the appropriate parameter (μ or
p).
Hypothesis H0 or Ha μ or p
The mean number of years Americans work before retiring is 34.
At most 60% of Americans vote in presidential elections.
The mean starting salary for San Jose State University graduates is at least
$100,000 per year.
Twenty-nine percent of high school seniors get drunk each month.
Fewer than 5% of adults ride the bus to work in Los Angeles.
The mean number of cars a person owns in her lifetime is not more than ten.
About half of Americans prefer to live away from cities, given the choice.
Europeans have a mean paid vacation each year of six weeks.
The chance of developing breast cancer is under 11% for women.
Private universities’ mean tuition cost is more than $20,000 per year.
Deciding Whether to Reject the Null Hypothesis: One-Tailed and Two-Tailed Hypothesis Tests
When a hypothesis is tested, a statistician must decide on how much evidence is necessary in order
to reject the null hypothesis. For example, if the null hypothesis is that the average height of a population
is 64 inches a statistician wouldn't measure one person who is 66 inches and reject the hypothesis based
on that one trial. It is too likely that the discrepancy was merely due to chance.
We use statistical tests to determine if the sample data give good evidence against the claim (H0).
The level of significance, denoted by α, is the numerical measure that we use to determine the strength
of the sample evidence we are willing to consider strong enough to reject H 0. If we choose, for example, α
= 0.01 we are saying that we would get data at least as unusual as the data we have collected no more
than 1% of the time when H0 is true. The most frequently used levels of significance are 0.05 and 0.01.
If our data results in a statistic that falls within the region determined by the level of significance,
then we reject H0. The region is therefore called the critical region. When choosing the level of
significance, we need to consider the consequences of rejecting or failing to reject the null hypothesis. If
there is the potential for health consequences (as in the case of active ingredients in prescription
medications) or great cost (as in the case of manufacturing machine parts), we should use a more
‘conservative’ critical region with levels of significance such as .005 or .001.
When determining the critical regions for a two-tailed hypothesis test, the level of significance
represents the extreme areas under the normal density curve. We call this a two-tailed hypothesis test
because the critical region is in both ends of the distribution. For example, if there was a significance level
of 0.95 the critical region would be the most extreme 5 percent under the curve with 2.5 percent on each
tail of the distribution as shown in Figure 12.3.
Therefore, if the mean from the sample taken from the population falls within one of these critical
regions, we would conclude that there was too much of a difference between our sample mean and the
hypothesized population mean, and we would reject the null hypothesis. However, if the mean from the
sample falls in the middle of the distribution (in between the critical regions) we would fail to reject the
null hypothesis. This is illustrated in Figure 12.4.
Fail to Reject H0
Figure 12. 4 Two Tailed Tests Critical Region Invalid source specified.
We calculate the critical region for the single-tail hypothesis test a bit differently. We would use a
single-tail hypothesis test when the direction of the results is anticipated, or we are only interested in one
direction of the results. For example, a single-tail hypothesis test may be used when evaluating whether
to adopt a new textbook. We would only decide to adopt the textbook if its improved student
achievement relative to the old textbook. A single-tail hypothesis simply states that the mean is greater
or less than the hypothesized value.
A single-tail hypothesis test also means that we have only one critical region because we put the
entire region of rejection into just one side of the distribution. When the alternative hypothesis is that the
sample mean is greater, the critical region is on the right side of the distribution. When the alternative
hypothesis is that the sample is smaller, the critical region is on the left side of the distribution as shown
in Figure 12.5.
Figure 12. 5 One Tailed Tests Critical Region Invalid source specified.
To calculate the critical regions, we must first find the critical values or the cut-offs where the critical
regions start. To find these values, we use the critical values found specified by the z – distribution or t –
distribution depending on the statistical test. These values can be found in a table that lists the areas of
each of the tails under a normal distribution.
Table 12.2 outlines the possible outcomes in hypothesis testing. Which type of error is more serious
depends on the specific research situation, but ideally both types of errors should be minimized during
the analysis.
Table 12. 2 Possible outcomes of Hypothesis Testing
H0 is true H0 is false
The general approach to hypothesis testing focuses on the Type I error: rejecting the null hypothesis
when it may be true. The level of significance, also known as the alpha level, is defined as the probability
of making a Type I error when testing a null hypothesis. For example, at the 0.05 level, we know that the
decision to reject the hypothesis may be incorrect 5 percent of the time.
Often, we establish the alpha level based on the severity of the consequences of making a Type I
error. If the consequences are not that serious, we could set an alpha level at 0.10 or 0.20. However, in a
field like medical research we would set the alpha level very low (at 0.001 for example) if there was
potential bodily harm to patients.
Calculating the probability of making a Type II error is not as straightforward as calculating the
probability of making a Type I error. The probability of making a Type II error can only be determined
when values have been specified for the alternative hypothesis. The probability of making a type II error
is denoted by β.
β = probability of a Type II error = P(Type II error) = probability of not rejecting H0 when H0 is false
Once the value for the alternative hypothesis has been specified, it is possible to determine the
probability of making a correct decision (1−β). This quantity, 1−β, is called the power of the test. Ideally,
we want a high power that is as close to one as possible. Increasing the sample size can increase the Power
of the Test. We can also attempt minimize the Type II errors by setting higher alpha levels in situations
that do not have grave or costly consequences. The type of alternative hypothesis also affects the power
of the test.
Type I and type II errors are not caused by mistakes. These errors are the result of random chance.
The data provide evidence for a conclusion that is false. The goal in hypothesis testing is to minimize the
potential of both Type I and Type II errors. However, there is a relationship between these two types of
errors. As the level of significance or alpha level increases, the probability of making a Type II
error (β) decreases and vice versa.
If the consequences of a type I error are more serious, choose a small level of significance (α).
If the consequences of a type II error are more serious, choose a larger level of significance (α).
But remember that the level of significance is the probability of committing a type I error.
In general, we pick the largest level of significance that we can tolerate as the chance of a type I
error.
Examples:
16. Suppose the null hypothesis, H0, is: Frank’s rock-climbing equipment is safe.
Type I error: Frank thinks that his rock-climbing equipment may not be safe when, in fact, it
really is safe.
Type II error: Frank thinks that his rock-climbing equipment may be safe when, in fact, it is
not safe.
α = probability that Frank thinks his rock-climbing equipment may not be safe when, in fact, it really is
safe.
β = probability that Frank thinks his rock-climbing equipment may be safe when, in fact, it is not safe.
Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his
rock-climbing equipment is safe, he will go ahead and use it.)
17. Suppose the null hypothesis is: the blood cultures contain no traces of pathogen X. State the Type I
and Type II errors.
Type I error: The researcher thinks the blood cultures do contain traces of pathogen X,
when in fact, they do not.
Type II error: The researcher thinks the blood cultures do not contain traces of pathogen X,
when in fact, they do.
18. Suppose the null hypothesis is: The victim of an automobile accident is alive when he arrives at the
emergency room of a hospital.
Type I error: The emergency crew thinks that the victim is dead when, in fact, the victim is
alive.
Type II error: The emergency crew does not know if the victim is alive when, in fact, the
victim is dead.
α = probability that the emergency crew thinks the victim is dead when, in fact, he is really alive.
β = probability that the emergency crew does not know if the victim is alive when, in fact, the victim is
dead.
The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is
dead, they will not treat him.)
19. Suppose the null hypothesis is: a patient is not sick. Which type of error has the greater
consequence, Type I or Type II?
The error with the greater consequence is the Type II error: the patient will be thought well
when, in fact, he is sick, so he will not get treatment.
20. It’s a Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a
boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H0, is: It’s a
Boy Genetic Labs has no effect on gender outcome.
Type I error: This result when a true null hypothesis is rejected. In the context of this
scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender
outcome, when in fact it has no effect. The probability of this error occurring is denoted by
the Greek letter alpha, α.
Type II error: This result when we fail to reject a false null hypothesis. In context, we would
state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy
when, in fact, it does. The probability of this error occurring is denoted by the Greek letter
beta, β.
The error of greater consequence would be the Type I error since couples would use the It’s a Boy
Genetic Labs product in hopes of increasing the chances of having a boy.
21. “Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called
dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams
living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the
Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of
shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin
per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels
of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which
error has the greater consequence.
H0: the mean level of toxins is at most 800 μg, μ ≤ 800 μg.
Type I error: The DMF believes that toxin levels are still too high when, in fact, toxin levels
are at most 800 μg. The DMF continues the harvesting ban.
Type II error: The DMF believes that toxin levels are within acceptable levels (are at least
800 μg) when, in fact, toxin levels are still too high (more than 800 μg). The DMF lifts the
harvesting ban. This error could be the most serious. If the ban is lifted and clams are still
toxic, consumers could possibly eat tainted food.
In summary, the more dangerous error would be to commit a Type II error, because this error
involves the availability of tainted clams for consumption.
22. A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer.
Describe both the Type I and Type II errors in context. Which error is the more serious?
Type I: A cancer patient believes the cure rate for the drug is less than 75% when it actually
is at least 75%.
Type II: A cancer patient believes the experimental drug has at least a 75% cure rate when it
has a cure rate that is less than 75%.
In this scenario, the Type II error contains the more severe consequence. If a patient believes the
drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s)
choice about whether to use the drug as a treatment option.
A sleeping bag is tested to withstand temperatures of –15 °F. You think the bag cannot stand
temperatures that low. State the Type I and Type II errors in complete sentences.
Type I Error:________________________________________________________________________
__________________________________________________________________________________
Type II Error:_______________________________________________________________________
__________________________________________________________________________________
A group of doctors is deciding whether or not to perform an operation. Suppose the null hypothesis,
H0, is: the surgical procedure will go well. State the Type I and Type II errors in complete sentences.
Type I Error:________________________________________________________________________
__________________________________________________________________________________
Type II Error:_______________________________________________________________________
__________________________________________________________________________________
A group of doctors is deciding whether or not to perform an operation. Suppose the null hypothesis,
H0, is: the surgical procedure will go well. Which is the error with the greater consequence?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
A group of divers is exploring an old sunken ship. Suppose the null hypothesis, H0, is: the sunken ship
does not contain buried treasure. State the Type I and Type II errors in complete sentences.
Type I Error:________________________________________________________________________
__________________________________________________________________________________
Type II Error:_______________________________________________________________________
__________________________________________________________________________________
Guilty: The jury concludes that there is enough evidence to convict the defendant. The
evidence is so strong that there is not a reasonable doubt that the defendant is guilty.
Not Guilty: The jury concludes that there is not enough evidence to conclude beyond a
reasonable doubt that the person is guilty. Notice that they do not conclude that the person is
innocent. This verdict says only that there is not enough evidence to return a guilty verdict.
The null hypothesis is “The person is innocent.” The alternative hypothesis is “The person is guilty.”
The evidence is the data. In a courtroom, the person is assumed innocent until proven guilty. In a
hypothesis test, we assume the null hypothesis is true until the data proves otherwise.
The two possible verdicts are like the two conclusions that are possible in a hypothesis test.
Reject the null hypothesis: When we reject a null hypothesis, we accept the alternative
hypothesis. This is like a guilty verdict. The evidence is strong enough for the jury to reject the
assumption of innocence. In a hypothesis test, the data is strong enough for us to reject the
assumption that the null hypothesis is true.
Fail to reject the null hypothesis: When we fail to reject the null hypothesis, we are delivering
a “not guilty” verdict. The jury concludes that the evidence is not strong enough to reject the
assumption of innocence, so the evidence is too weak to support a guilty verdict. We conclude
the data is not strong enough to reject the null hypothesis, so the data is too weak to accept
the alternative hypothesis.
How does the courtroom analogy relate to type I and type II errors?
Type I error: The jury convicts an innocent person. By analogy, we reject a true null hypothesis
and accept a false alternative hypothesis.
Type II error: The jury says a person is not guilty when he or she really is. By analogy, we fail to
reject a null hypothesis that is false. In other words, we do not accept an alternative hypothesis
when it is true.
The p-value corresponds to the probability of observing sample data at least as extreme as the
obtained test statistic. Small p-values provide evidence against the null hypothesis. The smaller (closer to
0) the p-value, the stronger is the evidence against the null hypothesis. [12] A small P-value indicates that
it is unlikely that the actual sample data came from the population described by the null hypothesis. More
specifically, a small P-value says that there is only a small chance that we will randomly select a sample
with results at least as extreme as the data if H0 is true.
When the P-value is less than (or equal to) α, we also say that the difference between the actual
sample statistic and the assumed parameter value is statistically significant. A significant difference is an
observed difference that is too large to attribute to chance. In other words, it is a difference that is unlikely
when we consider sampling variability alone. If the difference is statistically significant, we reject H 0.
If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
When testing a hypothesis for the mean of a normal distribution, we follow a series of basic steps:
Writing Conclusion
We must be very careful in how we state the conclusion. There are only two possibilities.
We have enough evidence to reject the null hypothesis and support the alternative hypothesis.
The results of the sample data are significant. There is sufficient evidence to conclude that H 0 is an
incorrect belief and that the alternative hypothesis, Ha, may be correct.
We do not have enough evidence to reject the null hypothesis, so there is not enough evidence
to support the alternative hypothesis.
The results of the sample data are not significant. There is not sufficient evidence to conclude that
the alternative hypothesis may be correct.
Notice that these conclusion focuses on the alternative hypothesis. It does not say “the null
hypothesis is true.” We never accept the null hypothesis or state that it is true. When there is not enough
evidence to reject H0, the conclusion will say that “there is not enough evidence to support H a.”
But of course, we will state the conclusion in the specific context of the situation we are investigating.
This is a statistical test for the mean of a population. It can be used either when n≥ 30 or when the
population is normally distributed, and the population standard deviation is known.
If we reject the null hypothesis, we are saying that the difference between the observed sample
mean and the hypothesized population mean is too great to be attributed to chance. When we fail to
reject the null hypothesis, we are saying that the difference between the observed sample mean and the
hypothesized population mean is probable if the null hypothesis is true. Essentially, we are willing to
attribute this difference to sampling error.
𝑋−𝜇
𝑧=
𝜎⁄√𝑛
where:
𝑋 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
𝜇 = ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Table 12.3 summarizes the common critical values used in hypothesis testing of a single mean with
population standard deviation known.
Table 12. 3
Level of significance, α zα
0.10 1.28
0.05 1.645
0.025 1.96
0.01 2.33
0.005 2.58
Examples:
23. The school nurse was wondering if the average height of 7th graders has been increasing. Over the
last 5 years, the average height of a 7th grader was 145 cm with a standard deviation of 20 cm. The
school nurse takes a random sample of 200 students and finds that the average height this year is 147
cm. Conduct a single-tailed hypothesis test using a 0.05 significance level to evaluate the null and
alternative hypotheses.
Solution:
H0: μ = 145 cm
We use this alternative hypothesis because of the sample data where the average height
is 147 cm which is greater than 145 cm.
2. Choose an α level
3. Set the criterion (critical values) for rejecting the null hypothesis.
Fail to
α = 0.05
Reject H0
Reject H0
1.65
Figure 12. 6 Critical Region for Example 15
𝜎 = 20 𝑐𝑚
𝑋 = 147 𝑐𝑚
𝜇 = 145 𝑐𝑚
𝑛 = 200
Substituting we have,
Fail to α = 0.05
Reject H0
Reject H0
1.41 1.64
The test statistic is within the non-rejection region; thus, we fail to reject H 0.
The probability of obtaining a sample mean equal to 147 if the mean of the population is 145 is
likely to have been due to chance.
24. College A has an average SAT score of 1500. From a random sample of 125 freshman psychology
students we find the average SAT score to be 1450 with a standard deviation of 100. We want to know
if these freshman psychology students are representative of the overall population. What are our
hypotheses and the test statistic at 0.05 significance level?
Solution:
H0: μ = 1500 cm
We use this alternative hypothesis because we want to know if these freshman psychology
students are representative of the overall population.
2. Choose an α level
3. Set the criterion (critical values) for rejecting the null hypothesis.
Fail to
Reject H0
Reject H0 Reject H0
𝜎 = 100
𝑋 = 1450
𝜇 = 1500
𝑛 = 125
Substituting we have,
- 5.59
Based on this sample we believe that the mean is not equal to 1500.
25. A farmer is trying out a planting technique that he hopes will increase the yield on his pea plants. Over
the last 5 years the average number of pods on one of his pea plants was 145 pods with a standard
deviation of 100 pods. This year, after trying his new planting technique, he takes a random sample
of 144 of his plants and finds the average number of pods to be 147. He wonders whether this is a
statistically significant increase. What are his hypotheses and the test statistic using 0.10 level of
significance?
Solution:
H0: μ = 145
This alternative hypothesis is > since he believes that there might be a gain in the number
of pods.
2. Choose an α level
3. Set the criterion (critical values) for rejecting the null hypothesis.
Fail to
α = 0.10
Reject H0
Reject H0
1.28
planting technique, he takes a random sample of 144 of his plants and finds the average
number of pods to be 147.”, we have the following given,
𝜎 = 100
𝑋 = 147
𝜇 = 145
𝑛 = 144
Substituting we have,
Fail to
α = 0.10
Reject H0
Reject H0
0.24 1.28
The test statistic is within the non – rejection region; thus, we fail to reject H 0.
Back in the early 1900’s a chemist at a brewery in Ireland discovered that when he was working with
very small samples, the distributions of the mean differed significantly from the normal distribution. He
noticed that as his sample sizes changed, the shape of the distribution changed as well. He published his
results under the pseudonym ‘Student’ and this concept and the distributions for small sample sizes are
now known as “Student’s t−distribu ons.”
T−distribu ons are a family of distribu ons that, like the normal distribu on, are symmetrical and
bell-shaped and centered on a mean. However, the distribution shape changes as the sample size changes.
Therefore, there is a specific shape or distribution for every sample of a given size; each distribution has
a different value of k, the number of degrees of freedom, which is 1 less than the size of the sample.
As the number of observations gets larger, the t−distribu on approaches the shape of the normal
distribution. In general, once the sample size is large enough - usually about 30 - we would use the normal
distribution or the z−table instead. Note that usually in prac ce, if the standard deviation is known then
the normal distribution is used regardless of the sample size.
The t−distribu on can be used with any statistic having a bell-shaped distribution. The Central Limit
Theorem states the sampling distribution of a statistic will be close to normal with a large enough sample
size. As a rough estimate, the Central Limit Theorem predicts a roughly normal distribution under the
following conditions:
The t−distribu on also has some unique proper es. These proper es are:
𝑋−𝜇
𝑡=
𝑠⁄√𝑛
where:
Examples:
26. The high school athletic director is asked if football players are doing as well academically as the other
student athletes. We know from a previous study that the average GPA for the student athletes is
3.10. After an initiative to help improve the GPA of student athletes, the athletic director samples 20
football players and finds that the average GPA of the sample is 3.18 with a sample standard deviation
of 0.54. Is there a significant improvement? Use a 0.05 significance level.
Solution:
H0: μ = 3.10
2. Choose an α level
3. Set the criterion (critical values) for rejecting the null hypothesis.
Using the t−distribu on table in Appendix B to find our cri cal values.
For a two-tailed test with 19 degrees of freedom and a 0.05 level of significance, our critical
values are equal to ±2.093.
Fail to
Reject H0
Reject H0 Reject H0
-2.093 + 2.093
Based from the problem, “We know from a previous study that the average GPA for the student
athletes is 3.10. After an initiative to help improve the GPA of student athletes, the athletic
director samples 20 football players and finds that the average GPA of the sample is 3.18 with a
sample standard deviation of 0.54.”, we have the following given,
𝑠 = 0.54
𝑋 = 3.18
𝜇 = 3.10
𝑛 = 20
We use the student’s t distribution because of two reasons:
Substituting we have,
Fail to
Reject H0
Reject H0 Reject H0
The test statistic is within the non – rejection region; thus, we fail to reject H 0.
The difference between the sample mean and the hypothesized value is not sufficient to attribute it
to anything other than sampling error. Thus, the athletic director can conclude that the mean
academic performance of football players does not differ from the mean performance of other
student athletes.
27. Statistics students believe that the mean score on the first statistics test is 65. A statistics
instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains
the scores 65 65 70 67 66 63 63 68 72 71. He performs a hypothesis test using a 5% level of
significance. The data are assumed to be from a normal distribution.
Solution:
H0: μ = 65
We use this alternative hypothesis since the instructor thinks the average score is higher,
use a “>”. The “>” means the test is right tailed.
2. Choose an α level
3. Set the criterion (critical values) for rejecting the null hypothesis.
Using the t−distribu on table in Appendix B to find our cri cal values.
For a one-tailed test with 9 degrees of freedom and a 0.05 level of significance, our critical
values are equal to 1.833. We use the positive value because we are using greater than.
Fail to α = 0.05
Reject H0
Reject H0
1.833
Based on the problem, “…believe that the mean score on the first statistics test is 65.”, we have
the following given,
𝜇 = 65
To find the sample mean and sample standard deviation we may use either the long method or
calculator method which will give us,
𝑠 = 3.1967
𝑋 = 67
𝑛 = 10
If you read the problem carefully, you will notice that there is no population standard deviation
given. You are only given n = 10 sample data values. Notice also that the data come from a
normal distribution. This means that the distribution for the test is a student’s t. Substituting we
have,
𝑋−𝜇 67 − 65
𝑡= = = 1.9764
𝑠⁄√𝑛 3.20/√10
5. Make a decision (reject or fail to reject the null hypothesis)
Fail to
Reject H0
Reject H0 Reject H0
1.833 1.98
At a 5% level of significance, the sample data show sufficient evidence that the mean
(average) test score is more than 65, just as the math instructor thinks.
Often statisticians are interest in making inferences about a population proportion. For example,
when we look at election results, we often look at the proportion of people that vote and who this
proportion of voters choose. Typically, we call these proportions percentages and we would say
something like “Approximately 68 percent of the population voted in this election and 48 percent of
these voters voted for candidate A.”
So how do we test hypotheses about proportions? We use the same process as we did when testing
hypotheses about populations, but we must include sample proportions as part of the analysis.
To determine the test statistic, we need to know the sampling distribution of the sample
proportion. We use the binomial distribution which illustrates situations in which two outcomes are
possible, remembering that when the sample size is relatively large, we can use the normal distribution
to approximate the binomial distribution. The test statistic is
𝑝̂ − 𝑝
𝑧=
𝑝𝑞/𝑛
Where:
𝑝̂ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 =
p = population proportion
n = sample size
q=1–p
The null hypothesis is always a statement of equality. The alternative hypothesis is always a
statement of inequality, using <, >, or ≠. So, hypotheses take the form:
H0: p = p0
1. np ≥ 10
2. n(1 − p) ≥ 10.
Examples:
28. According to the Government Accountability Office, 80% of all college students (ages 18 to 23) had
health insurance in 2006. The Patient Protection and Affordable Care Act of 2010 allowed young
people under age 26 to stay on their parents’ health insurance policy. Has the proportion of college
students (ages 18 to 23) who have health insurance increased since 2006? A survey of 800 randomly
selected college students (ages 18 to 23) indicated that 83% of them had health insurance. Use a 0.05
level of significance.
Solution:
H0: p = 0.80
where p is the proportion of college students ages 18 to 23 who have health insurance now.
2. Choose an α level.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
First note that the data are from a random sample. That is essential. Now we need to determine
if a normal model is a good fit for the sampling distribution. Since we assume that the null
hypothesis is true, we build the sampling distribution with the assumption that 0.80 is the
population proportion. We check the following conditions, using 0.80 for p:
np = (800)(0.80)=640
n(1-p) = (800)(1-0.80)=160
Because these are both more than 10, we can use the normal model to find the P-value.
The p – value is the area of the shaded region to the right of the test statistic because the
alternative hypothesis is a “greater-than” statement. To find the p – value, we use the standard
normal distribution table in Appendix A. Because we are looking for an area to the right, we use
Procedure B, which will give us the area of
The data from this study provides strong evidence that the percentage of all college
students who have health insurance is now greater than 80% (P-value = 0.017). The 3% increase
in the percentage who have health insurance since 2008 is statistically significant at the 5% level.
29. According to the Kaiser Family Foundation, 84% of U.S. children ages 8 to 18 had Internet access at
home as of August 2009. Researchers wonder if this percentage has changed since then. They survey
500 randomly selected children (ages 8 to 18) and find that 430 of them have Internet access at home.
Use a level of significance of α = 0.05 for this hypothesis test.
Solution:
H0: p = 0.84
where p is the proportion of children ages 8 to 18 with Internet access at home now
2. Choose an α level.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.
Our sample is random, so there is no problem there. Again, we want to determine whether
the normal model is a good fit for the sampling distribution of sample proportions. Based on the
null hypothesis, we will use 0.84 as our population proportion to check the conditions.
np = (500)(0.84) = 420
n(1-p) = (500)(1-0.84) = 80
Because these are both more than 10, we can use the normal model to find the P-value.
To calculate for the test statistic, we first calculate the sample proportion.
𝑥 430
𝑝̂ = = = 0.86
𝑛 500
Substituting,
𝑝̂ − 𝑝 0.86 − 0.84
𝑧= = = 1.22
𝑝𝑞/𝑛 (0.84)(0.16)
500
Because the normal distribution is symmetrical, we can simply consider one side of the curve
to find the area.
To find the p – value, we use the standard normal distribution table in Appendix A. Because
we are looking for an area to the right, we use Procedure B, which will give us the area of
extreme as the difference we see in the data. We want to determine the probability that the
difference in either direction (above or below 0.84) is at least as large as the difference seen in
the data, so we include sample proportions at or above 0.86 and sample proportions at or below
0.82. For this reason, we look at the area in both tails, so we have to double this area.
We do not have enough evidence to reject the null hypothesis. A sample result that could
occur 22% of the time by chance alone is not statistically significant.
The data from this study does not provide evidence that is strong enough to conclude that
the proportion of all children ages 8 to 18 who have Internet access at home has changed since
2009 (P-value = 0.22). The 2% change observed in the data is not statistically significant. These
results can be explained by predictable variation in random samples.
VIII. Summary
In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of
claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a
population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with H0.
The null is not rejected unless the hypothesis test shows otherwise. The null statement must
always contain some form of equality (=, ≤ or ≥) Always write the alternative hypothesis,
typically denoted with Ha or H1, using less than, greater than, or not equals symbols, i.e., (≠, >,
or <). If we reject the null hypothesis, then we can assume there is enough evidence to support
the alternative hypothesis. Never state that a claim is proven true or false. Keep in mind the
underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only
in terms of non-absolute certainties.
Here are some general observations about null and alternative hypotheses.
The hypotheses are competing claims about the parameter or about the comparison of
parameters.
Both hypotheses are statements about the same population parameter or same two
population parameters.
The null hypothesis contains an equal sign.
The alternative hypothesis is always an inequality statement. It contains a “less than” or
a “greater than” or a “not equal to” symbol.
In a statistical investigation, we determine the research question, and thus the
hypotheses, before we collect data.
In every hypothesis test, the outcomes are dependent on a correct interpretation of the data.
Incorrect calculations or misunderstood summary statistics can yield errors that affect the
results. A Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when
a false null hypothesis is not rejected. The probabilities of these errors are denoted by the Greek
letters α and β, for a Type I and a Type II error respectively. The power of the test, 1 – β,
quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis
being accepted. A high power is desirable.
We establish critical regions based on level of significance or alpha (α) level. If the value of the
test statistic falls in these critical regions, we make the decision to reject the null hypothesis.
To evaluate the sample mean against the hypothesized population mean, we use the concept
of z−scores to determine how different the two means are.
A test of significance is done when a claim is made about the value of a population parameter.
The test can only be conducted if the random sample taken from the population came from a
distribution that is normal or approximately normal. When you use s to estimate σ, you must
use t instead of z to complete the significance test for a mean.
Remember that the P-value is the probability of seeing a sample mean at least as extreme as the
one from the data if the null hypothesis is true. The probability is about the random sample; it is
not a “chance” statement about the null or alternative hypothesis.
In statistics, we also make inferences about proportions of a population. We use the same
process as in testing hypotheses about populations but we must include hypotheses about
proportions and the proportions of the sample in the analysis.
Concepts in Statistics. Provided by: Open Learning Initiative. Located at: http://oli.cmu.edu. License: CC
BY: Attribution
Introductory Statistics . Authored by: Barbara Illowski, Susan Dean. Provided by: Open Stax. Located
at: http://cnx.org/contents/30189442-6998-4686-ac05-ed152b91b9de@17.44. License: CC BY:
Attribution. License Terms: Download for free at http://cnx.org/contents/30189442-6998-4686-ac05-
ed152b91b9de@17.44
X. Exercises
Solve the following problems using the steps in conducting hypothesis tests.
1. A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it needs to
be replaced. From past studies of this tire, the standard deviation is known to be 8,000. A survey of
owners of that tire design is conducted. From the 28 tires surveyed, the mean lifespan was 46,500
miles with a standard deviation of 9,800 miles. Using alpha = 0.05, is the data highly inconsistent with
the claim?
2. From generation to generation, the mean age when smokers first start to smoke varies. However, the
standard deviation of that age remains constant of around 2.1 years. A survey of 40 smokers of this
generation was done to see if the mean starting age is at least 19. The sample mean was 18.1 with a
sample standard deviation of 1.3. Do the data support the claim at the 5% level?
3. The cost of a daily newspaper varies from city to city. However, the variation among prices remains
steady with a standard deviation of 20¢. A study was done to test the claim that the mean cost of a
daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard deviation of 18¢. Do
the data support the claim at the 1% level?
4. An article in the San Jose Mercury News stated that students in the California state university system
take 4.5 years, on average, to finish their undergraduate degrees. Suppose you believe that the mean
time is longer. You conduct a survey of 49 students and obtain a sample mean of 5.1 with a sample
standard deviation of 1.2. Do the data support your claim at the 1% level?
5. In 1955, Life Magazine reported that the 25 year-old mother of three worked, on average, an 80 hour
week. Recently, many groups have been studying whether or not the women’s movement has, in fact,
resulted in an increase in the average work week for women (combining employment and at-home
work). Suppose a study was done to determine if the mean work week has increased. 81 women were
surveyed with the following results. The sample mean was 83; the sample standard deviation was ten.
Does it appear that the mean work week has increased for women at the 5% level?
6. Your statistics instructor claims that 60 percent of the students who take her Elementary Statistics
class go through life feeling more enriched. For some reason that she can’t quite figure out, most
people don’t believe her. You decide to check this out on your own. You randomly survey 64 of her
past Elementary Statistics students and find that 34 feel more enriched as a result of her class. Now,
what do you think?
7. A Nissan Motor Corporation advertisement read, “The average man’s I.Q. is 107. The average brown
trout’s I.Q. is 4. So why can’t man catch brown trout?” Suppose you believe that the brown trout’s
mean I.Q. is greater than four. You catch 12 brown trout. A fish psychologist determines the I.Q.s as
follows: 5; 4; 7; 3; 6; 4; 5; 3; 6; 3; 8; 5. Conduct a hypothesis test of your belief.
8. A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students
attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her
students and finds that 11 of them attended the midnight showing. At a 1% level of significance, what
is an appropriate conclusion?
9. Over the past few decades, public health officials have examined the link between weight concerns
and teen girls’ smoking. Researchers surveyed a group of 273 randomly selected teen girls living in
Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-
three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen
girls smoke to stay thin?
10. According to the Center for Disease Control website, in 2011 at least 18% of high school students have
smoked a cigarette. An Introduction to Statistics class in Davies County, KY conducted a hypothesis
test at the local high school (a medium sized–approximately 1,200 students–small city demographic)
to determine if the local high school’s percentage was lower. One hundred fifty students were chosen
at random and surveyed. Of the 150 students surveyed, 82 have smoked. Use a significance level of
0.05 and using appropriate statistical evidence, conduct a hypothesis test and state the conclusions.
I. Introduction
Have you ever wondered if lottery numbers were evenly distributed or if some numbers occurred
with a greater frequency? How about if the types of movies people preferred were different across
different age groups? What about if a coffee machine was dispensing approximately the same amount
of coffee each time? You could answer these questions by conducting a hypothesis test.
You will now study a new distribution, one that is used to determine the answers to such questions.
This distribution is called the chi-square distribution.
In this chapter, you will learn the three major applications of the chi-square distribution:
1. the goodness-of-fit test, which determines if data fit a particular distribution, such as in
the lottery example
2. the test of independence, which determines if events are independent, such as in the
movie example
3. the test of a single variance, which tests variability, such as in the coffee example
II. Objectives
Figure 13. 1 The chi-square distribution can be used to find relationships between two things, like grocery prices at different
stores. (credit: Pete/flickr)
In previous lessons, we learned that there are several different tests that we can use to analyze data
and test hypotheses. The type of test that we choose depends on the data available and what question
we are trying to answer. We analyze simple descriptive statistics, such as the mean, median, mode, and
standard deviation to give us an idea of the distribution and to remove outliers, if necessary. We calculate
probabilities to determine the likelihood of something happening.
However, there is another test that we have yet to cover. To analyze patterns between distinct
categories, such as genders, political candidates, locations, or preferences, we use the chi-square test.
This test is used when estimating how closely a sample matches the expected distribution (also known as
the goodness-of-fit test) and when estimating if two random variables are independent of one another
(also known as the test of independence).
Random Variable: X2
The random variable for a chi-square distribution with k degrees of freedom is the sum of k
independent, squared standard normal variables.
𝑋 = (𝑍 ) + (𝑍 ) + ⋯ + (𝑍 )
The distribution curve of a chi – square distribution has the following characteristics:
3. The test statistic for any test is always greater than or equal to zero.
4. When df > 90, the chi-square curve approximates the normal distribution.
The goodness-of-fit test compares the observed values of a categorical variable with the expected
values of that same variable. The value that indicates the comparison between the observed and
expected frequency is called the chi-square statistic.
If the observed frequency is close to the expected frequency, then the chi-square statistic will be
small. If there is a substantial difference between the two frequencies, then we would expect the chi-
square statistic to be large.
(𝑂 − 𝐸)
𝑋 =
𝐸
where:
The observed values are the data values and the expected values are the values you would expect to get
if the null hypothesis were true. We compare the value of the test statistic to a tabled chi-square value
(Appendix C) to determine the probability that a sample fits an expected pattern.
There are many situations that use the goodness-of-fit test, including surveys, taste tests, and
analysis of behaviors. Interestingly, goodness-of-fit tests are also used in casinos to determine if there is
cheating in games of chance, such as cards or dice. For example, if a certain card or number on a die shows
up more than expected (a high observed frequency compared to the expected frequency), officials use
the goodness-of-fit test to determine the likelihood that the player may be cheating or that the game may
not be fair.
The goodness-of-fit test is almost always right-tailed. If the observed values and the corresponding
expected values are not close to each other, then the test statistic can get very large and will be way out
in the right tail of the chi-square curve.
A chi-square model is a good fit for the distribution of the chi-square test statistic only if the
following conditions are met:
Examples:
1. Employers want to know which days of the week employees are absent in a five-day work week. Most
employers would like to believe that employees are absent equally during the week. Suppose a
random sample of 60 managers were asked on which day of the week they had the highest number
of employee absences. The results were distributed as in the table below. For the population of
employees, do the days for the highest number of absences occur with equal frequencies during a
five-day work week? Test at a 5% significance level.
Table 13. 1 Day of the Week Employees were Most Absent
Number of Absences 15 12 9 9 15
Solution:
H0: The absent days occur with equal frequencies, that is, they fit a uniform distribution.
Ha: The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
df = c – 1 = 5 – 1 = 4
If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample:
15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 12
on Thursday, and 12 on Friday. These numbers are the expected (E) values. The values in the table are
the observed (O) values or data.
Make a chart with the following headings and fill in the columns:
O E (O-E)2 (O-E)2/E
Monday 15 12 (15-12)2 = 9 9/12
2
Tuesday 12 12 (12-12) = 0 0/12
Wednesday 9 12 (9-12)2 = 9 9/12
2
Thursday 9 12 (9-12) = 9 9/12
2
Friday 15 12 (15-12) = 9 9/12
Total: 3
(𝑂 − 𝐸)
𝑋 = =3
𝐸
The calculated test statistic is within the non – rejection region, thus, we fail to reject H 0.
At a 5% level of significance, from the sample data, there is not enough evidence to conclude that
the absent days do not occur with equal frequencies.
2. One study indicates that the number of televisions that American families have is distributed (this is
the given distribution for the American population) as in Column 2 of Table 13.2. A random sample of
600 families in the far western United States resulted in the data in Table 13.2.
At the 1% significance level, does it appear that the distribution “number of televisions” of far
western United States families is different from the distribution for the American population as a whole?
Solution:
Ha: The “number of televisions” distribution of far western United States families is different from
the “number of televisions” distribution of the American population.
α = 0.01
3. Set the criterion (critical values) for rejecting the null hypothesis.
df = c – 1 = 5 – 1 = 4
This problem asks you to test whether the far western United States families distribution fits the
distribution of the American families. This test is always right tailed.
Column 2 contains expected (E) percent because it is the “given” value. It contains expected
percentages. To get expected (E) frequencies, multiply the percentage by 600.
Number of
Percent E O (O-E)2 (O-E)2/E
Televisions
0 10 (0.10)(600) = 60 66 (66-60)2 = 36 36/60
1 16 (0.16)(600) = 96 119 (119-96)2 = 529 529/96
2
2 55 (0.55)(600) = 330 340 (340-330) = 100 100/330
3 11 (0.11)(600) = 66 60 (60-66)2 = 36 36/66
2
4+ 8 (0.08)(600) = 48 15 (15-48) = 1089 1089/48
Total: 29.646
(𝑂 − 𝐸)
𝑋 = = 29.646
𝐸
The calculated test statistic is within the rejection region; thus, we reject H0. This means you
reject the belief that the distribution for the far western states is the same as that of the American
population.
At the 1% significance level, from the data, there is sufficient evidence to conclude that the “number
of televisions” distribution for the far western United States is different from the “number of
televisions” distribution for the American population.
V. Test of Independence
The chi-square test of independence is used to assess if two factors are related. This test is often used
in social science research to determine if factors are independent of each other. For example, we would
use this test to determine relationships between voting patterns and race, income and gender, and
behavior and education. This test determines if there is a relationship between two categorical variables
in the population. It is called a test of independence because “no relationship” means “independent.” If
there is a relationship between the two variables in the population, then they are dependent.
In general, when running the test of independence, we ask, “Is Variable X independent of Variable Y?”
It is important to note that this test does not test how the variables are related, just simply whether or not
they are independent of one another. For example, while the test of independence can help us determine
if income and gender are independent, it cannot help us assess how one category might affect the other.
When running the test of independence, we use similar steps as when running the goodness-of-fit
test described earlier. Our hypotheses can be stated as follows:
H0: There is no statistically significant difference between the observed and expected frequencies.
The two variables (factors) are independent.
Ha: There is a statistically significant difference between the observed and expected frequencies.
The two variables (factors) are dependent.
Contingency tables can help us frame our hypotheses and solve problems. Often, we use contingency
tables to list the variables and observational patterns that will help us to run a chi-square test. For
example, we could use a contingency table to record the answers to phone surveys or observed behavioral
patterns.
Like the chi-square goodness-of-fit test, the test of independence is a comparison of the differences
between observed and expected values. However, in this test, we need to calculate the expected value
using the row and column totals from the table. The expected value for each of the potential outcomes in
the table can be calculated using the following formula:
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝐸=
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑠𝑢𝑟𝑣𝑒𝑦𝑒𝑑
The test of independence is always right tailed because of the calculation of the test statistic. If the
expected and observed values are not close together, then the test statistic is very large and way out in
the right tail of the chi-square curve, as it is in a goodness-of-fit.
The test statistic for a test of independence is like that of a goodness-of-fit test:
(𝑂 − 𝐸)
𝑋 =
𝐸
The conditions for use of the chi-square distribution are the same as we learned previously:
Examples:
3. In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time
with a disabled senior citizen. The program recruits among community college students, four-year
college students, and nonstudents. Table 13.3 is a sample of the adult volunteers and the number of
hours they volunteer per week. Is the number of hours volunteered independent of the type of
volunteer?
Table 13. 3 Number of Hours Worked Per Week by Volunteer Type (Observed)
Community College
111 96 48 255
Students
Four-Year College
96 133 61 290
Students
Nonstudents 91 150 53 294
Solution:
The observed table and the question at the end of the problem, “Is the number of hours volunteered
independent of the type of volunteer?” tell you this is a test of independence. The two factors are
number of hours volunteered and type of volunteer. This test is always right tailed.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The contingency table for the expected results is summarized in Table 13.4. For example, the
calculation for the expected frequency for the top left cell is
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 255(298)
𝐸= = = 90.57
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑠𝑢𝑟𝑣𝑒𝑦𝑒𝑑 839
Table 13. 4 Number of Hours Worked Per Week by Volunteer Type (Expected)
To find the test statistic, we need to combine Table 13.3 and Table 13.4:
(𝑂 − 𝐸)
𝑋 = = 6.809 + 5.452 + 0.728 = 12.989
𝐸
∙
The calculated test statistic is within the rejection region; thus, we reject H 0.
This means that the factors are not independent. At a 5% level of significance, from the data,
there is sufficient evidence to conclude that the number of hours volunteered, and the type of
volunteer are dependent on one another.
4. De Anza College is interested in the relationship between anxiety level and the need to succeed in
school. A random sample of 400 students took a test that measured anxiety level and need to succeed
in school. This table shows the results. De Anza College wants to know if anxiety level and need to
succeed in school are independent events.
Need to
High Med-high Medium Med-low Low Row
Succeed in
Anxiety Anxiety Anxiety Anxiety Anxiety Total
School
High Need 35 42 53 15 10 155
Medium Need 18 48 63 33 31 193
Low Need 4 5 11 15 17 52
Column Total 57 95 127 63 58 400
Solution:
The observed table and the statement at the end of the problem, “De Anza College wants to know
if anxiety level and need to succeed in school are independent events.” tell you this is a test of
independence. The two factors are anxiety level and need to succeed in school. This test is always
right tailed.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The contingency table for the expected results is summarized in Table 13.6. For example, the
calculation for the expected frequency for the top left cell is
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 155(57)
𝐸= = = 22.09
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑠𝑢𝑟𝑣𝑒𝑦𝑒𝑑 400
To find the test statistic, we need to combine Table 13.5 and Table 13.6:
MLA (O)
MHA (O)
(O-E)2/E
(O-E)2/E
(O-E)2/E
(O-E)2/E
(O-E)2/E
MHA (E)
MLA (E)
MA (O)
MA (E)
LA (O)
HA (O)
HA (E)
LA (E)
HN 35.00 22.09 7.55 42.00 36.81 0.73 53.00 49.21 0.29 15.00 24.41 3.63 10.00 22.48 6.92
MN 18.00 27.50 3.28 48.00 45.84 0.10 63.00 61.28 0.05 33.00 30.40 0.22 31.00 27.99 0.32
LN 4.00 7.41 1.57 5.00 12.35 4.37 11.00 16.51 1.84 15.00 8.19 5.66 17.00 7.54 11.87
12.40 5.21 2.18 9.51 19.12
(𝑂 − 𝐸)
𝑋 = = 12.40 + 5.21 + 2.18 + 9.51 + 19.12 = 48.42
𝐸
∙
15.507 48.42
Figure 13. 11 Location of the Test Statistic
The calculated test statistic is within the rejection region; thus, we reject H 0.
This means that the factors are not independent. At a 5% level of significance, from the data,
there is sufficient evidence to conclude that the anxiety level and the need to succeed in school are
dependent on one another.
The chi-square goodness-of-fit test and the test of independence are two ways to examine the
relationships between categorical variables. To determine whether or not the assignment of categorical
variables is random (that is, to examine the randomness of a sample), we perform the test of
homogeneity. In other words, the test of homogeneity tests whether samples from populations have the
same proportion of observations with a common characteristic.
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it
will not suffice to decide whether two populations follow the same unknown distribution. A different test,
called the test for homogeneity, can be used to draw a conclusion about whether two populations have
the same distribution. We use the test of homogeneity if the response variable has two or more categories
and we wish to compare two or more populations or subgroups. To calculate the test statistic for a test
for homogeneity, follow the same procedure as with the test of independence.
Hypotheses
Note: Homogeneous means the same in structure or composition. This test gets its name from the null
hypothesis, where we claim that the distribution of the responses is the same (homogeneous) across
groups.
To test our hypotheses, we select a random sample from each population and gather data on one
categorical variable. As with all chi-square tests, the expected counts reflect the null hypothesis. We must
determine what we expect to see in each sample if the distributions are identical. As before, the chi-
square test statistic measures the amount that the observed counts in the samples deviate from the
expected counts.
Test Statistic
(𝑂 − 𝐸)
𝑋 =
𝐸
df = number of columns – 1
Requirements
Common Uses
Comparing two populations. For example: men vs. women, before vs. after, east vs. west. The
variable is categorical with more than two possible response values.
Examples:
5. Do male and female college students have the same distribution of living arrangements? Use a level
of significance of 0.05. Suppose that 250 randomly selected male college students and 300 randomly
selected female college students were asked about their living arrangements: dormitory, apartment,
with parents, other. The results are shown Table 13.7. Do male and female college students have the
same distribution of living arrangements?
Table 13. 7 Distribution of Living Arrangements for College Males and College Females
Males 72 84 49 45
Females 91 86 88 35
Solution:
The observed table and the question at the end of the problem, “Do male and female college students
have the same distribution of living arrangements?” tell you this is a test of homogeneity. This test is
always right tailed.
H0: The distribution of living arrangements for male college students is the same as the
distribution of living arrangements for female college students.
Ha: The distribution of living arrangements for male college students is not the same as the
distribution of living arrangements for female college students.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
Males 72 84 49 45 250
Females 91 86 88 35 300
The contingency table for the expected results is summarized in Table 13.8. For example, the
calculation for the expected frequency for the top left cell is
Table 13. 8 Distribution of Living Arrangements for College Males and College Females (expected)
With
Dormitory Apartment Other
Parents
Males 74.09 77.27 62.27 36.36
Females 88.91 92.73 74.73 43.64
To find the test statistic, we need to combine Table 13.7 and Table 13.8:
(𝑂 − 𝐸)
𝑋 = = 0.11 + 1.07 + 5.19 + 3.76 = 10.13
𝐸
7.815 10.13
Figure 13. 13 Location of the Test Statistic
The calculated test statistic is within the rejection region; thus, we reject H 0.
Note: Notice that the conclusion is only that the distributions are not the same. We cannot use the test
for homogeneity to draw any conclusions about how they differ.
6. Both before and after a recent earthquake, surveys were conducted asking voters which of the three
candidates they planned on voting for in the upcoming city council election. Has there been a change
since the earthquake? Use a level of significance of 0.05. Table 13.9 shows the results of the survey.
Has there been a change in the distribution of voter preferences since the earthquake?
Table 13. 9 Voters Preference since the Earthquake
The observed table and the question at the end of the problem, “Has there been a change in the
distribution of voter preferences since the earthquake?” tell you this is a test of homogeneity.
H0: The distribution of voter preferences was the same before and after the
earthquake.
Ha: The distribution of voter preferences was not the same before and after the
earthquake.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The contingency table for the expected results is summarized in Table 13.10. For example, the
calculation for the expected frequency for the top left cell is
(𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙)(𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙) 430(381)
𝐸= = = 74.09
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑠𝑢𝑟𝑣𝑒𝑦𝑒𝑑 1066
Table 13. 10 Distribution of Living Arrangements for College Males and College Females (expected)
To find the test statistic, we need to combine Table 13.7 and Table 13.8:
(𝑂 − 𝐸)
𝑋 = = 1.93 + 0.12 + 1.20 = 3.25
𝐸
3.25 5.991
The calculated test statistic is within the non - rejection region; thus, we fail to reject H 0.
At a 5% level of significance, from the data, there is insufficient evidence to conclude that the
distribution of voter preferences was not the same before and after the earthquake.
In the previous sections, we learned how the chi-square test can help us assess the relationships between
two variables. In addition to assessing these relationships, the chi-square test can also help us test
hypotheses surrounding variance, which is the measure of the variation, or scattering, of scores in a
distribution. There are several different tests that we can use to assess the variance of a sample. The most
common test used to assess variance is the chi-square test for one variance.
Suppose that we want to test two samples to determine if they belong to the same population. The test
of variance between samples is used quite frequently in the manufacturing of food, parts, and
medications, since it is necessary for individual products of each of these types to be very similar in size
and chemical make-up. In almost all production processes quality is measured not only by how closely
the machine matches the target, but also the variability of the process. If one were filling bags with potato
chips not only would there be interest in the average weight of the bag, but also how much variation there
was in the weights. No one wants to be assured that the average weight is accurate when their bag has
no chips. Electricity voltage may meet some average level, but great variability, spikes, can cause serious
damage to electrical machines, especially computers. In short, statistical tests concerning the variance of
a distribution have great value and many applications.
To perform the test for one variance using the chi-square distribution, we need several pieces of
information.
A test of a single variance assumes that the underlying distribution is normal. The null and alternative
hypotheses are stated in terms of the population variance. The test statistic is:
2 (𝑛 − 1)𝑠
𝑋 =
𝑐 𝜎
where:
We want to test the hypothesis that the sample comes from a population with a variance greater than the
observed variance. You may think of s as the random variable in this test. The number of degrees of
freedom is df = n – 1. A test of a single variance may be right-tailed, left-tailed, or two-tailed. The null and
alternative hypotheses contain statements about the population variance. The null and alternative
hypothesis are:
2
𝐻 :𝜎 = 𝜎
0
2
𝐻 : 𝜎 ≠ 𝜎 → 𝑡𝑤𝑜 𝑡𝑎𝑖𝑙𝑒𝑑
0
2
𝐻 : 𝜎 > 𝜎 → 𝑟𝑖𝑔ℎ𝑡 𝑡𝑎𝑖𝑙𝑒𝑑
0
2
𝐻 : 𝜎 < 𝜎 → 𝑙𝑒𝑓𝑡 𝑡𝑎𝑖𝑙𝑒𝑑
0
Examples:
7. With individual lines at its various windows, a post office finds that the standard deviation for waiting
times for customers on Friday afternoon is 7.2 minutes. The post office experiments with a single,
main waiting line and finds that for a random sample of 25 customers, the waiting times for customers
have a standard deviation of 3.5 minutes on a Friday afternoon. With a significance level of 5%, test
the claim that a single line causes lower variation among waiting times for customers.
Solution:
Since the claim is that a single line causes less variation, this is a test of a single variance. The
parameter is the population variance, σ2.
Note: The hypothesis must be in terms of the variance thus we need to get the square the standard
deviation.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
df = n – 1 = 25 – 1 = 24
Since we need the area to the left, we subtract α to 1 which will give us 0.95. Locating this on
the chi – square distribution table,
The calculated test statistic is within the rejection region; thus, we reject H 0. This means that
you reject σ2 = 51.84. In other words, we do not think the variation in waiting times is 7.2 minutes;
you think the variation in waiting times is less.
At a 5% level of significance, from the data, there is sufficient evidence to conclude that a single
line causes a lower variation among the waiting times or with a single line, the customer waiting
times vary less than 7.2 minutes.
8. Professor Hadley has a weakness for cream filled donuts, but he believes that some bakeries are not
properly filling the donuts. A sample of 24 donuts reveals a mean amount of filling equal to 0.04 cups,
and the sample standard deviation is 0.11 cups. Professor Hadley has an interest in the average
quantity of filling, of course, but he is particularly distressed if one donut is radically different from
another. Professor Hadley does not like surprises. Test at 5% level of significance the null hypothesis
that the population variance of donut filling is significantly different from the average amount of
filling.
Solution:
Since we are to test the population variance of the donut filling, this is a test of a single variance.
The parameter is the population variance, σ2.
H0: σ2 = 0.04
Ha: σ2 ≠ 0.04
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The test is set up as a two-tailed test because Professor Hadley has shown concern with any
significant difference, whether too much variation in filling as well as too little.
df = n – 1 = 24 – 1 = 23
Since we are dealing with a two tailed test, we will be locating 2 critical values.
For the right tail value, simply divide the level of significance, α/2 = 0.025
The calculated test statistic is within the rejection region; thus, we reject H0. In other words, we
think the variation in donut filling is not 0.04.
At a 5% level of significance, from the data, there is enough evidence to conclude that the
population variance of donut filling is significantly different from the average amount of filling.
VIII. Summary
The chi-square distribution is a useful tool for assessment in a series of problem categories.
These problem categories include primarily (i) whether a data set fits a particular distribution,
(ii) whether the distributions of two populations are the same, (iii) whether two events might be
independent, and (iv) whether there is a different variability than expected within a population.
An important parameter in a chi-square distribution is the degrees of freedom df in a given
problem. The random variable in the chi-square distribution is the sum of squares of df standard
normal variables, which must be independent. The key characteristics of the chi-square
distribution also depend directly on the degrees of freedom.
The chi-square distribution curve is skewed to the right, and its shape depends on the degrees
of freedom df. For df > 90, the curve approximates the normal distribution. Test statistics based
on the chi-square distribution are always greater than or equal to zero. Such application tests
are almost always right-tailed tests.
You have seen the χ2 test statistic used in three different circumstances. The following bulleted
list is a summary that will help you decide which χ2 test is the appropriate one to use.
o Goodness-of-Fit: Use the goodness-of-fit test to decide whether a population with an
unknown distribution “fits” a known distribution. In this case there will be a single
qualitative survey question or a single outcome of an experiment from a single
population. Goodness-of-Fit is typically used to see if the population is uniform (all
outcomes occur with equal frequency), the population is normal, or the population is
the same as another population with a known distribution. The null and alternative
hypotheses are:
H0: The population fits the given distribution.
Ha: The population does not fit the given distribution.
o Independence: Use the test for independence to decide whether two variables (factors)
are independent or dependent. In this case there will be two qualitative survey
questions or experiments and a contingency table will be constructed. The goal is to see
if the two variables are unrelated (independent) or related (dependent). The null and
alternative hypotheses are:
H0: The two variables (factors) are independent.
Ha: The two variables (factors) are dependent.
o Homogeneity: Use the test for homogeneity to decide if two populations with unknown
distributions have the same distribution as each other. In this case there will be a single
qualitative survey question or experiment given to two different populations. The null
and alternative hypotheses are:
H0: The two populations follow the same distribution.
Ha: The two populations have different distributions.
To test variability, use the chi-square test of a single variance. The test may be left-, right-, or
two-tailed, and its hypotheses are always expressed in terms of the variance (or standard
deviation).
Introductory Statistics . Authored by: Barbara Illowski, Susan Dean. Provided by: Open Stax. Located
at: http://cnx.org/contents/30189442-6998-4686-ac05-ed152b91b9de@17.44. License: CC BY:
Attribution. License Terms: Download for free at http://cnx.org/contents/30189442-6998-4686-ac05-
ed152b91b9de@17.44
X. Exercises
1. A teacher predicts that the distribution of grades on the final exam will be and they are
recorded in the table. At the 5% significance level, what can you conclude?
Actual
Grade Proportion
Frequency
A 0.25 7
B 0.30 7
C 0.35 5
D 0.10 1
2. Conduct a goodness-of-fit test to determine if the actual college majors of graduating males fit
the distribution of their expected majors.
3. Transit Railroads is interested in the relationship between travel distance and the ticket class
purchased. A random sample of 200 passengers is taken. The table below shows the results. The
railroad wants to know if a passenger’s choice in ticket class is independent of the distance they
must travel. Use 5% level of significance.
1–100 miles 21 14 6 41
101–200 miles 18 16 8 42
201–300 miles 16 17 15 48
301–400 miles 12 14 21 47
401–500 miles 6 6 10 22
Total 73 67 60 200
4. A recent debate about where in the United States skiers believe the skiing is best prompted the
following survey. Test to see if the best ski area is independent of the level of the skier.
5. Car manufacturers are interested in whether there is a relationship between the size of car an
individual drive and the number of people in the driver’s family (that is, whether car size and
family size are independent). To test this, suppose that 800 car owners were randomly surveyed
with the results in the table. Conduct a test of independence.
Family Size Sub & Compact Mid-size Full-size Van & Truck
1 20 35 40 35
2 20 50 70 80
3–4 20 50 100 90
5+ 20 30 70 70
Men 47 35 28 53
Women 65 59 55 60
8. Suppose an airline claims that its flights are consistently on time with an average delay of at
most 15 minutes. It claims that the average delay is so consistent that the variance is no more
than 150 minutes. Doubting the consistency part of the claim, a disgruntled traveler calculates
the delays for his next 25 flights. The average delay for those 25 flights is 22 minutes with a
standard deviation of 15 minutes. Test this claim using 0.10 level of significance.
9. A plant manager is concerned her equipment may need recalibrating. It seems that the actual
weight of the 15 oz. cereal boxes it fills has been fluctuating. The standard deviation should be
at most 0.5 oz. In order to determine if the machine needs to be recalibrated, 84 randomly
selected boxes of cereal from the next day’s production were weighed. The standard deviation
of the 84 boxes was 0.54. Does the machine need to be recalibrated?
10. Airline companies are interested in the consistency of the number of babies on each flight, so
that they have adequate safety equipment. They are also interested in the variation of the
number of babies. Suppose that an airline executive believes the average number of babies on
flights is six with a variance of nine at most. The airline conducts a survey. The results of the 18
flights surveyed give a sample average of 6.4 with a sample standard deviation of 3.9. Conduct a
hypothesis test of the airline executive’s belief.
I. Introduction
You have learned to conduct hypothesis tests on single means and single proportions. You will expand
upon that in this chapter. You will compare two means or two proportions to each other. The general
procedure is still the same, just expanded.
To compare statistical parameters, you work with two groups. The groups are classified either
as independent or matched pairs. Independent groups consist of two samples that are independent, that
is, sample values selected from one population are not related in any way to sample values selected from
the other population. Matched pairs consist of two samples that are dependent. The parameters tested
using independent groups are either population means or population proportions.
II. Objectives
1. Conduct hypothesis tests for two population means, population standard deviations known
2. Conduct hypothesis tests for two population means, population standard deviations unknown
3. Under appropriate conditions, conduct a hypothesis test for comparing two population
proportions or two treatments.
4. Used F – Test to test two population variances.
Figure 14. 1 If you want to test a claim that involves two groups (the types of breakfasts eaten east and
west of the Mississippi River) you can use a slightly different technique when conducting a hypothesis
test. (credit: Chloe Lim)
Studies often compare two groups. For example, researchers are interested in the effect aspirin has
in preventing heart attacks. Over the last few years, newspapers and magazines have reported various
aspirin studies involving two groups. Typically, one group is given aspirin and the other group is given a
placebo. Then, the heart attack rate is studied over several years.
There are other situations that deal with the comparison of two groups. For example, studies
compare various diet and exercise programs. Politicians compare the proportion of individuals from
different income brackets who might vote for them. Students are interested in whether SAT or GRE
preparatory courses really help raise their scores.
In this section, we learn to make inferences about a difference between two population means. Our
work here focuses on the following slogan:
It’s Not about the Values – It’s about How They Are Related!
This signifies that the value of the population means is not the focus of inference. Instead, we want
to develop tools for determining the relationship between two unknown population means. We select
independent random samples from two different populations and find the difference in the sample
means. We use the sample difference by conducting a hypothesis test about the difference in
population means.
The general steps of this hypothesis test are the same as always. As expected, the details of the
conditions for use of the test and the test statistic are unique to this test.
The null hypothesis, H0, is again a statement of “no effect” or “no difference.”
As always, we state our conclusion in context, usually by referring to the alternative hypothesis.
We use this hypothesis test when the data meets the following conditions.
Even though this situation is not likely (knowing the population standard deviations is not likely),
the following example illustrates hypothesis testing for independent means, known population standard
deviations. The sampling distribution for the difference between the means is normal and both
populations must be normal. The random variable is 𝑋 − 𝑋 . The normal distribution has the following
format:
(𝜎 ) (𝜎 )
+
𝑛 𝑛
𝑋 −𝑋
𝑧 =
(𝜎 ) (𝜎 )
+
𝑛 𝑛
Examples:
1. The mean lasting time of two competing floor waxes is to be compared. Twenty floors are randomly
assigned to test each wax. Both populations have a normal distribution. The data are recorded in
Table 14.1. Does the data indicate that wax 1 is more effective than wax 2? Test at a 5% level of
significance.
Table 14. 1 Data Table
This is a test of two independent groups, two population means, population standard deviations known.
Random Variable: 𝑋 − 𝑋 = difference in the mean number of months the competing floor waxes last.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The words "is more effective" says that wax 1 lasts longer than wax 2, on average. "Longer" is a
“>” symbol and goes into Ha. Therefore, this is a right-tailed test.
𝑋 −𝑋 3 − 2.9
𝑧 = = = 0.916
(𝜎 ) (𝜎 ) (0.33) (0.36)
+ +
𝑛 𝑛 20 20
5. Make a decision (reject or fail to reject the null hypothesis)
The test statistic is within the non-rejection region; thus, we fail to reject H 0.
At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude
that the mean time wax 1 lasts longer (wax 1 is more effective) than the mean time wax 2 lasts.
2. An interested citizen wanted to know if Democratic U. S. senators are older than Republican U.S.
senators, on average. On May 26, 2013, the mean age of 30 randomly selected Republican Senators
was 61 years 247 days old (61.675 years) with a standard deviation of 10.17 years. The mean age of
30 randomly selected Democratic senators was 61 years 257 days old (61.704 years) with a standard
deviation of 9.55 years. Do the data indicate that Democratic senators are older than Republican
senators, on average? Test at a 5% level of significance.
Solution:
This is a test of two independent groups, two population means. The population standard deviations
are unknown, but the sum of the sample sizes is 30 + 30 = 60, which is greater than 30, so we can use
the normal approximation to the Student’s-t distribution.
Random variable: 𝑋 − 𝑋 = difference in the mean age of Democratic and Republican U.S. senators.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The words "older than" translates as a “>” symbol and goes into Ha. Therefore, this is a right-
tailed test.
𝑋 −𝑋 61.704 − 61.675
𝑧 = = = 0.011
(𝜎 ) (𝜎 ) (9.55) (10.17)
+ +
𝑛 𝑛 30 30
5. Make a decision (reject or fail to reject the null hypothesis)
The test statistic is within the non-rejection region; thus, we fail to reject H 0.
At the 5% level of significance, from the sample data, there is not enough evidence to
conclude that the mean age of Democratic senators is greater than the mean age of the
Republican senators.
The test comparing two independent population means with unknown and possibly unequal population
standard deviations is called the Aspin-Welch t-test. The degrees of freedom formula were developed by
Aspin-Welch.
The comparison of two population means is very common. A difference between the two samples
depends on both the means and the standard deviations. Very different means can occur by chance if
there is great variation among the individual samples. In order to account for the variation, we take the
difference of the sample means, 𝑋 − 𝑋 , and divide by the standard error in order to standardize the
difference. The result is a t-score test statistic and is given by:
𝑋 −𝑋
𝑡 =
(𝑠 ) (𝑠 )
+
𝑛 𝑛
where:
s1 and s2, the sample standard deviations, are estimates of σ 1 and σ2, respectively
σ1 and σ2 are the unknown population standard deviations
x1 and x2 are the sample means
μ1 and μ2 are the unknown population means
Because we do not know the population standard deviations, we estimate them using the two sample
standard deviations from our independent samples. For the hypothesis test, we calculate the estimated
standard deviation, or standard error, of the difference in sample means, 𝑋 − 𝑋 .
(𝑠 ) (𝑠 )
+
𝑛 𝑛
df = n1 + n2 – 2.
Examples:
3. The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed
to be the same. A study is done, and data are collected, resulting in the data in Table 14.2. Each
population has a normal distribution. Is there a difference in the mean amount of time boys and girls
aged seven to 11 play sports each day? Test at the 5% level of significance.
Average Number of
Sample Standard
Sample Size Hours Playing Sports Per
Deviation
Day
Girls 9 2 0.866
This is a test of two independent groups, two population means. The population standard deviations
are not known.
Let the Subscripts: g: girls b: boys. Then, μg is the population mean for girls and μb is the population
mean for boys.
Random variable: 𝑋 − 𝑋 = difference in the sample mean amount of time girls and boys play sports
each day.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
The words “the same” tell you H0 has an “=”. Since there are no other words to indicate Ha, is
either faster or slower. This is a two tailed test.
df = n1 + n2 – 2 = 9 + 16 – 2 = 23
𝑋 −𝑋 2 − 3.2
𝑡 = = = −3.14
(𝑠 ) (𝑠 ) (0.866) (1.00)
+ +
𝑛 𝑛 9 16
At the 5% level of significance, the sample data show there is sufficient evidence to conclude
that the mean number of hours that girls and boys aged seven to 11 play sports per day is different
4. A study is done by a community group in two neighboring colleges to determine which one graduates
students with more math classes. College A samples 11 graduates. Their average is four math classes
with a standard deviation of 1.5 math classes. College B samples nine graduates. Their average is 3.5
math classes with a standard deviation of one math class. The community group believes that a
student who graduates from college A has taken more math classes, on the average. Both populations
have a normal distribution. Test at a 1% significance level.
Solution:
This is a test of two independent groups, two population means. The population standard
deviations are not known.
Random variable: 𝑋 − 𝑋 = difference in the sample mean amount of math classes of College A
and College B.
α = 0.01
3. Set the criterion (critical values) for rejecting the null hypothesis.
The words “more” translates as a “>” symbol and goes into Ha. Therefore, this is a right-tailed
test.
df = n1 + n2 – 2 = 11 + 9 – 2 = 18
𝑋 −𝑋 4 − 3.5
𝑡 = = = 0.89
(𝑠 ) (𝑠 ) (1.5) (1.00)
+ +
𝑛 𝑛 11 9
The test statistic is within the non - rejection region; thus, we fail to reject H 0.
At the 1% level of significance, from the sample data, there is not sufficient evidence to
conclude that a student who graduates from college A has taken more math classes, on the
average, than a student who graduates from college B.
Comparing two proportions, like comparing two means, is common. If two estimated proportions are
different, it may be due to a difference in the populations or it may be due to chance in the sampling. A
hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the
two population proportions.
This section focuses on the difference in sample proportions to test a hypothesis about a treatment
effect or a hypothesis that compares two population proportions. We will test claims about a treatment
effect or about a difference in population proportions, and we’ll see that the steps and the logic of the
hypothesis test are the same.
Hypothesis
The difference of two proportions follows an approximate normal distribution. Generally, the null
hypothesis is a statement of “no effect” or “no difference,” so the null hypothesis for all hypothesis tests
about two population proportions is
H0: p1 − p2 = 0.
H0: p1 = p2
Ha: p1 − p2 ≠ 0 or p1 ≠ p2
When conducting a hypothesis test that compares two independent population proportions, the
following characteristics should be present:
1. The two independent samples are random samples that are independent.
2. The number of successes is at least five, and the number of failures is at least five, for each of
the samples.
3. Growing literature states that the population must be at least ten or even perhaps 20 times the
size of the sample. This keeps each population from being over-sampled and causing biased
results.
Test Statistic
(𝑝 − 𝑝 )
𝑍=
1 1
(𝑝 )(1 − 𝑝 ) +
𝑛 𝑛
where:
𝑥 +𝑥
𝑝 =
𝑛 +𝑛
Examples:
5. Two types of medication for hives are being tested to determine if there is a difference in the
proportions of adult patient reactions. Twenty out of a random sample of 200 adults given medication
A still had hives 30 minutes after taking the medication. Twelve out of another random sample of 200
adults given medication B still had hives 30 minutes after taking the medication. Test at a 1% level of
significance.
Solution:
The problem asks for a difference in proportions, making it a test of two proportions. Let A and B be
the subscripts for medication A and medication B, respectively. Then pA and pB are the desired population
proportions.
Random Variable: 𝑝 − 𝑝 = difference in the proportions of adult patients who did not react after
30 minutes to medication A and to medication B.
α = 0.01
3. Set the criterion (critical values) for rejecting the null hypothesis.
The words "is a difference" tell you the test is two-tailed. Thus, the area in each tail is α/2 = 0.005.
The test statistic is within the non-rejection region; thus, we fail to reject H 0.
At a 1% level of significance, from the sample data, there is not sufficient evidence to conclude
that there is a difference in the proportions of adult patients who did not react after 30 minutes
to medication A and medication B.
6. Researchers conducted a study of smartphone use among adults. A cell phone company claimed that
iPhone smartphones are more popular with whites (non-Hispanic) than with African Americans. The
results of the survey indicate that of the 232 African American cell phone owners randomly sampled,
5% have an iPhone. Of the 1,343 white cell phone owners randomly sampled, 10% own an iPhone.
Test at the 5% level of significance. Is the proportion of white iPhone owners greater than the
proportion of African American iPhone owners?
Solution:
This is a test of two population proportions. Let W and A be the subscripts for the whites and African
Americans. Then pW and pA are the desired population proportions.
α = 0.05
3. Set the criterion (critical values) for rejecting the null hypothesis.
α = 0.05
1.645 2.42
Figure 14. 7 Location of the Test Statistic
At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that
a larger proportion of white cell phone owners use iPhones than African Americans.
So far, we considered inference to compare two proportions and inference to compare two means.
In this section, we will present how to compare two population variances. Why would we want to
compare two population variances? There are many situations, such as in quality control problems,
where you may want to choose the process with smaller variability for a variable of interest.
One of the essential steps of a test to compare two population variances is for checking the equal
variances assumption if you want to use the pooled variances. Many people use this test as a guide to
see if there are any clear violations, much like using the rule of thumb.
Hypothesis
Generally, the null hypothesis is a statement of “no effect” or “no difference,” so the null hypothesis
for all hypothesis tests about two population variances is
2 2
𝐻 : 𝜎 −𝜎 =0
1 2
2 2
𝐻 : 𝜎 =𝜎
1 2
The alternative hypothesis is one of the following:
2 2 2 2
𝐻 : 𝜎 −𝜎 >0 𝑜𝑟 𝜎 >𝜎
1 2 1 2
2 2 2 2
𝐻 : 𝜎 −𝜎 <0 𝑜𝑟 𝜎 <𝜎
1 2 1 2
2 2 2 2
𝐻 : 𝜎 −𝜎 ≠0 𝑜𝑟 𝜎 ≠𝜎
1 2 1 2
A test of two variances may be left, right, or two-tailed.
F - Distribution
It is often desirable to compare two variances, rather than two means or two proportions. For
instance, college administrators would like two college professors grading exams to have the same
variation in their grading. In order for a lid to fit a container, the variation in the lid and the container
should be the same. A supermarket might be interested in the variability of check-out times for two
checkers. In order to compare two variances, we must use the F distribution.
F distribution is a probability distribution of the ratio of two variables, each with a chi-square
distribution; used in analysis of variance, especially in the significance testing of a correlation coefficient.
In order to perform a F test of two variances, it is important that the following are true:
The populations from which the two samples are drawn are normally distributed.
The two populations are independent of each other.
F – Statistic
2
𝑠
F= 1
2
𝑠
2
If F is close to 1, the evidence favors the null hypothesis (the two population variances are equal);
but if F is much larger than 1, then the evidence is against the null hypothesis.
Note that the F ratio could also be F = . It depends on Ha and on which sample variance is larger.
When you are finding the F test value, the larger of the variances is placed in the numerator of the
F formula; this is not necessarily the variance of the larger of the two sample sizes. The F – Distribution
Table found in Appendix E which gives the F critical values for α = 0.005, 0.01, 0.025, 0.05, and 0.10 These
are one-tailed values; if a two-tailed test is being conducted, then the a/2 value must be used.
Note: Be sure that the larger of the two sample variances is placed in the numerator to calculate
the test statistic. This will mean that only the right hand tail critical value will have to be found in the F-
table.
Examples:
7. Two college instructors are interested in whether or not there is any variation in the way they grade
math exams. They each grade the same set of 10 exams. The first instructor’s grades have a variance
of 52.3. The second instructor’s grades have a variance of 89.9. Test the claim that the first
instructor’s variance is smaller. The level of significance is 10%.
Solution:
α = 0.10
3. Set the criterion (critical values) for rejecting the null hypothesis.
To locate the critical values, we use the F – Distribution Table found in Appendix E. Also, the
higher variance should be the numerator based from the alternative hypothesis, thus our
numerator is instructor 2.
df1 = n1 – 1 = 10 – 1 = 9 → denominator
df2 = n2 – 1 = 10 – 1 = 9 → numerator
1.719 2.44
Figure 14. 8 Location of the Test Statistic
The test statistic is within the non - rejection region; thus, we fail to reject H 0.
With a 10% level of significance, from the data, there is insufficient evidence to conclude that the
variance in grades for the first instructor is smaller.
Concepts in Statistics. Provided by: Open Learning Initiative. Located at: http://oli.cmu.edu. License: CC
BY: Attribution
Introductory Statistics. Authored by: Barbara Illowski, Susan Dean. Provided by: Open Stax. Located
at: http://cnx.org/contents/30189442-6998-4686-ac05-ed152b91b9de@17.44. License: CC BY:
Attribution. License Terms: Download for free at http://cnx.org/contents/30189442-6998-4686-ac05-
ed152b91b9de@17.44
CK-12 Basic Probability and Statistics. Authored by: Brenda Meery. Provided by: CK-12. Located at:
https://www.ck12.org/book/ck-12-basic-probability-and-statistics-concepts. License: CC BY: Attribution
CK-12 Probability and Statistics - Advanced (Second Edition). Authored by: Ellen Lawsky, Larry Ottman,
Raja Almukkahal, Brenda Meery, Danielle DeLancey. Provided by: CK-12. Located at:
https://www.ck12.org/book/ck-12-probability-and-statistics-advanced-second-edition. License: CC BY:
Attribution
X. Summary
𝑋 −𝑋
𝑧 =
(𝜎 ) (𝜎 )
+
𝑛 𝑛
Population Variance Unknown
𝑋 −𝑋
𝑡 =
(𝑠 ) (𝑠 )
+
𝑛 𝑛
XI. Exercises
Solve the following problems. Use the steps in conducting hypothesis tests.
1. The U.S. Center for Disease Control reports that the mean life expectancy was 47.6 years for whites
born in 1900 and 33.0 years for nonwhites. Suppose that you randomly survey death records for
people born in 1900 in a certain county. Of the 124 whites, the mean life span was 45.3 years with a
standard deviation of 12.7 years. Of the 82 nonwhites, the mean life span was 34.1 years with a
standard deviation of 15.6 years. Conduct a hypothesis test to see if the mean life spans in the county
were the same for whites and nonwhites.
2. Mean entry-level salaries for college graduates with mechanical engineering degrees and electrical
engineering degrees are believed to be approximately the same. A recruiting office thinks that the
mean mechanical engineering salary is actually lower than the mean electrical engineering salary. The
recruiting office randomly surveys 50 entry level mechanical engineers and 60 entry level electrical
engineers. Their mean salaries were $46,100 and $46,700, respectively. Their standard deviations
were $3,450 and $4,210, respectively. Conduct a hypothesis test to determine if you agree that the
mean entry-level mechanical engineering salary is lower than the mean entry-level electrical
engineering salary.
3. A researcher is testing the effects of plant food on plant growth. Nine plants have been given the plant
food. Another nine plants have not been given the plant food. The heights of the plants are recorded
after eight weeks. The populations have normal distributions. The following table is the result. The
researcher thinks the food makes the plants grow taller.
4. A study is done to determine if students in the California state university system take longer to
graduate, on average, than students enrolled in private universities. One hundred students from both
the California state university system and private universities are surveyed. Suppose that from years
of research, it is known that the population standard deviations are 1.5811 years and 1 year,
respectively. The following data are collected. The California state university system students took on
average 4.5 years with a standard deviation of 0.8. The private university students took on average
4.1 years with a standard deviation of 0.3.
5. Researchers interviewed street prostitutes in Canada and the United States. The mean age of the 100
Canadian prostitutes upon entering prostitution was 18 with a standard deviation of six. The mean
age of the 130 United States prostitutes upon entering prostitution was 20 with a standard deviation
of eight. Is the mean age of entering prostitution in Canada lower than the mean age in the United
States? Test at a 1% significance level.
6. In the recent Census, three percent of the U.S. population reported being of two or more races.
However, the percent varies tremendously from state to state. Suppose that two random surveys are
conducted. In the first random survey, out of 1,000 North Dakotans, only nine people reported being
of two or more races. In the second random survey, out of 500 Nevadans, 17 people reported being
of two or more races. Conduct a hypothesis test to determine if the population percents are the same
for the two states or if the percent for Nevada is statistically higher than for North Dakota.
7. We are interested in whether the proportions of female suicide victims for ages 15 to 24 are the same
for the whites and the blacks races in the United States. We randomly pick one year, 1992, to compare
the races. The number of suicides estimated in the United States in 1992 for white females is 4,930.
Five hundred eighty were aged 15 to 24. The estimate for black females is 330. Forty were aged 15 to
24. We will let female suicide victims be our population.
8. Adults aged 18 years old and older were randomly selected for a survey on obesity. Adults are
considered obese if their body mass index (BMI) is at least 30. The researchers wanted to determine
if the proportion of women who are obese in the south is less than the proportion of southern men
who are obese. The results are shown in Table. Test at the 1% level of significance.
9. Two coworkers commute from the same building. They are interested in whether or not there is any
variation in the time it takes them to drive to work. They each record their times for 20 commutes.
The first worker’s times have a variance of 12.1. The second worker’s times have a variance of 16.9.
The first worker thinks that he is more consistent with his commute times and that his commute time
is shorter. Test the claim at the 10% level.
10. Two cyclists are comparing the variances of their overall paces going uphill. Each cyclist records his or
her speeds going up 35 hills. The first cyclist has a variance of 23.8 and the second cyclist has a variance
of 32.1. The cyclists want to see if their variances are the same or different. At the 5% significance
level, what can we say about the cyclists’ variances?
References
[1] D. C. Montgomery and G. C. Runger, Applied Statistics and Probability for Engineers, 3rd ed., New York, NY: John Wil
[2] R. Lasser, "Engineering Method," Electrical and Computer Engineering Handbook, 2020. [Online]. Available:
https://sites.tufts.edu/eeseniordesignhandbook/2013/engineering-method/.
[4] A. G. Bluman, Elementary Statistics: A Step by Step Approach, 9th ed., New York: McGraw-Hill Education, 2014.
[6] "2019-20 NBA Predictions," ABC News Internet Ventures, 12 March 2020. [Online]. Available: https://projects.fiveth
predictions/.
[8] F. Newport, "Americans Still Enjoy Saving Rather than Spending: Few demographic differences seen in these views o
GALLUP Economy, 2013. [Online]. Available: http://www.gallup.com/poll/162368/americans-enjoy-saving-rathe
[Accessed 15 May 2013].
[9] "What are the key statistics about pancreatic cancer?," American Cancer Society, 2013. [Online]. Available:
http://www.cancer.org/cancer/pancreaticcancer/detailedguide/pancreatic-cancer-key-statistics . [Accessed 15 M
[10] "NBA Statistics," ESPN NBA, 2013. [Online]. Available: http://espn.go.com/nba/statistics/_/seasontype/2 . [Accessed
[11] L. Vanderkam, "Stop Checking Your Email, Now," CNNMoney, 2014. [Online]. Available: http://management.fortune
now/. [Accessed 15 May 2013].
[12] Open Learning Initiative, "Concepts in Statistics," Open Learning Initiative, [Online]. Available: https://s3-us-west-
2.amazonaws.com/oerfiles/Concepts+in+Statistics/interactives/continuousprobabilitydistribution/ContinuousPr
[13] Department of Mathematics - The University of Arizona, "Standard Normal Distribution Table," 4 April 2016. [Online
https://www.math.arizona.edu/~rsims/ma464/standardnormaltable.pdf. [Accessed June 2020].
[14] D. M. Lane, "Online Statistics Education: A Multimedia Course of Study," Rice University , University of Houston Clea
University, [Online]. Available: http://onlinestatbook.com/. [Accessed June 2020].
[15] Lund Research Ltd, "How to do Normal Distributions Calculations," Laerd Statistics, 2018. [Online]. Available:
https://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php. [Accessed June 2020].
[16] NHS Digital, "Data dashboards," NHS Digital, [Online]. Available: https://digital.nhs.uk/.
[17] D. S. Young, "tolerance: An R Package for Estimating Tolerance Intervals," Journal of Statistical Software, vol. 36, no.
[18] NIST/SEMATECH, "7.2.6.3. Tolerance intervals for a normal distribution," in e-Handbook of Statistical Methods,, 2012
[19] Minitab, LLC, "Tolerance interval basics," 2019. [Online]. Available: https://support.minitab.com/en-us/minitab/18/h
and-process-improvement/quality-tools/supporting-topics/tolerance-interval-basics/. [Accessed July 2020].
[20] S. Glen, "Tolerance Intervals (Enclose Intervals) & Factors," StatisticsHowTo.com: Elementary Statistics for the rest o
[Online]. Available: https://www.statisticshowto.com/tolerance-intervals/.
APPENDIX
I. Statistical Tables
B: Student’s t – Distribution
E: F- Distribution Table
Week 2
Exercises
1.
2.
3. 0.92
4.
a. 0.8913
b. 0.3696
c. 0.2174
5.
a. 0.4737
b. 0.2368
c. 5/38=0.1315
6.
a. 7400/15000
b. 4600/15000
c. 3000/15000
7.
a. 13/36
b. 17/36
8. 0.292
9. 4096
10. 151, 200
Week 3
Exercises
1.
a. 0.10
b. 0.60
2. 0.50
3. 0.70
4. 0.3846
5. 0.7297
6.
a. 99.95%
b. 96.95%
c. 0.0005%
d. 5 x 10-6
7. 27%
8. 13.05%
9. 56.92%
10. 85%
Week 4
Exercises
1. No, Sum is 2
2. Yes
3.
X 2 3 4 5 6 7 8 9 10 11 12
P(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
4.
M 0 1 3
P(M) 2/6 3/6 1/6
5. X is the number of days Jeremiah attends basketball practice per week. X takes on the
values 0, 1, and 2.
6. 2.32 & 1.22
7. 0.0023
8. - $3.80
9.
a. -$5.006
b. 28.718
10. 22.5 & 4.33
Week 5
Exercises
1.
a. 0.03486
b. 0.6513
2.
a. 12
b. 2.25
c. 0.011
3.
a. 10
b. 0.0442
4.
a. 0.0183
b. 0.2149
5.
a. 2.35, 1.53
b. 0.6860
6.
a. 2.5, 1.58
b. 0.2138
c. 0.0420
7.
a. 1.33
b. 6.26
8.
a. 0.077
b. 0.7160
9.
a. 0.0128
b. 9.75, 1.3653
10.
a. 13.5, 3.5
b. 0.0988
c. 0.1987
Week 6
Exercises
1.
a. 0.0401
b. 2.29
2. 0.841
3. 113.28 – 126.72
4. 6.68%
5.
a. 0.50
b. 0.15866
c. 0.34458
d. 168.16
6. 0.042
7. 0.091
8. 0.999
9. 0.9849
10. 0.0401
Week 7
Exercises
1.
a. 2𝑥 + 𝑥
b. + 𝑦
2.
a. 0.0109
b. 1.2x + 0.4
c. 0.6 + 1.2 y2
3.
a.
Marginal 0.4 0.3 0.3
pmf of X
Marginal
0.28 0.72
pmf of Y
b.
P(X=x 0.416666667 0.347222222 0.236111111
|Y=2)
b.
Marginal
25/36 10/36 1/36
pmf of Y
c. 1/3
5.
6. Marginal of x: 𝑥 +
Marginal of y: 𝑦 +
7.
a.
Marginal 0.323425599 0.396990526 0.279583875
pmf of Y
Marginal
0.426899 0.573101
pmf of X
b. 0.222
c. 0.523314
8.
a. 41/720 or 0.0569
b. Marginal of x: +
( )
Marginal of y: +
c. Conditional probability density function of X:
( )
Conditional probability density function of Y:
Week 9
Exercises
1.
a.
b.
2.
a. 4038
b. 5671
c. 49
3. 0.0202
4.
5.
a.
b.
c.
6.
7.
a.
b.
8.
a. 1.5%
b. 5.35%
9.
a.
b.
10.
Week 10
Exercises
1. 48.73% - 64.87%
2. 0.564 - 0.656, 0.538 – 0.682, 0.555 – 0.665
3. 3.511 – 3.609
4. 9.7 – 27.7
5. 3, 244.06 – 3, 475.94
6. 636.84 – 663.16
7. 6, 244 – 11, 014
8. 239.84 – 248.16; 4.16
9. 0.3041 – 0.3730
10. 7.9441 – 8.4559
Week 11
Exercises
1. 0.0000307
2.
a. 3.30185 – 8.7363
b. 1.06 – 1.14
3.
a. 7, 975, 727.09
b. 54, 291.75 – 68, 692.25
4. 0.517 – 1.983
5.
a. 2.7161 – 3.0873
b. 2.488 – 3.312
6. 1.068 – 1.132
7. 2, 174.41 – 2, 345.425
8. 8.165 – 8.303
Week 12
Exercise 11
1. Reject H0.
2. Reject H0.
3. Fail to reject H0.
4. Reject H0.
5. Reject H0.
6. Fail to reject H0.
7. Reject H0.
8. Fail to reject H0.
9. Fail to reject H0.
10. Fail to reject H0.
Week 13
Exercise 12
1. Fail to reject H0.
2. Fail to reject H0.
3. Reject H0.
4. Reject H0.
5. Fail to reject H0.
6. Fail to reject H0.
7. Fail to reject H0.
8. Fail to reject H0.
9. Fail to reject H0.
10. Reject H0.
Week 14
Exercise 13
1. Reject H0.
2. Fail to reject H0.
3. Fail to reject H0.
4. Reject H0.
5. Fail to reject H0.
6. Reject H0.
7. Reject H0.
8. Reject H0.
9. Fail to reject H0.
10. Fail to reject H0.
Syllabus
UNIVERSITY
The University of Nueva Caceres, a private non-sectarian institution, is Bicol’s first university.
VISION Guided by its motto, “Non ScholaeSed Vitae” (Not of school but of life), and attuned to the demands of a highly dynamic global
STATEMENT environment, the University commits itself to quality and excellent education for all to transform the youth into entrepreneurial,
productive, morally upright, socially responsible professionals for a just, humane and progressive society.
The University of Nueva Caceres shall be a leading exponent of academic excellence, research, extension, and innovative technology
MISSION for sustainable development.
STATEMENT It creates a nurturing academic environment and provides equal opportunities in the formation of individuals into empowered
leaders, competent professionals and proactive entrepreneurs who are cognizant of our cultural heritage.
1. Culturally-rooted with multi-cultural understanding. Preserves his or her cultural roots and manifests pride in his or her
language, practices and traditions; shows appreciation of the culture of other peoples.
2. Collaborative. Works with others effectively as a member of a team, a group, an organization or a community.
3. Creative and critical thinker. Applies creative, imaginative and innovative thinking and ideas to problem solving.
4. Effective communicator. Communicates effectively and confidently in a range of contexts and for a variety of purposes.
GRADUATES 5. Life-long learner. Demonstrates an attitude of continuous learning to succeed in changing times.
ATTRIBUTES 6. Ethically and socially responsible. Demonstrates an understanding of ethical, social, and cultural issues and makes personal,
professional and leadership decisions in accordance with these principles.
7. Great leader. Demonstrates complete (accomplished, distinguished, expert) leadership traits and capabilities to influence and
enable others to achieve common goals and visions.
8. Excellence-driven. Demonstrates mastery of the fundamental and evolving technical and technological knowledge and skills
relating to their discipline.
COLLEGE
VISION Guided by value-centered instruction and service, this college seeks to become the country’s leading college in Engineering and
STATEMENT Architecture where theory and ethical practice foster professional excellence.
PROGRAM Within three to five years after graduation, graduates of BSECE shall be:
EDUCATIONA 1. Apply technical expertise in professional engineering practices, research, or in allied fields locally or globally while
L OBJECTIVES upholding the code of ethics for engineers.
2. Demonstrate life – long learning through a graduate education program or professional advancement.
3. Contribute to the growth/development of the society by integrating social, economic, cultural and environmental aspects
in nation building.
III. PROGRAM EDUCATIONAL OBJECTIVES (PEOs) AND ITS RELATIONSHIP TO COLLEGE MISSION STATEMENT
COLLEGE MISSION STATEMENT
PROGRAM EDUCATIONAL OBJECTIVES (PEOs)
1 2
Within three to five years after graduation, graduates of BSECE shall be:
3. Contribute to the growth/development of the society by integrating social, economic, cultural and
environmental aspects in nation building.
II. PROGRAM OUTCOMES (POs) AND ITS RELATIONSHIP TO PROGRAM EDUCATIONAL OBJECTIVES (PEOS)
PROGRAM OUTCOMES (PO) INSTITUTIONAL GOALS (IG)
understanding
with multi-cultural
Culturally-rooted
Collaborative
critical thinker
Creative and
communicator
Effective
Life-long learner
responsible
socially
Ethically and
Great leader
Excellence-driven
PEO
By the time of graduation, the students of the program shall have the ability to:
CO2 Compute the probability distribution of a random variable for both discrete and
continuous data. D D E I I D
Note:
This course is implemented using the Blended –Flipped approach wherein majority of individual learning tasks are performed
online using Google Classroom, while the learning synthesis through discussions as well as collaborative tasks are done on face
to face sessions with the assigned course instructor and other learners enrolled in the course.
Discussion of:
Orientation to the Course
1. UNC and College VMO 1. UNC and College VMO
2. Core Values 2. Core Values
3. OBE Framework 3. OBE Framework
4. Objectives of the Course DL Copy of Syllabus 4. Objectives of the Course
5. Course outline 5. Course outline
6. Course Requirements & Course Requirements &
Grading System Grading System
Week 1 1. Obtaining Data
1.1. Methods of Data
Collection
1.2. Planning and
Conducting Surveys
1.3. Planning and
Conducting
Experiments:
Introduction to Design
of Experiments
2. Probability
2.1. Sample Space and
Relationships among
Events
Week 2
2.2. Counting Rules Useful
in Probability
2.3. Basic Interpretations
of Probability
3. Rules of Probability
3.1 Terms in Probability
3.2 Additive Rules of
Week 3
Probability
3.3 Multiplicative Rules of
Probability
4. Introduction to Probability
Distribution
4.1. Random Variables and
Their Probability
Week 4 Distributions
4.2. Cumulative
Distribution Functions
4.3. Expected Values of
Random Variables
PRELIM EXAM
5. Discrete Probability
Distribution
5.1 Binomial Probability
Distribution
Week 6 5.2 Poisson Probability
Distribution
5.3 Hypergeometric
Probability
Distribution
6. Continuous Probability
Distribution
6.1 Introduction
6.2 Normal Probability
Distribution
Week 7 6.3 The Normal
approximation to
Binomial
6.4 The Normal
Approximation to
Poisson
7. Joint Probability
Distribution
7.1 Joint Probability Mass
Function
Week 8 7.2 Joint Probability
Density Function
7.3 Conditional Probability
Distribution
MIDTERM EXAM
8. Point Estimation of
Parameters and Sampling
Distribution
8.1 Point Estimation
Week 10 8.2 General Concepts of
Point Estimation
8.3 Sampling Distribution
8.4 The Central Limit
Theorem
9. Confidence Intervals
9.1 Introduction
Week 11 9.2 Single Population
Mean
9.3 Population Proportion
FINAL EXAM
*All online learning activities / assessments are scheduled on the 2 nd meeting of the week’s schedule. Ex. MTh
schedule, the online component is on Thursday. For TF, it will be on Friday and for WS, it will be on Saturday.
IX. RUBRICS
RECITATION / BOARDWORK
CRITERIA PERCENTAGE
Accuracy 60%
Complete
Good solid Explanation is
Accuracy response with Wrong No
response with unclear/ misses key
(60%) detailed answer. answer.
clear explanation points
explanation
Detailed solution
Accuracy generally correct generally Wrong No
with correct
(60%) and complete but incorrect/incomplete answer. answer.
answer
may contain
minor flaws
Revision
CLAZE THERESE DE VERA EDGARDO N. MARTINEZ JR. MA. LOURDES REQUINTA CHRISTINE C. BAUTISTA Number 5
June 2020