KEMBAR78
Continuous Random Variables | PDF | Probability Distribution | Probability Density Function
0% found this document useful (0 votes)
64 views156 pages

Continuous Random Variables

The histogram shows a bell-shaped curve, indicating that the data are approximately normally distributed. Some key properties of a normal distribution include: - It is symmetric about the mean. - It has a characteristic bell shape. - Nearly all the values lie within 3 standard deviations of the mean. The normal distribution is completely described by just two parameters: 1) The mean (μ) - the average or expected value of the distribution. This appears to be around 170-175 cm based on the center of the bell curve. 2) The standard deviation (σ) - a measure of how spread out the data are from the mean. The data become increasingly less frequent as the distance from the mean increases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views156 pages

Continuous Random Variables

The histogram shows a bell-shaped curve, indicating that the data are approximately normally distributed. Some key properties of a normal distribution include: - It is symmetric about the mean. - It has a characteristic bell shape. - Nearly all the values lie within 3 standard deviations of the mean. The normal distribution is completely described by just two parameters: 1) The mean (μ) - the average or expected value of the distribution. This appears to be around 170-175 cm based on the center of the bell curve. 2) The standard deviation (σ) - a measure of how spread out the data are from the mean. The data become increasingly less frequent as the distance from the mean increases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

Capital University of Science and Technology

Subject: Statistics

Instructor: Azhar Rauf


Random Variable
Random Variable:
The rule or mapping from the original sample space (numerical
or non-numerical) to a numerical sample space, subject to the
certain constraints is called a random variable.
In general, each outcome of an experiment can be associated with a number by
specifying a rule of association (e.g., the number among the sample of ten
components that fail to last 1000 hours or the total weight of baggage for a sample
of 25 airline passengers). Such a rule of association is called a random variable—
a variable because different numerical values are possible and random because the
observed value depends on which of the possible experimental outcomes results
Random Variable
Probability Distribution

❖ Random variable
a variable (typically represented by x)
that has a single numerical value,
determined by chance, for each
outcome of a procedure
❖ Probability distribution
a description that gives the probability
for each value of the random variable;
often expressed in the format of a
graph, table, or formula
TWO TYPES OF RANDOM
VARIABLES

Discrete Continuous A random


Random A random Random variable is
Variables variable is Variables continuous if
discrete if its its set of
set of possible possible values
values consist consist of an
of discrete entire interval
points on the on the number
number line. line.
TWO TYPES OF RANDOM
VARIABLES

A discrete random variable:


⚫ has a countable number of possible values
⚫ has discrete jumps (or gaps) between successive values
⚫ has measurable probability associated with individual values
⚫ counts

A continuous random variable:


⚫ has an uncountably infinite number of possible values
⚫ moves continuously from value to value
⚫ has no measurable probability associated with each value
⚫ measures (e.g.: height, weight, speed, value, duration, length)
EXAMPLES
Continuous Random
Variables
OR
Continuous Probability
Distributions
Continuous Probability Distributions
• So far we considered a discrete random variable
• Random variable will be continuous if it represents a
continuous quantity such as time, distance, temperature
etc.
• Example
• Let random variable X be the amount of time a customer has to wait
before a call is received by a call center representative
• Assume that maximum wait time is 5 minutes, after which the call
automatically disconnects
• Now X is a continuous random variable whose value is in the range 0-5
Continuous Probability Distributions
• The waiting time can be in minutes

• But it can also be a fraction


Continuous Probability Distributions
• We can make time units more granular
• But it does not work

• For example, what is the probability that a customer has to wait


exactly 1 minute?
Continuous Probability Distributions
• However, we can talk about intervals

• For example, what is the probability that a customer has to wait


between 1 and 2 minutes
Continuous Probability Distributions
• We can make intervals more and more granular
Continuous Probability Distributions
• With very small intervals, the distribution becomes continuous
• Probability of waiting time in an interval is calculated as area under
the curve for that interval
Probability Density Function
• Probability Density Function (PDF) is equivalent of PMF for
continuous distributions
• It is defined on an interval and is the area below the curve
Probability Density Function
• At any given point, the area below the curve is zero
Probability Density Function
• Formal definition
Probability Density Function
• Discrete vs Continuous distributions
Discrete Continuous
Cumulative Density Function
• Cumulative density function (CDF)
for discrete distribution
Cumulative Density Function

• Cumulative density function (CDF) for continuous distribution


Cumulative Density Function
• Cumulative density function (CDF) for continuous distribution
Cumulative Density Function
• Cumulative density function (CDF) for continuous distribution
Cumulative Density Function
• Cumulative density function (CDF) for continuous distribution
Cumulative Density Function
• PDF vs CDF
Probability Distributions for Continuous Variables
Suppose the variable X of interest is the depth of a lake at a randomly chosen point on the
surface. Let the maximum depth (in meters), so that any number in the interval [0, M] is a
possible value of X. If we “discretize” X by measuring depth to the nearest meter, then
possible values are nonnegative integers less than or equal to M. The resulting discrete
distribution of depth can be pictured using a probability histogram. If we draw the
histogram so that the area of the rectangle above any possible integer k is the proportion of
the lake whose depth is (to the nearest meter) k, then the total area of all rectangles is 1. A
possible histogram appears in Figure 4.1(a). If depth is measured much more accurately
and the same measurement axis as in Figure 4.1(a) is used, each rectangle in the resulting
probability histogram is much narrower, though the total area of all rectangles is still 1. A
possible histogram is pictured in Figure 4.1(b); it has a much smoother appearance than
the histogram in Figure 4.1(a). If we continue in this way to measure depth more and more
finely, the resulting sequence of histograms approaches a smooth curve, such as is pictured
in Figure 4.1(c). Because for each histogram the total area of all rectangles equals 1, the
total area under the smooth curve is also 1. The probability that the depth at a randomly
chosen point is between a and b is just the area under the smooth curve
between a and b. It is exactly a smooth curve of the type pictured in Figure 4.1(c) that
specifies a continuous probability distribution
Continuous Random Variables

Probability Density Function


Let X be a continuous rv. Then a probability distribution or probability density
function (pdf) of X is a function f(x) such that for any two numbers a and b with ,
𝑎≤𝑏
𝑏

𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = න𝑓 𝑥 𝑑 𝑥
𝑎

That is, the probability that X takes on a value in the interval [a, b] is the area above
this interval and under the graph of the density function, as illustrated in Figure. The
graph of 𝑓(𝑥) is often referred to as the density curve.
Continuous Random Variables
Probability Density Function

For f(x) to be a legitimate pdf, it must satisfy the following two conditions:
1. 𝒇(𝒙) ≥ 𝟎 for all x

2.‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = area under the entire graph of f(x)
=1
Continuous Random Variables

Probability Density Function


Continuous Random Variables
Continuous Random Variables
Continuous Random Variables
Continuous Random Variables
Continuous Random Variables
Continuous Random Variables
3-4 Continuous Random Variables
Continuous Random Variables

Mean and Variance


Continuous Random Variables
Uniform Distribution
Example 2: Uniform distribution

The uniform distribution: all values are equally likely

The uniform distribution:


p(x)
f(x)= 1 , for 1 x 0
1

x
1

We can see it’s a probability distribution because it integrates


to 1 (the area under the curve is 1): 1 1

1 = x
0
0
=1− 0 =1
Example: Uniform distribution

What’s the probability that x is between ¼ and ½?

p(x)

¼ ½ x
1

P(½ x ¼ )= ¼
Normal Distribution
Introduction: Normal distribution

Introduction: Normal distribution


The standard normal distribution
More general normal distributions
Solving problems by working
backwards.
Introduction: Normal distribution

Introduction: Normal distribution


The standard normal distribution
More general normal distributions
Solving problems by working
backwards.
Normal Distribution (Gaussian Distribution)
• Consider the following probability distribution
• The probability distribution looks like a bell shaped curve
Normal Distribution
• Normal distribution curve (bell shaped curve)
Normal Distribution
• Normal distribution formula
Normal Distribution
• Scaling normal distribution
Normal distribution
A sample of heights of 10,000 adult males gave rise to the following histogram:

Histogram showing the heights of 10000 males


1400
1200
1000
Frequency

800
600
400
200
0
140 148 156 164 172 180 188 More
Height (cm )

Notice that this histogram is symmetrical and bell-shaped.


This is the characteristic shape of a normal distribution.
The Normal Distribution

 Bell Shaped
 Symmetrical f(X)
 Mean, Median and Mode
are Equal
Location is determined by σ
the mean, μ X
μ
Spread is determined by the
standard deviation, σ
Mean
= Median
The random variable has an
= Mode
infinite theoretical range:
+  to − 
Introduction: Normal distribution
This is called
If we were to draw a smooth the normal
curve through the mid-points of curve.
the bars in the histogram of
these heights, it would have the
following shape:

The normal distribution is an appropriate model for many common


continuous distributions, for example:

The masses of new-born babies;


The IQs of school students;
The hand span of adult females;
The heights of plants growing in a field;
etc.
The normal distribution is a theoretical
probability
the area under the curve adds up to one
Introduction: Normal distribution
If X has a normal distribution with mean μ, and variance σ2, we write
X ~ N[μ, σ2]

x
μ–σ μ+σ

68% of the distribution lies within 1


standard deviation of the mean.
The X axis
As well asisthe
divided up into
mean thedeviations
standard from the
mean. Below the shaded area is one deviation
deviation (σ) must also be known.
from the mean.

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Introduction: Normal distribution
If X has a normal distribution with mean μ, and variance σ2, we write
X ~ N[μ, σ2]

x
μ – 2σ μ + 2σ

95% of the distribution lies within 2


standard deviations of the mean.
Two standard deviations from the
mean

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Three standard deviations from the
mean

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Introduction: Normal distribution
If X has a normal distribution with mean μ, and variance σ2, we write
X ~ N[μ, σ2]

x
μ – 3σ μ + 3σ

99.7% of the distribution lies within 3


standard deviations of the mean.
A handy estimate – known as the Imperial
Rule for a set of normal data:

68% of data will fall within 1σ of the μ

P( -1 < z < 1 ) = 0.683 = 68.3%

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
95% of data fits within 2σ of the μ

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5

P( -2 < z < 2 ) = 0.954 = 95.4%


99.7% of data fits within 3σ of the μ

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5

P( -3 < z < 3 ) = 0.997 = 99.7%


Normal distribution
As normal distributions always represent continuous data, it
only makes sense to find the probability that X takes a value in
a particular interval. For example, we could find:

P(X ≥ 20);
P(–5 < X < 9);
P(X = 19 to the nearest whole number),
i.e. P(18.5 ≤ X < 19.5).
y
Probabilities correspond to
areas underneath the normal
x
curve.
There is no simple formula that can be used to find the
probabilities. Instead, the probabilities are found from
tables.
The standard normal distribution

Introduction: Normal distribution


The standard normal distribution
More general normal distributions
Solving problems by working backwards.
The standard normal distribution
y

x
-3 -2 -1 1 2 3

The normal distribution with mean 0 and standard deviation 1


is called the standard normal distribution – it is denoted Z.
So, Z ~ N[0, 1].

Probabilities for this distribution are given in tables.


The standard normal distribution
Here is an extract from a standard normal distribution table:
z 0 1 2 3 4 5 6 7 8 9

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517

0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852

0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

This row gives the next decimal place of the z value.


This column gives the first part of the z value.

The tables are cumulative, i.e. they give P(Z ≤ z).


The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

So, P(Z ≤ 0.54) = 0.7054.


The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

P(Z > 0.6) = 1 – P(Z ≤ 0.6)


= 1 – 0.7257
= 0 .2743
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

P(0.25 ≤ Z < 0.78) = P(Z ≤ 0.78) – P(Z ≤ 0.25)


= 0.7823 – 0.5987
= 0.1836
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

P(Z > –0.3) = P(Z < 0.3) Remember that the


= 0.6179 standard normal
distribution is symmetrical
around 0.
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

P(Z ≤ –0.28) = 1 – P(Z ≤ 0.28)


= 1 – 0.6103
= 0.3897
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

P(–0.08 < Z ≤ 0.85) = P(Z ≤ 0.85) – P(Z ≤ –0.08)


= 0.8023 – (1 – 0.5319)
= 0.3342
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

Find a such that P(Z < a) = 0.6950.


We search in the table to find the probability 0.6950.
We see that a = 0.51.
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

Find b such that P(Z > b) = 0.242.


i.e. such that P(Z ≤ b) = 1 – 0.242 = 0.758.
We see that b = 0.7.
The standard normal distribution
Extract from table:

z 0 1 2 3 4 5 6 7 8 9
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

Find c such that P(Z < c) = 0.352.


c must be negative because P(Z < c) is less than 0.5000.

By symmetry, P(Z > |c|) = 0.352 and


P(Z ≤ |c|) = 0.648. Therefore c = –0.38.
More general normal
distributions

Introduction: Normal distribution


The standard normal distribution
More general normal distributions
Solving problems by working backwards.

© Boardworks Ltd 2005


More general normal distributions
It would of course be impractical to publish tables of
probabilities for every possible normal distribution.

Fortunately, it is possible and easy to transform any


normal distribution to a standard normal:

If X ~ N[  , 
2
]then X −
Z= ~ N[0,1].

N[  ,  2 ] y N[0, 1]
y

Standardise
x
x -3 -2 -1 1 2 3
More general normal distributions
Example: If X ~ N[20, 16] , find
a) P(X < 23);
b) P(X > 14);
c) P(16 < X < 24.8).

a)If σ2 = 16, then σ = 4.


Standardise
y y
23 − 20
= 0.75
4
x x
20 23 0 0.75

P( X  23) = P( Z  0.75) = 0.7734


More general normal distributions
Example: If X ~ N[20, 16] , find
a) P(X < 23);
b) P(X > 14);
c) P(16 < X < 24.8).
b)
Standardise
y y
14 − 20
= −1.5
x
4
x
14 20 –1.5 0

P( X  14) = P( Z  −1.5) = P( Z  1.5) = 0.9332


More general normal distributions
Example: If X ~ N[20, 16] , find
a) P(X < 23);
b) P(X > 14);
c) P(16 < X < 24.8).
 16 − 20 24.8 − 20 
c)
P (16  X  24.8 ) = P  Z  = P(−1  Z  1.2)
 4 4 
y y
Standardise

x x
16 20 24.8 -1 0 1.2

P(Z < 1.2) = 0.8849


and P(Z < –1) = 1 – P(Z < 1) = 1 – 0.8413 = 0.1587.
So, P(–1 < Z < 1.2) = 0.8849 – 0.1587 = 0.7262
More general normal distributions
Examination style question: IQs are normally distributed with
mean 100 and standard deviation 15. What proportion of the
population have an IQ of at least 124?

Let X be the random variable for the IQ of an individual.


X ~ N[100, 225].

Standardise
y y
124 − 100
= 1 .6
15
x x
100 124 0 1.6
So, we want P(X > 124) = P(Z > 1.6)
= 1 – P(Z ≤ 1.6) = 1 – 0.9452
= 0.0548
The Normal Distribution
Density Function
◼ The formula for the normal probability density function is
2
1  (X −μ) 
1 −  
2 
f(X) = e 

2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
By varying the parameters μ and σ, we
obtain different normal distributions
A
B
C

A and B have the same mean but different standard deviations.


B and C have different means and different standard deviations.
The Normal Distribution Shape

f(X) Changing μ shifts the


distribution left or right.

Changing σ increases
or decreases the
σ spread.

μ X
The Standardized Normal

 Any normal distribution (with any mean and standard


deviation combination) can be transformed into the
standardized normal distribution (Z)

 To compute normal probabilities need to transform X


units into Z units

 The standardized normal distribution (Z) has a mean of


0 and a standard deviation of 1
Translation to the Standardized Normal
Distribution

 Translate from X to the standardized normal (the “Z” distribution) by


subtracting the mean of X and dividing by its standard deviation:

X −μ
Z=
σ
The Z distribution always has mean = 0 and
standard deviation = 1
The Standardized Normal Probability
Density Function

 The formula for the standardized normal probability density function is

1 −(1/2)Z2
f(Z) = e

Where e = the mathematical constant approximated by 2.71828


π = the mathematical constant approximated by 3.14159
Z = any value of the standardized normal distribution
The Standardized
Normal Distribution
 Also known as the “Z” distribution
 Mean is 0
 Standard Deviation is 1
f(Z)

1
Z
0
Values above the mean have positive Z-values.
Values below the mean have negative Z-
values.
Example

 If X is distributed normally with mean of $100 and standard deviation of


$50, the Z value for X = $200 is

X − μ $200 − $100
Z= = = 2.0
σ $50
 This says that X = $200 is two standard deviations (2 increments of $50
units) above the mean of $100.
Comparing X and Z units

$100 $200 $X (μ = $100, σ = $50)


0 2.0 Z (μ = 0, σ = 1)
Note that the shape of the distribution is the same,
only the scale has changed. We can express the
problem in the original units (X in dollars) or in
standardized units (Z)
Finding Normal Probabilities

Probability is measured by the area under


the curve
f(X) P (a ≤ X ≤ b)
= P (a < X < b)
(Note that the
probability of any
individual value is zero)

a b X
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below

f(X) P( −  X  μ) = 0.5
P(μ  X   ) = 0.5

0.5 0.5

μ X
P( −  X   ) = 1.0
The Standardized Normal Table

 The Cumulative Standardized Normal table in the textbook (Appendix table


E.2) gives the probability less than a desired value of Z (i.e., from negative
infinity to Z)

0.9772
Example:
P(Z < 2.00) = 0.9772

0 2.00 Z
The Standardized Normal Table
(continued
)
The column gives the value of
Z to the second decimal point
Z 0.00 0.01 0.02 …

The row shows 0.0


the value of Z 0.1
. The value within the
to the first .
decimal point . table gives the
2.0 .9772 probability from Z = −
 up to the desired Z
value
2.0
P(Z < 2.00) = 0.9772
General Procedure for Finding Normal
Probabilities
To find P(a < X < b) when X is
distributed normally:
 Draw the normal curve for the problem in
terms of X

 Translate X-values to Z-values

 Use the Standardized Normal Table


Finding Normal Probabilities
 Let X represent the time it takes (in seconds) to download an image
file from the internet.
 Suppose X is normal with a mean of18.0 seconds and a standard
deviation of 5.0 seconds. Find P(X < 18.6)

X
18.0
18.6
Finding Normal Probabilities
(continued)
 Let X represent the time it takes, in seconds to download an image
file from the internet.
 Suppose X is normal with a mean of 18.0 seconds and a standard
deviation of 5.0 seconds. Find P(X < 18.6)
X − μ 18.6 − 18.0
Z= = = 0.12
σ 5.0

μ = 18 μ=0
σ=5 σ=1

18 18.6 X 0 0.12 Z

P(X < 18.6) P(Z < 0.12)


Solution: Finding P(Z < 0.12)

Standardized Normal Probability P(X < 18.6)


Table (Portion) = P(Z < 0.12)
Z .00 .01 .02 0.5478
0.0 .5000 .5040 .5080

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871
Z
0.00
0.3 .6179 .6217 .6255
0.12
Finding Normal
Upper Tail Probabilities
Suppose X is normal with mean 18.0 and
standard deviation 5.0.
Now Find P(X > 18.6)

X
18.0
18.6
Finding Normal
Upper Tail Probabilities (continued
Now Find P(X > 18.6)… )

P(X > 18.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)


= 1.0 - 0.5478 = 0.4522

0.5478
1.000 1.0 - 0.5478
= 0.4522

Z Z
0 0
0.12 0.12
Finding a Normal Probability Between
Two Values

 Suppose X is normal with mean 18.0 and standard deviation 5.0. Find
P(18 < X < 18.6)

Calculate Z-values:

X − μ 18 − 18
Z= = =0
σ 5
18 18.6 X
X − μ 18.6 − 18 0 0.12 Z
Z= = = 0.12
σ 5 P(18 < X < 18.6)
= P(0 < Z < 0.12)
Solution: Finding P(0 < Z < 0.12)

Standardized Normal Probability P(18 < X < 18.6)


Table (Portion) = P(0 < Z < 0.12)
= P(Z < 0.12) – P(Z ≤ 0)
Z .00 .01 .02 = 0.5478 - 0.5000 = 0.0478
0.0 .5000 .5040 .5080 0.0478
0.5000
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255 Z


0.00
0.12
Probabilities in the Lower Tail

Suppose X is normal with mean 18.0 and


standard deviation 5.0.
Now Find P(17.4 < X < 18)

X
18.0
17.4
Probabilities in the Lower Tail
(continued
)

Now Find P(17.4 < X < 18)…


P(17.4 < X < 18)
= P(-0.12 < Z < 0) 0.0478
= P(Z < 0) – P(Z ≤ -0.12)
= 0.5000 - 0.4522 = 0.0478 0.4522

The Normal distribution is


symmetric, so this probability
17.4 18.0 X
is the same as P(0 < Z < 0.12) Z
-0.12 0
Determining Normal Probabilities
When value do not fall directly on σ landmarks:

1. State the problem


2. Standardize the value(s) (z score)
3. Sketch, label, and shade the curve
4. Use Table B
Step 1: State the Problem
What percentage of gestations are
less than 40 weeks?
Let X ≡ gestational length
We know from prior research:
X ~ N(39, 2) weeks
Pr(X ≤ 40) = ?
121
Step 2: Standardize
Standard Normal
variable ≡ “Z” ≡ a
Normal random
variable with μ = 0
and σ = 1,
Z ~ N(0,1)
Use Table B to look
up cumulative
probabilities for Z
7: Normal Probability Distributions
122

Example: A Z variable
of 1.96 has cumulative
probability 0.9750.
123
Step 2 (cont.)
Turn value into z score:
x−
z=

z-score = no. of σ-units above (positive z) or below
(negative z) distribution mean μ

For example, the value 40 from X ~ N (39,2) has


40 − 39
z= = 0.5
2
124 Steps 3 & 4: Sketch & Table B
3. Sketch
4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915
Example
The life of light bulbs has a normal distribution with
mean µ = 2250 hours and standard deviation σ of
220 hours. What is the probability that a bulb will
last:
i) between 1900 & 2300 hours?
ii) less than 1500 hours?
Example

To calculate the probabilities for the lifespan


of light bulbs, we can use the properties of
the normal distribution.
i) Between 1900 and 2300 hours: First, we
need to standardize the values using the z-
score formula:
(𝑥 − µ)
𝑧 =
𝜎
Example

For the lower limit (1900 hours):


(1900 − 2250) −350
𝑧1 = = = −1.59
220 220
For the upper limit (2300 hours):
(2300 − 2250) 50
𝑧2 = = = 0.23
220 220
Using a standard normal distribution table or a calculator, we
can find the probabilities associated with these z-scores.
𝑃(1900 < 𝑥 < 2300) = 𝑃(−1.59 < 𝑧 < 0.23)
Example
Looking up the values in a standard normal distribution
table, we find that 𝑃(𝑧 < −1.59) ≈ 0.0569 and
𝑃(𝑧 < 0.23) ≈ 0.5891.
Therefore,

≈ 𝑃(𝑧 < 0.23) − 𝑃(𝑧 < −1.59)


≈ 0.5891 − 0.0569 ≈ 0.5322
So, the probability that a bulb will last between 1900
and 2300 hours is approximately 0.5322.
Example

ii) Less than 1500 hours: Similarly, we need to


standardize the value:
𝑥 − µ 1500 − 2250 −750
𝑧 = = = = −3.41
𝜎 220 220
Using a standard normal distribution table or a
calculator, we can find
𝑃(𝑧 < −3.41) ≈ 0.0003.
Therefore, the probability that a bulb will last less than
1500 hours is approximately 0.0003.
Example
The time that it takes a driver to react to the brake lights on a
decelerating vehicle is critical in helping to avoid rear-end
collisions. The experimental data suggests that reaction time
for an in-traffic response to a brake signal from standard
brake lights can be modeled with a normal distribution having
mean value 1:25 sec and standard deviation of 0:46 sec.
Find for the in-traffic reaction time is between 1:00 sec and
1:75 sec
Example
An important quality characteristic for soft-drink
bottlers is the amount of soft drink dispensed into
each bottle. In a filling process, the amount in
millilitre dispensed into 500 ml bottles is
approximately normally distributed with mean = 500
ml and standard deviation = 5 ml. Bottles that
contain less than 490 ml do not meet the bottlers
quality standard and are sold at a substantial
discount.
i) If 25,000 bottles are filled, approximately how
many will fail to meet the quality standard?
Example
 An important quality characteristic for soft-drink bottlers
is the amount of soft drink dispensed into each bottle. In
a filling process, the amount in millilitre dispensed into
500 ml bottles is approximately normally distributed with
mean = 500 ml and standard deviation = 5 ml. Bottles
that contain less than 490 ml do not meet the bottlers
quality standard and are sold at a substantial discount.
ii) Suppose that, due to the failure of one of the filling
systems components, the mean of the filling process
shifts to 495 ml (assume that the standard deviation
remains 5 ml). If 25,000 bottles are filled, approximately
how many will fail to meet the quality standard?
Example
An important quality characteristic for soft-drink
bottlers is the amount of soft drink dispensed into
each bottle. In a filling process, the amount in
millilitre dispensed into 500 ml bottles is
approximately normally distributed with mean = 500
ml and standard deviation = 5 ml. Bottles that
contain less than 490 ml do not meet the bottlers
quality standard and are sold at a substantial
discount.
iii) Suppose that a different component fails and,
although the mean of the filling process remain 500
ml, the standard deviation increases to 10 ml. If
25,000 bottles are filled, approximately how many
will fail to meet the quality standard?
Exponential Distribution
Summary From Last Time
Discrete Random Variables
𝑛 𝑘 𝑛−𝑘
Binomial Distribution 𝑃 𝑋=𝑘 =
𝑘
𝑝 1−𝑝

Probability of number of 𝑘 success when you do 𝑛 Bernoulli trials

Mean and variance 𝜇 = 𝑛𝑝 𝜎 2 = 𝑛𝑝(1 − 𝑝)

Poisson distribution 𝑒 −𝜆 𝜆𝑘
𝑃 𝑋=𝑘 =
𝑘!
Probablily of 𝑘 randomly occurring events, given average number is 𝜆

Mean and variance 𝜇=𝜆 𝜎 2 = var 𝑋 = 𝜆

Is approximation to Binomial when n is large and p is small

Continuous Random Variables 𝑏



Probability Density Function (PDF) 𝑓(𝑥) 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓 𝑥 𝑑𝑥′
𝑎
 1 1  x   2
Uniform distribution 
𝑓 𝑥 =  2 −  1
 0 otherwise
Poisson or not?

Which of the following is most likely to be well


modelled by a Poisson distribution?

1. Number of trains arriving at


Falmer every hour
54%
2. Number of lottery winners
each year that live in Brighton
3. Number of days between
solar eclipses 23%
4. Number of days until a 18%

component fails
5%

1 2 3 4
Are they Poisson? Answers:

1. Number of trains arriving at Falmer every hour

NO, (supposed to) arrive regularly on a timetable not at random

2. Number of lottery winners each year that live in Brighton

Yes, is number of random events in fixed interval

3. Number of days between solar eclipses

NO, solar eclipses are not random events and this is a time
between random events, not the number in some fixed interval

4. Number of days until a component fails


NO, random events, but this is time until a random event, not the
number of random events
Time between random events / time till first random event ?

If a Poisson process has constant average rate 𝜈, the mean after a time
𝑡 is 𝜆 = 𝜈𝑡.

What is the probability distribution for the time to the first event?

⇒ Exponential distribution

Poisson - Discrete distribution: P(number of events)

Exponential - Continuous distribution: P(time till first event)


What is Exponential Distribution?
In Probability theory and statistics, the
exponential distribution is a continuous
probability distribution that often concerns
the amount of time until some specific event
happens. It is a process in which events
happen continuously and independently at a
constant average rate.
• Definition: The exponential distribution
models the time between events in a
Poisson process.
• Key Characteristics:
• Continuous and non-negative outcomes
• Memoryless property
• Exponential decay
The continuous random variable X has an exponential distribution,
with parameter λ and denoted by exp(λ), if its density function is
given by:
0, 𝑥<0
𝑓 𝑥 =ቊ
𝜆 ∗ 𝑒 (−𝜆𝑥) , 𝑥≥0

Cumulative Distribution Function (CDF):


The CDF of the exponential distribution is given by:
0, 𝑥<0
𝐹 𝑥 =ቊ
1 − 𝑒 (−𝜆𝑥) , 𝑥≥0
The mean and variance of the exponential distribution
1 2 1
are µ = and 𝜎 = 2
𝜆 𝜆
NOTE
If X is the time of arrival of the first customer and if
the
average time is 30 minutes, then λ = 1=30.
This distribution is commonly used to model waiting
times between occurrences of rare events, lifetimes of
electrical or mechanical devices.
Exponential distribution
The continuous random variable 𝑌 has the Exponential distribution, with constant
rate parameter 𝜈 if:

𝑓(𝑦) 𝜈=1
−𝜈𝑦
𝜈𝑒 , 𝑦>0
𝑓 𝑦 =ቊ
0, 𝑦<0

𝑦
Occurrence

1) Time until the failure of a part.

2) Separation between randomly happening events

- Assuming the probability of the events is constant in time: 𝜈 = const


Relation to Poisson distribution
If a Poisson process has constant average rate 𝜈, the mean after a time
𝑡 is 𝜆 = 𝜈𝑡.
The probability of no-occurrences in time 𝑡 is

𝑒 −𝜆 𝜆𝑘
𝑃 𝑘=0 = = 𝑒 −𝜆 = 𝑒 −𝜈𝑡 .
𝑘!
If 𝑓(𝑡) is the pdf for the first occurrence, then the probability of no
occurrences is

𝑃(no occurrence by 𝑡) = 1 − 𝑃(first occurrence has happened by 𝑡)


𝑡
= 1 − න 𝑓 𝑡 𝑑𝑡
0
𝑡 𝑡
⇒ 1 − න 𝑓 𝑡 𝑑𝑡 = 𝑒 −𝜈𝑡 ⇒ න 𝑓 𝑡 𝑑𝑡 = 1 − 𝑒 −𝜈𝑡
0 0

Solve by differentiating both sides respect to 𝑡 assuming constant 𝜈,


𝑑 𝑡 𝑑
න 𝑓 𝑡 𝑑𝑡 = 1 − 𝑒 −𝜈𝑡 The time until the first occurrence
𝑑𝑡 0 𝑑𝑡 (and between subsequent
occurrences) has the Exponential
⇒ 𝑓 𝑡 = 𝜈𝑒 −𝜈𝑡 distribution, parameter 𝜈.
Example

On average lightening kills three people each


year in the UK, 𝜆 = 3. So the rate is 𝜈 = 3/year.

Assuming strikes occur randomly at any time during


the year so 𝜈 is constant, time from today until the
next fatality has pdf (using 𝑡 in years)
𝑓 𝑡 = 𝜈𝑒 −𝜈𝑡 = 3 𝑒 −3𝑡

𝑓(𝑡) E.g. Probability the time till


the next death is less than
one year?
1 1
න 𝑓 𝑡 𝑑𝑡 = න 3 𝑒 −3𝑡 𝑑𝑡
0 0
1
3𝑒 −3𝑡
=
−3 0

𝑡 = −𝑒 −3 + 1 ≈ 0.95
Exponential distribution
A certain type of component can be
purchased new or used. 50% of all new
components last more than five years,
but only 30% of used components last
more than five years. Is it possible that the
lifetimes of new components are
53%
exponentially distributed?
1. YES
2. NO
48%

1 2
Exponential distribution

A certain type of component can be


purchased new or used. 50% of all new
components last more than five years,
but only 30% of used components last
more than five years. Is it possible that
the lifetimes of new components are
exponentially distributed?

Exponential distribution models time between independent


randomly occurring events, where frequency of events is
independent of time.
i.e. probability of failing in the first 5 years has to be same as
the probability of failing in any other period of 5 years. No
memory property.

The observed lifetimes imply that instead the failure rate must increase with time

NOT exponential
Mean and variance of exponential distribution

∞ ∞ ∞ −𝜈𝑦 ∞

𝑒 1
𝜇 = න 𝑦 𝑓 𝑦 𝑑𝑦 = න 𝑦𝜈𝑒 −𝜈𝑦 𝑑𝑦 = −𝑦𝑒 −𝜈𝑦 0 +න 𝑒
−𝜈𝑦
𝑑𝑦 = − =
−∞ 0 0 𝜈 0
𝜈
∞ ∞
2 2 2 2
1
−𝜈𝑦
𝜎 = න 𝑦 𝑓 𝑦 𝑑𝑦 − 𝜇 = න 𝑦 𝜈𝑒 𝑑𝑦 − 2
−∞ 0 𝜈

2 −𝜈𝑦 ∞ −𝜈𝑦
1 𝜇 1 1
= −𝑦 𝑒 0 + 2න 𝑦 𝑒 𝑑𝑦 − 2 = 0 + 2 − 2 = 2
0 𝜈 𝜈 𝜈 𝜈

𝜎 𝜎

𝜈=3

1
𝜇=
3
Example: Reliability

The time till failure of an electronic component has an Exponential


distribution and it is known that 10% of components have failed by 1000
hours.

(a) What is the probability that a component is still working after 5000
hours?

(b) Find the mean and standard deviation of the time till failure.
Answer

Let Y = time till failure in hours; 𝑓 𝑦 =


𝜈𝑒 −𝜈𝑦 . 1000
(a) First we need to find 𝜈 𝑃 𝑌 ≤ 1000 = න 𝜈𝑒 −𝜈𝑦
0
1000
= −𝑒 −𝜈𝑦 0 = 1 − 𝑒 −1000𝜈

𝑃 𝑌 ≤ 1000 = 0.1 ⇒ 1 − 𝑒 −1000𝜈 = 0.1


⇒ 𝑒 −1000𝜈 = 0.9
⇒ −1000𝜈 = ln 0.9 = −0.10536 ⇒ 𝜈 ≈ 1.05 × 10−4
If 𝑌 is the time till failure, the question asks for 𝑃(𝑌 > 5000):

𝑃 𝑌 > 5000 = න 𝜈𝑒 −𝜈𝑦 𝑑𝑦
5000
= −𝑒 −𝜈𝑦 ∞
5000 = 𝑒 −5000𝜈 ≈ 0.59

(b) Find the mean and standard deviation of the


time till failure.

Answer:
Mean = 1/𝜈 = 9491 hours.
Standard deviation = Variance

1
= = 1/𝜈 = 9491 hours
𝜈2
Is it exponential?

Which of the following random


variables is best modelled by an
exponential
distribution?

Question adapted
from Derek Bruff

1. The distance between defects


39%
in an optical fibre
2. The number of days between
someone winning the National 27% 27%
Lottery
3. The number of fuses that blow 17%
in the UK today
4. The hours of sunshine in
Brighton this week assuming an
average of 7.2hrs/day
1 2 3 4
Is it exponential?

Which of the following random


variables is best modelled by an
exponential
distribution?

1. The distance between defects in an optical fibre

- YES: continuous distribution that is the separation between


independent random events (the location of the defects)

2. The number of days between someone winning the National Lottery

- NO: continuous (if you allow fractional days), but draws happen
regularly on a schedule

3. The number of fuses that blow in the UK today

- NO: this is a discrete distribution – the number of events is a Poisson


distribution (exponential is the distribution of times between events)

4. The hours of sunshine in Brighton this week assuming an average of


7.2hrs/day

- NO: This is a continuous variable, but not the time between


Example
Suppose the time between consecutive phone calls at a customer
service center follows an exponential distribution with a rate parameter
of λ = 0.5 calls per minute. Find the probability that a call arrives within
the first 2 minutes.
Solution:
We want to find P(X ≤ 2), where X is the time between calls.
Using the CDF, we can calculate: F(2) = 1 - e^(-0.5*2) F(2) ≈ 0.8647
Interpretation:
The probability that a call arrives within the first 2 minutes is
approximately 0.8647 or 86.47%.
Example
If the life length of a refrigerator follows the exponential
distribution, and let X represents the life length of a refrigerator.
Suppose the average life length for this type of refrigerator is 15
years. Answer the following: 1
What is the probability that a refrigerator can be used for less than
6 years?
What is the probability that this refrigerator can be used for more
than 18 years?
What is the variance and the standard deviation of this random
variable?

You might also like