Continuous Random Variables
Continuous Random Variables
Table of Contents
PMF.....................................................................................................................................................3
CDF......................................................................................................................................................5
PDF......................................................................................................................................................6
Properties of PDFs........................................................................................................................11
Properties of CDFs........................................................................................................................11
thus is infinite. For example, if we were to select a random point on a line between 0
to 1, it could have infinite possible values, or when a person arrives, we can state the
Notice that unlike with discrete random variables, we are not giving a set of specific
values, since it is literally not possible. Essentially, we can be infinitely more specific.
Probability Models for Continuous Random Variables
PMF
For discrete random variables, we used PMF and CDF as probability models. However,
PMF is not applicable for continuous random variables. This is because P [ X=x ] =0 ,
We can prove this. Consider, for simplicity, that all values for a continuous random
1
variable are equally likely. We know that P [ X=x ] = in this case. Since there are
n
infinite possible values, n=∞ , meaning P [ X=x ] is always 0 .
More formally, PMF is nothing more than distributing 1 unit of mass on a number line.
If we have an infinite number of points on the number line, then the allocation per
point is 0.
Consider a situation where we have a line AB of length 1. We will pick a random point
C on that line. Here, X =length of AC , so X is a continuous random variable and
Now consider the same situation, except that we have discretised the line, dividing it
recurring), Y =7 . The question is, how well does Y approximate X ? That depends on
how large the value of n is. The larger the value of n, the more accurately X can be
approximated by Y .
We can see that, for { X =x } (the PMF of X can also be represented like this
Thus, { X =x } ⊂ {Y =⌈ nX ⌉ }. As such,
P [ X=x ] ≤ P [ Y =⌈ nX ⌉ ]
1
P [ X=x ] ≤
n
P [ X=x ] ≤ lim P [ Y =⌈ nX ⌉ ]
x→ ∞
1
≤ lim
n →∞ n
≤0
The very definition of the PMF probability model states that P [ X=x ] ≥ 0 . Thus, the
only takeaway we have from this is that for continuous random variables, P [ X=x ] =0 .
This was the mathematical proof of the same thing we discovered one whole page
here is a bit a problem for us. We stated that one of the goals of using random
variables was to facilitate further processing, and PMFs have been helping us a lot
with that. Now that they’re gone, we need to find something to replace them, and we
CDF
1
Say we pick the point 0.5 on the line of 1 meter. Here, P [ 0< X ≤ 0.5 ] = . This is
2
because, since the probability of picking any of the points is the same, we will be
picking one of the points to the left of 0.5 in half of the cases and one of the points to
1 1
the right in half of the cases. Similarly, P [ 0< X ≤ 0.25 ] = , P [−∞< X ≤ 0.5 ] = and P ¿.
4 2
Even if we cannot find the probability for specific values of X , we can find the
{
0 x <0
F X ( x )= x 0< x <1
1 x≥1
A few points to notice:
This last point is correct because P [ X ≤b ] =P [ X <b ] + P [ X=b ] and we know that
P [ X=b ] =0. Following the same logic:
Density is the mass per unit volume, but it is also possible to define it as a unit per
unit area, for example the density of people per unit area in a field. We can even find
The probability density function (PDF) defines the average probability per unit
length.
When we first started discussing random variables, we stated that one of its major
outcomes. One of the key ways we achieved this with discrete random variables was
by using PMFs. However, as we have seen, PMFs cannot be used with continuous
random variables. As such, we need a probability model that can work with
continuous random variables in the way that PMFs worked with discrete random
{
0 x <0
x
case, F X ( x )= 0< x < 4 . If we draw the CDF graphs for both these cases, we will find
4
1 x≥4
that they both increase linearly, but the graph for the first case has a much steeper
slope.
distributions. Thus, in the first case, where x has values between 0 and 1, a total
probability of 1 is distributed over a smaller range, thus giving a steeper slope. In the
second case, the same total probability of 1 is distributed over a larger range,
probabilities for two intervals, from x 1 to x 1+ ∆ and from x 2 to x 2+ ∆ . Notice that we are
taking intervals of the same length, just starting at different points, and that ∆ is a
Even though we are taking intervals of the same length, the probability values we
find for those intervals are greatly different. This is because the density of
Another indication of the density is the steepness of the CDF curve. A flat area
indicates a lower density, while a steep area indicates a higher density and vice
versa.
The slope of the CDF curve is of course given by its first derivative, the rate of
F X ( x +∆ )−F X ( x )
¿ ⋅∆
∆
CDF values simply indicate a probability. Unlike PMFs, they indicate the probability
over a range, but the value is still a probability. Thus, the first term in the equation
above can be thought of as the probability in an interval divided by the length of the
interval, the average probability density. From this, we can go back to finding the
If we use very small values of ∆ , we could even go as far as to say the probability at
the single point x is approximately equal to the probability density for the interval
F x ( x +∆ )−F x ( x ) d
f X ( x )=lim = F X ( x)
∆→0 ∆ dx
This is the PDF of X . The PDF is the probability density per unit length for a random
Always remember that the PDF is not a probability, but rather a probability density. By
PDF Curves
The PDF curve understandably depends on the CDF values. For the case where
probabilities are uniformly distributed over an interval, the PDF curve will be flat.
Of course, the area of the PDF curve for the interval will represent the probability in
1
that interval. As such, the total area will be 1. This means the height of the line is
b−a
.
Thus, P [ a< X ≤ b ] is given by the area of the curve, which, in this case, can be
b
{
1
x 0< x <4
We previously saw an example where F X ( x )= 0< x < 4 . Here, f X ( x )= 4 .
4
0 otherwise
1 x≥4
x
1 x
Inversely, ∫ dx= .
−∞ 4 4
+∞
Of course, the PDF formulas are not always simple, and since integration and
differentiation have become involved, we will soon start facing problems like having
to integrate by parts.
Properties of PDFs
+∞
∫ f X ( x ) dx=1
−∞
x
∫ f X ( x ) dx=P [ X ≤ x ] =F X ( x )
−∞
b
P [ a< x ≤ b ] =∫ f X ( x ) dx
a
f X ( x )≥ 0
Properties of CDFs
x
F X ( x )=P [ X ≤ x ] =∫ f X ( x ) dx
−∞
F X (−∞ )=0
F X ( +∞ )=1
P [ a< x ≤ b ] =F X ( b )−F X ( a )
Example
{
−x
2
f X ( x )= cx e x≥0
0 otherwise
We need to find
ii) Derive F X ( x ),
iii) Retrieve the PDF from the CDF, although it is already given,
iv) Find P [ a< x ≤ 5 ] using the PDF and the CDF separately.
i)
+∞
∫ f X ( x ) dx=1
−∞
+∞ −x
∫ cx e 2
dx=1
0
[cx (−2 ) e ]
−x +∞ +∞ −x
2
0 −c ∫ 1 ⋅ (−2 ) e 2
dx=1
0
[ ]
−x + ∞
2
0+2 c (−2 ) e 0 =1
4 c=1 c
1
c=
4
ii)
x
F X ( x )=∫ f X ( x ) dx
−∞
x −x
1
¿∫ xe 2
dx
0 4
[ ]
−x x −x
1 1
¿ x (−2 ) e
4
2
−
4
∫ 1 ⋅ (−2 ) e 2
dx
0
[ ]
−x −x x
1 1
¿− x e 2 + (−2 ) e 2
0
2 2
−x −x
1
¿− x e 2 −e 2 +1
2
−x −x
1
¿ 1−e 2 − x e 2
2
iii)
d
f X ( x )= F (x )
dx X
( )
−x −x
d 1
¿ 1−e 2 − xe 2
dx 2
[ ]
−x −x −x
1 1 1
¿ 0+ e 2 − e 2 − x e 2
2 2 2
−x
1 2
¿ xe
4
iv)
Using CDF,
P [ 2< X ≤5 ] =F X ( 5 )−F X ( 2 )
( )( )
−5 −5 −2 −2
1 1
¿ 1−e 2 − ⋅5 ⋅e 2
− 1−e 2 − ⋅2 ⋅e 2
2 2
−5
−1 7 2
¿2e − e
2
Using PDF,
5
P [ 2< X ≤5 ] =∫ f X ( x ) dx
2
5 −x
1
¿∫ xe 2
dx
2 4
[ ]
−x 5 5 −x
1 1
¿ x (−2 ) e 2
− ∫ 1⋅ (−2 ) e 2
dx
4 2 4 2
[ ]
−5 −x 5
5
¿− e 2 + e−1− e 2
2
2
−5
7
¿ 2 e−1− e 2
2
Expectations and Variances
E [ X ]= ∑ x ⋅ P X ( x )
x∈ SX
+∞
E [ X ] = ∫ x ⋅ f X ( x ) dx
−∞
Var [ X ] = ∑ ( x −E [ x ])2 ⋅ P X ( x )
( x ∈S X )
+∞
Var [ X ] = ∫ ( x−E [ x ]) ⋅ f X ( x )
2
−∞
Note that the limits for integration are replaced by the given limits for individual
We have already seen a uniform random variable for a discrete set of values. In the
continuous version of this, the values are given by a set [ a , b ]. This set defines the
possible values of X .
X uniform ( a , b )
Note that the brackets do not matter for continuous random variable and can be
used interchangeably.
Unlike discrete random variables, we cannot simply define the uniform continuous
random variable as one that has an equal probability for every possible value. This is
because all values for continuous random variables always have the value 0 , as we
constant
from any interval within the specified range will be picked. This definition is
f X ( x )= {c0 a< x ≤ b
otherwise
We know,
+∞
∫ f X ( x ) dx=1
−∞
∫ c dx=1
a
[ cx ] ba=1
c ( b−a )=1
1
c=
b−a
{
1
a< x ≤b
f X ( x )= b−a
0 otherwise
Using this, we can calculate the CDF as
x
1
F X ( x )=∫ dx
a b−a
[ ]
x
x
¿
b−a a
x−a
¿
b−a
{
0 x <a
x−a
F X ( x )= a< x ≤ b
b−a
1 x≥b
( b−a )2
Var [ X ] =
12
Example
Say a student requires between 22 to 30 minutes to travel from home to school. The
probability density between this interval is constant. Say the class starts at 8 :00 AM
and the student starts from home at 7 :35 AM . What is the probability that they reach
in time?
Let X be a uniform continuous random variable, where X =amount of delay . The best
possible case is that 22 minutes are needed, which puts the arrival time at 7 :57 AM ,
meaning X =−3. The worst possible case is 30 minutes are needed, which puts the
{
1
−3 ≤ x ≤ 5
f X ( x )= 8
0 otherwise
0
1
P [−3≤ X ≤ 0 ] =∫ dx
−3 8
[]
0
x
¿
8 −3
3
¿
8
An easier way we could have calculated this probability is if we defined the ‘length’ of
valid values as A , from 7 :57 AM to 8 :00 AM , and the ‘length’ of the total range as B,
scenario where events are occurring at a particular Poisson rate, λ . Say n events
Let X 1 =time of first event , X 2 =timebetween second event ∧first event and so on. This can
be generalized to
Since the events could occur at absolutely any moment, all the X we just defined are
F X ( x )=1−P [ X 1 > x ] =1−F X ( x ), the complimentary CDF. The complimentary CDF tells
C
us the probability that the first event occurs after the time x . This is the same as the
|
− λx
C e ( λx )n
F ( x )=
X
n! n=0
−λ x
e ( λx )0
¿
0!
− λx
¿e
From this,
F X ( x )= { 0
1−e − λx
x <0
x≥0
d
f X ( x )= F (x )
dx X
− λx
¿ λe
{
− λx
f X ( x )= λ e x≥0
0 otherwise
1
E [ X ]=
λ
1
Var [ X ] = 2
λ
The expected value actually tells us something very obvious in this case. If there are
1
5 events per hour, λ=5. Thus, the expected interval time will be hours, or 12
5
minutes, meaning on average, there will be an arrival every 12 minutes. Note that this
experiments before the first success and here we are interested in the time before
Example
1
On an average, there is an earthquake every 3 months. Thus, =3 . We need to find
λ
the probability that the next earthquake will occur after 3 months, but before 7
months.
problems related to some rate, if we have to work with the rate of events occurring
then we want a Poisson random variable, and if we have to work with the time interval
P [ 3 < X ≤7 ] =F X ( 7 )−F X ( 3 )
−7 −3
¿ 1−e 3 −1+ e 3
¿ 0.27
Gamma Random Variables
{
λ ( e ( λx ) )
−λx k−1
f X ( x )= x≥0
( k −1 ) !
0 otherwise
Clearly, this becomes the same as the exponential distribution for k =1.
There are situations where k is a real number, and in those cases, we need to
∞
Γ ( α )=∫ x
α −1 − x
e dx
0
Γ ( 1 )=1
Γ ( r +1 ) =r !
and
{
λ(e ( λx )r−1 )
−λx
f X ( x )= x≥0
Γ (r )
0 otherwise
The gamma distribution curves are different from other distributions, due to the
r
E [ X ]=
λ
r
Var [ X ] = 2
λ
successes, and here we are interested in the time until the k -th event.
Example
Say customers are arriving at a restaurant at a rate of 12 customers per hour and
that the restaurant will start having profits after 30 customers have arrived. We want
31
E [ X ]= =2 hours 35 minutes
12
Gaussian Random Variables
Also called the normal random variable, the gaussian random variable is possibly the
most important random variable due its wide usage in statistical inferences.
Whenever we are unsure about what the random variable should be in a situation, it
2
− ( x−μ )
1
, where −∞ < x <+∞ .
2
2σ
f X ( x )= e
√2 π σ 2
The two parameters are σ 2, which is the variance of the random variable and μ, which
Notice that the curve is symmetric, and if we change the value of μ, the only thing
that happens is that the curve shifts. The shape of the curve is unaffected. However,
The problem with gaussian random variables is that, if we integrate the PDF, we do
not get a closed-form solution. As such, we cannot mathematically define a value for
the CDF. This means we cannot find the probability associated with a particular
The probabilities in these scenarios are calculated using numerical techniques that
we will not cover in this course. Another option is to use tables and the PDF curve.
P¿
the CDF values for Gaussian random variables. Here, we shall be looking into how to
X N ( μ , σ 2 ) where μ is the mean and −∞ < μ<+ ∞ and σ 2 is the variance and σ 2> 0
The table should contain all the CDF values for each pair of μ and σ 2. There could be
millions of values like this, and it would be unrealistic to expect a table to include all of
the values.
For Z ,
2
−x
1
f X ( x )= e 2
√2 π
Normally, we would denote the CDF of a random variable as F X ( x ), but the CDF of Z
Φ ( z )=P [ Z ≤ z ]
In most cases, the value of z in tables is between 0 to 4 . The table is setup in a weird
way to save space. Each for indicates the values for some values of z and the column
1
specifies the digit in the ths place of z . Thus, to find the CDF for z=1.26 , we need
100
to go to the row marked 1.2 and the column marked 0.06 , where we find the value
0.896 .
However. there are no probability values available for negative values of z in the
table. For negative values, we use the symmetric property of the curve.
Each value from the table tells us the area under the curve from the leftmost point of
the curve to the specified value of z . Say we are asked to find P [ Z >1.26 ]. This is just
1−0.896=0.104 . We took the area on the left of z=1.26 and subtracted it from 1 to
Now, using the symmetric property of the graph, we can tell that the area on the
P [− z< Z < z ]
¿ P [ Z < z ] −P [ Z ← z ]
¿ Φ ( z )−( 1−Φ ( z ) )
¿ 2 Φ ( z )−1
P [ Z > z ] =0.05
1−P [ Z ≤ z ]=0.05
P [ Z ≤ z ] =1−0.05
P [ Z ≤ z ] =0.95
z=1.65
We are approximating from the table since the exact value is not available in
given table. There are two possible values (1.64 and 1.65) and one has been
chosen that felt more accurate. If a more accurate table were available that
2 × P [ Z < z ] =1.90
P [ Z < z ] =0.95
z=1.65
Now to get back to non-standard normal random variables, i.e. X N ( μ , σ 2 ) where μ ≠ 0
and σ 2 ≠ 1. For these cases, we can do two things. We could either have this huge
table for all possible values of μ and σ 2, or we could have some formula that allows us
to convert a normal random variable into a standard normal random variable. This
formula is
X−μ
Z=
σ
Say Y is a random variable that is derived from X . Any random variable that is derived
Y Gaussian ( μ1 , σ 21 )
For a derived random variable where Y =aX +b , two formulae we need to remember
are:
E [ Y ] =a ⋅ E [ X ] +b
Var [ Y ] =a Var [ X ]
2
μ1=E [ Y ] =E
[ σ ]
X −μ E [ X ] −μ μ−μ
=
σ
=
σ
=0
1 1 2
Var [ X ] = 2 ⋅ σ =1
2
σ 1= 2
σ σ
X −μ
Thus, we have standardized Y by using the formula Y = .
σ
Example
Say some data packets we are sending have an average delay that is a normal
distribution with μ=10 and σ 2=4 . We want to find the probability that the delay for a
∴ P[ X >13]
¿P
[ X−μ 13−μ
σ
>
σ ]
[ ]
¿ P Z>
3
2
¿ 1−P [ Z ≤1.5 ]
¿ 1−0.933
¿ 0.067
Beta Random Variables
1 α−1 β −1
f X ( x )= x ( 1−x ) 0< x <1
B (α , β )
Here,
1
B ( α , β )=∫ x
α −1 β−1
( 1−x ) dx
0
Γ (α) Γ ( β )
¿
Γ ( α+β)
Beta distributions are used for calculating probabilities that are fractions.
For example, when send data packets, what is the portion of packets that were
The graphs of beta distributions are complex and change depending on the values of
β and α .