Random variables and their
distributions
Prof. Miloš Stanković
Introduction
• Fundamental concept in probability and statistics
• Random variables can isolate individual characteristics that we are
focused on – we don’t need to know the set of all possible elementary
events Ω
• We want to represent random events using numbers due to simpler
applications of mathematical tools
• E.g. coin toss – heads we assign with number 1, tail we assign with 0
Definition
• Random variable is a mapping from the set Ω (set of all elementary events)
to the set of real numbers ℝ
• Mapping does not have to be bijective – two or more events can be
mapped to the same number
• E.g. rolling dice, random variable 𝑋 = 1 if the number is even, 𝑋 = 0 if the number is
odd
• Typically random variable is denoted with capital letter, and the particular
value that it can have with small letters
• e.g. {𝑋 = 𝑥} is the event {𝜔 ∈ Ω|𝑋(𝜔) = 𝑥}
Distribution of random variable
• To a random variable 𝑋 we can assign a probability function in the
following way:
𝑃𝑋 𝐵 = 𝑃 𝜔 ∈ Ω 𝑋 𝜔 ∈ 𝐵 , 𝐵 ⊂ ℝ
• Or just:
𝑃𝑋 𝐵 = 𝑃(𝑋 ∈ 𝐵}
Probability distribution of discrete random
variables
• If the set of all possible values that a RV can take is discrete (finite or
countable) we say that the RV is discrete
• Let 𝑥1 , 𝑥2 , … be all the values that a RV can take
• Then,
{𝒑𝒊 = 𝑷 𝑿 = 𝒙𝒊 , 𝒊 = 𝟏, 𝟐, … } is the probability mass function (PMF) of RV 𝑋
• Since RV encompasses (maps) the whole set Ω – we always have:
𝑝𝑖 = 1
𝑖
Bernoulli distribution
• We have two possible outcomes in an experiment:
• Success 𝑋 = 1 , with probability 𝑝
• Failure 𝑋 = 0, with probability 1 − 𝑝
• This completely determines the PMF of the RV 𝑋
• IF RV has this distribution, we use notation 𝑋~Bern(𝑝)
• Example– indicator of event 𝐴 ∶
1, 𝜔 ∈ 𝐴
𝐼𝐴 𝜔 = ⇒ 𝐼𝐴 ~Bern(𝑃(𝐴))
0, 𝜔 ∉ 𝐴
Binomial distribution
• We repeat 𝑛 independent experiments with two possible outcomes
• 𝑋 – random variable representing the number of successes
• RV 𝑋 can take values 0,1,2, … . 𝑛 and it has Binomial distribution:
𝑛 𝑘 𝑛−𝑘
𝑃 𝑋=𝑘 = 𝑝 1−𝑝
𝑘
• 𝑋~Bin(𝑛, 𝑝)
Discrete uniform distribution
• Finite discrete RV which, with the same probability, can take any of 𝑛
possible values 𝑥1 , … , 𝑥𝑛
1
• PMF: 𝑃 𝑥𝑖 =
𝑛
• It can be parameterized using an interval 𝑎, 𝑏 :
Poisson distribution
• Poisson RV models number of certain events that happened in a unit of
time (or space), when the events occur independently of each other!
• Examples:
• Number of emails received in one day
• Number of phone calls in one day
• Number of buses in a station in a unit of time
• Number of newborn babies in one day
• Number of radioactive decays in a unit of time
• Number of trees per unit of surface in a forest
• From the given assumptions the following PMF can be derived:
𝑘
𝜆
𝑃 𝑋 = 𝑘 = 𝑒 −𝜆 , 𝑘 = 1,2, … , 𝜆>0
𝑘!
Poisson distribution
• 𝑋~Poiss (𝜆)
• Arbitrary large number of
events is allowed (with
probability that goes to zero)!
• 𝜆 – average number of events
Geometric distribution
• Bernoulli experiments with probability of success 𝑝 are repeated until
the first success
• RV 𝑋 is the number of experiments until the first success
• Probability mass function:
𝑃 𝑥 =𝑛 =𝑝 1−𝑝 𝑛−1 , 𝑛 = 1,2, …
Hypergeometric distribution
• 𝑚 objects, out of totally 𝑛 objects, are specially marked. We randomly
choose 𝑟 objects, 0 < 𝑟 ≤ 𝑛 − 𝑚
• RV 𝑋 is the number of special objects (minimum is 0, maximum 𝑟 )
• PMF is:
𝑚 𝑛−𝑚
𝑃 𝑋=𝑘 = 𝑘 𝑟−𝑘 ,
𝑛 𝑘 = 0,1,2, … , 𝑟
𝑟
Negative binominal distribution
• RV 𝑋 is the number of Bernoulli experiments up to the 𝑟-th sucess 𝑟 ≥ 1
• If 𝑟-th success happended in 𝑘-th experiment, it follows that in the first 𝑘 − 1
experiments there was 𝑟 − 1 successes
𝑘−1
• This can happen in ways, and the probability of each of theses outcomes is
𝑝𝑟−1 1 − 𝑝 𝑘−𝑟 𝑟−1
• Hence, the PMF is:
𝑘 − 1 𝑟−1 𝑘−𝑟 𝑘−1 𝑟 𝑘−𝑟
𝑃 𝑋=𝑘 = 𝑝 1−𝑝 𝑝= 𝑝 1−𝑝 , 𝑘 = 𝑟, 𝑟 + 1, …
𝑟−1 𝑟−1
• Example: How many times we need to roll a dice in order to claim that, with probability
0.99, we had at least two sixes?
1
• Answer: 𝑝 = , 𝑟 = 2, 𝑚 𝑘=2 𝑃 𝑋 = 𝑘 ≥ 0.99 , so that we get 𝑚 ≥ 37.
6
Cumulative distribution function and
continuous random variables
• Distribution of continuous random variables cannot be characterized
with the probability mass function!
• E.g. if RV 𝑋 can take any value from the interval [0,1] with the same
probability, then for each separate point 𝑥 ∈ [0,1]
𝑃 𝑋=𝑥 =0
• PMF doesn’t make sense in this case!
• Hence, we first introduce the Cumulative Distribution Function (CDF)!
Cumulative Distribution Function (CDF)
• It can be shown that the distribution of any RV is completely
determined with its values in the intervals of the form (−∞, 𝑥]
• Definition:
Real function defined by
𝐹 𝑥 =𝑃 𝑋≤𝑥 =𝑃 𝜔∈Ω𝑋 𝜔 ≤𝑥 , −∞ < 𝑥 < +∞
is called Cumulative Distribution Function of RV 𝑋.
CDF of discrete RV
• If 𝑋 is discrete RV, then its CDF looks like this:
Properties of CDF
• 0 ≤ 𝐹 𝑥 ≤ 1 , ∀𝑥 ∈ ℝ
• 𝐹 is monotonically non-decreasing
• 𝐹 is right-continuous in every point 𝑥 ∈ ℝ
• 𝐹 has a left limit value, 𝐹 𝑥− , in every point 𝑥 ∈ ℝ
• 𝐹 −∞ = lim 𝐹 𝑥 = 0
𝑥→−∞
• 𝐹 +∞ = lim 𝐹 𝑥 = 1
𝑥→+∞
For discrete RVs: 𝐹 𝑥− ≠ 𝐹(𝑥)
Finding probabilities of arbitrary events using
CDF
•𝑃 𝑎 <𝑋 ≤𝑏 =𝐹 𝑏 −𝐹 𝑎
•𝑃 𝑎 < 𝑋 < 𝑏 = 𝐹 𝑏− − 𝐹(𝑎)
•𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝐹 𝑏 − 𝐹(𝑎− )
•𝑃 𝑎 ≤ 𝑋 < 𝑏 = 𝐹 𝑏− − 𝐹(𝑎− )
For continuous RVs this is zero!
•𝑃 𝑋 = 𝑏 = 𝐹 𝑏 − 𝐹(𝑏− )
•𝑃 𝑋 < 𝑏 = 𝐹(𝑏− )
•𝑃 𝑋 > 𝑎 = 1 − 𝐹(𝑎)
•𝑃 𝑋 ≥ 𝑎 = 1 − 𝐹(𝑎− )
Probability density function of continuous
random variables
• Probability Density Function (PDF) is analogous to PMF of discrete RVs
Definition: If there exists a nonnegative function 𝑓 such that
𝑥
𝐹 𝑥 = 𝑓 𝑡 𝑑𝑡 ∀𝑥 ∈ ℝ
−∞
it is called probability density function (PDF).
Calculating probability of events using PDF
• 𝑃 𝑋 = 𝑎 = 0, ∀𝑎 ∈ ℝ
•𝑃 𝑎<𝑋≤𝑏 =𝑃 𝑎<𝑋<𝑏 =𝑃 𝑎≤𝑋≤𝑏 =𝑃 𝑎≤𝑋<𝑏 =
𝑏
𝑎
𝑓 𝑡 𝑑𝑡
𝑏
•𝑃 𝑋<𝑏 =𝑃 𝑋≤𝑏 = −∞
𝑓 𝑡 𝑑𝑡
+∞
•𝑃 𝑋>𝑎 =𝑃 𝑋≥𝑎 = 𝑎
𝑓 𝑡 𝑑𝑡
+∞
• −∞
𝑓 𝑡 𝑑𝑡 = 1
Formal interpretation of PDF
• The following equality can be derived:
𝑓 𝑥 Δ𝑥 = 𝑃 𝑥 ≤ 𝑋 ≤ 𝑥 + Δ𝑥 , (Δ𝑥 → 0)
• Hence, the probability that RV will take values in a small
neighborhood of some point is proportional to PDF at that point
Uniform distribution
• Continuous RV SP which can take arbitrary values in an interval [𝑎, 𝑏]
such that 𝑃 𝑋 ∈ 𝑥, 𝑦 = (𝑦 − 𝑥)/(b − a)
• Hence, the probability depends only on the interval size!
• Generalization of equally likely events in the discrete case
• 𝑋~Unif[𝑎, 𝑏]
Exponential distribution
• PDF: 𝑓 𝑥 = 𝜆𝑒 −𝜆𝑥 , 𝑥 ≥ 0 , 𝑓 𝑥 = 0, 𝑥 < 0
• CDF: 𝐹 𝑥 = 1 − 𝑒 −𝜆𝑥 , 𝑥 ≥ 0
• 𝑋~Exp(𝜆)
• Very important distribution, it is used to model lifetime of some devices, or time
between two malfunctions
• It has the important property of absence of memory!
𝑃 𝑋 > 𝑠 + 𝑡|𝑋 > 𝑠 = 𝑃(𝑋 > 𝑡)
• If we know that a has worked without malfunctions for s hours, probability that it
will malfunction in the next t hours is the same as the probability that it will
malfunction t hours after we turn it on!
• Connection with the Poisson distribution
Exponential distribution
• 𝑋~Exp(0.5)
Normal (Gaussian) distribution
• The most important distribution in probability and statistics
• RVs which are the result of large number of random influences, where
the effect of individual influence is negligible with respect to their
total sum, will have normal distribution!
• Hence, this distribution appears the most frequently in natural
processes (e.g. measurement noise)
• We will prove this statement later – Central Limit Theorem
Normal (Gaussian) distribution
𝑥2
1 −2
• PDF: 𝑓 𝑥 = 𝑒 , −∞ < 𝑥 < +∞
2𝜋
• 𝑋~Norm(0,1) – mathematical expectation is 0, variance is 1
• CDF is not elementary function! (it must be calculated numerically)
• For arbitrary expectation and variance:
(𝑥−𝜇)2
1 −
𝑓 𝑥 = 𝑒 2𝜎2 , −∞ < 𝑥 < +∞
𝜎 2𝜋
• 𝑋~Norm(𝜇, 𝜎 2 )
Normal (Gaussian) distribution
𝑋~Norm(5.2,3.7)
Random vectors
• In practice we usually observe several RVs defined on a same set of
events Ω
• e.g. in machine learning we typically have very large number of RVs
• Ordered tuple of such RVs we call random vector
(𝑋1 , 𝑋2 , … , 𝑋𝑛 )
Joint Cumulative Distribution Function for 2D
RV
• A random vector (𝑋, 𝑌) is given
• Joint CDF is defined as a function of two variables:
𝐹𝑋,𝑌 𝑥, 𝑦 = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦), −∞ < 𝑥 < +∞
• CDFs of RVs 𝑋 and 𝑌 are called marginal CDFs and can be obtained in the
following way:
𝐹𝑋 𝑥 = 𝑃 𝑋 ≤ 𝑥 = 𝑃 𝑋 ≤ 𝑥, 𝑌 < +∞ = 𝐹𝑋,𝑌 (𝑥, +∞)
Joint PDF
• If there exists a function 𝑓(𝑥, 𝑦) such that
𝑥 𝑦
𝐹𝑋,𝑌 𝑥, 𝑦 = 𝑓𝑋,𝑌 𝑢, 𝑣 𝑑𝑣𝑑𝑢
−∞ −∞
then, 𝑓𝑋,𝑌 (𝑥, 𝑦) is called joint PDF of the random vector (𝑋, 𝑌)
• If this function is continuous it can be obtained as the derivative of CDF:
𝜕
𝑓𝑋,𝑌 𝑥, 𝑦 = 𝐹𝑋,𝑌 (𝑥, 𝑦)
𝜕𝑥𝜕𝑦
+∞
• Also, it directly follows that: 𝑓𝑋 𝑥 = −∞ 𝑓𝑋,𝑌 𝑥, 𝑦 𝑑𝑦
Example – discrete random vector
• Joint PMF of discrete random vector (𝑋, 𝑌) is given:
i/j 1 2 3
1 5/24 1/12 1/6
2 ? 7/24 0
• From the condition that the sum of probabilities of all the values should be
1
1, we get 𝑃 𝑋 = 2, 𝑌 = 1 =
4
• Marginal PMFs (summing up columns or rows):
11
• 𝑃 𝑌 = 1 = 𝑃 𝑋 = 2, 𝑌 = 1 + 𝑃 𝑋 = 1, 𝑌 = 1 =
24
3
• 𝑃 𝑌 = 2 = 𝑃 𝑋 = 2, 𝑌 = 2 + 𝑃 𝑋 = 1, 𝑌 = 2 =
8
•…
Example – 2D uniform distribution
• Given is a region 𝐷 ⊂ ℝ2 , with area 𝑆
1
• PDF: 𝑓𝑋,𝑌 𝑥, 𝑦 = , 𝑥, 𝑦 ∈ 𝐷 , 𝑓 𝑥, 𝑦 = 0 , 𝑥, 𝑦 ∉ 𝐷
𝑆
1 Area (𝐵)
• Then: 𝑃 𝑋, 𝑌 ∈ 𝐵 = 𝐵
𝑑𝑥𝑑𝑦 =
𝑆 𝑆
Independence of random variables
• One of the fundamental characteristics which describe relationship
between RVs is there (in)dependence
• If RVs are not independent, their relationship can be described in
more precise terms (we will see later)
• Definition of independence of RVs:
We say that RVs 𝑋1 , … , 𝑋𝑛 are independent if the events {𝑋1 ∈
Independence of random variables
• From the definition we directly obtain, for two independent RVs
(𝑋, 𝑌):
𝐹𝑋,𝑌 𝑥, y = P X ≤ 𝑥, 𝑌 ≤ 𝑦 = 𝑃 𝑋 ≤ 𝑥 𝑃 𝑌 ≤ 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌 (𝑦)
• Hence, two RVs are independent if and only if:
𝐹𝑋,𝑌 𝑥, y = 𝐹𝑋 𝑥 𝐹𝑌 (𝑦)
• If PDFs exist, then it holds that:
𝑓𝑋,𝑌 𝑥, y = 𝑓𝑋 𝑥 𝑓𝑌 (𝑦)
Example
• Let the joint PMF of a random vector 𝑋, 𝑌 be:
𝑓𝑋,𝑌 𝑥, 𝑦 = 6𝑒 −2𝑥−3𝑦 , 𝑥, 𝑦 ≥ 0,
𝑓𝑋,𝑌 𝑥, 𝑦 = 0, 𝑥, 𝑦 < 0
• Marginal PDFs are calculated by integration (marginalization):
+∞ +∞
𝑓𝑋 𝑥 = 𝑓𝑋,𝑌 𝑥, 𝑦 𝑑𝑦 = 6𝑒 −2𝑥 𝑒 −3𝑦 𝑑𝑦 = 2𝑒 −2𝑥 , 𝑥 ≥ 0
−∞ 0
𝑓𝑌 𝑦 = 3𝑒 −3𝑦 ,𝑦 ≥0
⇒ they are independent!
Functions of random variables
• Example, PMF is given by the table:
𝑋/𝑌 1 2 3
1 1/12 1/6 1/18
2 1/9 1/12 1/9
3 5/36 1/12 1/6
• Find PMF of RV 𝑈 = 𝑋 + 𝑌
• For each value of the pair (𝑋, 𝑌) we get a value for 𝑈
• We find PMF by assigning probabilities of each value of the pair (𝑋, 𝑌) to the probabilities of
corresponding values of 𝑈
• For more complicating distributions and functions there is a general procedure/formula for
finding the resulting distributions (PDF, CDF)