Random Variables (Discrete and Continuous)
Consider a random experiment of tossing three coins together. The sample space of this
experiment is,
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Suppose that the experimenter is interested in knowing total number of heads that turn up. It can
take values 0, 1, 2 or 3. Suppose X denotes the total number of heads. If the experiment is
performed repeatedly, then the value of X will vary from each trial. We therefore call X as a
variable. We always use capital letters to denote the random variables. Thus here X takes the
values 0, 1, 2, or 3. This is the sample space of the random variable. In mathematical terms we
say that X is a function, defined from S to R and write it as X: S --> R. Notice that
X(HHH) = 3 X(HHT) = 2 X(HTH) = 2 X(THH) = 2
X(HTT) = 1 X(THT) = 1 X(TTH) = 1 X(TTT) = 0
The random experiment had eight points. The sample space contains all these points. The random
variable X in this case has taken only four possible values (0, 1, 2, or 3). It is possible that you
may define another variable on the same sample space.
For example, define Y as 1, if you observe at least one head and at least one tail, otherwise define
Y as 0. Notice that
Y(HHH) = 0 Y(HHT) = 1 Y(HTH) = 1 Y(THH) = 1
Y(HTT) = 1 Y(THT) = 1 Y(TTH) = 1 Y(TTT) = 0
Here variable X ,and also Y takes isolated values. Total number of these values is countable.
Discrete Random Variable
A random variable is discrete, if it can assume either a finite or countably infinite number values.
Countable means that you can associate the values that the random variable can assume with the
integers 1, 2, 3 and so on. For example, if you count how many times a particular attribute occurs,
it’s possible values are usually integers such as 0, 1, 2 …; clearly the number of outcomes will
either be finite or countable. It however does not mean that a discrete random variable must take
integer values. It takes isolated values and these values are countable. If you find proportion of
female children in a family having four children, then the possible values it can assume are 0,
0.25, 0.50, 0.75 and 1.
Typical examples of a discrete random variable are:
Number of peas set in a pod.
Number of accidents on Mumbai- Agra national highway in a day
Number of misprints per page
Number of male children in a family having n children
Number of TV channels seen by the viewer in a day.
Number of candidates interviewed before the first candidate is selected.
Number of customers in a queue waiting for service during peak hours.
Number of errors reported while compiling a C-program.
Number of defective bolts in a sample of size 20 drawn from day’s production.
Number of goals scored in a foot ball match
Continuous Random Variable
A random variable is continuous if it can assume all values in an interval . A continuous random
variable differs from a discrete random variable . It takes on an uncountably infinite number of
possible outcomes.
Examples of a continuous random variable :
The height of an individual.
The length of life of a component.
The systolic and diastolic blood pressure of an adult individual.
The residual life of an individual.
The time required for roasting groundnuts.
The amount of sugar in an apple.
The waiting time for receiving service at a nationalized bank.
Probability Distribution
A probability distribution is a table or an equation that links each outcome of a statistical
experiment with its probability of occurrence.
When the value of a variable is the outcome of a statistical experiment, that variable is a random
variable.
Generally, statisticians use a capital letter to represent a random variable and a lower-case letter,
to represent one of its values. For example :
X represents the random variable X.
P(X) represents the probability of X.
P(X = x) refers to the probability that the random variable X is equal to a particular value,
denoted by x. As an example, P(X = 1) refers to the probability that the random variable X
is equal to 1.
Example :
Suppose you flip a coin two times. This statistical experiment can have four possible outcomes:
HH, HT, TH, and TT. Now, let the variable X represent the number of Heads that result from this
experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random
variable; because its value is determined by the outcome of a statistical experiment.
A probability distribution is a table or an equation that links each outcome of a statistical
experiment with its probability of occurrence. Consider the coin flip experiment described above.
The table below, which associates each outcome with its probability, is an example of a probability
distribution.
The above table represents the probability distribution of the random variable X.
Probability Mass Function
If X is a discrete random variable then its range R X is a countable set, so, we can list the elements
in RX. In other words, we can write
Note that here are possible values of the random variable X. While random
variables are usually denoted by capital letters, to represent the numbers in the range we usually
use lowercase letters such as etc. For a discrete random variable X, we are
interested in knowing the probabilities of . Note that here, the event
is defined as the set of outcomes s in the sample space S for which the corresponding value of X
is equal to . In particular,
The probabilities of events are formally shown by the probability mass function
(pmf) of X.
Thus, the PMF is a probability measure that gives us probabilities of the possible values for a
random variable. While the above notation is the standard notation for the PMF of X, it might look
confusing at first. The subscript X here indicates that this is the PMF of the random variable X.
Thus, for example, shows the probability that X=1.
Example :
I toss a fair coin twice, and let X be defined as the number of heads I observe. Find the range of
as well as its probability mass function .
Ref : (Link)
Cumulative Distribution Function (CDF) of Discrete Random Variable
Consider an Example of Tossing Three (3) coins, Sample Space :
X(HHH) = 3 X(HHT) = 2 X(HTH) = 2 X(THH) = 2
X(HTT) = 1 X(THT) = 1 X(TTH) = 1 X(TTT) = 0
Then,
F(0) = P(X≤ 0) = 1/8
F(1) = P(X≤ 1) = 3/8 (= P(0) + P(1))
F(2) = P(X≤ 2) = 7/8 (= P(0) + P(1) +P(2))
F(3) = P(X≤ 3) = 8/8 (= P(0) + P(1) + P(2))
In general we can define F(x) = P(X ≤ x), where x is any real number. This function F(x) is called
CDF (Cumulative distribution function) of random variable X. It can be represented by a graph. We
can obtain the probability mass function by differencing F(x). For example, p(2) = F(2) - F(1), p(3)
= F(3) - F(2), and so on. In general p(x) = F(x) – F(x-1).
Following is the CDF of a discrete random variable X,
Properties of CDF (discrete random variable)
F(x) is defined for every real number x.
0 ≤ F(x) ≤ 1, since F(x) is a probability.
F(x) is a non-decreasing step function of x. (looks like staircase).
If a and b are any two real numbers such that a ≤ b, then
o P(a < X ≤ b) = F(b) –F(a)
o P(a ≤ X ≤ b) = F(b) – F(a) + P(X = a)
o P(a ≤ X < b) = F(b) – F(a) – P(X = b) + P(X = a)
o P(X > a)= 1 – P( X ≤ a) = 1 – F(a)
CDF of a discrete random variable is usually a step function. This can be seen by plotting a
graph of F(x) Vs x. It is a theoretical counterpart of a “less than” cumulative frequency
curve.
Mean and Variance
Mean
Suppose X is a discrete random variable and p(x) is its probability mass function. Then expected
value or mean of X is defined as the weighted average of its possible values, weights being
respective probabilities of the values of X. It is denoted by or E(X). Thus,
mean of X =
If X takes finite values, then existence of E(X) is guaranteed. If X takes countably infinite values,
then existence of E(X) needs to be established.
Variance
The variance of random variable X is often written as Var(X) or
The square root of the variance is equal to the Standard Deviation.
Example
Cumulative Distribution Function (CDF) of Continuous Random Variable
Properties
F(x) = P(X ≤ x) is defined for every real number x.
0 ≤ F(x) ≤ 1, i.e. F(x) is bounded below and above. .
F(x) is a non-decreasing function of x, i.e. F(x 1 ) ≤ F(x 2 ), whenever x 1 ≤ x 2 .
(These are limit values)
If a and b are any two real numbers such that a ≤ b, then
o P(a < X ≤ b) = F(b) – F(a)
o P(a ≤ X ≤ b) = F(b) – F(a)
o P(a ≤ X < b) = F(b) – F(a)
o P(X > a)= 1 – P( X ≤ a) = 1 – F(a)
CDF of a continuous random variable is a continuous function e.g., study the graph of cdf
Vs x shown in the following figure.
Mean
The expected value is a measure of location or central tendency.
Let X be a continuous random variable with range [ a,b ] and probability density function f(x).
The expected value or mean of X is defined by,
Variance
Hence,