Module 1: Random Variables and Processes
A signal is said to be “random” if it is not possible to predict its precise value in
advance.
Consider, for example, a radio communication system.
The received signal in such a system usually consists of an information-bearing
signal component, a random interference component, and receiver noise. The
information-bearing signal component may represent, for example, a voice signal
that, typically, consists of randomly spaced bursts of energy of random duration.
The interference component may represent spurious electromagnetic waves
produced by other communication systems operating in the vicinity of the radio
receiver. A major source of receiver noise is thermal noise, which is caused by
the random motion of electrons in conductors and devices at the front end of the
receiver. We thus find that the received signal is completely random in nature.
Although it is not possible to predict the precise value of a random signal in
advance, it may be described in terms of its statistical properties such as the
average power in the random signal, or the average spectral distribution of this
power.
The mathematical discipline that deals with the statistical characterization of
random signals is probability theory.
Probability
Definition of some important terms:
1)Random Experiment
An experiment is said to be random if its outcome cannot be predicted precisely.
For example, the experiment may be the observation of the result of tossing a fair
coin. In this experiment, the possible outcomes of a trial are “heads” or “tails.”
2) Sample space
The set of all possible outcomes of the experiment is called the sample space,
which we denote by S.
e.g.: tossing of a coin: S= {H, T}
rolling a die: S= {1,2,3,4,5,6}
3) Sample point
Each of the individual outcomes in a sample space is called sample point.
A single sample point is called elementary event.
4) Event
It corresponds to either a single sample point or set of sample points in a sample
space, S
e.g. Rolling a die
event A: Number 6 turns up S= {6}
event B: even number of dots S= {2,4,6}
Event A is a elementary event
• The entire sample space S is called the sure event, and the null set φ is
called the null or impossible event.
• Two events are mutually exclusive if the occurrence of one event precludes
the occurrence of the other event.
• The sample space S may be discrete with a countable number of outcomes,
such as the outcomes when tossing a die. Alternatively, the sample space
may be continuous, such as the voltage measured at the output of a noise
source.
6) Probability
There are two approaches to the definition of probability.
• The first approach is based on the relative frequency of occurrence:
In n trials of a random experiment, if we expect an event A to occur m times, then,
𝑚
we assign the probability to the event A.
𝑛
• However, there are situations where experiments are not repeatable, then
we use general definition of probability
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑡𝑜 𝑒𝑣𝑒𝑛𝑡 𝐴
Probability of event A = P(A) =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑒𝑣𝑒𝑛𝑡 𝐴
6)A probability measure
A probability measure P is a function that assigns a non-negative number to an
event A in the sample space S and satisfies the following three Axioms
1. 0 ≤ P[A] ≤ 1
2. P[S] = 1
3. If A and B are two mutually exclusive events, then
P [A U B] = P[A] + P[B]
Figure 1: Venn diagram presenting a geometric interpretation of the three axioms
of probability
This abstract definition of a probability system is illustrated in Figure 2 below
Figure 2: Illustration of the relationship between sample space, events, and
probability.
• The sample space S is mapped to events via the random experiment.
• The events may be elementary outcomes of the sample space or larger
subsets of the sample space.
• The probability function assigns a value between 0 and 1 to each of these
events.
• The probability value is not unique to the event; mutually exclusive events
may be assigned the same probability. However, the probability of the
union of all event that is, the sure event is always unity.
The following properties of probability measure P may be derived from the above
axioms:
1. P [ 𝐴 ] = 1 - P[A]
where 𝐴 is the complement of the event A.
Proof: Sample space: 𝑆 =𝐴+𝐴
𝑃(𝑆) = 𝑃(𝐴) + 𝑃(𝐴)
1 = 𝑃(𝐴) + 𝑃(𝐴)
P(A)=1 - P(𝐴)
2. When events A and B are not mutually exclusive, then the probability of the
union event “A or B” satisfies
P [A U B] = P[A] + P[B] – P [A ∩ B] where P[A ∩ B] is the probability of the
joint event “ A and B.”
3. If A1, A2, ……..., Am are mutually exclusive events that include all possible
outcomes of the random experiment, then
P[A1] + P[A2] + ... + P[Am] = 1
Conditional Probability
Suppose we perform an experiment that involves a pair of events A and B.
Let P[B|A] denote the probability of event B, given that event A has occurred.
The probability P[B|A] is called the conditional probability of B given A.
Assuming that A has nonzero probability, the conditional probability P[B|A] is
defined by
P[A ∩ B]
P[B|A] = --------------------(1)
𝑃
where P[A|B] is the joint probability of A and B.
P[A ∩ B] = P[B|A]P[A] -------------(2)
we may also write
P[A ∩ B] = P[A|B]P[B] ------------(3)
From eqns (2) and (3)
P[B|A]P[A]= P[A|B]P[B]
P[A|B]P[B]
P[B|A] = 𝑃[𝐴] ≠ 0 -------------(4)
P[A]
This relation eqn(4) is a special form of Bayes’ rule.
If the events A and B are statistically independent
P[B|A] = P[B] and P[A|B]=P[A]
Eqn(2) becomes P[A ∩ B] = P[B]P[A]
Example: Binary Symmetric Channel
Consider a discrete memoryless channel used to transmit binary data.
The channel is said to be discrete in that it is designed to handle discrete messages.
It is memoryless in the sense that the channel output at any time depends only on
the channel input at that time.
Owing to the unavoidable presence of noise in the channel, errors are made in the
received binary data stream.
Specifically, when symbol 1 is sent, occasionally an error is made and symbol 0
is received and vice versa.
The channel is assumed to be symmetric, which means that the probability of
receiving symbol 1 when symbol 0 is sent is the same as the probability of
receiving symbol 0 when symbol 1is sent.
To describe the probabilistic nature of this channel fully, we need two sets of
probabilities.
1. The a priori probabilities of sending binary symbols 0 and 1 are
P[A0] =p0 and P[A1] =p1
where A0 and A1 denote the events of transmitting symbols 0 and 1,
respectively.
Note that p0 + p1 = 1
2. The conditional probabilities of error: they are P[Bl|A0] = P[B0|A1] = p
where B0 and B1 denote the events of receiving symbols 0 and 1,
respectively. The conditional probability P[B1|A0 ] is the probability of
receiving symbol 1, given that symbol 0 is sent. The second conditional
probability P[B0|A1] is the probability of receiving symbol 0, given that
symbol 1 is sent.
3. The requirement is to determine the a posteriori probabilities
P[A0|B0] and P[A1|Z1].
The conditional probability P[A0|B0] is the probability that symbol 0 was
sent, given that symbol 0 is received. The second conditional probability
P[A1|B1]is the probability that symbol 1 was sent, given that symbol 1 is
received.
Both these conditional probabilities refer to events that are observed “after
the fact”, hence, the name “a posteriori” probabilities.
Since the events B0 and B1 are mutually exclusive, we have from axiom (3)
P[B0|A0] + P[B1 |A0] = 1
That is to say, P[B0|A0] = 1 - p
Similarly, we may write P[B1|A1] = l – p
Accordingly, we may use the transition probability diagram shown in
Figure 3 to represent the binary communication channel specified in this
example; the term “transition probability” refers to the conditional
probability of error.
Figure 3: Transition probability diagram of binary symmetric channel.
From the figure the following results are deduced
Total Probability theorem
Random Variables:
The outcomes of the random variable are non-numerical, but for mathematical
analysis it is convenient if the outcomes are numerical in nature.
A random variable is used to assign a number to the outcome of a random
experiment.
The benefit of using random variables is that probability analysis can now be
developed in terms of real-valued quantities regardless of the form or shape of
the underlying events of the random experiment.
Random variables may be discrete and take only a finite number of values, such
as in the coin-tossing experiment. Alternatively, random variables may be
continuous and take a range of real values.
Statistical Averages of Random Variable
1. Expected Value or Mean value of a random variable
2. Variance
3. Standard Deviation
4. Correlation
5. Covariance
Expected Value or Mean value of a random variable
The expected value or mean of a random variable X is defined by
where E denotes the statistical expectation operator.
That is, the mean µx locates the center of gravity of the area under the probability density
curve of the random variable X.
FUNCTION OF A RANDOM VARIABLE
Let X denote a random variable, and let g(X) denote a real-valued function defined on the
real line. The quantity obtained by letting the argument of the function g( X ) be a random
variable is also a random variable which we denote as
Y = g(X)
The expected value of the random variable Y
EXAMPLE: Cosinusoidal Random Variable
MOMENTS:
For the special case of g(X) = Xn,
we obtain the nth moment of the probability distribution of the random variable X; that is,
The most important moments of X are the first two moments
Thus putting n = 1 gives the mean of the random variable
∞
𝐸[𝑋] = ∫ 𝑥𝑓𝑥 (𝑥)𝑑𝑥
−∞
Where as putting n = 2 gives the mean-square value of X
Central Moments:
the moments of the difference between a random variable X and its mean µx. Thus, the nth
central moment is
n=1 , the central moment is zero
𝐸[(𝑋 − µ𝑋 )] = 𝐸[𝑋] − 𝐸[µ𝑋 ] = µ𝑋 − µ𝑋 = 0
n=2 , second central moment is referred to as the variance of the random variable X, given by
The variance of a random variable X is commonly denoted as σx2.
The square root of the variance, namely, σx is called the standard deviation of the random
variable X.
The variance σx2 of a random variable X is a measure of the variable’s “ randomness.” By
specifying the variance σx2, we essentially constrain the effective width of the probability
density function fx( x) of the random variable X about the mean
𝜎𝑋2 = 𝐸[(𝑋 − µ𝑋 )2 ] = 𝐸[𝑋 2 − 2µ𝑋 𝑋 + µ2𝑋 ]
= 𝐸[𝑋 2 ] − 2µ𝑋 𝐸[𝑋] + µ2𝑋
= 𝐸[𝑋 2 ] − 2µ2𝑋 + µ2𝑋
= 𝐸[𝑋 2 ] − µ2𝑋
If µX =0 , 𝜎𝑋2 = 𝐸[𝑋 2 ]
If mean is zero then variance is equal to mean square value of X.
Properties of Expectation operator
1. Linearity :
Let X,Y and Z are the random variables, Z=X+Y
E[Z]=E[X+Y]
=E[X]+E[Y]
The expectation of the sum of the two random variables is equal to the sum
of individual expectations.
2. Let X and Y be the two independent random variables, then
E[XY]= E[X] E[Y]
3. E[constant] = constant
Consider next a pair of random variables X and Y. A set of statistical averages of importance
in this case are the joint moments, namely, the expected value of XiYk, where i and k may
assume any positive integer values. We may thus write
A joint moment of particular importance is the correlation defined by E[XY], which
corresponds to i = k = 1 in the above equation.
The correlation of the centered random variables X -E[X| and Y -E[Y], that is, the joint moment
is called the covariance of X and Y
cov[XY]=E[(X-E[X])(Y-E[Y])]=E[XY – Y E[X] - XE[Y] – E[X]E[Y]]
cov[XY]=E[XY-YµX-XµY +µXµY]
cov[XY]=E[XY]- µXE[Y]- µYE[X] + µXµY
cov[XY]=E[XY]- µXµY - µXµY + µXµY
cov[XY]=E[XY]- µXµY
Let 𝜎𝑋2 𝑎𝑛𝑑 𝜎𝑌2 denote the variances X and Y respectively, Then the covariance of X and Y,
normalized with respect to σxσY is called the correlation coefficient of X and Y
We say that the two random variables X and Y are uncorrelated if and only if their covariance
is zero, that is, if and only if cov[XY]=0.
We say that they are orthogonal if and only if their correlation is zero, that is, if and only if
E[XY] = 0
we observe that if one of the random variables X and Y or both have zero means, and if they
are orthogonal, then they are uncorrelated, and vice versa. Note also that if X and Y are
statistically independent, then they are uncorrelated; however, the converse of this statement is
not necessarily true.
Moments of a Bernouli Random Variable
CHARACTERISTIC FUNCTION
the characteristic function is (except for a sign change in the exponent) the
Fourier transform of the probability density function fx(x). In this relation we have
used exp(jvx) rather than exp(-jvx), so as to conform with the convention adopted
in probability theory. Recognizing that v and x play analogous roles to the
variables 2πf and t of Fourier transforms, respectively, we deduce the following
inverse relation, analogous with the inverse Fourier transform:
EXAMPLE : Gaussian Random Variable
The probability density function of a gaussian random variable is
The characteristic function is given by
Differentiating both sides of the above equation with respect to v a total of n
times, and then setting v = 0, we get the result
𝑑𝑛 𝑑𝑛 ∞
𝜑 (𝑣)|𝑣=0 =
𝑛 𝑋
[ ∫−∞ 𝑓𝑋 (𝑥) exp(𝑗𝑣𝑥) 𝑑𝑥]
𝑑𝑣 𝑑𝑣 𝑛
∞ 𝑑𝑛
=∫−∞ 𝑓𝑋 (𝑥) (exp(𝑗𝑣𝑥)) 𝑑𝑥]
𝑑𝑣 𝑛
∞
= ∫ 𝑓𝑋 (𝑥)(𝑗𝑥)𝑛 𝑑𝑥
−∞
∞
= 𝑗 ∫ 𝑥 𝑛 𝑓𝑋 (𝑥)𝑑𝑥
𝑛
−∞
𝑑𝑛
𝜑𝑋 (𝑣)|𝑣=0 = 𝑗 𝑛 𝐸[𝑋 𝑛 ]
𝑑𝑣 𝑛
The nth moment of a Random Variable is
𝑛]
𝑑𝑛
𝐸[𝑋 = (−𝑗)𝑛 𝜑 (𝑣)|𝑣=0
𝑑𝑣 𝑛 𝑋
Random Processes
In the statistical analysis of communication systems , the characterization of
random signals such as voice signals, television signals, computer data, and
electrical noise is of major concern.
These random signals have two properties.
First, the signals are functions of time, defined on some observation interval.
Second, the signals are random in the sense that before conducting an experiment,
it is not possible to describe exactly the waveforms that will be observed.
Accordingly, in describing random signals, we find that each sample point in our
sample space is a function of time. The sample space or ensemble comprised of
functions of time is called a random or stochastic process
Consider then a random experiment specified by the outcomes s from some
sample space S ,by the events defined on the sample space Sand by the
probabilities of these events. Suppose that we assign to each sample point s a
function of time in accordance with the rule:
where 2T is the total observation interval
For a fixed sample point sj, the graph of the function X(t,sj) versus time t is called
a realization or sample function of the random process. To simplify the notation,
we denote this sample function as
Figure illustrates a set of sample functions { xj( t) | j= 1, 2, ..., n}. From this
figure, for a fixed time tk inside the observation interval, the set of numbers
constitutes a random variable.
X{t,s} or X(t) is called random process.
For a random process, the outcome of a random experiment is mapped into a
waveform that is a function of time
For a random variable, the outcome of a random experiment is mapped into a
number.
Mean, Correlation and Covariance Functions:
The mean of the process X(t) as the expectation of the random variable obtained by observing
the process at some time t, as shown by
If the random process X(t) is stationary to second order if its distribution function
depends only on the time difference.
This, in turn, implies that the autocorrelation function of a stationary (to second order) random
process depends only on the time difference t2 – t1 as shown by
𝐶𝑋 (𝑡1 , 𝑡2 ) = 𝐸[𝑋(𝑡1 )𝑋(𝑡2 ) − µ𝑋 𝐸[𝑋(𝑡2 )] − µ𝑋 𝐸[𝑋(𝑡1 )] + µ2𝑋
= 𝐸[𝑋(𝑡1 )𝑋(𝑡2 )] − µ2𝑋 − µ2𝑋 + µ2𝑋
= 𝑅𝑋 (𝑡1 , 𝑡2 ) − µ2𝑋
=𝑅𝑋 (𝑡2 − 𝑡1 ) − µ2𝑋
GAUSSIAN PROCESS
Let us suppose that we observe a random process X(t) for an interval that starts at time t = 0 and
lasts until t = T. Suppose also that we weight the random process X(t ) by some function g(t ) and
then integrate the product g( t )X(t) over this observation interval, there by obtaining a random
variable Y defined by
Y is referred as a linear functional of X(t).
The weighting function g(t) is such that the mean-square value of the random variable Y is finite,
and if the random variable Y is a Gaussian-distributed random variable for every g(t) in this class
of functions, then the process X(t) is said to be a Gaussian process. In other words, the process
X(t) is a Gaussian process if every linear functional of X(t) is a Gaussian random variable.
The random variable Y has a Gaussian distribution if its probability density function has the
form
when mean value µY = 0 and variance σY =1 Then the random variable is said to be
normalized.
Such a normalized Gaussian distribution is commonly written as N(0,1).
A plot of this probability density function is given in
CENTRAL LIMIT THEOREM
The central limit theorem provides the mathematic l justification for using a Gaussian process
as a model for a large number of different physical phenomena in which the observed random
variable, at a particular instant of time, is the result of a large number of individual random
events. To formulate this important theorem,
Let Xi i= 1,2,3,……….N , be a set of random variables that satisfies the following
requirements:
PROPERTIES OF A GAUSSIAN PROCESS
PROPERTY 1
If a Gaussian process X(t) is applied to a stable linear filter, then the random process Y(t)
developed at the output of the filter is also Gaussian.
Consider linear time-invariant filter of impulse response h{t), with the random process X(t) as
input and the random process T(f) as output. We assume that X(t) is a Gaussian process. The
random processes Y(t) and X(t) are related by the convolution integral.
Since X(t) is a Gaussian process by hypothesis, it follows that Z must be a Gaussian random
variable. We have thus shown that if the input X(t) to a linear filter is a Gaussian process, then
the output Y(t) is also a Gaussian process.