Lecture 1-1 - Review of Probability
Lecture 1-1 - Review of Probability
Xinwei Ma
Department of Economics
UC San Diego
Spring 2021
Probability theory starts with random variables, which provide numerical summary of
random outcomes.
• EXAMPLE. Gambling outcome, crop yield, GDP next year, test score, etc.
• We use uppercase letters, such as X , Y and Z to denote random variables, and lowercase letters,
x, y and z for their realizations.
FX (x) = P[X ≤ x]
• We generally use the probability mass function (PMF) to characterize its distribution
1.0
0.8
0.6
0.4
0.2
0.0
fX (x) = P[X = x]
−2 −1 0 1 2 3 4
EXAMPLE.
x -1 2 3
fX (x) 0.3 0.5 0.2
x · fX (x) -0.3 1.0 0.6
x -1 2 3
fX (x) 0.3 0.5 0.2
x − E[X ] -2.3 0.7 1.7
(x − E[X ])2 5.29 0.49 2.89
(x − E[X ])2 fX (x) 1.587 0.245 0.578
Bernoulli distribution
• Mean
E[X ] = p
• Variance
V[X ] = p(1 − p)
Binomial distribution
• Mean
E[X ] = np
• Variance
V[X ] = np(1 − p)
EXAMPLE. Let Y denote the number of “heads” that occur when three fair coins are
tossed.
Solution. The random variable, Y , follows the distribution: Binomial(3, 21 ). Then we can
apply the formula to find its distribution (PMF). For example, the probability of
observing two “heads” is
3 1 2 1 3−2
3
P[Y = 2] = 1− = .
2 2 2 8
The mean and variance can also be found by employing the corresponding formulas:
1 3
E[Y ] = 3 · =
2 2
1 1 3
V[Y ] = 3 · cot 1 − = .
2 2 4
√
3
p
As a by-product, the standard deviation is V[Y ] = 2
= 0.866.
• We generally use the probability density function (PDF) to characterize its distribution
δ δ
fX (x)δ ≈ P x − ≤ X ≤ x + .
2 2
More precisely, for an interval [a, b],
Z b
P[a ≤ X ≤ b] = fX (x)dx.
a
0.5
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2 4
• The PDF does not represent probabilities. The integral of the PDF does.
c Xinwei Ma 2021 8/30
Random Variables, Expectation, and Variance
EXAMPLE.
Uniform distribution
• Mean
a+b
E[X ] =
2
• Variance
(b − a)2
V[X ] =
12
The normal distribution has a unique role in probability and statistics, due to the central
limit theorem (to be discussed later).
A normal distribution has two parameters: the mean µ, and the variance σ 2
• We reserve the two notation, φ(x) and Φ(x), for the PDF and the CDF of the standard normal
distribution.
In practice, we use software (such as Stata, R, Matlab) or tabulated values to find values
of φµ,σ2 (x) and Φµ,σ2 (x).
c Xinwei Ma 2021 12/30
Random Variables, Expectation, and Variance
0.5
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2 4
EXAMPLE. Let X follow the standard normal distribution. What is the probability
Table.P[X
Cumulative distribution function of the standard normal distribution P[X x] =
≤ x] = Φ(x)? (x)
Note. This table can be used to calculate P[X x] where X is a standard normal random variable. For example
x = Letthis
1.17, x be, say, −2.36.
probability Firstwhich
is 0.8790, go to the table
is the row labeled
entry for“−2.3,” and then
the row labeled go to
1.1 and thethe column
column labeled 7.
labeled “6.” The probability is P[X ≤ −2.36] = Φ(−2.36) = 0.0091.
c Xinwei Ma 2021
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 14/30
Random Variables, Expectation, and Variance
EXAMPLE. Let X follow the standard normal distribution. What is the probability
P[X ≤ x] = Φ(x)?
Let x be, say, 0.38. First go to the row labeled “0.3,” and then go to the column labeled
“8.” The probability is P[X ≤ 0.38] = Φ(0.38) = 0.6480.
Solution. First of all, we cannot use the tabulated values directly, as they are for the
standard normal distribution. To proceed, consider
Y −1 3−1
P [Y ≤ 3] = P [Y − 1 ≤ 3 − 1] = P √ ≤ √ ,
4 4
where we subtract and
√divide by the same quantities on the two sides of the inequality.
Define X = (Y − 1)/ 4, then X follows the standard normal distribution. That is,
P [Y ≤ 3] = P [X ≤ 1] = Φ(1).
Finally, we can use the table to find the above probability, which is 0.8413.
(See the section, “Some Important Properties,” if you can unfamiliar with the above
reasoning.)
With two or more random variables, we rely on their joint distribution to characterize
their properties.
• If both X and Y are discrete, the joint probability mass function (PMF) is
fX ,Y (x, y ) = P[X = x, Y = y ].
• If both X and Y are continuous, the joint probability density function (PDF) is
2 δ δ δ δ
fX ,Y (x, y )δ ≈ P x − ≤ X ≤ x + , y − ≤ Y ≤ y + .
2 2 2 2
More precisely, Z d Z b
P[a ≤ X ≤ b, c ≤ Y ≤ d] = fX ,Y (x, y )dxdy .
c a
• EXAMPLE. Assume X follows the standard normal distribution. Clearly X and X 2 are related,
but Cov[X , X 2 ] = 0.
Cov[X , Y ]
Corr[X , Y ] = p p .
V[X ] V[Y ]
• Correlation is unit-free (i.e., it does not depend on how X and Y are measured).
• EXAMPLE. Changing the unit of X from meters to centimeters will make the covariance 100
times larger, but leaves the correlation unchanged.
10 15 20 25 30
Student-Teacher Ratio
Positive, negative, and zero correlation: larger values of X are associated with · · · values
of Y .
Given the joint distribution of two random variables, we can also find the conditional
distribution of one variable after fixing the value of the other.
P[X = x, Y = y ]
fX |Y =y (x|y ) = P[X = x|Y = y ] = .
P[Y = y ]
We can also compute the conditional expectation and the conditional variance as
X X
E[Y |X = x] = y · P[Y = y |X = x] = y · fY |X =x (y |x),
y y
and
X X
V[Y |X = x] = (y −E[Y |X = x])2 ·P[Y = y |X = x] = (y −E[Y |X = x])2 ·fY |X =x (y |x).
y y
X =0 X =1 X =3 Total
Y =1 0.16 0.06 0.22 0.44
Y =2 0.13 0.01 0.14 0.28
Y =5 0.06 0.19 0.03 0.28
Total 0.35 0.26 0.39 1.00
X =0 X =1 X =3 Total
Y =1 0.16 0.06 0.22 0.44
Y =2 0.13 0.01 0.14 0.28
Y =5 0.06 0.19 0.03 0.28
Total 0.35 0.26 0.39 1.00
We usually treat conditional expectations as random variables. The reason is that the
value of the conditional expectation, E[Y |X ], depends on the value of X , where the latter
is random.
X =0 X =1 X =3 Total
Y =1 0.16 0.06 0.22 0.44
Y =2 0.13 0.01 0.14 0.28
Y =5 0.06 0.19 0.03 0.28
Total 0.35 0.26 0.39 1.00
We already computed the values
fX ,Y (x, y )
fX |Y =y (x|y ) = .
fY (y )
We can also compute the conditional expectation and the conditional variance as
Z
E[Y |X = x] = yfY |X =x (y |x)dy .
and Z
V[Y |X = x] = (y − E[Y |X = x])2 fY |X =x (y |x)dy .
• Expectation is linear
E[aX + bY ] = a · E[X ] + b · E[Y ].
• Important application: variance of the sum of n identically and independently distributed (iid)
variables " n #
X
X1 , X2 , · · · , Xn iid ⇒ V Xi = n · V[X ].
i=1
Assume we have three random variables, X , Y and Z . The law of iterated expectation
says
E E[X |Y , Z ] Z = E[X |Z ]
• Intuition. The inner expectation, E[X |Y , Z ], calculates the average value of X for each slice of
(Y , Z ). The outer expectation further averages across different values of Y .
If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.
c Xinwei Ma 2021
x1ma@ucsd.edu