Short Guides to Microeconometrics Kurt Schmidheiny
Fall 2021 Unversität Basel
Elements of Probability Theory
Contents
1 Random Variables and Distributions 2
1.1 Univariate Random Variables and Distributions . . . . . . 2
1.2 Bivariate Random Variables and Distributions . . . . . . 3
2 Moments 5
2.1 Expected Value or Mean . . . . . . . . . . . . . . . . . . . 5
2.2 Variance and Standard Deviation . . . . . . . . . . . . . . 6
2.3 Higher order Moments . . . . . . . . . . . . . . . . . . . . 7
2.4 Covariance and Correlation . . . . . . . . . . . . . . . . . 7
2.5 Conditional Expectation and Variance . . . . . . . . . . . 8
3 Random Vectors and Random Matrices 9
4 Important Distributions 10
4.1 Univariate Normal Distribution . . . . . . . . . . . . . . . 10
4.2 Bivariate Normal Distribution . . . . . . . . . . . . . . . . 10
4.3 Multivariate Normal Distribution . . . . . . . . . . . . . . 11
Version: 21-9-2021, 21:28
Elements of Probability Theory 2
1 Random Variables and Distributions
A random variable is a variable whose values are determined by a prob-
ability distribution. This is a casual way of defining random variables
which is sufficient for our level of analysis. For more advanced probability
theory, a random variable will be defined as a real-valued function over
some probability space.
In section 1 to 3, a random variable is denoted by capital letters, e.g.
X, whereas its realizations are denoted by small letters, e.g. x.
1.1 Univariate Random Variables and Distributions
A univariate discrete random variable is a variable that takes a countable
number K of real numbers with certain probabilities. The probability
that the random variable X takes the value xk among the K possible
realizations is given by the probability distribution
P (X = xk ) = P (xk ) = pk
with k = 1, 2, ..., K. K may be ∞ in some cases. This can also be written
as
p1 if X = x1
p2
if X = x2
P (xk ) = ..
.
pK if X = xK
Note that
K
X
pk = 1.
k=1
A univariate continuous random variable is a variable that takes a
continuum of values in the real line. The distribution of a continuous ran-
dom variable X can be characterized by a density function or probability
3 Short Guides to Microeconometrics
density function (pdf ) f (x). The nonnegative function f (x) is such that
Z x2
P (x1 ≤ X ≤ x2 ) = f (x)dx.
x1
defines the probability that X takes a value in the interval [x1 , x2 ]. Note
that there is no chance that X takes exactly the value x, P (X = x) = 0.
The probability that X takes any value on the real line is
Z ∞
f (x)dx = 1.
−∞
The distribution of a univariate random variable X is alternatively
described by the cumulative distribution function (cdf )
F (x) = P (X < x).
The cdf of a discrete random variable X is
X X
F (x) = P (X = xk ) = pk .
xk ≤x xk ≤x
and of a continuous random variable X
Z x
F (x) = f (t)dt
−∞
F (x) has the following properties:
• F (x) is monotonically nondecreasing
• F (−∞) = 0 and F (∞) = 1.
• F (x) is continuous to the left
1.2 Bivariate Random Variables and Distributions
A bivariate continuous random variable is a variable that takes a contin-
uum of values in the plane. The distribution of a bivariate continuous
random variable (X, Y ) can be characterized by a joint density function
Elements of Probability Theory 4
or joint probability density function, f (x, y). The nonnegative function
f (x, y) is such that
Z x2 Z y2
P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = f (x, y)dydx
x1 y1
defines the probability that X and Y take values in the interval [x1 , x2 ]
and [y1 , y2 ], respectively. Note that
Z ∞Z ∞
f (x, y)dydx = 1.
−∞ −∞
The marginal density function or marginal probability density function
is given by Z ∞
f (x) = f (x, y)dy
−∞
such that
Z x2
P (x1 ≤ X ≤ x2 ) = P (x1 ≤ X ≤ x2 , −∞ ≤ Y ≤ ∞) = f (x)dx.
x1
The conditional density function or conditional probability density func-
tion with respect to the event {Y = y} is given by
f (x, y)
f (y|x) =
f (x)
provided that f (x) > 0. Note that
Z ∞
f (y|x)dy = 1.
−∞
Two random variables X and Y are called independent, if and only if
f (x, y) = f (x) · f (y)
If X and Y are independent, then:
• f (y|x) = f (y)
• P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) = P (x1 ≤ X ≤ x2 ) · P (y1 ≤ Y ≤ y2 )
5 Short Guides to Microeconometrics
More generally, if a finite set of n continuous random variables X1 , X2 , X3 , ..., Xn
are mutually independent, then
f (x1 , x2 , x3 , ..., xn ) = f (x1 ) · f (x2 ) · f (x3 ) · ... · f (xn ).
2 Moments
2.1 Expected Value or Mean
The expected value or mean of a discrete random variable with probability
distribution P (xk ) and k = 1, 2, ..., K is defined as
K
X
E[X] = xk P (xk )
k=1
if the series converges absolutely.
The expected value or mean of a continuous univariate random variable
with density function f (x) is defined as
Z ∞
E[X] = xf (x)dx
−∞
if the integral exists.
For a random variable Z which is a continuous function φ of a discrete
random variable X, we have:
K
X
E[Z] = E[φ(X)] = φ(xk )P (xk )
k=1
For a random variable Z which is a continuous function φ of the con-
tinuous random variables X and Y , we have:
Z ∞
E[Z] = E[φ(X)] = φ(x)f (x)dx
−∞
Z ∞Z ∞
E[Z] = E[φ(X, Y )] = φ(x, y)f (x, y)dx dy
−∞ −∞
Elements of Probability Theory 6
The following rules hold in general, i.e. for discrete, continuous and
mixed types of random variables:
• E[α] = α
• E[αX + βY ] = αE[X] + βE[Y ]
Pn Pn
• E [ i=1 Xi ] = i=1 E[Xi ]
• E[X Y ] = E[X] E[Y ] if X and Y are independent
where α ∈ R and β ∈ R are constants.
2.2 Variance and Standard Deviation
The variance of a univariate random variable X is defined as
V[X] = E (X − E[X])2 = E[X 2 ] − (E[X])2
The variance has the following properties:
• V[X] ≥ 0
• V[X] = 0 if and only if X = E[X]
The following rules hold in general, i.e. for discrete, continuous and mixed
types of random variables:
• V[αX + β] = α2 V[X]
• V[X + Y ] = V[X] + V[Y ] + 2Cov[X, Y ]
• V[X − Y ] = V[X] + V[Y ] − 2Cov[X, Y ]
Pn Pn
• V i=1 Xi = i=1 V[Xi ] if Xi and Xj independent for all i 6= j
where α ∈ R and β ∈ R are constants.
Instead of the variance, one often considers the standard deviation
√
σX = VX.
7 Short Guides to Microeconometrics
2.3 Higher order Moments
The j-th moment around zero is defined as
E (X − E[X])j .
2.4 Covariance and Correlation
The Covariance between two random variables X and Y is defined as:
Cov[X, Y ] = E (X − E[X])(Y − E[Y ])
= E[XY ] − E[X]E[Y ]
= E (X − E[X])Y = E X(Y − E[Y ])
The following rules hold in general, i.e. for discrete, continuous and
mixed types of random variables:
• Cov[αX + γ, βY + µ] = αβCov[X, Y ]
• Cov[X1 + X2 , Y1 + Y2 ]
= Cov[X1 , Y1 ] + Cov[X1 , Y2 ] + Cov[X2 , Y1 ] + Cov[X2 , Y2 ]
• Cov[X, Y ] = 0 if X and Y are independent
where α ∈ R, β ∈ R, γ ∈ R and µ ∈ R are constants.
The correlation coefficient between two random variables X and Y is
defined as:
Cov[X, Y ]
ρX,Y =
σX σY
where σX and σY denote the corresponding standard deviations. The
correlation coefficient has the following property:
• −1 ≤ ρX,Y ≤ 1
The following rule holds:
• ραX+γ,βY +µ = ρX,Y
• ρX,Y = 0 if X and Y are independent
where α ∈ R and β ∈ R are constants.
Elements of Probability Theory 8
We say that
• X and Y are uncorrelated if ρ = 0
• X and Y are positively correlated if ρ > 0
• X and Y are negatively correlated if ρ < 0
2.5 Conditional Expectation and Variance
Let (X, Y ) be a bivariate discrete random variable and P (yk |X) the con-
ditional probability of Y = yk given X. Then the conditional expected
value or conditional mean of Y given X is
K
X
E[Y |X] = EY |X [Y ] = yk P (yk |X).
k=1
Let (X, Y ) be a bivariate continuous random variable and f (y|x) the
conditional density of Y given X. Then the conditional expected value or
conditional mean of Y given X is
Z ∞
E[Y |X] = EY |X [Y ] = yf (y|X)dy.
−∞
The law of iterated means or law of iterated expecations holds in gen-
eral, i.e. for discrete, continuous or mixed random variables:
EX E[Y |X] = E[Y ].
The conditional variance of Y given X is given by
2
V[Y |X] = E (Y − E[Y |X])2 X = E Y 2 X − (E[Y |X]) .
The law of total variance is
V[Y ] = EX V [Y |X] + VX E[Y |X] .
9 Short Guides to Microeconometrics
3 Random Vectors and Random Matrices
In this section we denote matrices (random or non-random) by bold cap-
ital letters, e.g. X and vectors by small letters, e.g. x.
Let x = (x1 , . . . , xn )0 be a (n × 1)-dimensional vector such that each
element xi is a random variable. Let X be a (n × k)-dimensional matrix
such that each element xij is a random variable. Let a = (a1 , . . . , an )0
be a n × 1-dimensional vector of constants and A a (m × n) matrix of
constants.
The expectation of a random vector, E[x), and of a random matrix,
E[X), summarize the expected values of its elements, respectively:
E[x1 ] E[x11 ] E[x12 ] . . . E[x1k ]
E[x2 ] E[x21 ] E[x22 ] . . . E[x2k ]
E[x] = . and E[X] = .
.. .. .. .
.. .. . . .
E[xn ] E[xn1 ] E[xn2 ] . . . E[xnk ]
The following rules hold:
• E[a0 x] = a0 E[x]
• E[Ax] = AE[x]
• E[AX] = AE[X]
• E[tr(X)] = tr(E[X]) for X a quadratic matrix
The variance-Covariance matrix of a random vector, V(x), summarizes
all variances and Covariances of its elements:
0 0
V[x] = E (x − E[x]) (x − E[x]) = E[xx0 ] − (E[x])([E[x])
V[x1 ] Cov[x1 , x2 ] . . . Cov[x1 , xn ]
Cov[x2 , x1 ] V[x2 ] . . . Cov[x2 , xn ]
= .. .. .. .. .
.
. . .
Cov[xn , x1 ] Cov[xn , x2 ] . . . V[xn ]
Elements of Probability Theory 10
The following rules hold:
• V[a0 x] = a0 V[x] a
• V[Ax] = A V[x] A0
where the (m × n) dimensional matrix A with m ≤ n has full row rank.
If the variance-Covariance matrix V[x] is positive definite (p.d.) then
all random elements and all linear combinations of its random elements
have strictly positive variance:
V [a0 x] = a0 V [x] a > 0 for all a 6= 0.
4 Important Distributions
4.1 Univariate Normal Distribution
The density of the univariate normal distribution is given by:
1 1 x−µ 2
f (x) = √ e− 2 ( σ ) .
σ 2π
The normal distribution is characterized by the two parameters µ and
σ. The mean of the normal distribution is E[X] = µ and the variance
V[X] = σ 2 . We write X ∼ N(µ, σ 2 ).
The univariate normal distribution with mean µ = 0 and variance
2
σ = 1 is called the standard normal distribution N(0, 1).
4.2 Bivariate Normal Distribution
The density of the bivariate normal distribution is
1
f (x, y) = p
2πσX σY 1 − ρ2
( " 2 2 #)
1 x − µX y − µY x − µX y − µY
exp − + − 2ρ .
2(1 − ρ2 ) σX σY σX σY
11 Short Guides to Microeconometrics
If (X, Y ) follows a bivariate normal distribution, then:
• The marginal densities f (x) and f (y) are univariate normal.
• The conditional densities f (x|y) and f (y|x) are univariate normal.
2
• E[X] = µX , V[X] = σX , E[Y ] = µY , V[Y ] = σY2 .
• The correlation coefficient between X and Y is ρX,Y = ρ.
• E[Y |X] = µY + ρ σσX
Y
(X − µX ) and V[Y |X] = σY2 (1 − ρ2 ).
The above properties characterize the normal distribution. It is the only
distribution with all these properties.
Further important properties:
• If (X, Y ) follows a bivariate normal distribution, then aX + bY is
also normally distributed:
2
aX + bY ∼ N(µX + µY , σX + σY2 + 2ρσX σY ).
The reverse implication is not true.
• If X and Y are bivariate normally distributed with Cov[X, Y ] = 0,
then X and Y are independent.
4.3 Multivariate Normal Distribution
Let x = (x1 , . . . , xn )0 be a n-dimensional vector such that each element
xi is a random variable. In addition let E[x] = µ = (µ1 , . . . , µn ) and
V[x] = Σ with
σ11 σ12 ... σ1n
σ21 σ22 ... σ2n
Σ=
.. .. .. ..
.
. . .
σn1 σn2 ... σnn
where σij = Cov[xi , xj ].
Elements of Probability Theory 12
A n-dimensional random variable x is multivariate normally distributed
with mean µ and variance-Covariance matrix Σ, x ∼ N(µ, Σ) if its density
is:
1
f (x) = (2π)−n/2 (det Σ)−1/2 exp − (x − µ)0 Σ−1 (x − µ) .
2
Let x ∼ N(µ, Σ) and A a (m × n) matrix with m ≤ n and m linearly
independent rows then we have
Ax ∼ N(Aµ, AΣA0 ).
References
Amemiya, Takeshi (1994), Introduction to Statistics and Econometrics,
Cambridge: Harvard University Press.
Hayashi, Fumio (2000), Econometrics, Princeton: Princeton University
Press. Appendix A.
Stock, James H. and Mark W. Watson (2020), Introduction to Economet-
rics, 4th Global ed., Pearson. Chapter 2.