Generating Random Variables Mpress
Generating Random Variables Mpress
Some History
Zi = (aZi−1 + c) mod m
where m, a, c and Z0 are non-negative integers. We say that m is the modulus and that Z0 is the seed. The
sequence that we obtain clearly satisfies 0 ≤ Zi < m . In order to generate pseudo-random numbers,
U1 , . . . , Un , . . ., we set Ui = Zi /m. Note that Ui ∈ (0, 1) for each i.
We now act as though the Ui ’s constitute an independent sequence of U (0, 1) random variables.
Generating Random Variables and Stochastic Processes                                                          2
                                            Z0   =   1
                                            Z1   =   (11) mod 16 = 11
                                            Z2   =   (121) mod 16 = 9
                                            Z3   =   (99) mod 16 = 3
                                            Z4   =   (33) mod 16 = 1
                                            Z5   =   ···
                                            Z6   =   ···
Possible Objections
  1. The Zi ’s are notrandom?
  2. They can only take on a finite number of values
  3. The period of the generator can be very poor: see previous example
Responses
Theorem 1 The linear congruential generator has full period if and only if the following 3 conditions hold:
  1. If 4 divides m, then 4 divides a − 1
  2. The only positive integer that exactly divides both m and c is 1, i.e., m and c are relatively prime
  3. If q is a prime number that divides m, then it divides a − 1
                                            Z0   =   1
                                            Z1   =   (26) mod 16 = 10
                                            Z2   =   (143) mod 16 = 15
                                            Z3   =   (248) mod 16 = 0
                                            Z4   =   (13) mod 16 = 13
                                            Z5   =   ···
Constructing and testing good random number generators is very important! However, we will not study these
issues in this course. Instead, we will assume that we have a good random number generator available to us and
we will use it as a black box. See Law and Kelton, Chapter 7, for an excellent treatment and further details.
                   > m = mean(x);
                   % compute the mean of x
θ = E[g(U )]
Proof of Consistency
Example 4
                                      R3
Suppose we wish to estimate θ =        1
                                           (x2 + x) dx using simulation:
   • Know the exact answer is 12.67
   • Using simulation
        – note that                                     Z       3
                                                                    x2 + x
                                                θ = 2                      dx = 2E[X 2 + X]
                                                            1          2
           where X ∼ U (1, 3)
        – so generate n U (0, 1) independent variables
Generating Random Variables and Stochastic Processes                                                                    5
We can also apply Monte Carlo integration to more general problems. For example, if we want to estimate
                                           Z Z
                                       θ=        g(x, y)f (x, y) dx dy
                                                                A
Generating Random Variables and Stochastic Processes                                                               6
θ = E[g(X, Y )]
In the next part of this lecture, we will begin learning how to generate random variables that are not uniformly
distributed.
where p1 + p2 + p3 = 1. We would like to generate a value of X and we can do this by using our U (0, 1)
generator:
    1. Generate U
    2. Set                                   
                                              x1 ,         if     0 ≤ U ≤ p1
                                          X=   x2 ,         if     p1 < U ≤ p 1 + p 2
                                             
                                               x3 ,         if     p1 + p2 < U ≤ 1
P(X = x1 ) = P(0 ≤ U ≤ p1 ) = p1
since U is U (0, 1). The same is true for P(X = x2 ) and P(X = x3 ).
More generally, suppose X can take on n distinct values, x1 < x2 < . . . < xn , with
P(X = xi ) = pi for i = 1, . . . , n.
You should convince yourself that this is correct! How does this compare to the coin-tossing method for
generating X?
   2. Set
                                             X = j if F(j − 1) < U ≤ F(j)
                                     set j = 0, p = e−λ , F = p
                                     while U > F
                                           set p = λp/(j + 1), F = F + p, j = j + 1
                                     set X = j
= P(U ≤ Fx (x))
= Fx (x)
  1. Generate U
  2. Set
                                             X = min{x : Fx (x) ≥ U }
This works for discrete and continuous random variables or mixtures of the two.
So if Y := X1 + . . . + Xn then Y ∼ Gamma(n, λ) !
      where c is a constant
   • How can we use this?
Question: Can we do even better?
Generating Random Variables and Stochastic Processes                                                           10
• Monotonicity
        – inducing correlation
        – ‘1 to 1’, i.e. one U (0, 1) variable produces one X variable
   • F−1
      x may not always be computable
Suppose now that it’s difficult to simulate X directly using the inverse transform method. Then we could use
the composition method instead.
Generating Random Variables and Stochastic Processes                                                             11
Composition Algorithm:
P(I = j) = pj
      How do we do this?
   2. If I = j, then simulate Yj from Fj
3. Set X = Yj
The proof actually suggests that the composition approach might arise naturally from ‘sequential’ type
experiments. Consider the following example.
                                                   α1    =   p1
                                                    α2 =     p2
                                                f1 (x) =     λ1 e−λ1 x
                                                f2 (x) =     λ2 e−λ2 x
                                     generate U1
                                     if U1 ≤ p1 then
                                             set i = 1
                                     else  set i = 2
                                     generate U2
                                     /∗ Now generate X from Exp(λi ) ∗/
                                     set
                                               1
                                        X = − log(U2 )
                                              λi
Example 13 (Splitting)
Suppose
                                             1                 6
                                       fx (x) = 1[−1,0] (x) +    1[0,2] (x).
                                             5                15
How do we simulate a value of X using vertical splitting?
How would horizontal splitting work?
                                                               P ((Y ≤ x) ∩ B)
                                                       =                       .                             (1)
                                                                     P(B)
Then the denominator in (1) is given by
                                                        µ          ¶
                                                            f (Y )
                                              P(B)   = P U≤
                                                            ag(Y )
                                                           1
                                                     =
                                                           a
while the numerator in (1) satisfies
                                  Z    ∞
        P ((Y ≤ x)  ∩  B)    =              P ((Y ≤ x) ∩ B | Y = y) g(y) dy
                                       −∞
                                   Z µ ∞         µ              ¶ ¯      ¶
                                                         f (Y )   ¯
                              =     P (Y ≤ x) ∩ U ≤               ¯ Y = y g(y) dy
                                 −∞                     ag(Y )
                                Z x  µ           ¶
                                           f (y)
                              =     P U≤           g(y) dy (why?)
                                 −∞       ag(y)
                                   Fx (x)
                              =
                                     a
Therefore P(X ≤ x) = Fx (x), as required.
We could, for example, integrate f (·) to find F(·), and then try to use the inverse transform approach. However,
it is hard to find F−1 (·). Instead, let’s use the acceptance-rejection algorithm:
   1. First choose g(y): let’s take g(y) = 1 for y ∈ [0, 1], i.e., Y ∼ U (0, 1)
   2. Then find a. Recall that we must have
                                                      f (x)
                                                            ≤ a for all x,
                                                      g(x)
      which implies
                                              60x3 (1 − x)2 ≤ a for all x ∈ [0, 1].
      So take a = 3. It is easy to check that this value works. We then have the following algorithm.
Generating Random Variables and Stochastic Processes                                                            14
Algorithm
                                           generate Y ∼ U (0, 1)
                                           generate U ∼ U (0, 1)
                                           while U > 20Y 3 (1 − Y )2
                                                 generate Y
                                                 generate U
                                           set X = Y
Let N be the number of loops in the A-R algorithm until acceptance, and as before, let B be the event that Y
                                      f (Y )
has been accepted in a loop, i.e. U ≤ ag(Y  ) . We saw earlier that P(B) = 1/a.
Questions:
1: What is the distribution of N ?
2: What is E[N ]?
How Do We Choose a ?
E[N ] = a, so clearly we would like a to be as small as possible. Usually, this is just a matter of calculus.
                                            f (x)
                                                  ≤a        for all x ∈ [0, 1]
                                            g(x)
and we chose a = 3.
We can do better by choosing
                                                            f (x)
                                             a = max              ≈ 2.073.
                                                  x∈[0,1]   g(x)
         – if g(·) ‘close’ to f (·) then will probably also be hard to simulate from g(·)
   • So often need to find a balance between having a ‘nice’ g(·) and a small a
Generating Random Variables and Stochastic Processes                                                          15
                                               generate Y
                                               generate U
                                         set X = Y
Generally, we would use this A-R algorithm when we can simulate Y efficiently.
We briefly mentioned this earlier in Example 9 when we described how to generate a Gamma(λ, n) random
variable. The convolution method is not always the most efficient method. Why?
X ∼ g(Y1 , . . . , Yn )
for some random variables Yi and some function g(·). (Note the Yi ’s need not necessarily be IID.) If we know
how to generate each of the Yi ’s then we can generate X as follows:
Example 17 (Convolution)
The convolution method is a special case where g(Y1 , . . . , Yn ) = Y1 + . . . + Yn .
                                                         X
                                                    Z := q
                                                             Y
                                                             n
Note: The proof of the statements in these examples can be found in many probability or statistics textbooks
    • Will therefore look at the following methods for generating N(0, 1) random variables
        1. Box-Muller method
        2. Polar method
        3. Rational approximations
The Box-Muller Algorithm for Generating Two IID N(0, 1) Random Variables
We now show that this algorithm does indeed produce two IID N(0, 1) random variables, X and Y .
Proof: We need to show that
                                                 µ 2¶      µ 2¶
                                             1      x  1      y
                                 f (x, y) = √ exp −   √ exp −
                                             2π     2  2π     2
First, make a change of variables:
                                                               p
                                                        R   :=   X2 + Y 2
                                                                     µ ¶
                                                                      Y
                                                        θ   := tan−1
                                                                      X
Since U1 and U2 are IID, R and θ are independent. Clearly θ ∼ U (0, 2π) so fθ (θ) = 1/2π for 0 ≤ θ ≤ 2π.
                                            2
                                                /2
It is also easy to see that fR (r) = re−r            , so
                                                                          1 −r2 /2
                                                      fR,θ (r, θ) =         re     .
                                                                         2π
This implies
and note that dxdy = rdrdθ, i.e., the Jacobian of the transformation is r.
We then have                                                    Z   x1   Z   y1    µ              ¶
                                                1                                     (x2 + y 2 )
                         P(X ≤ x1 , Y ≤ y1 ) =                                  exp −               dxdy
                                               2π                −∞          −∞           2
Generating Random Variables and Stochastic Processes                                                               19
                                            Z   x1                         Z   y1
                                      1                        1
                                    =√         exp(−x2 /2) dx √                     exp(−y 2 /2) dy
                                       2π   −∞                  2π          −∞
as required.
The Polar Algorithm for Generating Two IID N(0, 1) Random Variables
                               set t = 0, I = 0
                               generate U
                               set t = t − log(U )/λ
                               while t < T
                                      set I = I + 1, S(I) = t
                                      generate U
                                      set t = t − log(U )/λ
Then it can be shown that N (t + s) − N (t) is a Poisson random variable with parameter m(t + s) − m(t), i.e.,
                                                               exp (−mt,s ) (mt,s )r
                                 P (N (t + s) − N (t) = r) =
                                                                      r!
where mt,s := m(t + s) − m(t).
Generating Random Variables and Stochastic Processes                                                          21
Proposition 2 Let N (t) be a Poisson process with constant intensity λ. Suppose that an arrival that occurs
at time t is counted with probability p(t), independently of what has happened beforehand. Then the process of
counted arrivals is a non-homogeneous Poisson process with intensity λ(t) = λp(t).
Suppose now N (t) is a non-homogeneous Poisson process with intensity λ(t) and that there exists a λ such that
λ(t) ≤ λ for all t ≤ T . Then we can use the following algorithm, based on Proposition 2, to simulate N (t).
                           set t = 0, I = 0
                           generate U1
                           set t = t − log(U1 )/λ
                           while t < T
                                  generate U2
                                  if U2 ≤ λ(t)/λ then
                                        set I = I + 1, S(I) = t
                                  generate U1
                                  set t = t − log(U1 )/λ
Questions
1) Can you give a more efficient version of the algorithm when there exists λ > 0 such that min0≤t≤T λ(t) ≥ λ?
2) Can you think of another algorithm for simulating a non-homogeneous Poisson process that is not based on
thinning?
Notation
Xt = x + µt + σBt
where B is an SBM. We will usually write a B(µ, σ 2 ) Brownian motion in this way.
Remark 1 Bachelier (1900) and Einstein (1905) were the first to explore Brownian motion from a
mathematical viewpoint whereas Wiener (1920’s) was the first to show that it actually exists as a well-defined
mathematical entity.
Questions
1) What is E[Bt+s Bs ]?
2) What is E[Xt+s Xs ] where X ∼ B(µ, σ)?
3) Let B be an SBM and let Zt := |Bt |. What is the CDF of Zt for t fixed?
opposed to the statistical error), in estimating θ. We will return to this topic at the end of the course when we
learn how to simulate stochastic differential equations.
In either case, we need to be able to simulate Bti for t1 < t2 < . . . < tn and for a fixed n. We will now see how
to do this. The first observation we make is that
                         set t0 = 0, Bt0 = 0
                         for i = 1 to n
                                 generate X ∼ N(0, ti − ti−1 ))
                                 set Bti = Bti−1 + X
Remark 2 When we generate (Bt1 , Bt2 , . . . , Btn ) we are actually generating a random vector that does not
consist of IID random variables. In fact the method that we use to simulate (Bt1 , Bt2 , . . . , Btn ) is one of the
most common methods for generating correlated random variables and stochastic processes. We will return to
this later.
Remark 3 It is very important that when you generate Bti+1 , you do so conditional on the value of Bti . If you
generate Bti and Bti+1 independently of one another then you are effectively simulating from different sample
paths of the Brownian motion. This is not correct!
The following properties of GBM follow immediately from the definition of BM:
                                       Xt2       Xt3        Xtn
  1. Fix t1 , t2 , . . . , tn . Then   Xt1   ,   Xt3   ,   Xtn−1   are mutually independent.
                        ³          ´      ¡                    ¢
                            Xt+s
  2. For s > 0, log          Xt        ∼ N (µ − σ 2 /2)s, σ 2 s .
  3. Xt is continuous.
Again, we call µ the drift and σ the volatility. If X ∼ GBM (µ,
                                                             ¡ σ), then note that
                                                                                ¢ Xt has a lognormal
distribution. In particular, if X ∼ GBM(µ, σ), then Xt ∼ LN (µ − σ 2 /2)t, σ 2 t .
Question: How would you simulate a sample path of GBM (µ, σ 2 ) at the fixed times 0 < t1 < t2 < . . . < tn ?
Answer: Simulate log(Xti ) first and then take exponentials! (See below for more details.)
         where B is a standard BM. The drift is µ, σ is the volatility and S is a therefore a GBM (µ, σ) process
         that begins at S0
Questions
1) What is E[St ]?
2) What is E[St2 ]?
                              2
2) Show St+s = St e(µ−σ           /2)s+σ(Bt+s −Bt )
                                                           .