0% found this document useful (0 votes)

107 views96 pages

Understanding Stationary Time Series

This document discusses stationary and non-stationary time series. It defines a stationary time series as having a constant mean, variance, and covariance over time. Examples provided include white noise (WN), moving average (MA), and autoregressive (AR) processes. Non-stationary examples provided have means or variances that change over time. Autocorrelation (ACF) and partial autocorrelation (PACF) functions are discussed to help determine if a stationary series is white noise or another type of process.

Uploaded by

Carmen Orazzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views96 pages

Understanding Stationary Time Series

Uploaded by

Carmen Orazzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

02 Stationary time series

Andrius Buteikis, andrius.buteikis@mif.vu.lt

http://web.vu.lt/mif/a.buteikis/
Introduction
All time series may be divided into two big classes - stationary and
non-stationary.

I Stationary process - a random process with a constant mean,

variance and covariance. Examples of stationary time series:

WN, mean = 0 MA(3), mean = 5 AR(1), mean = 5

8
6
1

6
5
x1

x3
0

4
4
−1

2
−2

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time Time Time

The three example processes fluctuate around their constant mean

values. Looking from the graphs, the fluctuations of the first two graphs
seem to be constant, however the third one is not so apparent.
If we plot the last time series for a longer time period:

8
6 AR(1), mean = 5
x3

4
2

0 50 100 150 200

Time

AR(1), mean = 5
8
6
x3

4
2

0 100 200 300 400

Time

We can see that the fluctuations are indeed around a constant mean and
the variance does not appear to change throughout the period.
Some non-stationary time series examples:

I Yt = t + t , where t ∼ N (0, 1);

2
I t · t, where t ∼ N (0, σ );
Yt = P
t
I Yt = j=1 Zj , where each independent variable Zj is either 1 or −1,
with a 50% probability for either value.

The reasons for their non-stationarity are as follows:

I The first time series is not stationary because its mean is not
constant: EYt = t - depends on t;
I The second time series is not stationary because its variance is not
constant: Var (Yt ) = t 2 · σ 2 - depends on t.
However, EYt = 0 · t = 0 is constant;
I The third
Pttime series is not stationary because even though
EYt = j=1 (0.5 + (−0.5)) = 0, the variance
Var (Yt ) = E(Yt2 ) − (E(Yt ))2 = E(Yt2 ) = t where:
Pt
E(Yt2 ) = E(Zj2 ) + 2 E(Zj Zk ) = t · (0.5 · 1 + 0.5 · (−1)2 ) = t
P
j=1 j6=k
The sample data graphs are provided below:
non stationary in mean non stationary in variance no clear tendency
50

4
100

3
40

2
50
30

1
ns1

ns2

ns3

0
20

−1
10

−2
−50

−3
0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Index Index Index

I White noise (WN) - a stationary process of uncorrelated
(sometimes we may demand a stronger property of independence)
random variables with zero mean and constant variance. White
noise is a model of an absolutely chaotic process of uncorrelated
observations - it is a process that immediately forgets its past.

How can we know which of the previous three stationary graphs are not
WN? Two functions help us determine this:

I ACF - Autocorrelation function

I PACF - Partial autocorrelation function

If all the bars (except the 0th in the ACF) are within the blue band - the
stationary process is WN.
WN MA(3) AR(1)

1.0

0.8
0.8

0.6
ACF

ACF

0.4
0.4

0.2

0.0
0.0

−0.2
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 25

Lag Lag Lag

WN MA(3) AR(1)
0.10

0.6
0.3
Partial ACF

Partial ACF

0.4
0.00

0.1

0.2
−0.10

0.0
−0.1

5 10 15 20 5 10 15 20 0 5 10 15 20 25

Lag Lag Lag

The 95% confidence intervals are calculated from:

qnorm(p = c(0.025, 0.975))/sqrt(n)

(more details on the confidence interval calculation are provided later in

these slides)
par(mfrow = c(1, 2))
set.seed(10)
n = 50
x0 <- rnorm(n)
acf(x0)
abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")
pacf(x0)
abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")

Series x0 Series x0

0.3
1.0
0.8

0.2
0.6

0.1
Partial ACF
0.4
ACF

0.0
0.2

−0.1
0.0

−0.2
−0.2

0 5 10 15 5 10 15

Lag Lag
Covariance-Stationary Time Series
I In cross-sectional data different observations were assumed to be
uncorrelated;
I In time series we require that there be some dynamics, some
persistence, some way in which the present is linked to the past and
the future - to the present. Having historical data then would allow
us to forecast the future.

If we want to forecast a series - at a minimum we would like its mean

and covariance structure to be stable over time. In that case, we would
say that the series is covariance stationary. There are two requirements
for this to be true:

1. The mean of the series is stable over time: EYt = µ;

2. The covariance structure is stable over time.

In general, the (auto)covariance between Yt and Yt−τ is:

γ(t, τ ) = cov (Yt , Yt−τ ) = E(Yt − µ)(Yt−τ − µ)

If the covariance structure is stable, then the covariance depends on τ

but not on t: γ(t, τ ) = γ(τ ). Note: γ(0) = Cov (Yt , Yt ) = Var (Yt ) < ∞.
Remark
When observing/measuring time series we obtain numbers y1 , ..., yT
which are the realization of random variables Y1 , ..., YT .
Using probabilistic concepts, we can give a more precise definition of a
(weak) stationary series:

I If EYt = µ - the process is called mean-stationary;

I If Var (Yt ) = σ 2 < ∞ - the process is called variance-stationary;
I If γ(t, τ ) = γ(τ ) - the process is called covariance-stationary.

In other words, a time series Yt is stationary if its mean, variance and

covariance do not depend on t.
If at least one of the three requirements is not met, then the process is
not-stationary.
Since we often work with the (auto)correlation between Yt and Yt−τ
rather than the (auto)covariance (because they are easier to interpret),
we can calculate the autocorrelation function (ACF):

cov (Yt , Yt−τ ) γ(τ )

ρ(τ ) = p =
Var (Yt )Var (Yt−τ ) γ(0)

Note: ρ(0) = 1, |ρ(τ )| ≤ 1.

The partial autocorrelation function (PACF) measures the association
between Yt and Yt−k :

p(k) = βk , where Yt = α + β1 Yt−1 + ... + βk Yt−k + t

The variance of the autocorrelation coefficient at lag k, rk , is normally
distributed at the limit, and the variance can be approximated:
1
Var (rk ) ∼ (where T is the number of observations).
T
As such, we want to createlowerand upper 95% confidence bounds for
1 1
the normal distribution N 0, , whose standard deviation is √ .
T T
The 95% confidence interval (of a stationary time series) is:

1.96
∆=0± √
T

In general, the critical value of a standard normal distribution and its

confidence interval can be found in these steps:

1−Q
I Compute α = , where Q is the confidence level;
2
I To express the critical value as a z − score, find the z1−α value.

For example, if Q = 0.95, then α = 0.05. Then, the standard normal

distributions 1 − α quantile is z0.025 ≈ 1.96.
White Noise
White noise processes are the fundamental building blocks of all
stationary time series.
We denote it t ∼ WN(0, σ 2 ) - a zero mean, constant variance and
serially uncorrelated (ρ(t, τ ) = 0, for τ > 0 and any t) random variable
process.
Sometimes we demand a stronger property of independence.
From the definition it follows that:

I E(t ) = 0;
I Var (t ) = σ 2 < ∞;
I γ(t, τ ) = E(t − Et )(t−τ − Et−τ ) = E(t t−τ ), where:

(
0, 6 0
if τ =
E(t t−τ ) =
σ 2 , if τ = 0
Example on how to check if a process is stationary.
Let us check if Yt = t + β1 t−1 , where t ∼ WN(0, σ 2 ) is stationary:

1. EYt = E(t + β1 t−1 ) = 0 + β1 · 0 = 0;

2. Var (Yr ) = Var (t + β1 t−1 ) = σ 2 + β12 σ 2 = σ 2 (1 + β1 );
3. The autocovariance for τ > 0:
γ(t, τ ) = E(Yt Yt−τ ) = E(t + β1 t−1 )(t−τ + β1 t−τ −1 )
= Et t−τ + β1 Et t−τ −1 + β1 Et−1 t−τ + β12 Et−1 t−τ −1
(
β1 σ 2 , if τ = 1
= β1 Et−1 t−τ =
0, if τ > 1

None of these characteristics depend on t, which means that the process

is stationary. This process has a very short memory (i.e. if Yt and Yt+τ
are separated by more than one time period - they are uncorrelated).
On the other hand, this process is not a WN.
The Lag Operator
The lag operator L is used to lag a time series: LYt = Yt−1 . Similarly:
L2 Yt = L(LYt ) = L(Yt−1 ) = Yt−2 etc. In general, we can write:
Lp Yt = Yt−p
Typically, we operate on a time series with a polynomial in the lag
operator. A lag operator polynomial of degree m is:
B(L) = β0 + β1 L + β2 L2 + ... + βm Lm
For example, if B(L) = 1 + 0.9L − 0.6L2 , then:
B(L)Yt = Yt + 0.9Yt−1 − 0.6Yt−2

A well known operator - the first-difference operator ∆ - is a first-order

polynomial in the lag operator: ∆Yt = Yt − Yt−1 = (1 − L)Yt , i.e.
B(L) = 1 − L.
We can also write an infinite-order lag operator polynomial as:
∞
X
B(L) = β0 + β1 L + β2 L2 + ... = β j Lj
j=0
The General Linear Process
Wold’s representation theorem points to the appropriate model for
stationary processes.

Wold’s Representation Theorem

Let {Yt } be any zero-mean covariance-stationary process. Then we can
write it as:
∞
X
Yt = B(L)t = βj t−j , t ∼ WN(0, σ 2 )
j=0

P∞
where β0 = 1 and j=0 βj2 < ∞. On the other hand, any process of the
above form is stationary.

I If β1 = β2 = ... = 0 - this corresponds to a WN process. This shows

once again that WN is a stationary process.
I If βk = φk , then since 1 + φ + φ2 + ... = 1/(1 − φ) < ∞ we have
that if |φ| < 1, then the process Yt = + φt−1 + φ2 t−2 + ... is a
stationary process.
In Wold’s theorem, we assumed a zero mean, though this is not as
restrictive as it may seem. Whenever you see Yt , analyse the process
Yt − µ, so that the process is expressed in deviations from its mean. The
deviation from the mean has a zero mean by construction. So, there is
not generality loss, when analyzing zero-mean processes.
Wold’s representation theorem points to the importance of models with
infinite distributed (weighted) lags. Although infinite distributed lag
models are not of immediate practical use since they contain infinite
parameters, although this may not always be the case. From the previous
slide, βk = φk of the infinite polynomial B(L) - is only one parameter.
Estimation and Inference for the Mean, ACF and PACF

Suppose we have a sample data of a stationary time series but we do not

know the true model that generated the data (we only know that it was a
polynomial B(L)), nor the mean, ACF or PACF associated with the
model.
We want to use the data to estimate the mean, ACF and PACF, which
we might use to help us decide the suitable model to fit the data.
Sample Mean
The mean of a stationary series is EYt = µ. A fundamental principle of
estimation, called the analog principle, suggests that we develop
estimators by replacing expectations with sample averages. Thus, our
estimator of the population mean, given a sample of size T is the sample
mean:
T
1 X
Ȳ = Yt
T t=1
Typically, we are not interested in estimating the mean but it is needed
for estimating the autocorrelation function.
Sample Autocorrelations
The autocorrelation at displacement, or lag, τ for the covariance
stationary series {Yt } is:

E (Yt − µ) (Yt−τ − µ)
ρ(τ ) = 2
E (Yt − µ)

Application of the analog principle yields a natural estimator of ρ(τ ):

1 PT
t=1 Yt − Ȳ Yt−τ − Ȳ
ρ̂(τ ) = T
1 PT 2
t=1 Yt − Ȳ
T
This estimator is called the sample autocorrelation function (sample
ACF).
It is often of interest to assess whether a series is reasonably
approximated as white noise, i.e. whether all of its autocorrelations are
zero in population.
If a series is white noise, then the sample autocorrelations ρ̂(τ ), √
τ = 1, ..., K in large samples are independent and have the N (0, 1/ T )
distribution.
Thus, if the series is WN, ~95%
√ of the sample autocorrelations should
fall in the interval of ±1.96/ T .
Exactly the same holds for both sample ACF and sample PACF. We
typically plot the sample ACF and sample PACF along with their error
bands.
The aforementioned error bands provide 95% confidence bounds for only
the sample autocorrelation taken one at a time.
We are often interested in whether a series is white noise, i.e. whether all
its autocorrelations are jointly zero. Because of the sample size, we can
only take a finite number of autocorrelations. We want to test:

H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0

Under the null hypothesis the Ljung-Box statistic:

k
X ρ̂2 (τ )
Q = T (T + 2)
τ =1
T −τ

is approximately distributed as a χ2K random variable.

To test the null hypothesis, we have to calculate the
p − value = P(χ2K > q): if p − value < 0.05 - we reject the null
hypothesis, H0 , and assume that Yt is not white noise.
Example: Canadian unemployment data

We will illustrate the provided ideas by examining quarterly Canadian

employment index data. The data is seasonally adjusted and displays no
trend, however it does appear to be highly serially correlated…

suppressPackageStartupMessages({require("forecast")})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "caemp.txt"
caemp <- read.csv(url(paste0(txt1, txt2)),
header = TRUE, as.is = TRUE)
caemp <- ts(caemp, start = c(1960, 1), freq = 4)
tsdisplay(caemp)
caemp

105
95
90
85

1960 1965 1970 1975 1980 1985 1990 1995

1.0

1.0
0.8

0.8
0.6

0.6
0.4

0.4
PACF
ACF

0.2

0.2
−0.2 0.0

−0.2 0.0
5 10 15 20 5 10 15 20

Lag Lag

I The sample ACF are large and display a slow one-sided decay;
I The sample PACF are large at first, but are statistically negligible
beyond displacement τ = 2.
We shall once again test the WN hypothesis, this time using the
Ljung-Box test statistic.
Box.test(caemp, lag = 1, type = "Ljung-Box")

##
## Box-Ljung test
##
## data: caemp
## X-squared = 127.73, df = 1, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0.

Box.test(caemp, lag = 2, type = "Ljung-Box")

##
## Box-Ljung test
##
## data: caemp
## X-squared = 240.45, df = 2, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0, ρ(2) = 0,

and so on. We can see that the time series is not a WN.
We will now present a few more examples of stationary processes.

Moving-Average (MA) Models

Finite-order moving-average processes are approximations to the Wold
representation (an infinite-order moving average process).
The fact that all variation in time series, one way of another, is driven by
shocks of various sorts suggests the possibility of modelling time series
directly as distributed lags of current and past shocks - as
moving-average processes.
The MA(1) Process
The first-order moving average or MA(1) process is:

Yt = t + θt−1 = (1 − θL)t , −∞ < θ < ∞, ∼ WN(0, σ 2 )

Defining characteristics of an MA process: the current value of the

observed series can be expressed as a function of current and lagged
unobservable shocks t .
Whatever the value of θ (as long as |θ| < ∞), MA(1) is always a
stationary process and:

I E(Yt ) = E(t ) + θE(t−1 ) = 0;

I = Var (t ) + θ2 Var (t−1 ) = (1 + θ2 )σ 2 ;
Var (Yt ) 
1, if τ = 0

I ρ(τ ) = θ/(1 + θ2 ), if τ = 1

0, otherwise


Key feature of MA(1): (sample) ACF has a sharp cutoff beyond τ = 1.

We can write MA(1) another way:
Since:
1
Yt = (1 − θL)t ⇒ t = Yt
1 − θL
Recalling the formula of a geometric series, if |θ| < 1:

t = (1 − θL + θ2 L2 − θ3 L3 + ...)Yt
= Yt − θYt−1 + θ2 Yt−2 − θ3 Yt−3 + ...

and we can express Yt as an infinite AR process:

Yt = θYt−1 − θ2 Yt−2 + θ3 Yt−3 − ... + t

X∞
= (−1)j+1 θj Yt−j + t
j=1

Remembering the definition of a PACF we have that for an MA(1)

process it will decay gradually to zero.

I If θ < 0, then the pattern of decay will be one-sided

I If 0 < θ < 1, then the pattern of decay will be oscillating.
An example on how the sample ACF and PACF would look like of MA(1)
processes:

MA(1) with θ = 0.5 MA(1) with θ = 0.5

1.0

0.0 0.1 0.2 0.3

0.8

Partial ACF
0.6
ACF

0.4
0.2
0.0

−0.2
0 1 2 3 4 5 1 2 3 4 5

Lag Lag

MA(1) with θ = −0.5 MA(1) with θ = −0.5

0.1
0.8

0.0
Partial ACF
0.4
ACF

−0.2
0.0

−0.4
−0.4

0 1 2 3 4 5 1 2 3 4 5

Lag Lag
The MA(q) Process
We will now consider a general finite-order moving average process of
order q, MA(q):

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θ < ∞, ∼ WN(0, σ 2 )

where
Θ(L) = 1 + θ1 L + ... + θq Lq
is the qth-order lag polynomial. The MA(q) process is a generalization of
the MA(1) process. Compared to MA(1), MA(q) can capture richer
dynamic patterns which can be used for improved forecasting.
The properties of an MA(q) processes are parallel to those of an MA(1)
process in all respects:

I The finite-order MA(q) process is covariance stationary for any value

of its parameters (|θj | < ∞, j = 1, ..., q);
I In MA(q) case, all autocorrelations in ACF beyond displacement q
are 0 (a distinctive property of the MA process);
I The PACF of the MA(q) decays gradually in accordance with the
infinite autoregressive representation, similar to MA(1):
Yt = a1 Yt−1 + a2 Yt−2 + ... + t (with certain conditions for aj ).
An example on how the sample ACF and PACF would look like of MA(3)
process:
0.8 MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35
ACF

0.4
0.0

0 2 4 6 8

Lag

MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35

0.6
0.4
Partial ACF

0.2
0.0
−0.4

2 4 6 8

Lag

ACF is cut off at τ = 3 and PACF decays gradually.

Autoregressive (AR) Models
The autoregressive process is also a natural approximation of the Wold
representation. We have seen that, under certain conditions, a
moving-average process has an autoregressive representation. So, an
autoregressive process is, in a sense, the same as a moving average
process.
The AR(1) Process

The first-order autoregressive or AR(1) process is:

Yt = φYt−1 + t , t ∼ WN(0, σ 2 )

or:
1
(1 − φL)Yt = t ⇒ Yt = t
1 − φL

Note the special interpretation of the errors, or disturbances, or shocks t

in time series theory: in contrast to the regression theory where they were
understood as the summary of all unobserved X ’s, now they are treated
as economic effects which have developed in period t.
As we will see when analyzing ACF, the AR(1) model is capable of
capturing much more persistent dynamics (depending on its parameter
value) than the MA(1) model, which has a very short memory, regardless
of its parameter value.
Recall that a finite-order moving-average process is always covariance
stationary, but that certain conditions must be satisfied for AR(1) to be
stationary. The AR(1) process can be rewritten as:
1
Yt = t = (1 + φL + φ2 L2 + ...)t = t + φt−1 + φ2 t−2 + ...
1 − φL
This Wold’s moving-average representation for Y is convergent if |φ| < 1,
thus:

AR(1) is stationary is |φ| < 1

Equivalently, the condition for covariance stationarity is that the root, z1 ,

of the autoregressive lag operator polynomial (i.e.
1 − φz1 = 0 ⇔ z1 = 1/φ) be greater than 1 in absolute value (a similar
condition on the roots is important for the AR(p) case).
We can also get the above equation by recursively applying the equation
of AR(1) to get the infinite MA process:
Yt = φYt−1 + t = φ(φYt−2 + t−1 ) + t
∞
X
2
= t + φt−1 + φ Yt−2 = ... = φj t−j
j=0
From the moving average representation of the covariance stationary
AR(1) process:

I E(Yt ) = E(t + φt−1 + φ2 t−2 + ...) = 0;

I Var (Yt ) = Var (t ) + φ2 Var (t−1 ) + ... = σ 2 /(1 − φ2 );

Or, alternatively: when |φ| < 1 - the process is stationary, i.e. EYt = m,
therefore EYt = φEYt−1 + Et ⇒ m = φm + 0 ⇒ m = 0.
This allows us to easily estimate the mean of the generalized AR(1)
process: if Yt = α + φYt−1 + t , then m = α/(1 − φ).
The correlogram (ACF & PACF) of AR(1) is in a sense symmetric to that
of MA(1):

I φτ , τ = 0, 1, 2... - ACF decays exponentially;

ρ(τ ) = (
φ, τ = 1
I p(τ ) = - PACF cuts off abruptly.
0, τ > 1
An example on how the sample ACF and PACF would look like of AR(1)
process:

AR(1) with φ = 0.85

0.8
ACF

0.4
0.0

0 1 2 3 4 5

Lag

AR(1) with φ = 0.85

0.8
Partial ACF

0.4
0.0

1 2 3 4 5

Lag
The AR(p) Process
The general pth order autoregressive process, AR(p) is:
Yt = φ1 Yt−1 + φ2 Yt−2 + ... + φp Yt−p + t , t ∼ WN(0, σ 2 )
In lag operator form, we write:
Φ(L)Yt = (1 − φ1 L − φ2 L2 − ... − φp Lp )Yt = t
Similar to the AR(1) case, the AR(p) process is covariance stationary
if and only if all the roots zi of the autoregressive lag operator polynomial
Φ(z) are outside the complex unit circle:
1 − φ1 z − φ2 z 2 − ... − φp z p = 0 ⇒ |zi | > 1
So:

AR(p) is stationary if all the roots |zi | > 1

For a quick check of stationarity, use the following rule:

Pp
If i=1 φi ≥ 1, the process isn’t stationary
In the covariance stationary case, we can write the process in the infinite
moving average MA(∞) form:

1
Yt = t
Φ(L)

I The ACF for the general AR(p) process decays gradually when the
lag increases;
I The PACF for the general AR(p) process has a sharp cutoff at
displacement p.
An example on how the sample ACF and PACF would look like of AR(2)
process Yt = 1.5Yt−1 − 0.9Yt−2 + T :

AR(2) with φ1 = 1.5, φ2 = −0.9, AR(2) with φ1 = 1.5, φ2 = −0.9,

0.5
Partial ACF
0.5
ACF

−0.5
−0.5

0 5 10 15 20 5 10 15 20

Lag Lag

The corresponding lag operator polynomial is 1 − 1.5L + 0.9L2 with two

complex√conjugate roots: z1,2 = 0.83 ± 0.65i,
|z1,2 | = 0.832 + 0.652 = 1.05423 > 1 - thus the process is stationary.
The ACF for an AR(2) is:

0,
 τ =0
ρ(τ ) = φ1 /(1 − φ2 ), τ =1

φ1 ρ(τ − 1) + φ2 ρ(τ − 2), τ = 2, 3, ...


Because the roots are complex, the ACF oscillates and because the roots
are close to the unit circle, the oscillation damps slowly.
Stationarity and Invertibility
The AR(p) is a generalization of the AR(1) strategy for approximating
the Wold representation. The moving-average representation associated
with the stationary AR(p) process:

∞
1 1 X
Yt = t where = ψj Lj , ψ0 = 1
Φ(L) Φ(L) j=0

depends on p parameters only. This gives us the infinite process from

Wold’s Representation Theorem:

∞
X
Yt = ψj t−j
j=0

which is known asPthe infinite moving-average process, MA(∞). Because

∞
AR is stationary, j=0 ψj2 < ∞ and Yt take finite values.
Thus, a stationary AR process can be rewritten as an MA(∞) process.
Stationarity and Invertibility

In some cases the AR form of a stationary process is preferred to that of

MA. Just as we can write an AR process as an MA(∞), we can written
an MA process as an AR(∞). The necessary definition says that the MA
process is called invertible if it can be expressed as an AR process. So,
the MA(q) process:

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θi < ∞, t ∼ WN(0, σ 2 )

is invertible if all the roots of Θ(x ) = 1 + θ1 x + ... + θq x q lie outside the

unit circle:

1 + θ1 x + ... + θq x q = 0 ⇒ |xi | > 1

Stationarity and Invertibility
Then we can write the process as:

∞
1 1 X
t = Yt , where = πj Lj , π0 = 1
Θ(L) Θ(L) j=0
∞
X ∞
X
t = πj Yt−j = Yt + πj Yt−j
j=0 j=1

which gives us the infinite-order autoregressive process, AR(∞):

∞
X
Yt = πej Yt−j + t
j=1

Because the MA process is invertible, the infinite series converges to a

finite value.
For example, MA(1) of the form Yt = t − t−1 is not invertible since
1 − x = 0 ⇒ x = 1.
Autoregressive Moving-Average (ARMA) Models

AR and MA models are often combined in attempts to obtain better

approximations to the Wold representation. The results it the
ARMA(p,q) process. The motivation for using ARMA models is as
follows:

I If the random shock that drives and AR process is itself a MA

process, then we obtain an ARMA process;
I ARMA processes arise from aggregation - sums of AR processes,
sums of AR and MA processes;
I AR processes observed subject to measurement error also turn out
to be ARMA processes.
ARMA(1,1) process
The simplest ARMA process that is not a pure AR or pure MA is the
ARMA(1,1) process:
Yt = φYt−1 + t + θt−1 , t ∼ WN(0, σ 2 )
or in lag operator form:
(1 − φL)Yt = (1 + θL)t
where:
1. |φ| < 1 - required for stationarity;
2. |θ| < 1 - required for invertibility.
If the covariance stationarity conditions are satisfied, then we have the
MA representation:
(1 − φL)
Yt = t = t + b1 t−1 + b2 t−2 + ...
(1 + θL)
which is an infinite distributed lag of current and past innovations.
Similarly, we can rewrite it in the infinite AR form:
(1 + θL)
Yt + a1 Yt−1 + a2 Yt−2 + ... = Yt = t
(1 − φL)
ARMA(p,q) process

A natural generalization of the ARMA(1,1) is the ARMA(p,q) process

that allows for multiple moving-average and autoregressive lags. We can
write it as:

Yt = φ1 Yt−1 + ... + φp Yt−p + t + θq t−1 + ... + θq t−q , t ∼ WN(0, σ 2 )

or:
Φ(L)Yt = Θ(L)t

I If all the roots of Φ(L) are outside the unit circle, then the process is
stationary and has a convergent infinite moving average
representation: Yt = (Φ(L)/Θ(L)) t ;
I If all roots of Θ(L) are outside the unit circle, then the process is
invertible and can be expressed as the convergent infinite
autoregression: (Φ(L)/Θ(L)) Yt = t .
An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + t + 0.5t−1 :

0.8
ARMA(1,1) with φ = 0.85, θ = 0.5,
ACF

0.4
0.0

0 5 10 15 20

Lag

ARMA(1,1) with φ = 0.85, θ = 0.5,

0.8
Partial ACF

0.4
0.0
−0.4

5 10 15 20

Lag
ARMA models are often both highly accurate and highly parsimonious.
In a particular situation, for example, it might take an AR(5) model to
get the same approximation accuracy as could be obtained with an
ARMA(1,1), but the AR(5) has five parameters to be estimated, whereas
the ARMA(1,1) has only two.

The rule to determine the number of AR and MA terms:

- AR(p) - ACF declines, PACF = 0 if τ > p;
- MA(q) - ACF = 0 if τ > q, PACF declines;
- ARMA(p,q) - both ACF and PACF decline.
Estimation
Autoregressive process parameter estimation
Let say we want to estimate the parameters of our AR(1) process:

Yt = φ1 Yt−1 + t

I The OLS estimator of φ for the AR(1) case:

PT
t=1 Yt Yt−1
φ̂ = P T 2
Yt−1
t=1

I Yule-Walker estimator of φ for AR(1) can be calculated by

multiplying Yt = φ1 Yt−1 + t by Yt−1 and taking its expectation.
We will get the equation:

γ(1) = φγ(0)

Recall that γ(τ ) is the covariance between Yt and Yt−τ .

For the AR(p) case, we would need p different equations,i.e.:
γ(k) = θ1 γ(t − 1) + ... + θp γ(k − p), k = 1, ..., p

Moving-average process parameter estimation

Let say we want to estimate the parameter of our invertible MA(1)
process (i.e. |θ| < 1):

Yt = t + θ1 t−1 ⇒ t = Yt − θYt−1 + ...

PT
Let S(θ) = t=1 t and 0 = 0. We can find the parameter θ by
minimizing S(θ).

ARMA process parameter estimation

For the ARMA(1,1): Yt = φYt−1 + t + θt−1 we would need to
PT
minimize S(θ, φ) = t=1 2t with 0 = Y0 = 0.
For the ARMA(p,q), we would need to minimize S(θ, φ) by setting
k = Yk = 0 for k ≤ 0.

We can also estimate the parameters using the maximum

likelihood method.
Forecasting
So far we thought of the information set as containing the available past
history of the series, ΩT = {YT , YT −1 , ...}, where we imagined the
history as having begun in the infinite past. Based upon that information
set, we want to find the optimal forecast of Y at some future time T + h.
If Yt is a stationary process, then the forecast tends to the process mean
as h increases. Therefore, the forecast is only interesting for several small
values of h.
Our forecast method is always the same: write out the process for the
future time period, T + h and project it on what is known at time T
when the forecast is made. We denote the forecast as YT +h|T , h ≥ 1.
Point forecasts can be calculated using the following three steps.

1. If needed, expand the equation so that Yt is on the left hand side

and all other terms are on the right;
2. Rewrite the equation by replacing T by T + h;
3. On the right hand side of the equation, replace future observations
by their forecasts, future errors (T +j , 0 < j ≤ h) by zero, and past
errors by the corresponding residuals.
Forecasting MA(q) process

Consider, for example, an MA(1) process:

Yt = µ + t + θt−1 , t ∼ WN(0, σ 2 )

We have:
YT +1 = µ + T +1 + θT ⇒ YT +1|T = µ + 0 + θT
YT +2 = µ + T +2 + θT +1 ⇒ YT +2|T = µ + 0 + 0
...
YT +h|T = µ

The forecast quickly approaches the (sample) mean of the process and
starting at h = q + 1 - coincides with it. When h increases, the accuracy
of the forecast diminishes up to the moment h = q + 1, whereupon it
becomes constant.
An example of an MA(1) process: Yt = t + 0.5t−1 :

Forecasts from ARIMA(0,0,1) with zero mean

2
1
0
−1
−2

0 20 40 60 80 100 120
Forecasting AR(p) process

Consider, for example, an AR(1) process:

Yt = φYt−1 + t , t ∼ WN(0, σ 2 )

We have:
YT +1 = φYT + T +1 ⇒ YT +1|T = φYT + 0
YT +2 = φYT +1 + T +2 ⇒ YT +2|T = φYT +1 + 0 = φ2 YT
...
YT +h|T = φh YT

The forecast tends to the (sample) mean exponentially fast, but never
reaches it. When h increases, the accuracy of the forecast diminishes but
never reaches the limit.
An example of an AR(1) process: Yt = 0.85Yt−1 + t :

Forecasts from ARIMA(1,0,0) with zero mean

4
2
0
−2
−4

0 20 40 60 80 100 120
Forecasting ARMA(p,q) process

Consider, for example, an ARMA(1,1) process:

Yt = φYt−1 + t + θt−1 , t ∼ WN(0, σ 2 )

We have:
YT +1 = φYT + T +1 + θT ⇒ YT +1|T = φYT + 0 + θT
YT +2 = φYT +1 + T +2 + θT +1 ⇒ YT +2|T = φYT +1 + 0 + 0 = φ2 YT + φθT
...
YT +h|T = φh YT + φh−1 θt

Similar to the AR(p) process, the ARMA(p,q) process tends to the

average, but never reaches it.
An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + t + 0.5t−1 :

Forecasts from ARIMA(1,0,1) with zero mean

6
4
2
0
−2
−4

0 20 40 60 80 100 120

- The forecast YT +h|T of an MA(q) process in h = q steps reaches

its average and then does not change anymore;
- The forecast YT +h|T of an AR(p) or ARMA(p,q) process tends
to the average, but never reaches it. The speed of convergence
depends on the coefficients;
Financial Volatility
Consider Yt growing annually at rate r :
Yt = (1 + r )Yt−1 = (1 + r )2 Yt−2 = ... = (1 + r )t Y0 = e t·log(1+r ) Y0

The values of Yt lie on an exponent:

Yt with Y0 = 1 and r = 0.05

10
8
Y

6
4
2

0 10 20 30 40 50

Time

In order for the model to represent a more realistic growth, let us

introduce an economic shock component, t ∼ WN(0, σ 2 ).
Thus, our model is now:
Pt
log(1+r +s )
Yt = (1 + r + t )Yt−1 = Πts=1 (1 + r + s ) · Y0 = e s=1 · Y0

The values of Yt are again close to the exponent:

Yt with Y0 = 1, r = 0.05 and εt ~ WN(0, 0.052)

12
10
8
Y

6
4
2

0 10 20 30 40 50

Time

Note: EYt = e t·log(1+r ) Y0 , thus Yt is not stationary.

We can take the differences: ∆Yt = Yt − Yt−1 but they are also not
stationary. We can also take the logarithms and use the equality
log(1 + x ) ≈ x (using Taylor’s expansions of a function around 0):
t
X t
X
Ỹt = logYt = logY0 + log(1 + r + s ) ≈ logY0 + rt + s
x =1 s=1

log(Yt)
2.5
2.0
1.5
log(Y)

1.0
0.5
0.0

0 10 20 30 40 50

Time

Ỹt is still not stationary, however its differences ∆Ỹt = r + t are

stationary.
0.06
0.00
∆log(Yt)

0 10 20 30 40 50
0.4

0.4
PACF
ACF

0.0

0.0
−0.4

−0.4
5 10 15 5 10 15

Lag Lag

The differences, in this case, also have an economic interpretation - it is

the series of (logarithmic) returns, i.e. annual growth of Yt .
Stock and bond returns (or similar financial series) can be described as
having an average return of r but otherwise seemingly unpredictable from
the past values (i.e. resembling WN): Yt = r + t , t ∼ WN(0, σ 2 ).
Although the sequence may initially appear to be WN, there is strong
evidence to suggest that it is not an independent process.
As such, we shall try to create a model of residuals: et = ˆt , i.e. centered
returns Yt − Ȳt = Yt − r̂ of real stocks that posses some interesting
empirical properties:

I high volatility events tend to cluster in time (i.e. persistency or

inertia of volatility);
I Yt is uncorrelated with its lags, but Yt2 is correlated with
2 2
Yt−1 , Yt−2 , ...;
I Yt is heavy-tailed, i.e. the right tail of its density decreases slower
than that of the Gaussian density (this means that Yt take big
values more often than Gaussian random variables).

Note: volatility = the conditional standard deviation of the stock return:

σt2 = Var (rt |Ωt−1 ), where Ωt−1 - the information set available at time
t − 1.
An introductory example:
Let’s say Pt denote the price of a financial asset at time t. Then, the log
returns:
Rt = log(Pt ) − log(Pt−1 )
could be typically modeled as a stationary time series. An ARMA model
for the series Rt would have the property that the conditional variance Rt
is independent of t. However, in practice this is not the case. Lets say
our Rt data is generated by the following process:

set.seed(346)
n = 1000
alpha = c(1, 0.5)
epsilon = rnorm(mean = 0, sd = 1, n = n)
R.t = NULL
R.t[1] = sqrt(alpha[1]) * epsilon[1]
for(j in 2:n){
R.t[j] = sqrt(alpha[1] + alpha[2] * R.t[j-1]^2) * epsilon[j]
}

i.e., Rt , t > 1, nonlinearly depends on its past values.

If we plot the data and the ACF and PACF plots:

forecast::tsdisplay(R.t)

R.t
5
0
−5

0 200 400 600 800 1000

0.10

0.10
0.05

0.05
PACF
ACF

0.00

0.00
−0.05

−0.05
−0.10

−0.10

0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag
and perform the Ljung-Box test

Box.test(R.t, lag = 10, type = "Ljung-Box")$p.value

## [1] 0.9082987

Box.test(R.t, lag = 20, type = "Ljung-Box")$p.value

## [1] 0.3846643

Box.test(R.t, lag = 25, type = "Ljung-Box")$p.value

## [1] 0.4572007

We see that for all cases p-value > 0.05, so we do not reject the null
hypothesis that the autocorrelations are zero. The series appears to be
WN.
But we know that this is not the case from the data generation code.
If we check the ACF and PACF of the squared log-returns, Rt2 :

forecast::tsdisplay(R.t^2)

R.t^2
50
30
0 10

0 200 400 600 800 1000

0.4

0.4
0.3

0.3
0.2

0.2
PACF
ACF

0.1

0.1
−0.1 0.0

−0.1 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

Lag Lag

The squared log-returns are autocorrelated in the first couple of lags.

From th Ljung-Box test:

Box.test(R.t^2, lag = 10, type = "Ljung-Box")

##
## Box-Ljung test
##
## data: R.t^2
## X-squared = 174.37, df = 10, p-value < 2.2e-16

we do not reject the null hypothesis that the squared log-returns are
autocorrelated.
In comparison, for a simple t ∼ WN(0, 1) process:

set.seed(123)
epsilon = rnorm(mean = 0, sd = 1, n = 5000)
The t process is not serially correlated:
par(mfrow = c(1, 2))
forecast::Acf(epsilon, lag.max = 20)
forecast::Pacf(epsilon, lag.max = 20)
Series epsilon Series epsilon
0.00 0.02 0.04

0.00 0.02 0.04

Partial ACF
ACF

−0.04

−0.04
5 10 15 20 5 10 15 20

Lag Lag

Box.test(epsilon, lag = 10, type = "Ljung-Box")$p.val

## [1] 0.872063
The 2t process is also not serially correlated:
par(mfrow = c(1, 2))
forecast::Acf(epsilon^2, lag.max = 20)
forecast::Pacf(epsilon^2, lag.max = 20)
Series epsilon^2 Series epsilon^2
0.00 0.02 0.04

0.00 0.02 0.04

Partial ACF
ACF

−0.04

−0.04
5 10 15 20 5 10 15 20

Lag Lag

Box.test(epsilon^2, lag = 10, type = "Ljung-Box")$p.val

## [1] 0.7639204

So, Rt only appeared to be a WN process, unless we also analyse Rt2 .

The following example stock data contains weekly data for logarithms of
stock prices, log(Pt ):
suppressPackageStartupMessages({require(readxl)})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "stock.xls"
tmp = tempfile(fileext = ".xls")
#Download the file
download.file(url = paste0(txt1, txt2),
destfile = tmp, mode = "wb")
#Read it as an excel file
stocks <- read_excel(path = tmp)
plot.ts(stocks$lStock)
3.40
stocks$lStock

3.30
3.20

0 50 100 150 200

Time
The differences do not pass WN checks:
tsdisplay(diff(stocks$lStock))
diff(stocks$lStock)
0.015
−0.010

0 50 100 150 200

0.2

0.2
PACF
ACF

0.0

0.0
−0.2

−0.2
5 10 15 20 5 10 15 20

Lag Lag

Box.test(diff(stocks$lStock), lag = 10)$p.value

## [1] 3.014097e-05

The basic idea behind volatility study is that the series is serially
uncorrelated, but it is a dependent series.
Let us calculate the volatility as ût2 from ∆log(Yt ) = α + ut

mdl <- lm(diff(stocks$lStock) ~ 1)

u <- residuals(mdl)
u2<- u^2
plot.ts(data.frame(diff(stocks$lStock), u2),
main = "returns and volatility")

returns and volatility

diff.stocks.lStock.

0.015
−0.010
0.00025
u2

0.00000

0 50 100 150 200

Time

Note the small volatility in stable times and large volatility in fluctuating
return periods.
We have learned that the AR process is able to model persistency, which,
in our case, may be called clustering of volatility. Consider an
AR(1) model of volatility (for this example we assume ut2 is WN):
ut2 = α + φut−1
2
+ wt , wt ∼ WN

library(forecast)
u2.mdl <- Arima(u2, order = c(1, 0, 0), include.mean = TRUE)
coef(u2.mdl)

## ar1 intercept
## 7.335022e-01 9.187829e-06

Remember that for a stationary process ut2 : Eut2 = µ. So µ = α/(1 − φ).

The Arima function returns the intercept, however, if the model has
an autoregressive part, it is actually the process mean.

#To get the alpha coefficient of an AR process:

#alpha = mu *(1-phi)
unname(coef(u2.mdl)[2] * (1 - coef(u2.mdl)[1]))

## [1] 2.448536e-06
The resulting model:

ut2 = 0.00000245 + 0.7335ut−1

2
+ wt

Might be of great interest to an investor wanting to purchase this stock.

2
I Suppose an investor has just observed that ut−1 = 0, i.e. the stock
price changes by its average amount in period t − 1. The investor is
interested in predicting volatility in period t in order to judge the
likely risk involved in purchasing the stock. Since the error is
unpredictable, the investor ignores it (it could be positive or
negative). So, the predicted volatility in period t is 0.00000245.
2
I If the investor observed ut−1 = 0.0001, then he would have
predicted the volatility at period t to be 0.00000245 +
0.00007335 = 7.58e-05, which is almost 31 times bigger.

This kind of information can be incorporated into financial models of

investor behavior.
Weak WN and Strong WN
I A sequence of uncorrelated random variables (with zero mean and
constant variance) is called a weak WN;
I A sequence of independent random variables (with zero mean and
constant variance) is called a strong WN;

If t is a strong WN then so is 2t or any other function of t .

Let Ωs = F(s , s−1 , ...) be the set containing all the information on the
past of the process.
If t is a strong WN, then:

I conditional mean E(t |Ωt−1 ) = 0;

I conditional variance Var (t |Ωt−1 ) = E(2t |Ωt−1 ) = σ 2

Now we shall present a model of weak WN process (its variance is

constant) such that its conditional variance or volatility may change over
time. The simplest way to model this kind of phenomenon is to use the
ARCH(1) model.
From the rules for the mean :

E(X + α) = µ + α

and the variance

Var (β · X + α) = β 2 · σ 2

we can modify the random variables to have different mean and variance:

t ∼ N (0, 1) ⇒ (β · t + µ) ∼ N (µ, β 2 · 1)

If we take β = σt , we can have the variance change depending on the

time t. We can then specify the volatility (i.e. standard deviation) as a
separate equation and estimate its parameters.
Auto Regressive Conditional Heteroscedastic (ARCH)
model
The core idea of the ARCH model is to effectively describe the
dependence of volatility on recent (centered) returns rt .
The ARCH(1) model can be written as:

rt
 = t
t = σ t zt

 2
σt = E(2t |Ωt−1 ) = ω + α1 2t−1

where:

I zt are (0,1) - Gaussian or Student (or similar symmetric) i.i.d.

random variables (strong WN);
I ω, α1 > 0;
I E(t ) = 0, Var (t ) = ω/(1 − α1 ), Cov (t+h , t ) = 0, ∀t ≥ 0 and
|h| ≥ 1. Also, Var (t ) ≥ 0 ⇒ 0 ≤ α1 < 1.

An ARCH process is stationary. If the returns are not centered, then the
first equation is rt = µ + t .
ARCH(q):
The ARCH process can also be generalized:

rt
 = µ + t
t = σ t zt

 2
σt = ω + α1 2t−1 + ... + αq 2t−q

AR(P) − ARCH(q):
It may also be possible that the returns rt themselves are autocorrelated:

rt
 = µ + φ1 rt−1 + ... + φp rt−P + t
t = σ t zt

 2
σt = ω + α1 2t−1 + ... + αq 2t−q
Continuing the stock example (1)
Recall that our ‘naive’ log stock return data volatility model was:

ub2 t = 0.00000245 + 0.7335ub2 t−1

2
Because the coefficient of ut−1 was significant - it could indicate that ut2
is probably an ARCH(1) process.

suppressPackageStartupMessages({library(fGarch)})
mdl.arch <- garchFit(~ garch(1,0), diff(stocks$lStock),
trace = FALSE)
mdl.arch@fit$matcoef

## Estimate Std. Error t value Pr(>|t|)

## mu 1.048473e-03 1.132355e-04 9.259222 0.000000e+00
## omega 2.400242e-06 3.904157e-07 6.147914 7.850864e-10
## alpha1 6.598808e-01 1.571422e-01 4.199260 2.677887e-05
So, our model looks like:


\
∆log(stockt )
 = µ = 0.001048


c2
σ t c2 t−1 = 2.4 · 10−6 + 0.660σ
= ω + α1 σ c2 t−1

Recall from tsdisplay(diff(stocks$lStock)) that the returns are

not WN (they might be an AR(6) process). To find the proper
conditional mean model for the returns, we use auto.arima function.

mdl.ar <- auto.arima(diff(stocks$lStock), max.p = 10, max.q = 0)

mdl.ar$coef #AR(7) model is recommended

## ar1 ar2 ar3 ar4

## -0.134997783 0.249189502 -0.095223779 -0.167506460 -0.024943
## ar6 ar7 intercept
## 0.159953621 -0.028619401 0.000983335
We combine it with ARCH(1) to create a AR(7)-ARCH(1) model:

mdl.arch.final <- garchFit(~ arma(7,0) + garch(1,0),

diff(stocks$lStock),
trace = FALSE)
mdl.arch.final@fit$matcoef

## Estimate Std. Error t value Pr(>|t|)

## mu 1.193945e-03 1.730481e-04 6.8994954 5.218714e-12
## ar1 -1.236738e-01 7.070313e-02 -1.7491979 8.025682e-02
## ar2 8.081154e-02 4.427947e-02 1.8250341 6.799588e-02
## ar3 -3.825929e-02 4.558812e-02 -0.8392383 4.013356e-01
## ar4 -1.069443e-01 3.932896e-02 -2.7192253 6.543502e-03
## ar5 7.208729e-03 3.970051e-02 0.1815777 8.559141e-01
## ar6 1.635547e-01 3.580176e-02 4.5683442 4.915924e-06
## ar7 -1.124515e-01 3.388652e-02 -3.3184725 9.051122e-04
## omega 2.045548e-06 3.566767e-07 5.7350195 9.750115e-09
## alpha1 6.503373e-01 1.721740e-01 3.7772104 1.585947e-04
The Generalized ARCH (GARCH) model
Although the ARCH model is simple, it often requires many parameters
to adequately describe the volatility process of an asset return. To reduce
the number of coefficients, an alternative model must be sought.
If an ARMA type model is assumed for the error variance, then a
GARCH(p, q) model should be considered:

rt = µ + t

t = σ t z t

 2 Pq Pp
σt = ω + j=1 αj 2t−j + i=1 βi σt−i
2

A GARCH model can be regarded as an application of the ARMA idea to

the series 2t .
Both ARCH and GARCH are (weak) WN processes with a special
structure of their conditional variance.
Such processes are described by an almost endless family of ARCH
models: ARCH, GARCH, TGARCH, GJR − GARCH, EGARCH,
GARCH − M, AVGARCH, APARCH, NGARCH, NAGARCH, IGARCH
etc.
Volatility Model Building
Building a volatility model consists of the following steps:

1. Specify a mean equation of rt by testing for serial dependence in

the data and, if necessary, build an econometric model (e.g. ARMA
model) to remove any linear dependence.
2. Use the residuals of the mean equation, bet = rt − rbt to test for
ARCH effects.
3. If ARCH effects are found to be significant, one can use the PACF of
et2 to determine the ARCH order (may not be effective when the
b
sample size is small). Specifying the order of a GARCH model is not
easy. Only lower order GARCH models are used in most applications,
say, GARCH(1, 1), GARCH(2, 1), and GARCH(1, 2) models.
4. Specify a volatility model if ARCH effects are statistically significant
and perform a joint estimation of the mean and volatility equations.
5. Check the fitted model carefully and refine it if necessary.
Testing for ARCH Effects
Let t = rt − r̂t be the residuals of the mean equation. Then 2t are used
to check for conditional heteroscedasticity (i.e. the ARCH effects). Two
tests are available:

1. Apply the usual Ljung-Box statistic Q(k) to 2t . The null hypothesis
is that the first k lags of ACF of 2t are zero:
H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0
2. The second test for the conditional heteroscedasticity is the Lagrange
Multiplier (LM) test, which is equivalent to the usual F − statistic
for testing H0 : α1 = ... = αk = 0 in the linear regression:
k
X
2t = α0 + 2t−j + et , t = k + 1, ..., T
j=1
Continuing the stock example (2)
Going through each of the steps:

tsdisplay(diff(stocks$lStock))

The log-returns are autocorrelated. So we need to specify an ARMA

model for the mean equation via auto.arima:

mdl.auto <- auto.arima(diff(stocks$lStock))

rbind(names(mdl.auto$coef)[1:3], names(mdl.auto$coef)[4:6])

## [,1] [,2] [,3]

## [1,] "ar1" "ar2" "ar3"
## [2,] "ma1" "ma2" "intercept"

The output is and ARMA(3,2) model:

rt = µ + φ1 rt−1 + φ2 rt−2 + φ3 rt−3 + t + θ1 t−1 + θ2 t−2

Now, we examine the residuals of this model:
par(mfrow = c(1,3))
forecast::Acf(mdl.auto$residuals)
forecast::Acf(mdl.auto$residuals^2)
forecast::Pacf(mdl.auto$residuals^2)

Series mdl.auto$residuals Series mdl.auto$residuals^2 Series mdl.auto$residuals^2

0.2

0.6

0.6
0.1

0.4

0.4
Partial ACF
ACF

ACF
0.0

0.2

0.2
−0.1

0.0

0.0
−0.2

−0.2

−0.2
5 10 15 20 5 10 15 20 5 10 15 20

Lag Lag Lag

We see that the ACF of the residuals are not autocorrelated, however the
squared residuals are autocorrelated. So, we need to create a volatility
model. Because the first lag of the PACF plot of the squared residuals is
significantly different from zero, we need to specify an ARCH(1) model
for the residuals.
The final model is an ARMA(3, 2) − ARCH(1):

mdl.arch.final <- garchFit(~ arma(3, 2) + garch(1, 0),

diff(stocks$lStock),
trace = FALSE)
mdl.arch.final@fit$matcoef

## Estimate Std. Error t value Pr(>|t|)

## mu 1.980586e-03 3.367634e-04 5.8812393 4.072058e-09
## ar1 -2.743818e-01 1.943599e-01 -1.4117200 1.580324e-01
## ar2 -6.001322e-01 1.365386e-01 -4.3953299 1.106047e-05
## ar3 -1.065850e-01 8.060903e-02 -1.3222460 1.860863e-01
## ma1 1.258717e-01 1.831323e-01 0.6873265 4.918770e-01
## ma2 7.018161e-01 1.486765e-01 4.7204244 2.353530e-06
## omega 2.488709e-06 4.030309e-07 6.1749835 6.617036e-10
## alpha1 6.216022e-01 1.525975e-01 4.0734767 4.631649e-05

mdl.arch.final@fit$ics

## AIC BIC SIC HQIC

## -9.359004 -9.230203 -9.361846 -9.306918
Finally, we check the standardized residuals ŵt = ˆt /σ̂t to check if ŵt
and ŵt2 are WN:
par(mfrow = c(2,2))
stand.res = mdl.arch.final@residuals / mdl.arch.final@sigma.t
forecast::Acf(stand.res); forecast::Pacf(stand.res)
forecast::Acf(stand.res^2); forecast::Pacf(stand.res^2)
Series stand.res Series stand.res
0.2

0.2
Partial ACF
ACF

0.0

0.0
−0.2

−0.2
5 10 15 20 5 10 15 20

Lag Lag

Series stand.res^2 Series stand.res^2

0.2

0.2
Partial ACF
ACF

0.0

0.0
−0.2

−0.2

5 10 15 20 5 10 15 20

Lag Lag

Unfortunately, the residuals ŵt still seem to be autocorrelated. In this

case, more complex models should be considered, like the ones mentioned
in the GARCH model slide … But this may not be necessary!
These tests are performed and provided in the model output:
capture.output(summary(mdl.arch.final))[46:56]

## [1] "Standardised Residuals Tests:"

## [2] " Statistic p-Value "
## [3] " Jarque-Bera Test R Chi^2 2.981865 0.2251626"
## [4] " Shapiro-Wilk Test R W 0.9941911 0.6029121"
## [5] " Ljung-Box Test R Q(10) 14.81308 0.1390265"
## [6] " Ljung-Box Test R Q(15) 17.92572 0.2665907"
## [7] " Ljung-Box Test R Q(20) 21.14201 0.3888168"
## [8] " Ljung-Box Test R^2 Q(10) 5.334754 0.8677243"
## [9] " Ljung-Box Test R^2 Q(15) 8.492303 0.9025344"
## [10] " Ljung-Box Test R^2 Q(20) 12.02647 0.9151619"
## [11] " LM Arch Test R TR^2 8.228338 0.7670416"

We see that Jarque-Bera Test and Shapiro-Wilk Test p-values >

0.05, so we do NOT reject the null hypothesis of normality of the
standardized residuals R. The Ljung-Box Test for the standardized
residuals R and Rˆ2 p-values > 0.05, so the residuals form a WN.
Finally, the LM Arch Test p-value > 0.05 shows that there are no
more ARCH effects in the residuals. So, our estimated model is
correctly specified in the sense that the residual autocorrelation
from the ACF/PACF plots is relatively weak!
To explore the predictions of volatility, we calculate and plot 51
observations from the middle of the data along with the one-step-ahead
c2 t :
predictions of the corresponding volatility σ
d_lstock <- ts(diff(stocks$lStock))
sigma = mdl.arch.final@sigma.t
plot(window(d_lstock, start = 75, end = 125),
ylim = c(-0.02, 0.035), ylab = "diff(stocks$lStock)",
main = "returns and their +- 2sigma confidence region")
lines(window(d_lstock - 2*sigma, start = 75, end = 125),
lty = 2, col = 4)
lines(window(d_lstock + 2*sigma, start = 75, end = 125),
lty = 2, col = 4)
returns and their +− 2sigma confidence region
diff(stocks$lStock)

0.02
0.00
−0.02

80 90 100 110 120

Time
predict(mdl.arch.final, n.ahead = 2, mse ="cond", plot = T)

Prediction with confidence intervals

0.006
0.004
0.002
x

−0.002 0.000

^
Xt+h
^
Xt+h − 1.96 MSE
^
Xt+h + 1.96 MSE

0 10 20 30 40 50

Index

## meanForecast meanError standardDeviation lowerInterval up

## 1 0.0008520921 0.002132817 0.002132817 -0.003328152
## 2 0.0010536363 0.002327369 0.002305715 -0.003507924
Data Sources
A useful R package for downloading financial data directly from open
sources, like Yahoo Finance, Google Finance, etc., is the quantmod
package. Click here for some examples.

suppressPackageStartupMessages({library(quantmod)})
suppressMessages({
getSymbols("GOOG", from = "2007-01-03", to = "2018-01-01")
})
tail(GOOG, 3)

## [1] "GOOG"
## GOOG.Open GOOG.High GOOG.Low GOOG.Close
## 2017-12-27 1057.39 1058.37 1048.05 1049.37
## 2017-12-28 1051.60 1054.75 1044.77 1048.14
## 2017-12-29 1046.72 1049.70 1044.90 1046.40
## GOOG.Volume GOOG.Adjusted
## 2017-12-27 1271900 1049.37
## 2017-12-28 837100 1048.14
## 2017-12-29 887500 1046.40
Time plots of daily closing price and trading volume of Google from the
last 365 trading days:
chartSeries(tail(GOOG, 365), theme = "white", name = "GOOG")

GOOG [2016−07−21/2017−12−29]
Last 1046.400024
1050

1000

950

900

850

800

750

50 Volume (100,000s):
887,500
40

Jul 21 2016 Oct 03 2016 Jan 03 2017 Apr 03 2017 Jul 03 2017 Oct 02 2017 Dec 29 2017
GOOG.rtn = diff(log(GOOG[, "GOOG.Adjusted"]))
chartSeries(GOOG.rtn, theme = "white",
name = "Daily log return data of GOOGLE stocks")

Daily log return data of GOOGLE stocks [2007−01−04/2017−12−29]

Last −0.00166145441754484

0.15

0.10

0.05

0.00

−0.05

−0.10

Jan 04 2007 Jul 01 2008 Jan 04 2010 Jul 01 2011 Jan 02 2013 Jul 01 2014 Jan 04 2016 Jul 03 2017
Example of getting non-financial data. Unemployment rates from FRED:
getSymbols("UNRATE", src = "FRED")

## [1] "UNRATE"
chartSeries(UNRATE, theme = "white", up.col = 'black')
UNRATE [1948−01−01/2018−01−01]
Last 4.1

Jan 1948 Jan 1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan 2010 Jan 2018
Summary of Volatility Modelling (1)

Quite often, the process we want to investigate for the ARCH effects is
stationary but not WN.

I Let t be a weak WN(0, σ 2 ) and consider the model Yt = r + t , or

Yt = β0 + β1 Xt + t or Yt = α + φYt−1 + t or similar.
I Test whether the WN shocks t make an ARCH process: plot the
graph of et2 ( = ˆ2t ) - if t is an ARCH process, this graph must
show a clustering property.
I Further test whether the shocks t form an ARCH process: test
them for normality (the hypothesis must be rejected) (e.g. using
Shapiro-Wilk test of normality).
I Further test whether the shocks t form an ARCH process: draw the
correlogram of et - the correlogram must indicate WN, but that of
et2 must not (it should be similar to the correlogram of an AR(p)
process).
Summary of Volatility Modelling (2)

I To formally test whether the shocks t form ARCH(q), test the null
hypothesisPH0 : α1 = ... = αq = 0 (i.e. no ARCH in
q
σt2 = ω + j=1 αj 2t−j ):
1. Choose the proper AR(q) model of the auxiliary regression
et2 = α + α1 et−1
2 2
+ ... + αq et−1 + wt (proper means minimum AIC
and WN residuals wt );
2. To test H0 , use the F − test (or the LM test).
I Instead of using ARCH(q) with a high order q, an often more
parsimonious description of t is usually given by GARCH(1,1) (or
a similar lower order GARCH process);
I In order to show that the selected ARCH(q) or GARCH(1,1) model
is ‘good’, test whether the residuals ŵt = ˆt /σ̂t and ŵt2 make WN
(as they are expected to).

Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
49 pages
CH 6 2023 Eonometrics For Acct and Finance
No ratings yet
CH 6 2023 Eonometrics For Acct and Finance
9 pages
Chapter 2 - Lecture Slides
No ratings yet
Chapter 2 - Lecture Slides
74 pages
Lectures 2-3 Notes Final20180308013455
No ratings yet
Lectures 2-3 Notes Final20180308013455
21 pages
Stationarity & Autocorrelation Guide
No ratings yet
Stationarity & Autocorrelation Guide
20 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
Sta 445 1 Stationarity and Non-Stationarity
No ratings yet
Sta 445 1 Stationarity and Non-Stationarity
15 pages
L4 Modeling Cycles
No ratings yet
L4 Modeling Cycles
80 pages
Time Series Basics for Statisticians
No ratings yet
Time Series Basics for Statisticians
6 pages
Stationary ARMA Processes Guide
No ratings yet
Stationary ARMA Processes Guide
14 pages
Ch6 Slides Ed3 Feb2021
No ratings yet
Ch6 Slides Ed3 Feb2021
63 pages
Hu - Time Series Analysis
No ratings yet
Hu - Time Series Analysis
149 pages
Ch6 Slides Ed3 Feb2024
No ratings yet
Ch6 Slides Ed3 Feb2024
31 pages
Univariate Time Series Analysis Guide
No ratings yet
Univariate Time Series Analysis Guide
75 pages
Econometrics Chapter 1 UNAV
No ratings yet
Econometrics Chapter 1 UNAV
38 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
35 pages
Day8 Session3 Time-Series Econometrics
No ratings yet
Day8 Session3 Time-Series Econometrics
33 pages
Spring 2012 Statistics 153 Lecture Five
No ratings yet
Spring 2012 Statistics 153 Lecture Five
6 pages
Time Series Analysis for Students
No ratings yet
Time Series Analysis for Students
24 pages
Univariate Time Series Analysis
No ratings yet
Univariate Time Series Analysis
20 pages
Chapter 1. Basic Concepts in Time Series Analysis
No ratings yet
Chapter 1. Basic Concepts in Time Series Analysis
43 pages
TSNotes 2
No ratings yet
TSNotes 2
28 pages
LN LinearTSModels
No ratings yet
LN LinearTSModels
31 pages
Notes LinearTimeSeries
No ratings yet
Notes LinearTimeSeries
12 pages
Econometrics Year 3 Eco
No ratings yet
Econometrics Year 3 Eco
185 pages
Chapter 4. ARIMA - SV
No ratings yet
Chapter 4. ARIMA - SV
49 pages
Week 7 Notes
No ratings yet
Week 7 Notes
111 pages
Intro of Time Series
No ratings yet
Intro of Time Series
18 pages
STA572 - 570 - Lecture - Notes Chapter 4
No ratings yet
STA572 - 570 - Lecture - Notes Chapter 4
63 pages
Financial Volatility Analysis
No ratings yet
Financial Volatility Analysis
57 pages
TSA Chapter 2
No ratings yet
TSA Chapter 2
3 pages
Slides
No ratings yet
Slides
31 pages
Box Jenkins Methodology
100% (1)
Box Jenkins Methodology
29 pages
Econometrics 2 Notes
No ratings yet
Econometrics 2 Notes
12 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
7 pages
Lecture 3 WN
No ratings yet
Lecture 3 WN
34 pages
Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004
No ratings yet
Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004
52 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
Linear Stationary Models
No ratings yet
Linear Stationary Models
16 pages
STA222
No ratings yet
STA222
6 pages
Time Series
No ratings yet
Time Series
32 pages
HSTS203 Time Series
No ratings yet
HSTS203 Time Series
22 pages
Topic 2 - Time Series Analysis - Part-1 - Shared
No ratings yet
Topic 2 - Time Series Analysis - Part-1 - Shared
21 pages
TS-Summary and Themes1&2
No ratings yet
TS-Summary and Themes1&2
40 pages
Time Series Exam, 2010: Solutions
No ratings yet
Time Series Exam, 2010: Solutions
4 pages
Time Series
No ratings yet
Time Series
13 pages
Time Series Regression Basics
No ratings yet
Time Series Regression Basics
10 pages
1 Time Series and Stationarity: Example Class 8
No ratings yet
1 Time Series and Stationarity: Example Class 8
2 pages
TS PartII
100% (1)
TS PartII
50 pages
Stationary Process
No ratings yet
Stationary Process
178 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
Understanding Stationary Processes
No ratings yet
Understanding Stationary Processes
18 pages
Econometrics for Grad Students
No ratings yet
Econometrics for Grad Students
130 pages
Introduction to Time Series Regression
No ratings yet
Introduction to Time Series Regression
13 pages
STA572 570 Lecture Notes Chapter 4
No ratings yet
STA572 570 Lecture Notes Chapter 4
63 pages
Time Series Gujrati
No ratings yet
Time Series Gujrati
45 pages
Station A Rity
No ratings yet
Station A Rity
18 pages
ARMA Models: Introduction & History
No ratings yet
ARMA Models: Introduction & History
23 pages
ISA Transactions: Mahdi Kazemi, Mohammad Mehdi Are Fi
No ratings yet
ISA Transactions: Mahdi Kazemi, Mohammad Mehdi Are Fi
7 pages
Half Life Tsay Notes
No ratings yet
Half Life Tsay Notes
25 pages
ARIMA Models for Data Scientists
No ratings yet
ARIMA Models for Data Scientists
38 pages
Nonlinear Time Series Models Guide
No ratings yet
Nonlinear Time Series Models Guide
79 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
Time+Series+Forecasting Monograph
100% (4)
Time+Series+Forecasting Monograph
58 pages
Econ275 (Stanford) PDF
No ratings yet
Econ275 (Stanford) PDF
4 pages
Long Memory in Exchange Rates Analysis
No ratings yet
Long Memory in Exchange Rates Analysis
10 pages
RATS Handbook: Vector Autoregressions
No ratings yet
RATS Handbook: Vector Autoregressions
171 pages
Stationarity & AR, MA, ARIMA, SARIMA
100% (1)
Stationarity & AR, MA, ARIMA, SARIMA
6 pages
Modelling of Wastewater Systems
No ratings yet
Modelling of Wastewater Systems
186 pages
Yousef Time Series Analysis in Python 2020
100% (1)
Yousef Time Series Analysis in Python 2020
835 pages
10 5923 J Statistics 20150505 08
No ratings yet
10 5923 J Statistics 20150505 08
11 pages
Da Unit-4
No ratings yet
Da Unit-4
43 pages
SAS Time Series Analysis Guide
No ratings yet
SAS Time Series Analysis Guide
34 pages
Turkeys Monthly Demand Seasonal ANN (2019)
No ratings yet
Turkeys Monthly Demand Seasonal ANN (2019)
15 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
124 pages
Predictingvolatility Lazardresearch
No ratings yet
Predictingvolatility Lazardresearch
9 pages
NITK IT Undergraduate Courses
No ratings yet
NITK IT Undergraduate Courses
17 pages
Climate Impact on Sorghum Yield
No ratings yet
Climate Impact on Sorghum Yield
70 pages
Higher Education Policy Analysis Using Quantitative Techniques
100% (1)
Higher Education Policy Analysis Using Quantitative Techniques
249 pages
Subset ARMA Selection Via The Adaptive Lasso: Kun Chen and Kung-Sik Chan
No ratings yet
Subset ARMA Selection Via The Adaptive Lasso: Kun Chen and Kung-Sik Chan
9 pages
Battery Voltage Forecasting with VARMA
No ratings yet
Battery Voltage Forecasting with VARMA
50 pages
Arnold Zellner - Statistics, Econometrics & Forecasting PDF
No ratings yet
Arnold Zellner - Statistics, Econometrics & Forecasting PDF
186 pages
ARIMA Models: X = X + Z, ∼ W N (0, σ)
No ratings yet
ARIMA Models: X = X + Z, ∼ W N (0, σ)
9 pages
Arima Slide Share
No ratings yet
Arima Slide Share
65 pages
Chapter5 Solutions
100% (1)
Chapter5 Solutions
12 pages
Paga Deepa PDF
No ratings yet
Paga Deepa PDF
243 pages
ARIMA Procedure Ebook
No ratings yet
ARIMA Procedure Ebook
110 pages

Understanding Stationary Time Series

Uploaded by

Understanding Stationary Time Series

Uploaded by

02 Stationary time series

Andrius Buteikis, andrius.buteikis@mif.vu.lt

I Stationary process - a random process with a constant mean,

WN, mean = 0 MA(3), mean = 5 AR(1), mean = 5

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time Time Time

The three example processes fluctuate around their constant mean

0 50 100 150 200

0 100 200 300 400

I Yt = t + t , where t ∼ N (0, 1);

The reasons for their non-stationarity are as follows:

Index Index Index

I ACF - Autocorrelation function

Lag Lag Lag

Lag Lag Lag

The 95% confidence intervals are calculated from:

qnorm(p = c(0.025, 0.975))/sqrt(n)

(more details on the confidence interval calculation are provided later in

If we want to forecast a series - at a minimum we would like its mean

1. The mean of the series is stable over time: EYt = µ;

In general, the (auto)covariance between Yt and Yt−τ is:

γ(t, τ ) = cov (Yt , Yt−τ ) = E(Yt − µ)(Yt−τ − µ)

If the covariance structure is stable, then the covariance depends on τ

I If EYt = µ - the process is called mean-stationary;

In other words, a time series Yt is stationary if its mean, variance and

cov (Yt , Yt−τ ) γ(τ )

Note: ρ(0) = 1, |ρ(τ )| ≤ 1.

p(k) = βk , where Yt = α + β1 Yt−1 + ... + βk Yt−k + t

In general, the critical value of a standard normal distribution and its

For example, if Q = 0.95, then α = 0.05. Then, the standard normal

1. EYt = E(t + β1 t−1 ) = 0 + β1 · 0 = 0;

None of these characteristics depend on t, which means that the process

A well known operator - the first-difference operator ∆ - is a first-order

Wold’s Representation Theorem

I If β1 = β2 = ... = 0 - this corresponds to a WN process. This shows

Suppose we have a sample data of a stationary time series but we do not

Application of the analog principle yields a natural estimator of ρ(τ ):

H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0

Under the null hypothesis the Ljung-Box statistic:

is approximately distributed as a χ2K random variable.

We will illustrate the provided ideas by examining quarterly Canadian

1960 1965 1970 1975 1980 1985 1990 1995

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0.

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0, ρ(2) = 0,

Moving-Average (MA) Models

Yt = t + θt−1 = (1 − θL)t , −∞ < θ < ∞,  ∼ WN(0, σ 2 )

Defining characteristics of an MA process: the current value of the

I E(Yt ) = E(t ) + θE(t−1 ) = 0;

Key feature of MA(1): (sample) ACF has a sharp cutoff beyond τ = 1.

and we can express Yt as an infinite AR process:

Yt = θYt−1 − θ2 Yt−2 + θ3 Yt−3 − ... + t

Remembering the definition of a PACF we have that for an MA(1)

I If θ < 0, then the pattern of decay will be one-sided

MA(1) with θ = 0.5 MA(1) with θ = 0.5

0.0 0.1 0.2 0.3

MA(1) with θ = −0.5 MA(1) with θ = −0.5

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θ < ∞,  ∼ WN(0, σ 2 )

I The finite-order MA(q) process is covariance stationary for any value

MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35

ACF is cut off at τ = 3 and PACF decays gradually.

The first-order autoregressive or AR(1) process is:

Note the special interpretation of the errors, or disturbances, or shocks t

AR(1) is stationary is |φ| < 1

Equivalently, the condition for covariance stationarity is that the root, z1 ,

I E(Yt ) = E(t + φt−1 + φ2 t−2 + ...) = 0;

I φτ , τ = 0, 1, 2... - ACF decays exponentially;

AR(1) with φ = 0.85

AR(1) with φ = 0.85

AR(p) is stationary if all the roots |zi | > 1

For a quick check of stationarity, use the following rule:

AR(2) with φ1 = 1.5, φ2 = −0.9, AR(2) with φ1 = 1.5, φ2 = −0.9,

The corresponding lag operator polynomial is 1 − 1.5L + 0.9L2 with two

depends on p parameters only. This gives us the infinite process from

which is known asPthe infinite moving-average process, MA(∞). Because

In some cases the AR form of a stationary process is preferred to that of

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θi < ∞, t ∼ WN(0, σ 2 )

I Yt = t + t , where t ∼ N (0, 1);

p(k) = βk , where Yt = α + β1 Yt−1 + ... + βk Yt−k + t

1. EYt = E(t + β1 t−1 ) = 0 + β1 · 0 = 0;

Yt = t + θt−1 = (1 − θL)t , −∞ < θ < ∞, ∼ WN(0, σ 2 )

I E(Yt ) = E(t ) + θE(t−1 ) = 0;

Yt = θYt−1 − θ2 Yt−2 + θ3 Yt−3 − ... + t

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θ < ∞, ∼ WN(0, σ 2 )

Note the special interpretation of the errors, or disturbances, or shocks t

I E(Yt ) = E(t + φt−1 + φ2 t−2 + ...) = 0;

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θi < ∞, t ∼ WN(0, σ 2 )

Yt = φ1 Yt−1 + ... + φp Yt−p + t + θq t−1 + ... + θq t−q , t ∼ WN(0, σ 2 )

Yt = t + θ1 t−1 ⇒ t = Yt − θYt−1 + ...

Yt = φYt−1 + t + θt−1 , t ∼ WN(0, σ 2 )

Ỹt is still not stationary, however its differences ∆Ỹt = r + t are

If t is a strong WN then so is 2t or any other function of t .

I conditional mean E(t |Ωt−1 ) = 0;