Understanding Stationary Time Series
Understanding Stationary Time Series
8
6
1
6
5
x1
x2
x3
0
4
4
−1
2
−2
8
6 AR(1), mean = 5
x3
4
2
Time
AR(1), mean = 5
8
6
x3
4
2
Time
We can see that the fluctuations are indeed around a constant mean and
the variance does not appear to change throughout the period.
Some non-stationary time series examples:
I The first time series is not stationary because its mean is not
constant: EYt = t - depends on t;
I The second time series is not stationary because its variance is not
constant: Var (Yt ) = t 2 · σ 2 - depends on t.
However, EYt = 0 · t = 0 is constant;
I The third
Pttime series is not stationary because even though
EYt = j=1 (0.5 + (−0.5)) = 0, the variance
Var (Yt ) = E(Yt2 ) − (E(Yt ))2 = E(Yt2 ) = t where:
Pt
E(Yt2 ) = E(Zj2 ) + 2 E(Zj Zk ) = t · (0.5 · 1 + 0.5 · (−1)2 ) = t
P
j=1 j6=k
The sample data graphs are provided below:
non stationary in mean non stationary in variance no clear tendency
50
4
100
3
40
2
50
30
1
ns1
ns2
ns3
0
20
−1
10
−2
−50
−3
0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
How can we know which of the previous three stationary graphs are not
WN? Two functions help us determine this:
If all the bars (except the 0th in the ACF) are within the blue band - the
stationary process is WN.
WN MA(3) AR(1)
1.0
0.8
0.8
0.6
ACF
ACF
ACF
0.4
0.4
0.2
0.0
0.0
−0.2
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 25
WN MA(3) AR(1)
0.10
0.6
0.3
Partial ACF
Partial ACF
Partial ACF
0.4
0.00
0.1
0.2
−0.10
0.0
−0.1
5 10 15 20 5 10 15 20 0 5 10 15 20 25
Series x0 Series x0
0.3
1.0
0.8
0.2
0.6
0.1
Partial ACF
0.4
ACF
0.0
0.2
−0.1
0.0
−0.2
−0.2
0 5 10 15 5 10 15
Lag Lag
Covariance-Stationary Time Series
I In cross-sectional data different observations were assumed to be
uncorrelated;
I In time series we require that there be some dynamics, some
persistence, some way in which the present is linked to the past and
the future - to the present. Having historical data then would allow
us to forecast the future.
1.96
∆=0± √
T
1−Q
I Compute α = , where Q is the confidence level;
2
I To express the critical value as a z − score, find the z1−α value.
I E(t ) = 0;
I Var (t ) = σ 2 < ∞;
I γ(t, τ ) = E(t − Et )(t−τ − Et−τ ) = E(t t−τ ), where:
(
0, 6 0
if τ =
E(t t−τ ) =
σ 2 , if τ = 0
Example on how to check if a process is stationary.
Let us check if Yt = t + β1 t−1 , where t ∼ WN(0, σ 2 ) is stationary:
P∞
where β0 = 1 and j=0 βj2 < ∞. On the other hand, any process of the
above form is stationary.
E (Yt − µ) (Yt−τ − µ)
ρ(τ ) = 2
E (Yt − µ)
1 PT
t=1 Yt − Ȳ Yt−τ − Ȳ
ρ̂(τ ) = T
1 PT 2
t=1 Yt − Ȳ
T
This estimator is called the sample autocorrelation function (sample
ACF).
It is often of interest to assess whether a series is reasonably
approximated as white noise, i.e. whether all of its autocorrelations are
zero in population.
If a series is white noise, then the sample autocorrelations ρ̂(τ ), √
τ = 1, ..., K in large samples are independent and have the N (0, 1/ T )
distribution.
Thus, if the series is WN, ~95%
√ of the sample autocorrelations should
fall in the interval of ±1.96/ T .
Exactly the same holds for both sample ACF and sample PACF. We
typically plot the sample ACF and sample PACF along with their error
bands.
The aforementioned error bands provide 95% confidence bounds for only
the sample autocorrelation taken one at a time.
We are often interested in whether a series is white noise, i.e. whether all
its autocorrelations are jointly zero. Because of the sample size, we can
only take a finite number of autocorrelations. We want to test:
suppressPackageStartupMessages({require("forecast")})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "caemp.txt"
caemp <- read.csv(url(paste0(txt1, txt2)),
header = TRUE, as.is = TRUE)
caemp <- ts(caemp, start = c(1960, 1), freq = 4)
tsdisplay(caemp)
caemp
105
95
90
85
1.0
0.8
0.8
0.6
0.6
0.4
0.4
PACF
ACF
0.2
0.2
−0.2 0.0
−0.2 0.0
5 10 15 20 5 10 15 20
Lag Lag
I The sample ACF are large and display a slow one-sided decay;
I The sample PACF are large at first, but are statistically negligible
beyond displacement τ = 2.
We shall once again test the WN hypothesis, this time using the
Ljung-Box test statistic.
Box.test(caemp, lag = 1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: caemp
## X-squared = 127.73, df = 1, p-value < 2.2e-16
##
## Box-Ljung test
##
## data: caemp
## X-squared = 240.45, df = 2, p-value < 2.2e-16
t = (1 − θL + θ2 L2 − θ3 L3 + ...)Yt
= Yt − θYt−1 + θ2 Yt−2 − θ3 Yt−3 + ...
Partial ACF
0.6
ACF
0.4
0.2
0.0
−0.2
0 1 2 3 4 5 1 2 3 4 5
Lag Lag
0.1
0.8
0.0
Partial ACF
0.4
ACF
−0.2
0.0
−0.4
−0.4
0 1 2 3 4 5 1 2 3 4 5
Lag Lag
The MA(q) Process
We will now consider a general finite-order moving average process of
order q, MA(q):
where
Θ(L) = 1 + θ1 L + ... + θq Lq
is the qth-order lag polynomial. The MA(q) process is a generalization of
the MA(1) process. Compared to MA(1), MA(q) can capture richer
dynamic patterns which can be used for improved forecasting.
The properties of an MA(q) processes are parallel to those of an MA(1)
process in all respects:
0.4
0.0
0 2 4 6 8
Lag
0.2
0.0
−0.4
2 4 6 8
Lag
Yt = φYt−1 + t , t ∼ WN(0, σ 2 )
or:
1
(1 − φL)Yt = t ⇒ Yt = t
1 − φL
Or, alternatively: when |φ| < 1 - the process is stationary, i.e. EYt = m,
therefore EYt = φEYt−1 + Et ⇒ m = φm + 0 ⇒ m = 0.
This allows us to easily estimate the mean of the generalized AR(1)
process: if Yt = α + φYt−1 + t , then m = α/(1 − φ).
The correlogram (ACF & PACF) of AR(1) is in a sense symmetric to that
of MA(1):
0.4
0.0
0 1 2 3 4 5
Lag
0.4
0.0
1 2 3 4 5
Lag
The AR(p) Process
The general pth order autoregressive process, AR(p) is:
Yt = φ1 Yt−1 + φ2 Yt−2 + ... + φp Yt−p + t , t ∼ WN(0, σ 2 )
In lag operator form, we write:
Φ(L)Yt = (1 − φ1 L − φ2 L2 − ... − φp Lp )Yt = t
Similar to the AR(1) case, the AR(p) process is covariance stationary
if and only if all the roots zi of the autoregressive lag operator polynomial
Φ(z) are outside the complex unit circle:
1 − φ1 z − φ2 z 2 − ... − φp z p = 0 ⇒ |zi | > 1
So:
1
Yt = t
Φ(L)
I The ACF for the general AR(p) process decays gradually when the
lag increases;
I The PACF for the general AR(p) process has a sharp cutoff at
displacement p.
An example on how the sample ACF and PACF would look like of AR(2)
process Yt = 1.5Yt−1 − 0.9Yt−2 + T :
0.5
Partial ACF
0.5
ACF
−0.5
−0.5
0 5 10 15 20 5 10 15 20
Lag Lag
Because the roots are complex, the ACF oscillates and because the roots
are close to the unit circle, the oscillation damps slowly.
Stationarity and Invertibility
The AR(p) is a generalization of the AR(1) strategy for approximating
the Wold representation. The moving-average representation associated
with the stationary AR(p) process:
∞
1 1 X
Yt = t where = ψj Lj , ψ0 = 1
Φ(L) Φ(L) j=0
∞
X
Yt = ψj t−j
j=0
∞
1 1 X
t = Yt , where = πj Lj , π0 = 1
Θ(L) Θ(L) j=0
∞
X ∞
X
t = πj Yt−j = Yt + πj Yt−j
j=0 j=1
or:
Φ(L)Yt = Θ(L)t
I If all the roots of Φ(L) are outside the unit circle, then the process is
stationary and has a convergent infinite moving average
representation: Yt = (Φ(L)/Θ(L)) t ;
I If all roots of Θ(L) are outside the unit circle, then the process is
invertible and can be expressed as the convergent infinite
autoregression: (Φ(L)/Θ(L)) Yt = t .
An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + t + 0.5t−1 :
0.8
ARMA(1,1) with φ = 0.85, θ = 0.5,
ACF
0.4
0.0
0 5 10 15 20
Lag
0.4
0.0
−0.4
5 10 15 20
Lag
ARMA models are often both highly accurate and highly parsimonious.
In a particular situation, for example, it might take an AR(5) model to
get the same approximation accuracy as could be obtained with an
ARMA(1,1), but the AR(5) has five parameters to be estimated, whereas
the ARMA(1,1) has only two.
Yt = φ1 Yt−1 + t
γ(1) = φγ(0)
Yt = µ + t + θt−1 , t ∼ WN(0, σ 2 )
We have:
YT +1 = µ + T +1 + θT ⇒ YT +1|T = µ + 0 + θT
YT +2 = µ + T +2 + θT +1 ⇒ YT +2|T = µ + 0 + 0
...
YT +h|T = µ
The forecast quickly approaches the (sample) mean of the process and
starting at h = q + 1 - coincides with it. When h increases, the accuracy
of the forecast diminishes up to the moment h = q + 1, whereupon it
becomes constant.
An example of an MA(1) process: Yt = t + 0.5t−1 :
0 20 40 60 80 100 120
Forecasting AR(p) process
Yt = φYt−1 + t , t ∼ WN(0, σ 2 )
We have:
YT +1 = φYT + T +1 ⇒ YT +1|T = φYT + 0
YT +2 = φYT +1 + T +2 ⇒ YT +2|T = φYT +1 + 0 = φ2 YT
...
YT +h|T = φh YT
The forecast tends to the (sample) mean exponentially fast, but never
reaches it. When h increases, the accuracy of the forecast diminishes but
never reaches the limit.
An example of an AR(1) process: Yt = 0.85Yt−1 + t :
0 20 40 60 80 100 120
Forecasting ARMA(p,q) process
We have:
YT +1 = φYT + T +1 + θT ⇒ YT +1|T = φYT + 0 + θT
YT +2 = φYT +1 + T +2 + θT +1 ⇒ YT +2|T = φYT +1 + 0 + 0 = φ2 YT + φθT
...
YT +h|T = φh YT + φh−1 θt
0 20 40 60 80 100 120
6
4
2
0 10 20 30 40 50
Time
6
4
2
0 10 20 30 40 50
Time
log(Yt)
2.5
2.0
1.5
log(Y)
1.0
0.5
0.0
0 10 20 30 40 50
Time
0 10 20 30 40 50
0.4
0.4
PACF
ACF
0.0
0.0
−0.4
−0.4
5 10 15 5 10 15
Lag Lag
set.seed(346)
n = 1000
alpha = c(1, 0.5)
epsilon = rnorm(mean = 0, sd = 1, n = n)
R.t = NULL
R.t[1] = sqrt(alpha[1]) * epsilon[1]
for(j in 2:n){
R.t[j] = sqrt(alpha[1] + alpha[2] * R.t[j-1]^2) * epsilon[j]
}
forecast::tsdisplay(R.t)
R.t
5
0
−5
0.10
0.05
0.05
PACF
ACF
0.00
0.00
−0.05
−0.05
−0.10
−0.10
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
and perform the Ljung-Box test
## [1] 0.9082987
## [1] 0.3846643
## [1] 0.4572007
We see that for all cases p-value > 0.05, so we do not reject the null
hypothesis that the autocorrelations are zero. The series appears to be
WN.
But we know that this is not the case from the data generation code.
If we check the ACF and PACF of the squared log-returns, Rt2 :
forecast::tsdisplay(R.t^2)
R.t^2
50
30
0 10
0.4
0.3
0.3
0.2
0.2
PACF
ACF
0.1
0.1
−0.1 0.0
−0.1 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
##
## Box-Ljung test
##
## data: R.t^2
## X-squared = 174.37, df = 10, p-value < 2.2e-16
we do not reject the null hypothesis that the squared log-returns are
autocorrelated.
In comparison, for a simple t ∼ WN(0, 1) process:
set.seed(123)
epsilon = rnorm(mean = 0, sd = 1, n = 5000)
The t process is not serially correlated:
par(mfrow = c(1, 2))
forecast::Acf(epsilon, lag.max = 20)
forecast::Pacf(epsilon, lag.max = 20)
Series epsilon Series epsilon
0.00 0.02 0.04
−0.04
−0.04
5 10 15 20 5 10 15 20
Lag Lag
## [1] 0.872063
The 2t process is also not serially correlated:
par(mfrow = c(1, 2))
forecast::Acf(epsilon^2, lag.max = 20)
forecast::Pacf(epsilon^2, lag.max = 20)
Series epsilon^2 Series epsilon^2
0.00 0.02 0.04
−0.04
−0.04
5 10 15 20 5 10 15 20
Lag Lag
## [1] 0.7639204
3.30
3.20
Time
The differences do not pass WN checks:
tsdisplay(diff(stocks$lStock))
diff(stocks$lStock)
0.015
−0.010
0.2
PACF
ACF
0.0
0.0
−0.2
−0.2
5 10 15 20 5 10 15 20
Lag Lag
## [1] 3.014097e-05
The basic idea behind volatility study is that the series is serially
uncorrelated, but it is a dependent series.
Let us calculate the volatility as ût2 from ∆log(Yt ) = α + ut
0.015
−0.010
0.00025
u2
0.00000
Time
Note the small volatility in stable times and large volatility in fluctuating
return periods.
We have learned that the AR process is able to model persistency, which,
in our case, may be called clustering of volatility. Consider an
AR(1) model of volatility (for this example we assume ut2 is WN):
ut2 = α + φut−1
2
+ wt , wt ∼ WN
library(forecast)
u2.mdl <- Arima(u2, order = c(1, 0, 0), include.mean = TRUE)
coef(u2.mdl)
## ar1 intercept
## 7.335022e-01 9.187829e-06
## [1] 2.448536e-06
The resulting model:
E(X + α) = µ + α
Var (β · X + α) = β 2 · σ 2
we can modify the random variables to have different mean and variance:
t ∼ N (0, 1) ⇒ (β · t + µ) ∼ N (µ, β 2 · 1)
where:
An ARCH process is stationary. If the returns are not centered, then the
first equation is rt = µ + t .
ARCH(q):
The ARCH process can also be generalized:
rt
= µ + t
t = σ t zt
2
σt = ω + α1 2t−1 + ... + αq 2t−q
AR(P) − ARCH(q):
It may also be possible that the returns rt themselves are autocorrelated:
rt
= µ + φ1 rt−1 + ... + φp rt−P + t
t = σ t zt
2
σt = ω + α1 2t−1 + ... + αq 2t−q
Continuing the stock example (1)
Recall that our ‘naive’ log stock return data volatility model was:
2
Because the coefficient of ut−1 was significant - it could indicate that ut2
is probably an ARCH(1) process.
suppressPackageStartupMessages({library(fGarch)})
mdl.arch <- garchFit(~ garch(1,0), diff(stocks$lStock),
trace = FALSE)
mdl.arch@fit$matcoef
\
∆log(stockt )
= µ = 0.001048
c2
σ t c2 t−1 = 2.4 · 10−6 + 0.660σ
= ω + α1 σ c2 t−1
1. Apply the usual Ljung-Box statistic Q(k) to 2t . The null hypothesis
is that the first k lags of ACF of 2t are zero:
H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0
2. The second test for the conditional heteroscedasticity is the Lagrange
Multiplier (LM) test, which is equivalent to the usual F − statistic
for testing H0 : α1 = ... = αk = 0 in the linear regression:
k
X
2t = α0 + 2t−j + et , t = k + 1, ..., T
j=1
Continuing the stock example (2)
Going through each of the steps:
tsdisplay(diff(stocks$lStock))
0.6
0.6
0.1
0.4
0.4
Partial ACF
ACF
ACF
0.0
0.2
0.2
−0.1
0.0
0.0
−0.2
−0.2
−0.2
5 10 15 20 5 10 15 20 5 10 15 20
We see that the ACF of the residuals are not autocorrelated, however the
squared residuals are autocorrelated. So, we need to create a volatility
model. Because the first lag of the PACF plot of the squared residuals is
significantly different from zero, we need to specify an ARCH(1) model
for the residuals.
The final model is an ARMA(3, 2) − ARCH(1):
mdl.arch.final@fit$ics
0.2
Partial ACF
ACF
0.0
0.0
−0.2
−0.2
5 10 15 20 5 10 15 20
Lag Lag
0.2
Partial ACF
ACF
0.0
0.0
−0.2
−0.2
5 10 15 20 5 10 15 20
Lag Lag
0.02
0.00
−0.02
Time
predict(mdl.arch.final, n.ahead = 2, mse ="cond", plot = T)
−0.002 0.000
^
Xt+h
^
Xt+h − 1.96 MSE
^
Xt+h + 1.96 MSE
0 10 20 30 40 50
Index
suppressPackageStartupMessages({library(quantmod)})
suppressMessages({
getSymbols("GOOG", from = "2007-01-03", to = "2018-01-01")
})
tail(GOOG, 3)
## [1] "GOOG"
## GOOG.Open GOOG.High GOOG.Low GOOG.Close
## 2017-12-27 1057.39 1058.37 1048.05 1049.37
## 2017-12-28 1051.60 1054.75 1044.77 1048.14
## 2017-12-29 1046.72 1049.70 1044.90 1046.40
## GOOG.Volume GOOG.Adjusted
## 2017-12-27 1271900 1049.37
## 2017-12-28 837100 1048.14
## 2017-12-29 887500 1046.40
Time plots of daily closing price and trading volume of Google from the
last 365 trading days:
chartSeries(tail(GOOG, 365), theme = "white", name = "GOOG")
GOOG [2016−07−21/2017−12−29]
Last 1046.400024
1050
1000
950
900
850
800
750
50 Volume (100,000s):
887,500
40
30
20
10
Jul 21 2016 Oct 03 2016 Jan 03 2017 Apr 03 2017 Jul 03 2017 Oct 02 2017 Dec 29 2017
GOOG.rtn = diff(log(GOOG[, "GOOG.Adjusted"]))
chartSeries(GOOG.rtn, theme = "white",
name = "Daily log return data of GOOGLE stocks")
0.15
0.10
0.05
0.00
−0.05
−0.10
Jan 04 2007 Jul 01 2008 Jan 04 2010 Jul 01 2011 Jan 02 2013 Jul 01 2014 Jan 04 2016 Jul 03 2017
Example of getting non-financial data. Unemployment rates from FRED:
getSymbols("UNRATE", src = "FRED")
## [1] "UNRATE"
chartSeries(UNRATE, theme = "white", up.col = 'black')
UNRATE [1948−01−01/2018−01−01]
Last 4.1
10
Jan 1948 Jan 1960 Jan 1970 Jan 1980 Jan 1990 Jan 2000 Jan 2010 Jan 2018
Summary of Volatility Modelling (1)
Quite often, the process we want to investigate for the ARCH effects is
stationary but not WN.
I To formally test whether the shocks t form ARCH(q), test the null
hypothesisPH0 : α1 = ... = αq = 0 (i.e. no ARCH in
q
σt2 = ω + j=1 αj 2t−j ):
1. Choose the proper AR(q) model of the auxiliary regression
et2 = α + α1 et−1
2 2
+ ... + αq et−1 + wt (proper means minimum AIC
and WN residuals wt );
2. To test H0 , use the F − test (or the LM test).
I Instead of using ARCH(q) with a high order q, an often more
parsimonious description of t is usually given by GARCH(1,1) (or
a similar lower order GARCH process);
I In order to show that the selected ARCH(q) or GARCH(1,1) model
is ‘good’, test whether the residuals ŵt = ˆt /σ̂t and ŵt2 make WN
(as they are expected to).