KEMBAR78
Week 11 - Deep Temporal Models | PDF | Autoregressive Integrated Moving Average | Autoregressive Model
0% found this document useful (0 votes)
15 views84 pages

Week 11 - Deep Temporal Models

The document discusses Deep Temporal Models and their applications in time series analysis, highlighting the importance of forecasting, anomaly detection, and classification. It covers various modeling techniques, including ARIMA models, and emphasizes the need to address trends, seasonality, and heteroskedasticity in time series data. The document also presents examples of time series data and outlines the objectives and properties of time series models.

Uploaded by

Deepak Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views84 pages

Week 11 - Deep Temporal Models

The document discusses Deep Temporal Models and their applications in time series analysis, highlighting the importance of forecasting, anomaly detection, and classification. It covers various modeling techniques, including ARIMA models, and emphasizes the need to address trends, seasonality, and heteroskedasticity in time series data. The document also presents examples of time series data and outlines the objectives and properties of time series models.

Uploaded by

Deepak Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Deep Temporal Models

Sourangshu Bhattacharya
Department of Computer Science and Engg.
IIT Kharagpur
https://cse.iitkgp.ac.in/~sourangshu/
Time Series Analysis
Time Series Data is Ubiquitous

stocks sales goods consumption


● A wide range of time series data
o AIOps
o IoT
o Business data, e.g., sales
volume, stock price sensor power demand
o Many others

Cloud service monitoring


DNA sequence motion detect ECG
Typical Applications of Time Series

Time Series Forecasting Time Series Anomaly Detection

Time Series Search/Query Time Series Classification/Clustering


Forecasting Use Case: AutoScaling
● Autoscaling in cloud computing is an effective method to improve the usage of
computing resources
o It automatically allocates resources for cloud-based applications while
maintaining SLA (service level agreement)
o Horizontal scaling (add/delete instances or VMs) vs vertical scaling
(up/downgrade CPU, RAM, network, etc.)
o Time series forecasting and decision-making on resources
Introduction

Plotting a time series is an important early step in its analysis In


general, a plot can reveal:
Trend: upward or downward pattern that might be
extrapolated into the future
Periodicity: Repetition of behavior in a regular pattern
Seasonality: Periodic behavior with a known period (hourly,
monthly, every 2 months...)
Heteroskedasticity: changing variance
Dependence: positive (successive observations are similar) or
negative (successive observations are dissimilar)
Missing data, outliers, breaks...
Example Time Series
Example 1: Global Warming
The data are the global mean land-ocean temperature index from 1880 to 2009.
We note an apparent upward trend in the series during the latter part of the 20th
century that has been used as an argument for the global warming hypothesis
(whether the overall trend is natural or whether it is caused by some human-induced
interface)
Example Time Series
Example 4: Airline passengers from 1949-1961
Trend? Seasonality? Heteroskedasticity? ...
Upward trend, seasonality on a 12 month interval, increasing variability
Monthly totals of internaional airline passengers 1949−1961

100 200 300 400 500 600

0 20 40 60 80 10 0 12 0 14 0

Time
Example Time Series
Example 5: Monthly Employed persons from 1980-1991
Trend? Seasonality? Heteroskedasticity? ...
Upward trend, seasonality with a structural break
Example Time Series
Example 7: Annual number of Candadian Lynx trapped near
McKenzie River
Trend? Seasonality? Heteroskedasticity? breaks?... no trend, no clear
seasonality as it does correspond to a known period, periodicity
Objectives of Time Series Analysis
What do we hope to achieve with time series analysis?

• Provide a model of the data (testing of scientific


hypothesis, etc.)

• Predict future values (very common goal of analysis)

• Produce a compact description of the data (a good


model can be used for "data compression")
Modeling Time Series
We take the approach that the data is a realization of random variable.

However, many statistical tools are based on assuming any R.V. are IID.

In Times Series:
R.V. are usually not independent (affected by trend and seasonality)
Variance may change significantly
R.V. are usually not identically distributed

The first goal in time series modeling is to reduce the analysis needed to a simpler
case: Eliminate Trend, Seasonality, and heteroskedasticity then we model the
remainder as dependent but Identically distributed.
Probabilistic Model: Stochastic Process

A complete probabilistic model/description of a time series Xt observed


as a collection of n random variables at times t1,t2,. . . , tn for any
positive integer n is provided by the joint probability distribution,

F (C1, C2, ..., Cn) = P(X1 ≤ C1, ..., Xn ≤ Cn)

This is generally difficult to write, unless the case the variables are
jointly normal.
Thus, we look for other statistical tools = > quantifying
dependencies
Properties of Time Series Model

A time series model is a Discrete Time Stochastic Process.

A time series model for the observed data xt


The mean function µX = E(Xt)

The Covariance function


γX(r, s) = E((Xr − µX(r))(Xs − µX(s))) for all integers r and s

The focus will be to determine the mean function and the


Covariance function to define the time series model.
Some zero-Mean Models
iid Noise
The simplest model for a times series: no trend or seasonal component and in
which the observations are IID with zero mean.
We can write, for any integer n and real numbers x1, x2,...,xn,
P(X1 ≤ x1, ..., Xn ≤ xn) = P(X1 ≤ x1)...P(Xn ≤ xn)
It plays an important role as a building block for more complicated time series
models whit e noise
3
2
1
w

−2 −1 0

0 100 200 300 400 500

Time
Some zero-Mean Models
Random Walk
The random walk {St}, t = 0, 1, 2, .... is obtained by cumulatively
summing iid random variables, S0 = 0
S t = X 1 + X2 + · · · + X t , t = 1, 2, ....
where Xt is iid noise. It plays an important role as a building block
for more complicated time series models
R a n d o m wal k
−5 0 5 10 15 20
x

25

0 50 100 150 200

Time
Models with Trend

50 100 150 200 250


Population of the U.S.A (Millions)

18 00 18 50 19 00 19 50

T im e

In this case a zero-mean model for the data is clearly inappropriate. The graph
suggests trying a model of the form:

Xt = mt + Yt
where mt is a function known as the trend component and Yt has a
zero mean. Estimating mt?
Models with Seasonality

In this case a zero-mean model for the data is clearly inappropriate. The
graph suggests trying a model of the form:

Xt = St + Yt
where St is a function known as the season component and
Yt has a zero mean. Estimating St?
Time series Modeling

Plot the series = > examine the main characteristics (trend,


seasonality, ...)

Remove the trend and seasonal components to get stationary


residuals/models

Choose a model to fit the residuals using sample statistics (sample


autocorrelation function)

Forecasting will be given by forecasting the residuals to arrive at


forecasts of the original series Xt
Stationary and Autocorrelation function
Definitions
Xt is strictly stationary if {X1, . . . Xn} and {X1+h, . . . Xn+h} have the
1
same joint distributions for all integers h and n > 0.
Xt is weakly stationary if
µX (t) is independent of t.
γX (t + h, t) is independent of t for each h.

Let Xt be a stationary time series. The autocovariance function (ACVF) of Xt at


lag h is
γX (h) = Cov (Xt+h, Xt )
The autocorrelation function (ACF) of Xt at lag h is
The Sample Autocorrelation function
In practical problems, we do not start with a model, but with observed data (x1, x2, . . . , xn). To assess
the degree of dependence in the data and to select a model for the data, one of the
important tools we use is the sample autocorrelation function (Sample ACF).

Definition
Let x1, x2, . . . , xn be observations of a time series. The sample mean of
x1, x2, . . . , xn is
𝑛
1
𝑋ത = ෍ 𝑥𝑡
𝑛
𝑡=1
The Sample Autocorrelation function
Remarks
The sample autocorrelation function (ACF) can be computed for any
data set and is not restricted to observations from a stationary time
series. 1

For data containing a Trend, |ρˆ(h)| will display slow decay as h


increases.

For data containing a substantial deterministic periodic component,


|ρˆ(h)| will exhibit similar behavior with the same periodicity.
The Sample Autocorrelation function
Remarks
We may recognize the sample autocorrelation function of many time
series:
1

• White Noise => Zero Trend => Slow decay

• Periodic => Periodic

• Moving Average (q) => Zero for |h| > q

• AutoRegression (p) => Decay to zero exponentially


The Airlines Dataset

1
The Sample Autocorrelation function

1
Time Series Forecasting
Forecasting: Background
● Different forecasting types
○ Short-term forecasting: predict the near future
○ Long-term forecasting: predict the future with an
extended period
○ Extreme value forecasting: predict the extreme Short term forecasting Long term forecasting

values
○ Point or Probabilistic forecasting: predict point
value or interval/probability distribution
● Challenges:
○ Accuracy, robustness Extreme value forecasting Probabilistic forecasting
/

● Models:
○ Traditional: Statistical (ARIMA, ETS, Prophet)
○ Ensemble: Tree, MLP
○ Deep Models: CNN, RNN, Transformers
ARIMA Models: General framework
An ARIMA model is a numerical expression indicating how the observations of a target
variable are statistically correlated with past observations of the same variable

▪ ARIMA models are, in theory, the most general class of models for forecasting a time series which
can be “stationarized” by transformations such as differencing and lagging

▪ The easiest way to think of ARIMA models is as fine-tuned versions of random-walk models: the fine-
tuning consists of adding lags of the differenced series and/or lags of the forecast errors to the
prediction equation, as needed to remove any remains of autocorrelation from the forecast errors

In an ARIMA model, in its most complete formulation, are considered:


▪ An Autoregressive (AR) component, seasonal and not
▪ A Moving Average (MA) component, seasonal and not
▪ The order of Integration (I) of the series

That’s why we call it ARIMA (Autoregressive Integrated Moving Average)


ARIMA Models: General framework
The most common notation used for ARIMA models is:

𝑨𝑹𝑰𝑴𝑨(𝒑, 𝒅, 𝒒)

where:
▪ p is the number of autoregressive terms
▪ d is the number of non-seasonal differences
▪ q is the number of lagged forecast errors in the equation

In the next slides we will explain each single component of ARIMA models!
ARIMA Models: Autoregressive part (AR)
In a multiple regression model, we predict the target variable Y using a linear
combination of independent variables (predictors)

In an autoregression model, we forecast the variable of interest using a linear


combination of past values of the variable itself

The term autoregression indicates that it is a regression of the variable against itself
▪ An Autoregressive model of order 𝒑, denoted 𝐴𝑅(𝑝) model, can be written as

𝑦 𝑡 = 𝑐 + 𝜙 1 𝑦 𝑡− 1 + 𝜙 2 𝑦 𝑡 − 2 + ⋯+𝜙 𝑝 𝑦 𝑡 − 𝑝 +𝜀 𝑡
Where:
▪ 𝑦 𝑡 = dependent variable
▪ 𝑦𝑡−1, 𝑦𝑡−2, …,𝑦𝑡−𝑝= independent variables (i.e. lagged values of 𝑦 𝑡 as predictors)
▪ 𝜙1 , 𝜙2 , … , 𝜙𝑝 = regression coefficients
▪ 𝜀𝑡= error term (must be white noise)
ARIMA Models: Autoregressive part (AR)
Autoregressive simulated process examples:
AR(1) process example (𝝓𝟏=0.5 ) AR(2) process example (𝝓𝟏=0.5 , 𝜙2=0.2 )

Consider that, in case of AR(1) model:


▪ When 𝜙1 = 0, y𝑡 is a white noise
▪ When 𝜙1 = 1 and 𝑐 = 0, 𝑦𝑡 is a random walk
▪ In order to have a stationary series the following condition must be true: −1 < 𝜙 1 < 1
ARIMA Models: Moving Average part (MA)
Rather than use past values of the forecast variable in a regression, a Moving
Average model uses past forecast errors in a regression-like model

In general, a moving average process of order q, MA (q), is defined as:

𝑦 𝑡 = 𝑐 + 𝜀 𝑡 + 𝜃 1 𝜀 𝑡 − 1 + 𝜃 2 𝜀 𝑡 − 2 + ⋯+𝜃 𝑞 𝜀 𝑡 − 𝑞

The lagged values of 𝜀 𝑡 are not actually observed, so it is not a standard regression.

Moving average models should not be confused with moving average smoothing
(the process used in classical decomposition in order to obtain the trend component)

A moving average model is used for forecasting future values while moving average
smoothing is used for estimating the trend-cycle of past values
ARIMA Models: Moving Average part (MA)
Moving Average simulated process examples:

MA(1) process example (𝜃1=0.7) MA(2) process example (𝜃1=0.8 , 𝜃2=0.5)

▪ Looking just the time plot it’s hard to distinguish between an


AR process and a MA process!
ARIMA Models: ARMA and ARIMA
If we combine autoregression and a moving average model,
we obtain an ARMA(p,q) model:

𝑦 𝑡 = 𝑐 +𝜙 1 𝑦 𝑡 − 1 +𝜙 2 𝑦 𝑡 − 2 + ⋯+𝜙 𝑝 𝑦 𝑡 − 𝑝 + 𝜃 1 𝜀 𝑡−1 + 𝜃 2 𝜀 𝑡−2 + ⋯+𝜃 𝑞 𝜀 𝑡 − 𝑞 + 𝜀 𝑡


Autoregressive component of order p Moving Average component of order q

To use an ARMA model, the series must be STATIONARY!

▪ If the series is NOT stationary, before estimating and ARMA model, we need to apply one or
more differences in order to make the series stationary: this is the integration process,
called I(d), where d= number of differences needed to get stationarity

▪ If we model the integrated series using an ARMA model, we get an ARIMA (p,d,q)
model where p=order of the autoregressive part; d=order of integration; q= order of the
moving average part
ARIMA Models: ARMA and ARIMA

ARIMA simulated process examples

ARMA(2,1) process example, equal to ARIMA(2,0,1) ARIMA(2,1,1) process example (𝝓𝟏=0.5, 𝜙2=0.4, 𝜃1=0.8 )
(𝝓𝟏=0.5, 𝜙2=0.4, 𝜃1=0.8 )
ARIMA Models: Model identification
General rules for model indentification based on ACF and PACF plots:

The data may follow an 𝑨𝑹𝑰𝑴𝑨(𝒑, 𝒅, 𝟎) model if the ACF plots of the
differenced data show the following patterns:
▪ the ACF is exponentially decaying or sinusoidal

The data may follow an 𝑨𝑹𝑰𝑴𝑨(𝟎, 𝒅, 𝒒) model if the ACF plots of the
differenced data show the following patterns:
▪ the PACF is exponentially decaying or sinusoidal

For a general 𝑨𝑹𝑰𝑴𝑨(𝒑, 𝒅, 𝒒) model (with both p and q > 1) both ACF and PACF plots show
exponential or sinusoidal decay and it’s more difficult to understand the structure of the model
ARIMA Models: Seasonal ARIMA
A seasonal ARIMA model is formed by including additional seasonal terms
in the ARIMA models we have seen so far
𝑨𝑹𝑰𝑴𝑨(𝒑, 𝒅, 𝒒) (𝑷, 𝑫, 𝑸)𝒔

where s = number of periods per season (i.e. the frequency of seasonal cycle)
We use uppercase notation for the seasonal parts of the model, and lowercase
notation for the non-seasonal parts of the model

As usual, d / D are the number of differences/seasonal differences necessary


to make the series stationary
ARIMA Models: estimation and AIC
Parameters estimation
In order to estimate an ARIMA model, normally it’s used the Maximum Likelihood Estimation (MLE)

This technique finds the values of the parameters which maximize the probability of obtaining the
data that we have observed For given values of (𝒑, 𝒅, 𝒒) (𝑷, 𝑫, 𝑸) (i.e. model order) the algorithm will
try to maximize the log likelihood when finding parameter estimates

ARIMA model order


A commonly used criteria to compare different ARIMA models (i.e. with different values for (𝒑,𝒒) (𝑷,𝑸) but
fixed 𝒅 , 𝑫 ) and to determine the optimal ARIMA order, is the Akaike Information Criterion (AIC)
𝐀𝐈𝐂 = −2log (𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) + 2(𝑝)

▪ where p is the number of estimated parameters in the model


▪ AIC is a goodness of fit measure
▪ The best ARIMA model is that with the lower AIC most of automatic model selection method
(e.g auto.arima in R) uses the AIC for determining the optimal ARIMA model order
ARIMA Models: Hands on
ARIMA Models: Hands on
Deep Time Series Forecasting
Deep Learning: Models

● MLP (multiple layer perceptron)


○ Fully connected feedforward artificial neural network
● CNN (convolutional neural network)
○ Shared-weight architecture of filters that slide along input features

● RNN (recurrent neural network)


○ Connection between nodes form a directed or undirected graph
along a temporal sequence

● Transformer
○ Deep learning model adopts the mechanism of self-attention,
differentially weighting the significance of each part of input
sequence
Forecasting: Deep Ensemble (MLP based Models)
● N-BEATS
○ Doubly residual stackings with
forward and backward residual
links
○ Forecasts are aggregated in a
hierarchical way
○ Trend and seasonal models for
interpretability
○ Ensemble: e.g., 18 to 180 models
○ Fit on different metrics: sMAPE,
MASE, MAPE
○ Train on input windows of different
lengths
○ Train with different random
initializations

Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. “N-BEATS: Neural basis expansion analysis for interpretable time series forecasting.” ICLR 2020.
Forecasting: RNN based Models

● Recurrent neural networks


○ From RNN to LSTM/GRU: control information flow by gates, mitigating vanishing gradient problem
○ DeepAR: time series probabilistic forecasting through autoregressive recurrent network

Vanilla RNN LSTM GRU DeepAR

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation, 1997.
Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555, 2014.
Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 2020.
Forecasting with Transformer: Informer

● Informer
○ ProbSparse self-attention for
efficient and robust attention
mechanism
○ Self-attention distilling: extract
dominating attention and reducing
the network size
○ Generative style decoder: produce
long sequence forecasts with only
one forward step, avoiding
cumulative error spreading during
inference

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Transformer: Attention Computation

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Long Sequence Time-series Forecasting

Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H. and Zhang, W., "Informer: Beyond efficient transformer for long sequence time-series forecasting." AAAI, 2021.
Forecasting with Transformer: Latest
Methods
● Autoformer: Transformer with auto-correlation mechanism
○ Decomposition architecture to disentangle complex temporal patterns (seasonality, trend)
○ Auto-correlation instead of point-wise self-attention to utilize period-based dependencies
and reduce complexity

● FEDformer: frequency enhanced decomposed Transformer


○ Efficient and robust frequency domain processing: to capture important structures in time series
○ Frequency enhanced block: substitute self-attention
○ Frequency enhanced attention: substitute cross-attention
○ Mixture of experts seasonal-trend decomposition: to better capture global properties in time series

Wu, H., Xu, J., Wang, J. and Long, M., “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting”, NeurIPS 2021.

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, Rong Jin, "FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting," ICML 2022.
Forecasting with Transformer: Results
Empirical comparison of FEDformer on six benchmark datasets

Linear complexity of FEDformer


Temporal Point Process
Many discrete events in continuous time

Events are (noisy)


observations of a
variety of complex
Disease dynamics
dynamic
processes…
Qmee, 2013

Online actions

Financial trading Mobility dynamics


Example I: Information propagation

Smeans D Christine
D follows S
Bob 3.00pm
3.25pm

Beth
3.27pm

Joe

David
4.15pm

Friggeri et al., 2014


t

They can have an impact in


the off-line world
Example II: Knowledge creation

Addition
Refutation

Questio
n
Answer

t
Upvote

t
Temporal point processes

Temporal point process: A random process whose realization consists of


discrete events localized in time
Discrete events

time
t=T

History, Dirac delta function

Formally:
Model time as a random variable
density
Prob. between [t, t+dt)

time

t=T

Prob. not before t


History,

t=T

Likelihood of a timeline:
Problems of density parametrization (I)

time
t=T

It is difficult for model design and interpretability:


1. Densities need to integrate to 1 (i.e., partition function)
2. Difficult to combine timelines
Intensity function
density
Prob. between [t, t+dt)

time
t=T

Prob. not before t


History,
Intensity:
Probability between [t, t+dt) but not before t

Observation: It is a rate = # of events / unit of time


Advantages of intensity parametrization (I)

time
t=T

Suitable for model design and interpretable:


1. Intensities only need to be nonnegative
2. Easy to combine timelines
Relation between f*, F*, S*, λ*

Central quantity
we will use!
Poisson process

time
t=T

Intensity of a Poisson process

Observations:
1. Intensity independent of history
2. Uniformly random occurrence
3. Time interval follows exponential distribution
Fitting & sampling from a Poisson

time
t=T

Fitting by maximum likelihood:

Sampling using inversion sampling:


Inhomogeneous Poisson process

time
t=T

Intensity of an inhomogeneous Poisson process


(Independent of history)

Example:
Fitting & sampling from inhomogeneous Poisson

time
t=T

Fitting by maximum likelihood:

Sampling using thinning (reject. sampling) + inverse sampling:


1. Sample from Poisson process with intensity
using inverse sampling
2. Generate Keep sample with
prob.
3. Keep the sample if
Self-exciting (or Hawkes) process

time
t=T

History,
Triggering kernel
Intensity of self-exciting
(or Hawkes) process:

Observations:
1. Clustered (or bursty) occurrence of events
2. Intensity is stochastic and history dependent
Fitting a Hawkes process from a
recorded timeline

time
t=T

Fitting by maximum likelihood:


The max. likelihood
is jointly convex
in and

Sampling using thinning (reject. sampling) + inverse sampling:


Key idea: the maximum of the intensity changes
over time
Mutually exciting process

time
Bob

History

Christine
time

History

Clustered occurrence affected by neighbors


Temporal Point Process
Deep Temporal Point Process Prediction
Deep Temporal Point Process Prediction
Transformer Hawkes Process
Embedding Layers
Multi-head Self-attention Modules
Continuous Time Conditional Intensity
Comparison Results
Summary
● Understanding different properties of time series is important
○ Stationarity, Trend, Seasonality, etc.

● Time series forecasting:


○ Identify trend and seasonality, and “remove” it
○ Model the stationary time series: ARIMA

● Deep Time-series forecasting:


○ Transformer-based models dominate – Informer

● Temporal point process models:


○ Event data can be modeled using self-exciting Hawkes process.
○ Transformer Hawkes process – Deep TPP modelling.
Thanks

questions?

Email: sourangshu@cse.iitkgp.ac.in

You might also like