KEMBAR78
A Practical Time-Series Tutorial With MATLAB | PDF | Cluster Analysis | Parameter (Computer Programming)
0% found this document useful (0 votes)
5K views95 pages

A Practical Time-Series Tutorial With MATLAB

This document provides an overview of a tutorial on time-series analysis using MATLAB. The tutorial aims to demonstrate that time-series research can be made enjoyable through visualizing ideas using concise MATLAB programming. It covers an introduction to MATLAB features for efficient exploratory data analysis and visualization. The timeline indicates that the tutorial will cover MATLAB basics, time-series representations, distance measures, clustering, classification, visualization and applications. It notes what topics are outside the scope, such as mathematical formulas and specific statistical models.

Uploaded by

Nabila Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5K views95 pages

A Practical Time-Series Tutorial With MATLAB

This document provides an overview of a tutorial on time-series analysis using MATLAB. The tutorial aims to demonstrate that time-series research can be made enjoyable through visualizing ideas using concise MATLAB programming. It covers an introduction to MATLAB features for efficient exploratory data analysis and visualization. The timeline indicates that the tutorial will cover MATLAB basics, time-series representations, distance measures, clustering, classification, visualization and applications. It notes what topics are outside the scope, such as mathematical formulas and specific statistical models.

Uploaded by

Nabila Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Tutorial | PKDD 2005

A practical Time-Series
Tutorial with MATLAB
Michalis Vlachos
IBM T.J.
T.J. Watson Research Center
Hawthorne, NY, 10532

Tutorial | Time-Series with Matlab

About this tutorial


 The goal of this tutorial is to show you that time-series
research (or research in general) can be made fun, when it
involves visualizing ideas, that can be achieved with concise
programming.
 Matlab enables us to do that.
Will I be able
to use this
MATLAB
right away
after the
tutorial?

I am definately
smarter than her,
but I am not a timetimeseries person, perper-se.
I wonder what I gain
from this tutorial
tutorial

Tutorial | Time-Series with Matlab

Disclaimer
 I am not affiliated with Mathworks in any way
 but I do like using Matlab a lot
since it makes my life easier

 Errors and bugs are most likely contained in this tutorial.


 I might be responsible for some of them.

Tutorial | Time-Series with Matlab

Timeline of tutorial
 Matlab introduction
I will try to convince you that Matlab is cool
Brief introduction to its many features
 TimeTime-series with Matlab
Introduction
TimeTime-Series Representations
Distance Measures
Lower Bounding
Clustering/Classification/Visualization
Applications
4

Tutorial | Time-Series with Matlab

What this tutorial is NOT about


 Moving averages
 Autoregressive models
 Forecasting/Prediction
 Stationarity

Letsnot
notescape
escapeinto
into
Lets
mathematics.
Lets
mathematics. Lets

Oh...buy
Oh...buy
mybooks
books
my
too!
too!

stickwith
withreality.
reality.
stick

 Seasonality
 ..or about complex mathematical formulas

Michael Crichton
5

Tutorial | Time-Series with Matlab

PART I: Matlab Introduction

Tutorial | Time-Series with Matlab

Why does anyone need Matlab?


 Matlab enables the efficient
Exploratory Data Analysis (EDA)
Science progresses through observation
-- Isaac Newton
Isaac Newton

The greatest value of a picture is that is forces us to


notice what we never expected to see
-- John Tukey

John Tukey
7

Tutorial | Time-Series with Matlab

Matlab
 Interpreted Language
Easy code maintenance (code is very compact)
Very fast array/vector manipulation
Support for OOP
 Easy plotting and visualization
 Easy Integration with other Languages/OSs
Interact with C/C++, COM Objects, DLLs
Build in Java support (and compiler)
Ability to make executable files
Multi-Platform Support (Windows, Mac, Linux)
 Extensive number of Toolboxes
Image, Statistics, Bioinformatics, etc
8

Tutorial | Time-Series with Matlab

History of Matlab (MATrix LABoratory)


The most important thing in the programming language is the name.
I have recently invented a very good name and now I am looking for a
suitable language. -- R. Knuth

Programmed by Cleve Moler as an interface for


EISPACK & LINPACK

Cleve Moler

 1957: Moler goes to Caltech. Studies numerical


Analysis
 1961: Goes to Stanford. Works with G. Forsythe on
Laplacian eigenvalues.
 1977: First edition of Matlab; 2000 lines of Fortran
80 functions (now more than 8000 functions)
 1979: Met with Jack Little in Stanford. Started working
on porting it to C
 1984: Mathworks is founded
Video:http://www.mathworks.com/company/aboutus/founders/origins_of_matlab_wm.html
9

Tutorial | Time-Series with Matlab

10

Tutorial | Time-Series with Matlab

Current State of Matlab/Mathworks


 Matlab, Simulink, Stateflow
 Matlab version 7, service pack 2
 Used in variety of industries
Aerospace, defense, computers, communication, biotech

 Mathworks still is privately owned


 Used in >3,500 Universities, with >500,000 users worldwide
 2004 Revenue: 300 M.
 2004 Employees: 1,000
 Pricing:
~2000$ (Commercial use),
~100$ (Student Edition)
Money is better than poverty, if only for financial reasons. Woody Allen

11

Tutorial | Time-Series with Matlab

Who needs Matlab?


 R&D companies for easy application deployment
 Professors
Lab assignments
Matlab allows focus on algorithms not on language features

 Students
Batch processing of files
No more incomprehensible perl code!

Great environment for testing ideas


Quick coding of ideas, then porting to C/Java etc

Easy visualization
Its cheap! (for students at least)
12

Tutorial | Time-Series with Matlab

Starting up Matlab

Personally I'm always ready to learn, although I do not always like


being taught.
Sir Winston Churchill

 Dos/Unix like directory navigation


 Commands like:
cd
pwd
mkdir
 For navigation it is easier to just
copy/paste the path from explorer
E.g.:
cd c:\documents\

13

Tutorial | Time-Series with Matlab

Matlab Environment

Command Window:
- type commands
- load scripts

Workspace:
Loaded Variables/Types/Size

14

Tutorial | Time-Series with Matlab

Matlab Environment

Command Window:
- type commands
- load scripts

Workspace:
Loaded Variables/Types/Size

Help contains a comprehensive


introduction to all functions

15

Tutorial | Time-Series with Matlab

Matlab Environment

Command Window:
- type commands
- load scripts

Workspace:
Loaded Variables/Types/Size

Excellent demos and


tutorial of the various
features and toolboxes
16

Tutorial | Time-Series with Matlab

Starting with Matlab


 Everything is arrays
 Manipulation of arrays is faster than regular manipulation
with for-loops
a = [1 2 3 4 5 6 7 9 10] % define an array

17

Tutorial | Time-Series with Matlab

Populating arrays
 Plot sinusoid function
a = [0:0.3:2*pi] % generate values from 0 to 2pi (with step of 0.3)
b = cos(a)
cos(a) % access cos at positions contained in array [a]
plot(a,b)
plot(a,b) % plot a (x(x-axis) against b (y(y-axis)

Related:
linspace(-100,100,15); % generate 15 values between -100 and 100
18

Tutorial | Time-Series with Matlab

Array Access
 Access array elements
>> a(1)

>> a(1:3)
ans =
0

ans =

0.3000

0.6000

 Set array elements


>> a(1) = 100

>> a(1:3) = [100 100 100]


100]

19

Tutorial | Time-Series with Matlab

2D Arrays
 Can access whole columns or rows

Lets define a 2D array


>> a = [1 2 3; 4 5 6]
a =

>> a(1,:)

Row-wise access

ans =
1
4

2
5

3
6

>> a(2,2)

>> a(:,1)

ans =

ans =

3
Column-wise access

1
4

A good listener is not only popular everywhere, but after a while he gets to know something. Wilson Mizner 20

10

Tutorial | Time-Series with Matlab

Column-wise computation
 For arrays greater than 1D, all computations happen
column-by-column
>> a = [1 2 3; 3 2 1]
a =

>> max(a)
max(a)
ans =

1
3

2
2

3
1

>> mean(a)
mean(a)

>> sort(a)

ans =

ans =

2.0000

2.0000

2.0000

1
3

2
2

1
3

21

Tutorial | Time-Series with Matlab

Concatenating arrays
 Column-wise or row-wise

Row next to row

>> a = [1 2 3];
>> b = [4 5 6];
>> c = [a b]

>> a = [1;2];
>> b = [3;4];
>> c = [a b]
c =

Column next to column

c =
1

>>
>>
>>

a
b
c

=
=
=

Row below to row

[1 2 3];
[4 5 6];
[a; b]

c =

1
2

>>
>>
>>

a
b
c

=
=
=

3
4

[1;2];
[3;4];
[a; b]

Column below column

c =
1
4

2
5

3
6

1
2
3
4
22

11

Tutorial | Time-Series with Matlab

Initializing arrays
 Create array of ones [ones]
>> a = ones(1,3)
a =
1

>> a = ones(2,2)*5;
a =
1

5
5

5
5

>> a = ones(1,3)*inf
a =
Inf Inf Inf

 Create array of zeroes [zeros]


Good for initializing arrays
>> a = zeros(1,4)
a =
0

>> a = zeros(3,1) + [1 2 3]
3]
a =
1
2
3

23

Tutorial | Time-Series with Matlab

Reshaping and Replicating Arrays


 Changing the array shape [reshape]
(eg, for easier column-wise computation)
>> a = [1 2 3 4 5 6]
6]; % make it into a column
>> reshape(a,2,3)
ans =
1
2

3
4

reshape(X,[M,N]):
[M,N] matrix of
columnwise version
of X

5
6

 Replicating an array [repmat]


>> a = [1 2 3];
>> repmat(a,1,2)
ans =

repmat(X,[M,N]):
make [M,N] tiles of X

>> repmat(a,2,1)
repmat(a,2,1)
ans =
1
2
1
2

3
3

24

12

Tutorial | Time-Series with Matlab

Useful Array functions


 Last element of array [end]
>> a = [1 3 2 5];
>> a(end)

>> a = [1 3 2 5];
>> a(enda(end-1)

ans =

ans =
2

 Length of array [length]

Length = 4

>> length(a)
a=

ans =

>> [rows, columns] = size(a)


rows = 1
columns = 4

columns = 4

rows = 1

 Dimensions of array [size]

25

Tutorial | Time-Series with Matlab

Useful Array functions


 Find a specific element [find] **
>> a = [1 3 2 5 10 5 2 3];
>> b = find(a==2)
b =
3

 Sorting [sort] ***


>> a = [1 3 2 5];
>> [s,i]=sort(a)

a=

s=

i=

s =
1

i =
1

Indicates the index


where the element
came from

26

13

Tutorial | Time-Series with Matlab

Visualizing Data and Exporting Figures


 Use Fishers Iris dataset
>> load fisheriris

4 dimensions, 3 species
Petal length & width, sepal length & width
Iris:
virginica/versicolor/setosa
meas (150x4 array):
Holds 4D measurements
...
'versicolor'
'versicolor'
'versicolor'
'versicolor'
'versicolor'
'virginica'
'virginica'
'virginica'
'virginica

species (150x1 cell array):


Holds name of species for
the specific measurement

...
27

strcmp, scatter, hold on

Tutorial | Time-Series with Matlab

Visualizing Data (2D)


>>
>>
>>
>>
>>
>>
>>
>>

idx_setosa = strcmp(species, setosa


setosa); % rows of setosa data
idx_virginica = strcmp(species, virginica
virginica); % rows of virginica
setosa = meas(idx_setosa,[1:2]);
virgin = meas(idx_virginica,[1:2]);
scatter(setosa(:,1), setosa(:,2)); % plot in blue circles by default
hold on;
scatter(virgin(:,1), virgin(:,2), rs
rs); % red[r
red[r]] squares[s
squares[s]] for these

idx_setosa
...
1
1
1
0
0
0

...

The world is governed more by appearances rather than realities --Daniel Webster

An array of zeros and


ones indicating the
positions where the
keyword setosa was
found

28

14

scatter3

Tutorial | Time-Series with Matlab

Visualizing Data (3D)


>>
>>
>>

idx_setosa = strcmp(species, setosa


setosa); % rows of setosa data
idx_virginica = strcmp(species, virginica
virginica); % rows of virginica
idx_versicolor = strcmp(species, versicolor
versicolor); % rows of versicolor

>>
>>
>>
>>
>>
>>
>>

setosa = meas(idx_setosa,[1:3]);
virgin = meas(idx_virginica,[1:3]);
versi = meas(idx_versicolor,[1:3]);
scatter3(setosa(:,1), setosa(:,2),setosa(:,3)); % plot in blue circles by default
hold on;
scatter3(virgin(:,1), virgin(:,2),virgin(:,3), rs
rs); % red[r
red[r]] squares[s
squares[s]] for these
scatter3(versi(:,1), virgin(:,2),versi(:,3), gx
gx); % green xs

7
6
5

>> grid on; % show grid on axis


>> rotate3D on; % rotate with mouse

4
3
2
1
4.5
4
3.5
3
2.5
2

4.5

5.5

6.5

7.5

29

Tutorial | Time-Series with Matlab

Changing Plots Visually


Zoom out

Zoom in

Create line

Create Arrow

Select Object

Add text

Computers are useless. They can only give you answers. Pablo Picasso

30

15

Tutorial | Time-Series with Matlab

Changing Plots Visually


 Add titles
 Add labels on axis
 Change tick labels
 Add grids to axis
 Change color of line
 Change thickness/
Linestyle
 etc

31

Tutorial | Time-Series with Matlab

Changing Plots Visually (Example)


Change color and
width of a line

A
Right click
C

32

16

Tutorial | Time-Series with Matlab

Changing Plots Visually (Example)


The result

Other Styles:
3
2
1
0
-1
-2
-3

0
3

10

20

30

40

50

60

70

80

90

10

20

30

40

50

60

70

80

90

100

2
1
0
-1
-2
-3
0

33

100

Tutorial | Time-Series with Matlab

Changing Figure Properties with Code


 GUIs are easy, but sooner or later we realize that
coding is faster
>> a = cumsum(randn(365,1)); % random walk of 365 values

If this represents a years


worth of measurements of an
imaginary quantity, we will
change:
x-axis annotation to months
Axis labels
Put title in the figure
Include some greek letters
in the title just for fun

Real men do it command-line --Anonymous

34

17

Tutorial | Time-Series with Matlab

Changing Figure Properties with Code


 Axis annotation to months
>> axis tight; % irrelevant but useful...
>> xx = [15:30:365];
>> set(gca, xtick
xtick,xx)

The result

Real men do it command-line --Anonymous

35

Tutorial | Time-Series with Matlab

Changing Figure Properties with Code


 Axis annotation to months
>> set(gca,
set(gca,xticklabel
xticklabel,[
,[Jan
Jan; ...
Feb
Feb;Mar
Mar])

The result

Real men do it command-line --Anonymous

36

18

Tutorial | Time-Series with Matlab

Changing Figure Properties with Code


Other latex examples:

 Axis labels and title

\alpha, \beta, e^{-\alpha} etc

>> title(
title(My measurements (\
(\epsilon/\
epsilon/\pi)
pi))

>> ylabel(
ylabel(Imaginary Quantity
Quantity)

>> xlabel(
xlabel(Month of 2005
2005)

Real men do it command-line --Anonymous

37

Tutorial | Time-Series with Matlab

Saving Figures
 Matlab allows to save the figures (.fig) for later
processing

.fig can be later


opened through
Matlab

You can always put-off for tomorrow, what you can do today. -Anonymous

38

19

Tutorial | Time-Series with Matlab

Exporting Figures

Export to:
emf, eps, jpg, etc

39

Tutorial | Time-Series with Matlab

Exporting figures (code)


 You can also achieve the same result with Matlab code

Matlab code:
% extract to color eps
print -depsc myImage.eps;
myImage.eps; % from commandcommand-line
print(gcf,
print(gcf,-depsc
depsc,myImage
myImage) % using variable as name
40

20

Tutorial | Time-Series with Matlab

Visualizing Data - 2D Bars


1
2
3
4

colormap

bars
time = [100 120 80 70]; % our data
h = bar(time);
bar(time); % get handle
cmap = [1 0 0; 0 1 0; 0 0 1; .5 0 1]; % colors
colormap(cmap);
colormap(cmap); % create colormap
cdata = [1 2 3 4]; % assign colors
set(h,'CDataMapping','direct','CData',cdata);
set(h,'CDataMapping','direct','CData',cdata);
41

Tutorial | Time-Series with Matlab

Visualizing Data - 3D Bars


data
10

10
9
8
6
6
3

8
6
4
2
0

8
6
6
5
3
2

colormap
7
5
4
4
2
1

64

1
2
3
5
6
7

0
0.0198
0.0397
0.0595
0.0794
0.0992
1.0000
1.0000
1.0000
1.0000

0
0.0124
0.0248
0.0372
0.0496
0.0620
...
0.7440
0.7564
0.7688
0.7812

0
0.0079
0.0158
0.0237
0.0316
0.0395
0.4738
0.4817
0.4896
0.4975

3
data = [ 10 8 7; 9 6 5; 8 6 4; 6 5 4; 6 3 2; 3 2 1];
bar3([1 2 3 5 6 7], data);
c = colormap(gray);
colormap(gray); % get colors of colormap
c = c(20:55,:); % get some colors
colormap(c);
colormap(c); % new colormap

42

21

Tutorial | Time-Series with Matlab

Visualizing Data - Surfaces


data
10
9

10

7
6

9 10

5
4

10

3
2
1
10
8

10
6

8
6

2
0

The value at position


x-y of the array
indicates the height of
the surface

data = [1:10];
data = repmat(data,10,1);
repmat(data,10,1); % create data
surface(data,'FaceColor',[1 1 1], 'Edgecolor
', [0 0 1]); % plot data
'Edgecolor',
view(3);
view(3); grid on; % change viewpoint and put axis lines
43

Tutorial | Time-Series with Matlab

Creating .m files
 Standard text files
Script: A series of Matlab commands (no input/output arguments)
Functions: Programs that accept input and return output

Right click

44

22

Tutorial | Time-Series with Matlab

Creating .m files

M editor
Double click

45

cumsum, num2str, save

Tutorial | Time-Series with Matlab

Creating .m files
 The following script will create:
An array with 10 random walk vectors
Will save them under text files: 1.dat, , 10.dat

Sample Script

myScript.m

a = cumsum(randn(100,10)); % 10 random walk data of length 100


for i=1:size(a,2),
% number of columns
data = a(:,i) ;
fname = [num2str(i) .dat
.dat]; % a string is a vector of characters!
save(fname, data
data,-ASCII
ASCII); % save each column in a text file
end
Write this in the
M editor

A random walk time-series

cumsum(A)

10

15

10

-5

10

20

30

40

50

60

70

80

90 100

and execute by typing the


name on the Matlab
command line

46

23

Tutorial | Time-Series with Matlab

Functions in .m scripts


When we need to:


Organize our code
Frequently change parameters in our scripts
keyword output argument function name
input argument

function dataN = zNorm(data)


% ZNORM zNormalization of vector
% subtract mean and divide by std

Help Text
(help function_name)

if (nargin
<1), % check parameters
(nargin<1),
error(
error(Not enough arguments
arguments);
end
data = data mean(data);
mean(data); % subtract mean
data = data/std(data
); % divide by std
data/std(data);
dataN = data;

Function Body

function [a,b] = myFunc(data, x, y) % pass & return more arguments

See also:varargin, varargout

47

Tutorial | Time-Series with Matlab

Cell Arrays
 Cells that hold other Matlab arrays
Lets read the files of a directory
>> f = dir(
dir(*.dat
*.dat) % read file contents
f =
15x1 struct array with fields:
name
date
bytes
isdir
for i=1:length(f),
a{i} = load(f(i).name);
N = length(a{i});
plot3([1:N], a{i}(:,1), a{i}(:,2), ...
r-, Linewidth
Linewidth, 1.5);
grid on;
pause;
600
500
cla;
400
end

Struct Array
1

name
date
bytes
isdir

a
).n
f(1

me

2
3
4
5

300
200
100
0
1000
1500
500

1000

48

500
0

24

Tutorial | Time-Series with Matlab

Reading/Writing Files
 Load/Save are faster than C style I/O operations
But fscanf, fprintf can be useful for file formatting
or reading non-Matlab files
fid = fopen('fischer.txt', 'wt');
for i=1:length(species),
fprintf(fid, '%6.4f %6.4f %6.4f %6.4f %s\
%s\n', meas(i,:), species{i});
end
fclose(fid);

Output file:

 Elements are accessed column-wise (again)


x = 0:.1:1; y = [x; exp(x)];
fid = fopen('exp.txt','w');
fprintf(fid,'%6.2f %12.8f\
%12.8f\n',y);
fclose(fid);
0
1

0.1
1.1052

0.2
1.2214

0.3
1.3499

0.4
0.4
1.4918

0.5
1.6487
1.6487

0.6
1.8221

0.7
2.0138

49

Tutorial | Time-Series with Matlab

Flow Control/Loops
 if (else/elseif) , switch
Check logical conditions

 while
Execute statements infinite number of times

 for
Execute statements a fixed number of times

 break, continue
 return
Return execution to the invoking function

Life is pleasant. Death is peaceful. Its the transition thats troublesome. Isaac Asimov

50

25

Tutorial | Time-Series with Matlab

For-Loop or vectorization?
clear all;
tic;
for i=1:50000
a(i)
a(i) = sin(i);
sin(i);
end
toc

clear all;
a = zeros(1,50000);
zeros(1,50000);
tic;
for i=1:50000
a(i)
a(i) = sin(i);
sin(i);
end
toc

clear all;
tic;
i = [1:50000];
a = sin(i);
sin(i);
toc;
toc;

elapsed_time =
5.0070

tic, toc, clear all

 Pre-allocate arrays that


store output results
No need for Matlab to
resize everytime
 Functions are faster than
scripts

elapsed_time =
0.1400

Compiled into pseudocode


 Load/Save faster than
Matlab I/O functions
 After v. 6.5 of Matlab there
is for-loop vectorization
(interpreter)

elapsed_time =
0.0200

Time not importantonly life important. The Fifth Element

 Vectorizations help, but


not so obvious how to
achieve many times
51

Tutorial | Time-Series with Matlab

Matlab Profiler
 Find which portions of code take up
most of the execution time
Identify bottlenecks
Vectorize offending code

Time not importantonly life important. The Fifth Element

52

26

Tutorial | Time-Series with Matlab

Hints &Tips
 There is always an easier (and faster) way
Typically there is a specialized function for what you want to
achieve

 Learn vectorization techniques, by peaking at the


actual Matlab files:
edit [fname], eg
edit mean
edit princomp

 Matlab Help contains many


vectorization examples

53

Tutorial | Time-Series with Matlab

Debugging

Beware of bugs in the above code; I have only proved it correct, not tried it
-- R. Knuth

 Not as frequently required as in C/C++


Set breakpoints, step, step in, check variables values
Set breakpoints

54

27

Tutorial | Time-Series with Matlab

Debugging
 Full control over variables and execution path
F10: step, F11: step in (visit functions, as well)
A

F10
C

Either this man is dead or my watch has stopped. Groucho Marx

55

Tutorial | Time-Series with Matlab

Advanced Features 3D modeling/Volume Rendering


 Very easy volume manipulation and rendering

56

28

Tutorial | Time-Series with Matlab

Advanced Features Making Animations (Example)


 Create animation by changing the camera viewpoint

-1

-2

0
0

0
-1
-2

-3

-3
0

-1

-2

50

50

50

-3
-1

1
0

100

100

-1

100

0
-1

azimuth = [50:100 99:99:-1:50]; % azimuth range of values


for k = 1:length(azimuth),
1:length(azimuth),
plot3(1:length(a),
);
plot3(1:length(a), a(:,1), a(:,2), 'r', 'Linewidth',2
'Linewidth',2);
grid on;
view(azimuth(k),30);
view(azimuth(k),30); % change new
M(k)
M(k) = getframe;
getframe; % save the frame
end
movie(M,20);
movie(M,20); % play movie 20 times
See also:movie2avi

57

Tutorial | Time-Series with Matlab

Advanced Features GUIs


 Built-in Development Environment
Buttons, figures, Menus, sliders, etc
 Several Examples in Help
Directory listing
Address book reader
GUI with multiple axis

58

29

Tutorial | Time-Series with Matlab

Advanced Features Using Java


 Matlab is shipped with Java Virtual
Machine (JVM)
 Access Java API (eg I/O or networking)
 Import Java classes and construct objects
 Pass data between Java objects and
Matlab variables

59

Tutorial | Time-Series with Matlab

Advanced Features Using Java (Example)


 Stock Quote Query
Connect to Yahoo server
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?
objectId=4069&objectType=file
disp('Contacting YAHOO server using ...');
disp(['url = java.net.URL('
java.net.URL(' urlString ')']);
end;
url = java.net.URL(urlString);
java.net.URL(urlString);
try
stream = openStream(url);
openStream(url);
ireader = java.io.InputStreamReader(stream);
java.io.InputStreamReader(stream);
breader = java.io.BufferedReader(ireader);
java.io.BufferedReader(ireader);
connect_query_data=
connect_query_data= 1; %connect made;
catch
connect_query_data=
connect_query_data= -1; %could not connect
case;
disp(['URL:
disp(['URL: ' urlString]);
urlString]);
error(['Could not connect to server. It may
be unavailable. Try again later.']);
stockdata={};
stockdata={};
return;
end

60

30

Tutorial | Time-Series with Matlab

Matlab Toolboxes
 You can buy many specialized toolboxes from Mathworks
Image Processing, Statistics, Bio-Informatics, etc

 There are many equivalent free toolboxes too:


SVM toolbox
http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox/

Wavelets
http://www.math.rutgers.edu/~ojanen/wavekit/

Speech Processing
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

Bayesian Networks
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

61

Tutorial | Time-Series with Matlab

In case I get stuck


 help [command] (on the command line)
eg. help fft

Ivehad
hadaawonderful
wonderful
Ive
evening.
Butthis
this
evening. But
wasnt
it
wasnt it

 Menu: help -> matlab help


Excellent introduction on various topics
 Matlab webinars
http://www.mathworks.com/company/events/archived_webinars.html?fp

 Google groups
comp.soft-sys.matlab
You can find *anything* here
Someone else had the same
problem before you!

62

31

Tutorial | Time-Series with Matlab

PART II: Time Series Analysis


Eightpercent
percentof
of
Eight
success
is
showing
success is showing
up.
up.

63

Tutorial | Time-Series with Matlab

What is a time-series
Definition:
Definition:AAsequence
sequenceof
ofmeasurements
measurementsover
overtime
time
ECG

Medicine

Stock Market

Meteorology

Geology

Astronomy

Chemistry

Biometrics

Robotics

64.0
62.8
62.0
66.0
62.0
32.0
86.4
...
21.6
45.2
43.2
53.0
43.2
42.8
43.2
36.4
16.9
10.0

Sunspot

Earthquake

time
64

32

Tutorial | Time-Series with Matlab

Applications (Image Matching)


Cluster 1
Many types of data can be
converted to time-series
Image

Color Histogram
600

Cluster 2

400
200
0

50

100

150

200

250

50

100

150

200

250

50

100

150

200

250

400

200

800
600
400
200
0

Time-Series

65

Tutorial | Time-Series with Matlab

Applications (Shapes)
Recognize type of leaf based on its shape

Ulmus carpinifolia

Acer platanoides

Salix fragilis

Tilia

Quercus robur

Convert perimeter into a sequence of values

Special thanks to A. Ratanamahatana &


E. Keogh for the leaf video.
66

33

Tutorial | Time-Series with Matlab

Applications (Motion Capture)


Motion-Capture (MOCAP) Data (Movies, Games)
Track position of several joints over time
3*17 joints = 51 parameters per frame

MOCAPdata
data
MOCAP
my
precious
my precious

67

Tutorial | Time-Series with Matlab

Applications (Video)
Video-tracking / Surveillance
Visual tracking of body features (2D time-series)
Sign Language recognition (3D time-series)
Video Tracking of body feature
over time (Athens1, Athens2)

68

34

Tutorial | Time-Series with Matlab

Time-Series and Matlab


Time-series can be represented as vectors or arrays
Fast vector manipulation
Most linear operations (eg euclidean distance, correlation) can
be trivially vectorized

Easy visualization
Many built-in functions
Specialized Toolboxes

69

Tutorial | Time-Series with Matlab

Becoming
Becoming sufficiently
sufficiently
familiar
familiar with
with something
something
is
a
substitute
is a substitute for
for
understanding
understanding it.
it.

PART II: Time Series Matching


Introduction

70

35

Tutorial | Time-Series with Matlab

Basic Data-Mining problem


Todays databases are becoming too large. Search is difficult.
How can we overcome this obstacle?
Basic structure of data-mining solution:
Represent data in a new format
Search few data in the new representation
Examine even fewer original data
Provide guarantees about the search results
Provide some type of data/result visualization

71

Tutorial | Time-Series with Matlab

Basic Time-Series Matching Problem


Distance
query
D = 7.3

Linear Scan:
Objective: Compare the query with
all sequences in DB and return
the k most similar sequences to
the query.
Database
Databasewith
withtime-series:
time-series:
Medical
sequences
Medical sequences
Images,
Images,etc
etc
Sequence
Length:100-1000pts
Sequence Length:100-1000pts
DB
DBSize:
Size:11TByte
TByte

D = 10.2

D = 11.8

D = 17

D = 22

72

36

Tutorial | Time-Series with Matlab

What other problems can we solve?


Clustering: Place time-series into similar groups

Classification: To which group is a time-series most similar to?


query
?

?
?

73

Tutorial | Time-Series with Matlab

Hierarchical Clustering
 Very generic & powerful tool
 Provides visual data grouping

Pairwise
distances
D1,1
D2,1

DM,N

1. Merge objects with


smallest distance
2. Reevaluate distances
3. Repeat process

Z = linkage(D);
H = dendrogram(Z);

74

37

Tutorial | Time-Series with Matlab

Partitional Clustering
 Faster than hierarchical clustering
 Typically provides suboptimal solutions (local minima)
 Not good performance for high dimensions

K-Means Algorithm:

0.9

0.8

1. Initialize k clusters (k specified


by user) randomly.

0.7

0.6

2. Repeat until convergence

0.5

1. Assign each object to the


nearest cluster center.

0.4

0.3

2. Re-estimate cluster centers.

0.2

0.1

0.2

0.3

0.4

See: kmeans

0.5

0.6

0.7

0.8

0.9

75

Tutorial | Time-Series with Matlab

K-Means Demo

1.4
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.5

0.5

1.5

76

38

Tutorial | Time-Series with Matlab

K-Means Clustering for Time-Series


 So how is kMeans applied for Time-Series that are high-dimensional?
 Perform kMeans on a compressed dimensionality

Original
sequences

Compressed
sequences

Clustering
space
0.4

0.2

-0.2

-0.4

-0.6

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

77

Tutorial | Time-Series with Matlab

Classification
Typically classification can be made easier if we have clustered the objects
Class A
0.4

0.2

-0.2

Project query in the


new space and find
its closest cluster

So, query Q is more


similar to class B

-0.4

-0.6

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

Class B

78

39

Tutorial | Time-Series with Matlab

Nearest Neighbor Classification


We need not perform clustering before classification. We can classify an
object based on the class majority of its nearest neighbors/matches.

Elfs
Hobbits

10
9

Hair Length

8
7
6
5
4
3
2
1
1

9 10

Height
79

Tutorial | Time-Series with Matlab

Example

What do we need?
1. Define Similarity
2. Search fast
Dimensionality Reduction
(compress data)
80

40

Tutorial | Time-Series with Matlab

All
All models
models are
are wrong,
wrong,
but
some
are
useful
but some are useful

PART II: Time Series Matching


Similarity/Distance functions

81

Tutorial | Time-Series with Matlab

Notion of Similarity I
 Solution to any time-series problem, boils down to a proper
definition of *similarity*

Similarity is always subjective.


(i.e. it depends on the application)

82

41

Tutorial | Time-Series with Matlab

Notion of Similarity II
Similarity depends on the features we consider
(i.e. how we will describe or compress the sequences)

83

Tutorial | Time-Series with Matlab

Metric and Non-metric Distance Functions


Distance functions
Metric

Non-Metric

 Euclidean Distance

 Time Warping

 Correlation

 LCSS

Properties
Positivity:
Positivity:d(x,y)
d(x,y)0
0and
andd(x,y)=0,
d(x,y)=0,ififx=y
x=y
Symmetry:
Symmetry:d(x,y)
d(x,y)==d(y,x)
d(y,x)

IfIfany
anyof
ofthese
theseis
isnot
not
obeyed
then
the
obeyed then thedistance
distance
is
isaanon-metric
non-metric

Triangle
TriangleInequality:
Inequality:d(x,z)
d(x,z)d(x,y)
d(x,y)++d(y,z)
d(y,z)
84

42

Tutorial | Time-Series with Matlab

Triangle Inequality
Triangle
TriangleInequality:
Inequality:d(x,z)
d(x,z)d(x,y)
d(x,y)++d(y,z)
d(y,z)

z
Metric distance
functions can exploit
the triangle inequality
to speed-up search

Intuitively, if:
- x is similar to y and,
- y is similar to z, then,
- x is similar to z too.

85

Tutorial | Time-Series with Matlab

Triangle Inequality (Importance)


Triangle
TriangleInequality:
Inequality:d(x,z)
d(x,z)d(x,y)
d(x,y)++d(y,z)
d(y,z)

Assume:

d(Q,bestMatch) = 20

and

d(Q,B) =150

Then, since d(A,B)=20

d(Q,A) d(Q,B) d(B,A)


d(Q,A) 150 20 = 130

So we dont have to retrieve A from disk

20

110

20

90

110

90

0
86

43

Tutorial | Time-Series with Matlab

Non-Metric Distance Functions

Man
similar to
bat??

Matching
Matchingflexibility
flexibility
Robustness
Robustnessto
tooutliers
outliers

Bat
similar to
batman

Stretching
Stretchingin
intime/space
time/space
Support
Supportfor
fordifferent
differentsizes/lengths
sizes/lengths
Batman
similar
to man

Speeding-up
Speeding-upsearch
searchcan
canbe
be
difficult
difficult

87

Tutorial | Time-Series with Matlab

Euclidean Distance
 Most widely used distance measure
n

 Definition: L2 =

(a[i] b[i])

i =1

20

40

60

80

100

L2 = sqrt(sum((asqrt(sum((a-b).^2)); % return Euclidean distance

88

44

Tutorial | Time-Series with Matlab

Euclidean Distance (Vectorization)


Question: If I want to compare many sequences to each other do I have to
use a for-loop?
Answer: No, one can use the following equation to perform matrix
computations only

||A-B|| = sqrt ( ||A||2 + ||B||2 - 2*A.B )


M sequences

B: DxN matrix
Result is MxN matrix

A=

Of length D

A: DxM matrix

result
D1,1
D2,1

DM,N

aa=
.*b); ab=a'*b;
aa=sum(a.*a);
sum(a.*a); bb=sum(b
bb=sum(b.*b);
ab=a'*b;
d = sqrt(repmat(aa',[1 size(bb,2)])
);
size(bb,2)]) + repmat(bb,[size(aa,2)
repmat(bb,[size(aa,2) 1]) - 2*ab
2*ab);
89

Tutorial | Time-Series with Matlab

Data Preprocessing (Baseline Removal)

average value of A
average value of B

a = a mean(a);
mean(a);

90

45

Tutorial | Time-Series with Matlab

Data Preprocessing (Rescaling)

a = a ./ std(a);
std(a);

91

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (Motivation)


Euclidean distance or warping cannot compensate for small distortions in
time axis.
A
B

According to Euclidean distance


B is more similar to A than to C

C
Solution: Allow for compression & decompression in time

92

46

Tutorial | Time-Series with Matlab

Dynamic Time-Warping
First used in speech recognition
for recognizing words spoken at
different speeds

Same idea can work equally well for


generic time-series data

---Maat--llaabb-------------------

----Mat-lab--------------------------

93

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (how does it work?)


The intuition is that we copy an element multiple times so as to achieve a
better matching

Euclidean
Euclideandistance
distance
T1
=
[1,
1,
T1 = [1, 1,2,2,2]
2]
dd==11
T2
T2==[1,
[1,2,2,2,2,2]
2]
One-to-one linear alignment

Warping
Warpingdistance
distance
T1
=
[1,
1,
T1 = [1, 1,2,2,2]
2]
dd==00
T2
T2==[1,
[1,2,2,2,2,2]
2]
One-to-many non-linear alignment

94

47

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (implementation)


It is implemented using dynamic programming. Create an array that stores
all solutions for all possible subsequences.

c(i,j)
)+
c(i,j) ==D(A
D(Ai,B
i,Bj j) +
min{
c(i-1,j-1)
min{ c(i-1,j-1), ,c(i-1,j
c(i-1,j)), ,c(i,j-1)
c(i,j-1)}}
Recursive equation

95

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (Examples)


So does it work better than Euclidean? Well yes! After all it is more costly..
Dynamic Time Warping
18
20
17
13
16
14
12
19
15
11
3
9
8
7
5
6
2
10
4
1

Euclidean Distance
18
16
7
13
14
3
9
6
2
15
11
19
10
20
17
5
12
8
4
1

MIT arrhythmia database


96

48

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (Can we speed it up?)


Complexity is O(n2). We can reduce it to O(n) simply by restricting the
warping path.

A
We now only fill only a small
portion of the array

Minimum
Bounding
Envelope
(MBE)
97

Tutorial | Time-Series with Matlab

Dynamic Time-Warping (restricted warping)


Camera-Mouse dataset

The restriction of the warping path helps:


A. Speed-up execution
B. Avoid extreme (degenerate) matchings
C. Improve clustering/classification
accuracy
Classification Accuracy
Camera Mouse

Australian Sign Language

10% warping is adequate

Warping Length
98

49

Tutorial | Time-Series with Matlab

Longest Common Subsequence (LCSS)


With Time Warping extreme values (outliers) can destroy the distance
estimates. The LCSS model can offer more resilience to noise and impose
spatial constraints too.
ignore majority
of noise

match
match

Matching within time and in space


Everything that is outside the bounding
envelope can never be matched

99

Tutorial | Time-Series with Matlab

Longest Common Subsequence (LCSS)


LCSS is more resilient to noise than DTW.

Disadvantages of DTW:
A. All points are matched
B. Outliers can distort distance
ignore majority
of noise

C. One-to-many mapping

Advantages of LCSS:
A. Outlying values not matched
B. Distance/Similarity distorted less
C. Constraints in time & space

match
match

100

50

Tutorial | Time-Series with Matlab

Longest Common Subsequence (Implementation)


Similar dynamic programming solution as DTW, but now we measure
similarity not distance.

Can also be expressed as distance

101

Tutorial | Time-Series with Matlab

Distance Measure Comparison


Dataset
Camera-Mouse

ASL

ASL+noise

Method

Time (sec)

Accuracy

Euclidean

34

20%

DTW

237

80%

LCSS

210

100%

Euclidean

2.2

33%

DTW

9.1

44%

LCSS

8.2

46%

Euclidean

2.1

11%

DTW

9.3

15%

LCSS

8.3

31%

LCSS offers enhanced robustness under noisy conditions

102

51

Tutorial | Time-Series with Matlab

Distance Measure Comparison (Overview)


Method

Complexity

Elastic Matching

One-to-one Matching

Noise
Robustness

O(n)

DTW

O(n*)

LCSS

O(n*)

Euclidean

103

Tutorial | Time-Series with Matlab

PART II: Time Series Matching


Lower Bounding

104

52

Tutorial | Time-Series with Matlab

Basic Time-Series Problem Revisited

Objective: Instead of comparing the query to the


original sequences (Linear Scan/LS) , lets compare
the query to simplified versions of the DB timeseries.

query

This
ThisDB
DBcan
cantypically
typically
fit
fitin
inmemory
memory

105

Tutorial | Time-Series with Matlab

Compression Dimensionality Reduction


Project all sequences into a new space, and
search this space instead (eg project timeseries from 100-D space to 2-D space)

Feature 1

A
B
C

Feature 2
query

One can also organize the low-dimensional


points into a hierarchical index structure. In
this tutorial we will not go over indexing
techniques.

Question: When searching the original space it is guaranteed that we


will find the best match. Does this hold (or under which circumstances)
in the new compressed space?

106

53

Tutorial | Time-Series with Matlab

Concept of Lower Bounding




You can guarantee similar results to Linear Scan in the original


dimensionality, as long as you provide a Lower Bounding (LB) function
(in low dim) to the original distance (high dim.)
GEMINI, GEneric Multimedia INdexIng
So, for projection from high dim. (N) to low dim. (n): Aa, Bb etc

DDLB (a,b)
<= D (A,B)
LB (a,b) <= Dtrue
true(A,B)
5

Projection onto X-axis


C B

4
0

C
3

EF
4

Projection on some other axis

0
0

D
2

False alarm (not a problem)

B C

EF

False dismissal (bad!)

Find everything within range of 1 from A


107

Tutorial | Time-Series with Matlab

Generic Search using Lower Bounding


simplified
DB

original
DB

Answer
Superset

Final
Answer
set

Verify
against
original
DB

simplified
query

query
108

54

Tutorial | Time-Series with Matlab

Lower Bounding Example


sequences

query

109

Tutorial | Time-Series with Matlab

Lower Bounding Example


sequences

query

110

55

Tutorial | Time-Series with Matlab

Lower Bounding Example


sequences

Lower Bounds
4.6399
37.9032
19.5174
72.1846
67.1436
78.0920
70.9273
63.7253
1.4121

111

Tutorial | Time-Series with Matlab

Lower Bounding Example


sequences

Lower Bounds

True Distance

4.6399

46.7790

37.9032

108.8856

19.5174

113.5873

72.1846

104.5062

67.1436

119.4087

78.0920

120.0066

70.9273

111.6011

63.7253

119.0635

1.4121

17.2540

BestSoFar

112

56

Tutorial | Time-Series with Matlab

Lower Bounding the Euclidean distance


There are many dimensionality reduction (compression ) techniques for time-series
data. The following ones can be used to lower bound the Euclidean distance.

20 40 60 80 100 120

DFT

20 40 60 80 100 120

DWT

20 40 60 80 100 120

SVD

20 40 60 80 100 120

APCA

20 40 60 80 100 120

PAA

20 40 60 80 100 120

PLA

Figure by Eamonn Keogh, Time-Series Tutorial

113

Tutorial | Time-Series with Matlab

Fourier Decomposition
Decompose a time-series into sum of sine waves
DFT:
IDFT:

Everysignal
signalcan
can
Every
be
represented
as
be represented as
a
superposition
of
a superposition of
sinesand
andcosines
cosines
sines
(alas
nobody
(alas nobody
believesme)
me)
believes

114

57

Tutorial | Time-Series with Matlab

Fourier Decomposition
Decompose a time-series into sum of sine waves
DFT:
IDFT:

fa = fft(a);
fft(a); % Fourier decomposition
fa(5:end) = 0; % keep first 5 coefficients (low frequencies)
reconstr = real(ifft(fa));
real(ifft(fa)); % reconstruct signal

X(f)

x(n)

-0.3633

-0.4446

-0.6280 + 0.2709i

-0.9864

-0.4929 + 0.0399i

-0.3254

-1.0143 + 0.9520i

-0.6938

0.7200 - 1.0571i

-0.1086

-0.0411 + 0.1674i

-0.3470

-0.5120 - 0.3572i

0.5849

0.9860 + 0.8043i

1.5927

-0.3680 - 0.1296i

-0.9430

-0.0517 - 0.0830i

-0.3037

-0.9158 + 0.4481i

-0.7805

1.1212 - 0.6795i

-0.1953

0.2667 + 0.1100i

-0.3037

0.2667 - 0.1100i

0.2381

1.1212 + 0.6795i

2.8389

-0.9158 - 0.4481i

-0.7046

-0.0517 + 0.0830i

-0.5529

-0.3680 + 0.1296i

-0.6721

0.9860 - 0.8043i

0.1189

-0.5120 + 0.3572i

0.2706

-0.0411 - 0.1674i

-0.0003

0.7200 + 1.0571i

1.3976

-1.0143 - 0.9520i

-0.4987

-0.4929 - 0.0399i

-0.2387

-0.6280 - 0.2709i

-0.7588

Life is complex, it has both real and imaginary parts.

115

Tutorial | Time-Series with Matlab

Fourier Decomposition
How much space we gain by compressing random walk data?

Reconstruction using 1coefficients

-5
50

100

150

200

250

 1 coeff > 60% of energy


 10 coeff > 90% of energy

116

58

Tutorial | Time-Series with Matlab

Fourier Decomposition
How much space we gain by compressing random walk data?

Reconstruction using 2coefficients

-5
50

100

150

200

250

 1 coeff > 60% of energy


 10 coeff > 90% of energy

117

Tutorial | Time-Series with Matlab

Fourier Decomposition
How much space we gain by compressing random walk data?

Reconstruction using 7coefficients

-5
50

100

150

200

250

 1 coeff > 60% of energy


 10 coeff > 90% of energy

118

59

Tutorial | Time-Series with Matlab

Fourier Decomposition
How much space we gain by compressing random walk data?

Reconstruction using 20coefficients

-5
50

100

150

200

250

 1 coeff > 60% of energy


 10 coeff > 90% of energy

119

Tutorial | Time-Series with Matlab

Fourier Decomposition
How much space we gain by compressing random walk data?
Energy Percentage

Error
1

1500

0.95
0.9
1000

0.85
0.8
0.75

500

0.7
0.65
0

20

40

60
80
Coefficients

100

120

20

40

60
80
Coefficients

100

120

 1 coeff > 60% of energy


 10 coeff > 90% of energy

120

60

Tutorial | Time-Series with Matlab

Fourier Decomposition
Which coefficients are important?
We can measure the energy of each coefficient
Energy = Real(X(fk))2 + Imag(X(fk))2
Most of data-mining research
uses first k coefficients:


Good for random walk


signals (eg stock market)

Easy to index

Not good for general signals

fa = fft(a);
fft(a); % Fourier decomposition
N = length(a);
length(a); % how many?
fa = fa(1:ceil(N/2));
fa(1:ceil(N/2)); % keep first half only
mag = 2*abs(fa).^2
; % calculate energy
2*abs(fa).^2;
121

Tutorial | Time-Series with Matlab

Fourier Decomposition
Which coefficients are important?
We can measure the energy of each coefficient
Energy = Real(X(fk))2 + Imag(X(fk))2
Usage of the coefficients with
highest energy:


Good for all types of signals

Believed to be difficult to
index

CAN be indexed using


metric trees

122

61

Tutorial | Time-Series with Matlab

Code for Reconstructed Sequence


a = load('randomWalk.dat');
a = aa-mean(a)/std(a);

X(f)
0
-0.6280 + 0.2709i

keep

% zz-normalization

-0.4929 + 0.0399i
-1.0143 + 0.9520i
0.7200 - 1.0571i

fa = fft(a);

-0.0411 + 0.1674i

maxInd = ceil(length(a)/2);
N = length(a);

-0.5120 - 0.3572i

% until the middle

0.9860 + 0.8043i
-0.3680 - 0.1296i

energy = zeros(maxIndzeros(maxInd-1, 1);


E = sum(a.^2);

-0.0517 - 0.0830i

% energy of a

-0.9158 + 0.4481i
1.1212 - 0.6795i

for ind=2:maxInd,
fa_N = fa;
fa_N(ind+1:Nfa_N(ind+1:N-ind+1) = 0;
r = real(ifft(fa_N));

end

Ignore
% copy fourier
% zero out unused
% reconstruction

0.2667 + 0.1100i
0.2667 - 0.1100i
1.1212 + 0.6795i
-0.9158 - 0.4481i
-0.0517 + 0.0830i

plot(r, 'r','LineWidth',2); hold on;


plot(a,'k');
title(['Reconstruction using ' num2str(indnum2str(ind-1) 'coefficients']);
set(gca,'plotboxaspectratio', [3 1 1]);
axis tight
pause;
% wait for key
cla;
% clear axis
keep

-0.3680 + 0.1296i
0.9860 - 0.8043i
-0.5120 + 0.3572i
-0.0411 - 0.1674i
0.7200 + 1.0571i
-1.0143 - 0.9520i
-0.4929 - 0.0399i
-0.6280 - 0.2709i
123

Tutorial | Time-Series with Matlab

Code for Plotting the Error


a = load('randomWalk.dat');
a = aa-mean(a)/std(a);
fa = fft(a);
maxInd = ceil(length(a)/2);
N = length(a);
energy = zeros(maxIndzeros(maxInd-1, 1);
E = sum(a.^2);
for ind=2:maxInd,
fa_N = fa;
fa_N(ind+1:Nfa_N(ind+1:N-ind+1) = 0;
r = real(ifft(fa_N));

% zz-normalization
This is the same
% until the middle

% energy of a

% copy fourier
% zero out unused
% reconstruction

energy(indenergy(ind-1) = sum(r.^2); % energy of reconstruction


error(inderror(ind-1) = sum(abs(rsum(abs(r-a).^2); % error
end
E = ones(maxIndones(maxInd-1, 1)*E;
error = E - energy;
ratio = energy ./ E;
subplot(1,2,1);
% left plot
plot([1:maxIndplot([1:maxInd-1], error, 'r', 'LineWidth',1.5);
subplot(1,2,2);
% right plot
plot([1:maxIndplot([1:maxInd-1], ratio, 'b', 'LineWidth',1.5);
124

62

Tutorial | Time-Series with Matlab

Lower Bounding using Fourier coefficients


Parsevals Theorem states that energy in the frequency domain equals the
energy in the time domain:

Euclidean distance

or, that

If we just keep some of the coefficients, their sum of squares always


underestimates (ie lower bounds) the Euclidean distance:

125

Tutorial | Time-Series with Matlab

Lower Bounding using Fourier coefficients -Example

x
y

Note the normalization


x = cumsum(randn(100,1));
cumsum(randn(100,1));
y = cumsum(randn(100,1));
cumsum(randn(100,1));
euclid_Time = sqrt(sum((xsqrt(sum((x-y).^2));

120.9051

fx = fft(x)/sqrt(length(x));
fft(x)/sqrt(length(x));
fy = fft(y)/sqrt(length(x));
fft(y)/sqrt(length(x));
euclid_Freq = sqrt(sum(abs(fx - fy).^2));
fy).^2));

Keeping 10 coefficients
the distance is:
115.5556 < 120.9051

120.9051
126

63

Tutorial | Time-Series with Matlab

Fourier Decomposition

 O(nlogn)
O(nlogn)complexity
complexity
 Tried
Triedand
andtested
tested
 Hardware
Hardwareimplementations
implementations
 Many
Manyapplications:
applications:
compression
compression

 Not
Notgood
goodapproximation
approximationfor
for
bursty
signals
bursty signals
 Not
Notgood
goodapproximation
approximationfor
for
signals
with
signals withflat
flatand
andbusy
busy
sections
sections
(requires
(requiresmany
manycoefficients)
coefficients)

smoothing
smoothing
periodicity
periodicitydetection
detection

127

Tutorial | Time-Series with Matlab

Wavelets Why exist?


 Similar concept with Fourier decomposition
 Fourier coefficients represent global contributions,
wavelets are localized

Fourier is good for smooth, random walk data,


but not for bursty data or flat data

128

64

Tutorial | Time-Series with Matlab

Wavelets (Haar) - Intuition


 Wavelet coefficients, still represent an inner product
(projection) of the signal with some basis functions.
 These functions have lengths that are powers of two (full
sequence length, half, quarter etc)
An arithmetic example

c-d00

X = [9,7,3,5]
c+d00
D

etc

Haar coefficients: {c, d00, d10, d11,}

Haar = [6,2,1,-1]

c = 6 = (9+7+3+5)/4
c + d00 = 6+2 = 8 = (9+7)/2
c - d00 = 6-2 = 4 = (3+5)/2
etc

See also:wavemenu

129

Tutorial | Time-Series with Matlab

Wavelets in Matlab
Specialized Matlab interface
for wavelets

See also:wavemenu

130

65

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 1coefficients
2
1
0
-1
-2
50

100

150

200

250

131

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 2coefficients
2
1
0
-1
-2
50

100

150

200

250

132

66

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 4coefficients
2
1
0
-1
-2
50

100

150

200

250

133

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 8coefficients
2
1
0
-1
-2
50

100

150

200

250

134

67

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 16coefficients
2
1
0
-1
-2
50

100

150

200

250

135

Tutorial | Time-Series with Matlab

PAA (Piecewise Aggregate Approximation)


also featured as Piecewise Constant Approximation

 Represent time-series as a sequence of segments


 Essentially a projection of the Haar coefficients in time
Reconstruction using 32coefficients
2
1
0
-1
-2
50

100

150

200

250

136

68

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (Nx1 or Nx1)
% numCoeff: number of PAA segments
% data: PAA sequence (Nx1)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
data = repmat(avg, segLen, 1);
data = data(:);

%
%
%
%

break in segments
average segments
expand segments
make column

numCoeff

137

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (Nx1 or Nx1)
% numCoeff: number of PAA segments
% data: PAA sequence (Nx1)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
data = repmat(avg, segLen, 1);
data = data(:);

%
%
%
%

N=8
segLen = 2

break in segments
average segments
expand segments
make column

numCoeff

138

69

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (Nx1 or Nx1)
% numCoeff: number of PAA segments
% data: PAA sequence (Nx1)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer
2

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
data = repmat(avg, segLen, 1);
data = data(:);

s
sN

N=8
segLen = 2

%
%
%
%

break in segments
average segments
expand segments
make column

numCoeff

139

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (Nx1 or Nx1)
% numCoeff: number of PAA segments
% data: PAA sequence (Nx1)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
data = repmat(avg, segLen, 1);
data = data(:);

sN

1.5

3.5

5.5

7.5

avg

%
%
%
%

N=8
segLen = 2

break in segments
average segments
expand segments
make column

numCoeff

140

70

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (1xN)
% numCoeff: number of PAA segments
% data: PAA sequence (1xN)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
2
data = repmat(avg, segLen, 1);
data = data(:)
data(:);

sN

1.5

3.5

5.5

7.5

avg

%
%
%
%

N=8
segLen = 2

break in segments
average segments
expand segments
make row

numCoeff

data

1.5

3.5

5.5

7.5

1.5

3.5

5.5

7.5

141

Tutorial | Time-Series with Matlab

PAA Matlab Code


function data = paa(s, numCoeff)
% PAA(s, numcoeff)
% s: sequence vector (1xN)
% numCoeff: number of PAA segments
% data: PAA sequence (1xN)
N = length(s);
segLen = N/numCoeff;

% length of sequence
% assume it's integer

sN = reshape(s, segLen, numCoeff);


avg = mean(sN);
data = repmat(avg, segLen, 1);
data = data(:)
data(:);

sN

1.5

3.5

5.5

7.5

avg

%
%
%
%

break in segments
average segments
expand segments
make row

numCoeff

data
data

N=8
segLen = 2

1.5

3.5

5.5

7.5

1.5

3.5

5.5

7.5

1.5

1.5

3.5

3.5

5.5

5.5

7.5

7.5
142

71

Tutorial | Time-Series with Matlab

APCA (Adaptive Piecewise Constant Approximation)


PAA

 Not all haar/PAA coefficients


are equally important

Segments of
equal size

 Intuition: Keep ones with the


highest energy
 Segments of variable length

APCA
Segments of
variable size

 APCA is good for bursty


signals
 PAA requires 1 number per
segment, APCA requires
2: [value, length]
E.g. 10 bits for a
sequence of 1024 points
143

Tutorial | Time-Series with Matlab

Wavelet Decomposition

 O(n)
O(n)complexity
complexity
 Hierarchical
Hierarchicalstructure
structure
 Progressive
Progressivetransmission
transmission

 Most
Mostdata-mining
data-miningresearch
research
still
utilizes
still utilizesHaar
Haarwavelets
wavelets
because
becauseof
oftheir
theirsimplicity.
simplicity.

 Better
Betterlocalization
localization
 Good
Goodfor
forbursty
burstysignals
signals
 Many
Manyapplications:
applications:
compression
compression
periodicity
periodicitydetection
detection
144

72

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation
 Many implementations
Optimal
Greedy Bottom-Up
Greedy Top-down
Genetic, etc
 You can find a bottom-up implementation here:
http://www.cs.ucr.edu/~eamonn/TSDMA/time_series_toolbox/
145

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation

146

73

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation

147

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation

148

74

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation

149

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)


 Approximate a sequence
with multiple linear
segments
 First such algorithms
appeared in cartography
for map approximation

150

75

Tutorial | Time-Series with Matlab

Piecewise Linear Approximation (PLA)

 O(nlogn)
O(nlogn)complexity
complexityfor
for
bottom
up
bottom upalgorithm
algorithm

 Visually
Visuallynot
notvery
verysmooth
smoothor
or
pleasing.
pleasing.

 Incremental
Incrementalcomputation
computation
possible
possible
 Provable
Provableerror
errorbounds
bounds
 Applications
Applicationsfor:
for:
Image
Image//signal
signal
simplification
simplification
Trend
Trenddetection
detection
151

Tutorial | Time-Series with Matlab

Singular Value Decomposition (SVD)


 SVD attempts to find the optimal basis for describing a set
of multidimensional points
 Objective: Find the axis (directions) that describe better the
data variance

y
We need 2 numbers (x,y)
for every point

y
Now we can describe each
point with 1 number, their
projection on the line

New axis and position of points


(after projection and rotation)

152

76

Tutorial | Time-Series with Matlab

Singular Value Decomposition (SVD)


 Each time-series is essentially a multidimensional point
 Objective: Find the eigenwaves (basis) whose linear
combination describes best the sequences. Eigenwaves are
data-dependent.

AMxn = UMxr * rxr * VTnxr

eigenwave 0

Factoring of data array into 3


matrices
eigenwave 1

each of length n

eigenwave 4

A linear combination of the


eigenwaves can produce any
sequence in the database

M sequences

eigenwave 3

[U,S,V]
U,S,V] = svd(A)
svd(A)

153

Tutorial | Time-Series with Matlab

Singular Value Decomposition

 Optimal
Optimaldimensionality
dimensionality
reduction
reductionin
inEuclidean
Euclidean
distance
sense
distance sense

 Cannot
Cannotbe
beapplied
appliedfor
forjust
just
one
sequence.
A
set
one sequence. A setof
of
sequences
sequencesis
isrequired.
required.

 SVD
SVDis
isaavery
verypowerful
powerfultool
tool
in
inmany
manydomains:
domains:
Websearch
Websearch(PageRank)
(PageRank)

 Addition
Additionof
ofaasequence
sequencein
in
database
databaserequires
requires
recomputation
recomputation
 Very
Verycostly
costlyto
tocompute.
compute.
2
2
Time:
min{
O(M
Time: min{ O(M2n),
n),O(Mn
O(Mn2)})}
Space:
Space:O(Mn)
O(Mn)
MMsequences
sequencesof
oflength
lengthnn

154

77

Tutorial | Time-Series with Matlab

Symbolic Approximation
 Assign a different symbol based on range of values
 Find ranges either from data histogram or uniformly

c
c

c
b

b
a

20

a
40

60

80

100

120

baabccbc
 You can find an implementation here:
http://www.ise.gmu.edu/~jessica/sax.htm
155

Tutorial | Time-Series with Matlab

Symbolic Approximations

 Linear
Linearcomplexity
complexity
 After
Aftersymbolization
symbolizationmany
many
tools
toolsfrom
frombioinformatics
bioinformatics
can
canbe
beused
used
Markov
Markovmodels
models

 Number
Numberof
ofregions
regions
(alphabet
(alphabetlength)
length)can
canaffect
affect
the
quality
of
result
the quality of result

Suffix-Trees,
Suffix-Trees,etc
etc

156

78

Tutorial | Time-Series with Matlab

Multidimensional Time-Series

Ari,are
areyou
yousure
surethe
the
Ari,
worldisisnot
not1D?
1D?
world

 Catching momentum lately


 Applications for mobile trajectories, sensor
networks, epidemiology, etc

 Lets see how to approximate 2D


trajectories with
Minimum Bounding Rectangles

Aristotle

157

Tutorial | Time-Series with Matlab

Multidimensional MBRs
Find Bounding rectangles that completely contain a trajectory
given some optimization criteria (eg minimize volume)

On my income tax 1040 it says "Check this box


if you are blind." I wanted to put a check mark
about three inches away.
- Tom Lehrer
158

79

Tutorial | Time-Series with Matlab

Comparison of different Dim. Reduction Techniques

159

Tutorial | Time-Series with Matlab

So which dimensionality reduction is the best?


APCAisis
APCA
better
better

Fourierisis
Fourier
good
good

1993

PAA!
PAA!

thanPAA!
PAA!
than

2000

2001

Chebyshev
Chebyshev
better
isisbetter
thanAPCA!
APCA!
than

2004

The
The
future
future isis
symbolic!
symbolic!

2005

Absence of proof is no proof of absence.


- Michael Crichton
160

80

Tutorial | Time-Series with Matlab

Comparisons
Lets see how tight the lower bounds are for a variety on 65 datasets
Average Lower Bound

A. No approach
is better on all
datasets

Median Lower Bound

B. Best coeff.
techniques
can offer
tighter
bounds
C. Choice of
compression
depends on
application

Note: similar results also reported by Keogh in SIGKDD02

161

Tutorial | Time-Series with Matlab

PART II: Time Series Matching


Lower Bounding the DTW and LCSS

162

81

Tutorial | Time-Series with Matlab

Lower Bounding the Dynamic Time Warping


Recent approaches use the Minimum Bounding Envelope
for bounding the DTW
Create Minimum Bounding Envelope (MBE) of query Q
Calculate distance between MBE of Q and any sequence A
One can show that: D(MBE(Q)
D(MBE(Q),A) < DTW(Q,A)
DTW(Q,A)
LB = sqrt(sum([[A > U].* [A-U]; [A < L].* [L-A]].^2));
One Matlab command!
U

MBE(Q)
A

However, this representation


is uncompressed. Both MBE
and the DB sequence can be
compressed using any of the
previously mentioned
techniques.
163

Tutorial | Time-Series with Matlab

Lower Bounding the Dynamic Time Warping


LB by Keogh
approximate MBE and
sequence using MBRs
LB = 13.84
Q

LB by Zhu and Shasha


approximate MBE and
sequence using PAA
LB = 25.41
Q
A

164

82

Tutorial | Time-Series with Matlab

Lower Bounding the Dynamic Time Warping


An even tighter lower bound can be achieved by warping the MBE
approximation against any other compressed signal.
LB_Warp = 29.05

Lower Bounding approaches for DTW,


will typically yield at least an order of
magnitude speed improvement
compared to the nave approach.
Lets compare the 3 LB approaches:
165

Tutorial | Time-Series with Matlab

Time Comparisons
We will use DTW (and the corresponding LBs) for recognition of hand-written
digits/shapes.

Accuracy: Using DTW we can achieve recognition above 90%.


Running Time: runTime LB_Warp < runTime LB_Zhu < runTime LB-Keogh
Pruning Power: For some queries LB_Warp can examine up to 65 time
fewer sequences
166

83

Tutorial | Time-Series with Matlab

Upper Bounding the LCSS


Since LCSS measures similarity and similarity is the inverse of distance, to
speed up LCSS we need to upper bound it.

LCSS(MBE
LCSS(MBEQQ,A)
,A)>=
>=LCSS(Q,A)
LCSS(Q,A)
Query

Indexed Sequence

Sim.=50/77
= 0.64

44 points

6 points
167

Tutorial | Time-Series with Matlab

LCSS Application Image Handwriting




Library of Congress has 54 million


manuscripts (20TB of text)

Increasing interest for automatic


transcribing

Word annotation:
1.1.Extract
Extractwords
wordsfrom
fromdocument
document
2.2.Extract
image
features
Extract image features
3.3.Annotate
Annotateaasubset
subsetof
ofwords
words
4.4.Classify
Classifyremaining
remainingwords
words

Feature Value

0.8
0.6
0.4
0.2
0

50

100

150

200
Column

250

300

350

400

Features:

George Washington Manuscript

- Black pixels / column


- Ink-paper transitions/ col , etc
168

84

Tutorial | Time-Series with Matlab

LCSS Application Image Handwriting


Utilized 2D time-series (2 features)
Returned 3-Nearest Neighbors of following words
Classification accuracy > 70%

169

Tutorial | Time-Series with Matlab

PART II: Time Series Analysis


Test Case and Structural Similarity Measures

170

85

Tutorial | Time-Series with Matlab

Analyzing Time-Series Weblogs

PKDD 2005

Porto

Weblog of user
requests over
time

Priceline

171

Tutorial | Time-Series with Matlab

Weblog Data Representation


Record
We
can aggregate information, eg, number of requests per day for each
keyword
Query: Spiderman
Requests

May 2002. Spiderman 1


was released in theaters

Jan

Feb Mar

Apr May Jun

Jul

Aug Sep Okt

Capture trends and periodicities

Privacy preserving

Nov Dec

Google Zeitgeist

172

86

Tutorial | Time-Series with Matlab

Finding similar patterns in query logs


We can find useful patterns and correlation in the user demand
patterns which can be useful for:
 Search engine optimization
 Recommendations
 Advertisement pricing (e.g. keyword more expensive at the popular
months)

Requests

Query: xbox

Query: ps2
Jan

Feb Mar

Apr May

Jun

Jul

Aug Sep Okt

Nov Dec

Game consoles are more


popular closer to Christmas
173

Tutorial | Time-Series with Matlab

Matching of Weblog data


Use Euclidean distance to match time-series. But which dimensionality
reduction technique to use?
Lets look at the data:

Query Bach
1 year span

The data is smooth and highly


periodic, so we can use Fourier
decomposition.
Instead of using the first Fourier
coefficients we can use the best ones
instead.
Lets see how the approximation will
look:

Query stock market


174

87

Tutorial | Time-Series with Matlab

First Fourier Coefficients vs Best Fourier Coefficients

Using the best coefficients, provides a


very high quality approximation of the
original time-series

175

Tutorial | Time-Series with Matlab

Matching results I
Query = Lance Armstrong

2000

2001

2002

LeTour
0

2000

2001

2002

Tour De France
0

2000

2001

2002

176

88

Tutorial | Time-Series with Matlab

Matching results II
Query = Christmas

2000

2001

2002

Knn4: Christmas coloring


books
Knn8: Christmas baking
Knn12: Christmas clipart
Knn20: Santa Letters

177

Tutorial | Time-Series with Matlab

Finding Structural Matches


The Euclidean distance cannot distill all the potentially useful
information in the weblog data.
 Some data are periodic, while other are bursty. We will attempt to
provide similarity measures that are based on periodicity and
burstiness.
Query cinema. Weakly periodicity.
Peak of period every Friday.

Query Elvis. Burst in demand on


16th August. Death anniversary of
Elvis Presley

178

89

Tutorial | Time-Series with Matlab

Periodic Matching
Frequency

F ( x), F ( y )

Ignore Phase/
Keep important
components

Calculate
Distance

arg max || F ( x) ||, F ( x + )


k
arg max || F ( y ) ||, F ( y + )
k

cinema

D1 =|| F ( x + ) F ( y + ) ||

Periodogram

D2 =|| F ( x + ) F ( y + ) ||
stock

easter
0

10

15

20

25

30

35

40

45

50

10

15

20

25

30

35

40

45

50

christmas

179

Tutorial | Time-Series with Matlab

Matching Results with Periodic Measure


Now we can discover more flexible matches. We observe a clear
separation between seasonal and periodic sequences.

180

90

Tutorial | Time-Series with Matlab

Matching Results with Periodic Measure


Compute pairwise periodic distances and do a mapping of the
sequences on 2D using Multi-dimensional scaling (MDS).

181

Tutorial | Time-Series with Matlab

Matching Based on Bursts


Another method of performing structural matching can be achieved
using burst features of sequences.
Burst feature detection can be useful for:
 Identification of important events
 Query-by-burst
Harry Potter 2 (November 15 2002)

Harry Potter 1
(Movie)

50

100

Harry Potter 1
(DVD)

150

200

2002: Harry Potter demand

250

300

350

182

91

Tutorial | Time-Series with Matlab

Burst Detection
Burst detection is similar to anomaly detection.
 Create distribution of values (eg gaussian model)
 Any value that deviates from the observed distribution (eg more than 3
std) can be considered as burst.

Valentines
Day
Mothers
Day

183

Tutorial | Time-Series with Matlab

Query-by-burst
To perform query-by-burst we can perform the following steps:
1.

Find burst regions in given query

2.

Represent query bursts as time segments

3.

Find which sequences in DB have overlapping burst regions.

184

92

Tutorial | Time-Series with Matlab

Query-by-burst Results

Queries

www.nhc.noaa.gov

Pentagon attack

Cheap gifts

Matches
Nostradamus prediction

Tropical Storm

Scarfs

185

Tutorial | Time-Series with Matlab

Structural Similarity Measures


Periodic similarity achieves high clustering/classification accuracy in
ECG data
DTW
34
33
30
35
27
26
36
31
28
32
29
25
24
21
17
13
23
20
22
19
15
18
16
14
11
7
9
6
3
2
10
4
12
8
5
1

Periodic Measure

Incorrect
Grouping

36
35
33
28
27
26
32
34
30
31
29
25
18
23
20
19
17
24
22
16
14
15
21
13
12
8
2
7
11
5
9
3
10
6
4
1

186

93

Tutorial | Time-Series with Matlab

Structural Similarity Measures


Periodic similarity is a very powerful visualization tool.
Random W alk
Random W alk
Sunspots: 1869 to 1990
Sunspots: 1749 to 1869
Great Lakes (Ontario)
Great Lakes (Erie)
Power Demand: April-June (Dutch)
Power Demand: Jan-March (Dutch)
Power Demand: April-June (Italian)
Power Demand: Jan-March (Italian)
Random
Random
Video Surveillance: Eamonn, no gun
Video Surveillance: Eamonn, gun
Video Surveillance: Ann, no gun
Video Surveillance: Ann, gun
Koski ECG: fast 2
Koski ECG: fast 1
Koski ECG: slow 2
Koski ECG: slow 1
MotorCurrent: healthy 2
MotorCurrent: healthy 1
MotorCurrent: broken bars 2
MotorCurrent: broken bars 1

187

Tutorial | Time-Series with Matlab

Structural Similarity Measures


Burst correlation can provide useful insights for understanding which
sequences are related/connected. Applications for:
 Gene Expression Data
 Stock market data (identification of causal chains of events)
Query: Which stocks exhibited trading bursts during 9/11 attacks?

PRICELINE:
Stock value dropped

NICE SYSTEMS:
Stock value increased
(provider of air traffic
control systems)

188

94

Tutorial | Time-Series with Matlab

Conclusion
The traditional shape matching measures cannot address all timeseries matching problems and applications.
Structural distance measures can provide more flexibility.
There are many other exciting time-series problems that havent been
covered in this tutorial:
 Anomaly Detection

dontwant
wantto
to
IIdont
achieve
immortality
achieve immortality
throughmy
myworkI
workI
through
want
to
achieve
want to achieve itit
throughnot
notdying.
dying.
through

 Frequent pattern Discovery

 Rule Discovery
 etc
189

95

You might also like