KEMBAR78
Anomaly Detection | PDF | Json | Sequence
0% found this document useful (0 votes)
89 views17 pages

Anomaly Detection

The article presents Palisade, a distributed framework for real-time anomaly detection in embedded systems, designed to handle multiple detection algorithms for various anomalies. It includes a comprehensive taxonomy of anomaly symptoms and demonstrates the framework's effectiveness through case studies involving autonomous vehicles. Palisade aims to provide low-latency detection and is extensible for future anomaly detection needs.

Uploaded by

yashwanthb110606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views17 pages

Anomaly Detection

The article presents Palisade, a distributed framework for real-time anomaly detection in embedded systems, designed to handle multiple detection algorithms for various anomalies. It includes a comprehensive taxonomy of anomaly symptoms and demonstrates the framework's effectiveness through case studies involving autonomous vehicles. Palisade aims to provide low-latency detection and is extensible for future anomaly detection needs.

Uploaded by

yashwanthb110606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Journal of Systems Architecture 113 (2021) 101876

Contents lists available at ScienceDirect

Journal of Systems Architecture


journal homepage: www.elsevier.com/locate/sysarc

Palisade: A framework for anomaly detection in embedded systems


Sean Kauffman a ,∗, Murray Dunne a , Giovani Gracioli b , Waleed Khan a , Nirmal Benann a ,
Sebastian Fischmeister a
a Real-Time Embedded Software Group, University of Waterloo, Canada
b
Software/Hardware Integration Lab, Federal University of Santa Catarina, Brazil

ARTICLE INFO ABSTRACT

Keywords: In this article, we propose Palisade, a distributed framework for streaming anomaly detection. Palisade is
Anomaly detection motivated by the need to apply multiple detection algorithms for distinct anomalies in the same scenario. Our
Real-time embedded systems solution blends low latency detection with deployment flexibility and ease-of-modification. This work includes
Software architecture
a thorough description of the choices made in designing Palisade and the reasons for making those choices.
Streaming-based architecture
We carefully define symptoms of anomalies that may be detected, and we use this taxonomy in characterizing
our work. The article includes two case studies using a variety of anomaly detectors on streaming data to
demonstrate the effectiveness of our approach in an embedded setting.

1. Introduction In this article, we introduce a taxonomy of anomaly symptoms and a


framework for low-latency online detection of these anomaly symptoms
Real-time embedded systems are part of modern daily life and called Palisade. Our system is designed to monitor remote, real-time
are integral components of safety-critical devices in the automotive, embedded systems, and to provide a unified mechanism for responding
aerospace, and medical device fields. Failure of these critical com- quickly to faults and attacks. Palisade is also designed to be extensible
ponents may mean the loss of millions of dollars or even human
to facilitate the incorporation of new anomaly detection algorithms to
lives. Increasingly, these systems require online anomaly detection to
detect the symptoms of unforeseen anomalies.
mitigate risks for operators from failures caused by faults and attacks.
The main contributions of this article are:
Failures may be caused by logical faults, which can be avoided with
good design principles, or execution faults, which can be difficult or • We describe a comprehensive taxonomy of symptoms of anoma-
impossible to detect at design time [1]. Execution faults may be caused lies that may occur in embedded systems and give examples of
by software errors such as memory leakage or deadlocks, or hard-
instances in the literature where they occur. This taxonomy is
ware errors like degraded sensors. Malicious attacks may also cause
a valuable resource for anomaly detection work as it supplies
execution faults, exploiting weaknesses in hardware and software to
a shared language with which researchers and practitioners can
cause unpredictable behaviors. As real-time embedded systems become
more ubiquitous and connected, such attacks become more scalable discuss their capabilities and requirements.
and realistic [2]. Examples of attacks on embedded systems include • We propose Palisade, a data streaming framework that supports
disabling a car’s braking system [3] and injecting a fatal dose of insulin online anomaly detection for embedded systems. We present its
in an insulin pump [4]. software architecture, design choices, included anomaly detec-
Detection systems must be able to identify faults and attacks quickly tors, and two evaluations of its detection latency and extensibility.
and accurately for their results to be useful. If an anomaly is missed • We evaluate the applicability of Palisade through two case stud-
it cannot be corrected, and if an anomaly is detected too late, the ies: one using real data from an autonomous car and a second
correction may be unable to prevent a failure. To use an analogy, using data generated from an autonomous driving development
anomalies such as faults and attacks can be thought of as diseases platform. We show that, by integrating different anomaly detec-
in a system. To prevent a disease from causing a system failure, it is tors, Palisade is able to detect more anomalies than a stand-alone
necessary to understand the disease’s symptoms, carefully watch for
detector.
them, and act quickly if any are detected.

∗ Corresponding author.
E-mail addresses: sean.kauffman@uwaterloo.ca (S. Kauffman), mdunne@uwaterloo.ca (M. Dunne), giovani@lisha.ufsc.br (G. Gracioli),
wqkhan@uwaterloo.ca (W. Khan), njbenann@uwaterloo.ca (N. Benann), sfischme@uwaterloo.ca (S. Fischmeister).

https://doi.org/10.1016/j.sysarc.2020.101876
Received 24 January 2020; Received in revised form 1 July 2020; Accepted 25 August 2020
Available online 12 September 2020
1383-7621/© 2020 Elsevier B.V. All rights reserved.
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Palisade deviates from conventional anomaly detection frameworks 3. Anomaly symptoms


in that it is designed to monitor remote, embedded systems and to
provide online verdicts with low latency. Many anomaly detection This section logically groups the anomaly symptoms that are present
algorithms have been proposed in recent years but the vast majority in the observable outputs of a system. These symptoms represent the
are designed to operate offline, by analyzing recorded traces [5–10]. realization of a perturbation in an internal, unobserved state ma-
Offline anomaly detection is useful for post mortem analysis of prob- chine. These symptoms do not prove an anomaly by their mere pres-
lems, but not to prevent failures at runtime. To monitor remote systems, ence, but an anomaly may cause one or more symptoms, hence the
events and readings must be transmitted to detection algorithms us- disease-symptom analogy. The list in this section is not exhaustive, but
ing a data streaming architecture. Palisade uses the publish–subscribe categorizes common anomaly symptoms.
interface from the Redis in-memory database [11]. This taxonomy relates to Mitre’s Common Attack Pattern Enumer-
Our contributions improve on the state-of-the-art of both taxon- ation and Classification (CAPEC) [12], in that both CAPEC and this
omies of anomalous behavior and anomaly detection frameworks. We taxonomy can be used to classify capabilities and behaviors. They
propose a finer grained and more extensive taxonomy of anomalies differ substantially in that CAPEC describes possible attacks, while this
from prior works, including symptoms of anomalies in an event series. section simply describes symptoms of attacks or other, non-malicious
Ours is also the first framework that unifies many algorithms into anomalies.
an online anomaly detection system that can be applied to remote, Many of the symptoms defined in this section are expressed in
embedded systems. relation to a set of parameters. Examples include the constant factor
The rest of this paper is organized as follows. Section 2 presents 𝑐 expressing the spike height for spike symptoms, or the difference 𝓁
the notation used in the paper. Section 3 describes anomaly symptoms expressing the difference required to define a level change symptom.
in both continuous signals and event series. Section 4 introduces and For a given system, a user can determine these constants using a system
discusses the Palisade architecture. Sections 5 and 6 present the two simulation based on prior knowledge, or traditional parameter-finding
case studies and performance results. Section 7 discusses a number approaches such as grid search [13,14].
of ways to evaluate the framework. Section 8 summarizes prior work
related to this article. Section 9 concludes the paper. 3.1. Continuous-signal anomaly symptoms

2. Notation For the purposes of anomaly symptoms, we define a continuous


signal as a digitally sampled signal with a constant sample rate, rep-
We denote the set of all natural numbers by N and the set of all real resented here as a time series. This signal is expected to be the result
numbers by R. We write 𝐴 × 𝐵 to denote the cross product of 𝐴 and 𝐵, of readings from a single sensor, not an amalgamation of many sources.
and 𝐴 → 𝐵 to denote the set of total functions from 𝐴 to 𝐵. The constant sample rate means that the sample time of each value is
A sequence 𝜎 of 𝑛 values is written 𝜎 = [𝑥1 , … , 𝑥𝑛 ] where both 𝑥𝑖 and known from its index in the time series.
𝜎(𝑖) mean the 𝑖’th item in the sequence. A subsequence of 𝜎 beginning
at and including the 𝑖th index and ending at and including the 𝑗th index 3.1.1. Spikes and S-waves
is denoted 𝜎[𝑖,𝑗] . A value 𝑥 is in 𝜎, denoted by 𝑥 ∈ 𝜎 iff ∃ 𝑖 ∈ N such We define a Spike (Fig. 1(a)) as a subsequence of contiguous sam-
that 𝜎(𝑖) = 𝑥. A time series is a sequence where each successive item of ples that lie farther than a given number of standard deviations from
the sequence represents a sample taken at a regular time interval after the current mean of the signal. To account for signals with means that
its predecessor. The alphabet of a sequence denotes the set of possible change over time, we consider the distance to the mean of a window of
items in a sequence and, given an alphabet 𝛴, the set of all possible samples prior to the subsequence. More formally, given a time series 𝑦,
finite sequences of its members is denoted 𝛴 ∗ . Given an alphabet 𝛴, a a window size 𝑛, and a constant factor 𝑐, the subsequence 𝑦[𝑝+1,𝑞] is a
string is of type 𝛴 ∗ , meaning it is a finite sequence of members of 𝛴 of spike iff
length ≥ 0.
We use  to denote the type of names, which are also called topics, ∀𝑦𝑡 ∶ 𝑝 < 𝑡 ≤ 𝑞, |𝑦𝑡 − 𝑦̄[𝑝−𝑛,𝑝] | > 𝑐 ⋅ 𝑠𝑡𝑑𝑒𝑣(𝑦[𝑝−𝑛,𝑝] )
and are used as a kind of identifier. For clarity of notation, we use 𝑙𝑜𝑐𝑘
We define S-waves (Fig. 1(b)) as spikes with an additional deviation
for the type of dense clock time which is defined as 𝑙𝑜𝑐𝑘 = R. The type
in the opposite direction immediately following the spike. S-waves can
of values is denoted by V, which can be any of strings, numbers (natural
mimic spikes if the counter-spike is sufficiently dampened.
or real), and Booleans. We denote the type of maps by M =  ↦ V,
Example: A flooding attack over a vehicle Controller Area Network
where a map is a partial function from names to values with a finite
(CAN) may falsely indicate that the collision prevention system issued
domain. The empty map is given by ∅. An event is a three-tuple of type
a command to engage the brakes [15]. Such an attack falls under
E =  × 𝑙𝑜𝑐𝑘 × M, that is, it contains a name (a topic), a clock time,
the category of ⟨CAPEC-125: Flooding⟩ [12] and could be detected by
and a map. A sequence of events is called a trace (or event series) and
monitoring for Spike anomalies in the volume of CAN packets.
is of type T.
The arithmetic mean, sometimes just called the mean, of 𝑛 values
3.1.2. Drifting
𝜎 = [𝑥1 , … , 𝑥𝑛 ] is 𝜎̄ = 1𝑛 𝛴𝑖=0
𝑛 𝑥 . The mean of a sequence 𝜎 is denoted
𝑖 A Drift (Fig. 1(c)) is a slow movement of the signal mean over
̄ The variance of 𝑛 values 𝜎 = [𝑥1 , … , 𝑥𝑛 ] is 𝑣𝑎𝑟(𝜎) = 1𝑛 𝛴𝑖=0
𝜎. 𝑖 ̄ 2
𝑛 (𝑥 − 𝜎)
a period of time. We consider only linear drift here; logarithmic and
and the standard
√ deviation (stdev) is the square root of the variance sub-linear drifts are rare, and higher order drifting encroaches on the
𝑠𝑡𝑑𝑒𝑣(𝜎) = 𝑣𝑎𝑟(𝜎). The absolute value of a number 𝑛, denoted |𝑛| is 𝑛 definition of level changes or spikes. Mathematically, a continuous
if 𝑛 ≥ 0 or −𝑛 if 𝑛 < 0. signal 𝑦 is offset by 𝑡𝑐, where 𝑡 is the time index and 𝑐 is a constant
The standard normal distribution is given as the probability density representing the slope of the drift. Formally, given a time series 𝑦, a
1 2
function 𝑁(𝑥) = √1 𝑒− 2 𝑥 . The general normal distribution is the nominal version of that time series 𝑦,
̂ and a slope 𝑐, a subsequence 𝑦[𝑝,𝑞]
2𝜋
standard normal distribution with the domain stretched to a specified has linear drift iff
stdev and translated to a specified mean. We denote the general normal
∀𝑦𝑡 ∶ 𝑝 ≤ 𝑡 ≤ 𝑞, 𝑦𝑡 = 𝑦̂𝑡 + 𝑡𝑐
distribution as  (𝑥|𝑚, 𝑠) = 1𝑠 𝑁( 𝑥−𝑚
𝑠
).
The Discrete Fourier Transform (DFT) of a time series is a sequence Example: An infrared combustible sensor, when functioning over
of the same length of the frequency components of the time series. the operational temperature limit, may drift or fail [16]. Such a fail-
Given a time series 𝜎 with length 𝑁, its DFT 𝑋 is defined by 𝑋𝑘 = ure could be detected by monitoring temperature readings for Drift
2𝜋𝑖
𝑁 𝜎 𝑒− 𝑁
𝛴𝑛=1 𝑘(𝑛−1)
. We write  (𝜎) to denote the DFT of 𝜎. anomalies.
𝑛

2
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

3.1.3. Noise 3.1.7. Amplification


Noise (Fig. 1(d)), an expected component of any signal, is consid- Amplification (Fig. 1(h)) is a simple gain on the target signal. For
ered a symptom of an anomaly only when it is more pronounced than amplification of an original signal we multiply every sample by some
is typical. We define noise as a normally distributed offset around the factor. Given the magnitude of the amplification 𝛼, and an unamplified
time series 𝑦,
̂ a sample 𝑦𝑡 is amplified iff
true value of the signal. Given a time series 𝑦, some noisiness coefficient
𝑛 and nominal time series 𝑦, ̂ a subsequence 𝑦[𝑝,𝑞] is noisy iff 𝑦𝑡 = 𝛼 𝑦̂𝑡

∀𝑦𝑡 ∶ 𝑝 ≤ 𝑡 ≤ 𝑞, 𝑦𝑡 = 𝑦̂𝑡 +  (0, 𝑛) Example: Analog to Digital Converter (ADC) can be attacked by am-
plifying analog signals past the dynamic range of the device. These at-
where  (0, 𝑛) is a standard normal distribution centered at zero with tacks can obscure other malicious behavior and damage hardware [21].
standard deviation 𝑛. This type of attack is an example of ⟨CAPEC-153: Input Data
Example: Compressed air in truck brakes may generate acoustical Manipulation⟩ [12]. It could be detected by monitoring the analog
interference and cause metallic friction noise from track vehicles in signal for Amplification anomalies.
ultrasonic sensors [17]. Brake failure could be detected by correlating
3.1.8. Level change
Noise anomalies in ultrasonic sensors with air brake usage.
A Level Change (Fig. 1(i)) symptom is observed when the mean of
a signal changes in a short period and then remains consistent at the
3.1.4. Clipping new level. Slower changes may fall under drifting. Given a time series
𝑦, an acceptable minimum level change threshold 𝓁, and a minimum
We define Clipping as a loss of data at the extrema of a signal
number of samples the mean change must persist 𝑛, a level change has
range (Fig. 1(e)), where a signal is of a higher amplitude than is
occurred over a window of 𝑤 samples 𝑦[𝑡,𝑡+𝑤−1] iff
supported by the sensor or transmission medium. Thus a clipped signal
can be represented by a series of identical samples at the maximum or |𝑦̄[𝑡+𝑤,𝑡+𝑤+𝑛] − 𝑦̄[𝑡−𝑛−1,𝑡−1] | > 𝓁
minimum extent of the sample medium. Example: An attack that increases the amount of code execu-
Example: A partially blinding attack on a camera of a vehicle by tion will increase the power consumption of the Central Processing
emitting light can hide objects [18]. This light can exceed the input Unit (CPU), which can be observed as a Level Change [22–24]. Such
range of the camera and would appear as clipped. This attack is an an attack could be an example of ⟨CAPEC-175: Code Inclusion⟩, or
example of ⟨CAPEC-607: Obstruction⟩ [12]. Such blinding light attacks ⟨CAPEC-242: Code Injection⟩ [12]. Many attacks with this profile can
could be detected by monitoring for Clipping anomalies. be detected by monitoring the power consumption of the CPU for Level
Change anomalies.

3.1.5. Loss 3.1.9. Frequency change


While Loss (Fig. 1(f)) may more typically refer to high noise levels A Frequency Change (Fig. 1(j)) occurs when the primary frequency
making it difficult to decode a signal, here we use loss to indicate a of a signal changes over a short period. We say a Frequency Change
complete loss of a signal. Although trivially an anomaly, a total loss of occurs if the primary frequency in a sliding window moves more than
some threshold over some time window. Given a time series 𝑦, a
signal may be a symptom of temporary network disruption without any
function 𝑃 which extracts the frequency of the highest peak from a
more dangerous cause. We represent a total loss of signal as a sudden
DFT (denoted  ), a threshold 𝜏, and a minimum number of samples the
transition to a fixed sample value. This can be observed as a special
frequency change must persist 𝑛, a subsequence of 𝑤 samples 𝑦[𝑡,𝑡+𝑤−1]
case of Clipping, where the extrema of the signal are identical for a experiences frequency change iff
short time.
Example: An attack sending a large volume of request messages |𝑃 ( (𝑦[𝑡+𝑤,𝑡+𝑤+𝑛] )) − 𝑃 ( (𝑦[𝑡−𝑛−1,𝑡−1] ))| > 𝜏
over the J1939 protocol increases the computational load of the re- It may be useful to consider more frequencies, but we restrict our
cipient ECU until it is not able to perform regular activities like definition to only consider the primary frequency for simplicity.
transmitting periodic messages [19]. Such an attack is an example of Example: An attack inserting a burst of light into a vehicle camera
⟨CAPEC-125: Flooding⟩ [12] and could be detected by monitoring CAN may change the tonal distribution (light frequency) of the captured
traffic for Loss anomalies. images, resulting in misclassification [18]. This attack is an example of
⟨CAPEC-607: Obstruction⟩ [12] and it could be detected by monitoring
captured images for Frequency Change anomalies.
3.1.6. Smoothing
We define Smoothing to be a reduction in the short term variance 3.1.10. Echo/reflection
of a signal compared to recent history. Smoothing (Fig. 1(g)) is the We consider an Echo (Fig. 1(k)) to be a duplication of a previous
series of samples on top of the underlying signal at a later position.
rarest of the symptoms presented here, with few natural causes. Given
A Reflection is identical to an Echo, except that the repeated signal is
a constant 𝑘 representing how far back the recent historical signal
inverted. Given a time series 𝑦, an echo length 𝑒, an echo coefficient
variance should be considered, and the factor threshold 𝜏 at which the
(the factor at which the echo is played back) 𝑞, and the nominal form
signal is considered smoothed, we say a subsequence of 𝑛 samples 𝑦[𝑡,𝑡+𝑛]
of the time series 𝑦,
̂ if we consider the subsequence 𝑦[𝑡,𝑡+𝑒] as the origin
is smoothed iff of the echo, we say that the subsequence 𝑦[𝑡′ ,𝑡′ +𝑒] exhibits Echo iff
𝑣𝑎𝑟(𝑦[𝑡,𝑡+𝑛] ) < 𝑣𝑎𝑟(𝑦[𝑡−(𝑛𝑘)−1,𝑡−1] )𝜏
𝑦[𝑡′ ,𝑡′ +𝑒] = 𝑦̂[𝑡′ ,𝑡′ +𝑒] + 𝑦[𝑡,𝑡+𝑒] × 𝑞
Example: In an attack of a control system, the attacker may ob- Example: A relay attack on the original signal sent from the vehicle
serve and record sensor readings and then continuously repeat the LiDAR creates fake echoes and can make real objects appear closer
recorded values during the attack [20]. This is an example where the or further than their actual location, thus affecting the mission plan-
sensor values are smoothed. Such an attack falls under the category of ning [18]. This attack is an example of ⟨CAPEC-586: Object Injection⟩
⟨CAPEC-148: Content⟩ Spoofing [12]. Such spoofing attacks could be [12]. Such an relay attack could be detected by monitoring the LiDAR
detected by monitoring sensor readings for Smoothing anomalies. signal for Echo and Reflection anomalies.

3
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Fig. 1. Time Series Anomaly Symptoms. The thick red line below the signal indicates the period of the described anomaly. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)

3.2. Event-series anomaly symptoms then the change in event frequency may indicate an anomaly. Given
a trace 𝑇 ∈ T, an event name 𝜇 ∈  , a window size 𝑤 ∈ 𝑙𝑜𝑐𝑘, and
Symptoms of anomalies also appear in event series, which are a threshold 𝜖, an event frequency change may be defined as ∃𝑡1 , 𝑡2 , 𝑡3 ∶
defined as a sequence of discrete events rather than a continuous signal. 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡1 , 𝑡2 )) − 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡2 , 𝑡3 )) > 𝜖.
An event series represents a trace of the execution of an automaton Example: Lin and Siewiorek introduced their Dispersion Frame
where each event describes a state transition. Event series differ from Technique (DFT) to predict hardware failures [25]. From analyzing
discretized continuous signals in that their events are not required to the logs of file servers, they observed that there exists a period of an
occur at regular time intervals and they may carry more complex data increasing rate of intermittent errors before most hardware failures.
than only a single real value. Many such failures could be detected by monitoring error reports for
Event Frequency Change anomalies.
3.2.1. Event frequency change
When events of the same name are periodic or semi-periodic, they
3.2.2. Unexpected event
have fairly consistent inter-arrival times and, by extension, frequency.
Most traces only contain events with a limited vocabulary of event
When that frequency changes suddenly, it can be a symptom of a system
names. While events themselves are unique, due to their varying clock
anomaly. Similarly, when the frequency of all events in a trace change
times, the event names are repeated many times. When an event occurs
suddenly, it may be due to an anomaly.
in a trace with a name that has not come earlier in the trace, it may be
The inter-arrival time of an event is the difference between the
a symptom of an anomaly.
clock times of successive events of the same name. It can be thought of
as the period of the event. More precisely, given a trace 𝑇 ∈ T, an Given a trace 𝑇 ∈ T, an unexpected event may be defined as an
event name 𝜇 ∈  , and a non-empty time interval defined by the event (𝜇, 𝑡1 , ⋅) ∈ 𝑇 ∶ ∄(𝜇, 𝑡2 , ⋅) ∈ 𝑇 where 𝑡2 < 𝑡1 . That is, we can think
end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡1 < 𝑡2 , the inter-arrival time is defined of an unexpected event as the first event with a new name.
as 𝑖𝑛𝑡𝑒𝑟𝐴𝑟𝑟𝑖𝑣𝑎𝑙(𝜇, (𝑡1 , 𝑡2 )) ≜ ((max 𝑡 ∶ (𝜇, 𝑡, ⋅) ∈ 𝑆) − (min 𝑡 ∶ (𝜇, 𝑡, ⋅) ∈ By this definition, however, most events at the beginning of a trace
𝑆))∕(|𝑆| − 1) where 𝑆 = {(𝜇, 𝑡, ⋅) ∈ 𝑇 ∶ 𝑡1 ≤ 𝑡 ≤ 𝑡2 }. will be considered unexpected. To solve this problem, we can restate
Event frequency measures how often events occur in a given time the definition in terms of the probability that an event occurs. Given a
span. It is given as the inverse of inter-arrival time, or given a trace trace 𝑇 ∈ T and a threshold 𝜖, an unexpected event may be defined as
𝑇 ∈ T, an event name 𝜇 ∈  , and a non-empty time interval defined an event 𝑒 ∈ 𝑇 ∶ (𝑒 ∈ 𝑇 ) < 𝜖.
by the end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡1 < 𝑡2 , the event frequency of 𝜇 is Example: Bellovin reported receiving broadcast packets meant
given as 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡1 , 𝑡2 )) ≜ 1∕𝑖𝑛𝑡𝑒𝑟𝐴𝑟𝑟𝑖𝑣𝑎𝑙(𝜇, (𝑡1 , 𝑡2 )). for local networks, requests to unused ports, and requests to un-
A sudden change in event frequency, then, is when the first deriva- occupied addresses over the public Internet at AT&T in his classic
tive of event frequency is high. A rapid change in event frequency can whitepaper [26]. These types of requests are examples of ⟨CAPEC-169:
be found by taking the difference between successive time intervals Footprinting⟩ [12] and they could be detected by monitoring network
(or windows) in the trace. If the difference exceeds some threshold, traffic for Unexpected Event anomalies.

4
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

3.2.3. Periods of silence


A period of silence in a trace is a segment of time where no, or
few, events occur. Events may occur more-or-less frequently during the
operation of a system as different behaviors result in different patterns
in the trace. However, nominal system behavior usually results in some
events appearing. When a period occurs where no events appear in the
trace, it may be a symptom of an anomaly.
Given a trace 𝑇 ∈ T and a minimum number of events 𝜈, a period
of silence may be defined as a non-empty time interval defined by the
end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡1 < 𝑡2 where |{(⋅, 𝑡, ⋅) ∈ 𝑇 ∶ 𝑡1 ≤ 𝑡 ≤ 𝑡2 }| < 𝜈.
The threshold for when a time interval is considered a period of
silence varies from system to system. Some high priority tasks may
monopolize system resources while not emitting any events. To solve
this problem, we can restate the definition to specify a minimum length
of the interval. Given a trace 𝑇 ∈ T, a minimum number of events 𝜈, Fig. 2. UML information flow of the Palisade architecture.
and a minimum length 𝓁 > 0, a period of silence may be defined as a
time interval defined by the end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡2 − 𝑡1 ≥ 𝓁 where
the interval meets the previous definition.
scaling. Requirement 4 is related to Requirements 1 and 3, since
Example: Missing log messages can indicate problems and failures
operating multiple detectors in series over multiple machines would
in High Performance Computing (HPC) logs that are too large for hu-
delay detection and require complex sequencing. Requirement 5 is to
mans to manually analyze [27]. These types of failures can be detected
reduce the barriers to adoption of the system: if it is hard to install, no
by monitoring logs for Periods of Silence anomalies.
one will use it.
Palisade is designed as a set of distributed micro-services built
3.2.4. Sampled value anomaly symptom
around a data streaming architecture. These micro-services are im-
When an event trace includes sampled values from a continuous
plemented as three types of nodes: sources, processors, and sinks.
signal, those sampled values may include the same trace anomalies
Nodes are typically written in Python as Palisade provides a Python
defined in Section 3.1. An event trace is not sampled at a fixed rate like
Application Programming Interface (API) to simplify their creation
a time series, however. To test for sampled value anomaly symptoms,
and maintenance. However, it is also possible to integrate existing
it may be necessary to extrapolate the values between samples to
tools written in other languages with Palisade. Fig. 2 presents an
approximate a continuous signal.
overview of the Palisade architecture through a UML information flow
There are several popular methods for recovering a continuous sig-
diagram [33]. Each node uses the publish–subscribe interface of the
nal from irregular samples. These algorithms include, for example, the
Redis data streaming infrastructure to receive and send data. We do
projections onto convex sets (POCS) method [28], the Wiley/Marvasti
not discuss the Graphical User Interface (GUI) in this article, as it is
method [29], the Sauer/Allebach Algorithm (also called the Voronoi
out-of-scope.
method) [30], and the Adaptive Weights Method [31]. These methods
generally construct an auxiliary signal from some sample values and
4.1. Data streaming
then obtain an initial approximation by applying a low-pass filter. The
error between this approximation and the samples is then fed into an
To support Requirements 1, 3, and 4, we needed to find a dis-
iterative algorithm which can recover the signal if the sampling density
tributed streaming architecture to transport information between dif-
is high enough.
ferent nodes. We evaluated the latency of four streaming frameworks
Example: Changes in byte frequency patterns in network payloads
while considering their inclusion in the Palisade architecture. Latency
to the same host and port can be accurate predictors of network in-
is defined as the time difference between the instant data is generated
trusions [32]. Such attacks can be detected by monitoring for Sampled by a source and the instant it is received by a processor or a sink [34].
Value Frequency Change anomalies. As a result of this experiment we chose Redis as the data streaming
architecture for Palisade.
4. Palisade architecture
We designed a set of experiments to measure the latency of four data
streaming frameworks: Redis [11], RabbitMQ [35], Kafka [36], and
Palisade is motivated by the need to remotely detect a dynamic
NATS [37]. We used two first generation Raspberry Pis (single 700 MHz
range of anomaly symptoms in an embedded system at the time they ARM6 core, 128 MB system RAM) for the subscriber and the publisher
occur. We are further motivated by the desire to combine multiple and a Raspberry Pi 2 (quad-core ARM Cortex-A7, 1G RAM) for the
anomaly detectors to leverage their different performance character- server. The clients and the server were synchronized using Precision
istics into a single, more reliable, detector. These motivations lead to Time Protocol (PTP), a network level time synchronization protocol
the following requirements: capable of microsecond accuracy. We varied the transmitted message
1. the anomaly detection must have low latency, sizes (256 bytes, 1 KB, 100 KB, and 1 MB) and the publishing frequency
2. it must be easy to implement and maintain detectors, (30 Hz, 60 Hz, and 100 Hz). Latency was measured by including the
3. the detectors must be able to be run on separate machines, timestamp at which a message was sent within the message itself. The
4. multiple detectors must be able to run in parallel on the same subscriber then noted the timestamp at which it received the message
data, and and subtracted the sending timestamp to find the latency. We ran each
configuration of the experiment five times and extracted the average
5. deployment of the system must be simple.
and standard deviation of the latency.
Requirement 1 is due to the need to respond to anomalies with enough Table 1 shows the results for Redis latency. We note that mean
time to mitigate their effects. Requirement 2 is important to any latency increases as the packet size increases, as does the standard
serious software framework, since it should always be a goal to reduce deviation. The throughput also increases when both frequency and
engineering costs. Requirement 3 allows Palisade to run on separate packet size increase, reaching 6 MB/s at 1 MB and 100 Hz. Redis
machines from the monitored system and to support anomaly detection presents the lowest latency of the four data streaming frameworks.
appliances to plug into existing networks. It also facilitates horizontal Refer to our previous work for a complete comparison [34].

5
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Table 1
Redis latency results in seconds.
Freq. (Hz)/Packet size 256 B 1 KB 100 KB 1 MB

30 0.0125 0.184 0.0731 0.315


± 0.005 ± 0.005 ± 0.035 ± 0.3
60 0.0218 0.0213 0.0461 0.337
± 0.005 ± 0.005 ± 0.035 ± 0.3
100 0.0425 0.0245 0.0946 0.365
± 0.005 ± 0.005 ± 0.035 ± 0.3

REmote DIctionary Server (Redis) is an open-source, in-memory,


key–value database that provides a publish–subscribe interface. Redis
clients publish data to channels using the REdis Serialization Protocol
(RESP), and subscribers receive data in the same order it was published.
Redis also supports integration with on-disk databases. Moreover, Redis
has low memory consumption; in a 64-bit system, 1 million keys (hash
values), representing an object with five fields, use around 160 MB
of memory [11]. Redis provides a master–slave replication mechanism
in which slave server instances are exact copies of master servers. To
reduce the network round trip latency, Redis implements pipelining, Fig. 3. Examples of data formats used in Palisade.
making it possible to send multiple commands to the server without
waiting for individual replies [11]. These replies are instead batched
together into a single response. Fig. 3 shows different JSON formats for event and time-series data.
Time-series data (Fig. 3(a)) contains the initial timestamp for the re-
4.2. Data format
ceived message as well as a frequency, which is used to calculate the
timestamp for each datum (by using the frequency and the timestamp
Formatting the data transmitted between Palisade nodes requires
of the first data). For the event-based format (Fig. 3(b)), timestamp
choices to be made between the relative importance of the require-
represents the instant in which the event is produced and the data
ments listed in Section 4. If Requirement 2 (implementation) and
field can be composed of arbitrary subfields (also in JSON format).
Requirement 5 (deployment) are more important, then a textual format
Fig. 3(c) represents the binary format for time series. There are
is preferable due to its increased interoperability and human readabil-
three fields in little-endian format. The first field is the millisecond
ity. If Requirement 1 (latency) is the most important one, then a simple,
component of the message time, followed by the two magic bytes set
offset-based binary format is better because of its lower bandwidth
to zero, followed by the seconds portion of time, and the data in 16-
requirements and parsing costs. Palisade handles this conflict by sup-
bit signed two’s complement integer format. In Fig. 3(c), there are 100
porting both JSON and a custom binary format for data transmission
samples, which makes the data field have 200 bytes (100 integers, 2
between nodes.
bytes each).
JSON is a standardized data format [38] with support in most
programming languages [39], which makes it easy to consume and
4.3. Source nodes
produce messages compatible with Palisade from different tools. JSON
is a textual format, meaning the data is represented as sequences of
characters and, as such, must undergo some processing to convert Source nodes are responsible for streaming data into Redis. These
to-and-from internal program state. While this does increase the pro- nodes select and transmit data from a database, file, or an embedded
cessing requirements of JSON formatted data, it has the benefit of being system. Examples of source nodes are one that reads data from a
human readable which substantially eases debugging. Formatting with relational database instance, a node that reads a Comma Separated
JSON also means that new parameters may be added without breaking Value (CSV) file, and a User Datagram Protocol (UDP) sniffer that
backward compatibility since a parser will ignore JSON object keys that reads packets from the network. Data might include system log entries,
are not expected. aggregate network states, or commands from an autonomous driving
In contrast, an offset-based binary format needs to be strictly speci- stack. Usually, a source node should read data from some data source,
fied and any changes require updates to producers and consumers alike. change it to JSON or binary Palisade formats, and publish it to a Redis
Such a binary format has the advantage of transmitting less data and channel.
does not require parsing since its fields may be found by their memory
offsets (like a C struct). Some binary formats, like Google’s Protocol 4.4. Processor nodes
Buffers, are more complex and allow some modification without up-
dating all uses. However these features always incur additional costs Processors in Palisade are responsible for detecting anomalies and
that must be considered. For example, it is only possible to extend a forwarding the results to sink nodes (see Section 4.5). Each processor
Protocol Buffer message if field numbers were previously allocated for implements a different anomaly detection algorithm.
that purpose [40]. All processors are sub-classes of the abstract Processor class. A
To support both sampled continuous value signals and discrete processor object is instantiated by the ProcessorLauncher class, which
event series data, Palisade includes JSON formats for both time series associates the processor with an instance of the DataSource class, which
and event series content. Palisade only supports a binary format for supplies data from either Redis or an offline file source for unit testing
time series data. The time series binary format also enables integration purposes.
with high throughput data sources such as a logic analyzer. Palisade Individual processors must inherit from the Processor class and
does not currently support more complex binary formats that attempt implement an interface used by the ProcessorLauncher to pass data from
trade-offs between extensibility and latency as there are no current use the DataSource. The relevant methods required are configure (which
cases for such a format. collected metadata) and test_on_data (which is invoked when new data

6
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

4.6.1. Continuous signal example detectors


• Autoencoders uses a neural network that can encode and then
decode windows of non-anomalous time series [41]. The network
is trained by comparing its input to its output and trying to find
an encoding that minimizes the difference. Depending on the
configuration, the encoding can be on the order of 10 bits for
Fig. 4. JSON format used when an anomaly is detected. a window of hundreds of samples. When running, the detector
encodes and then decodes its input time series using the same
network it trained on nominal data. If the network does a poor
is ready from the DataSource). Other optional methods that may be job of producing output that matches its input (measured by a
implemented by processors are: begin (called on startup), end (called difference metric), then the detector concludes that the network
on shutdown), load_model (called if a model is specified in configure ) must not have seen similar data during training, and therefore the
and train (used to build models where applicable). input contains symptoms of anomalies.
• Clip detect checks for a number of contiguous identically valued
All non-source nodes subscribe to a channel, named command.
symbols at the extremes of the range of a time series. The detector
This is motivated by Requirements 3 and 5, that deployment must be
assumes that a sufficiently long sequence of such values is a
distributed and simple. Without this channel, each Palisade node would
symptom.
need to be individually controlled from a local interface. The command
• SAX + HMM uses Symbolic Aggregate approXimation (SAX) to
channel supports the control commands for nodes (such as restart and
discretize a time series into a sequence of symbols [42]. The
info (a status command)) the are used for general system maintenance.
symbols are based on the distribution of a non-anomalous time
Once a detector finds an anomaly, it publishes the timestamp at series where a new symbol is generated based on thresholds
which the anomaly occurred, a note about its cause, and the source in the learned distribution. The sequence of those symbols are
channel of the anomaly to a specific Redis channel. Fig. 4 shows used to build a Hidden Markov Model (HMM) approximating
an example of the JSON data format for anomalies in Palisade. The the sequence [43]. The input to the detector is encoded into
timestamp field contains the time at which the anomaly occurred. symbols using the same distributions. Sequences that are suffi-
The anomaly field is a measure of the confidence (𝑐 ∈ R ∶ 0 < 𝑐 ≤ 1) ciently unlikely in the learned HMM are considered symptoms of
of the detector that an anomaly has occurred, where 1 represents 100% anomalies.
confidence. The note field is a textual description of the anomaly, and • Sixnum tracks changes in six standard statistical metrics: mean,
channel contains the Redis channel in which the anomaly happened. standard deviation, maximum, minimum, upper hinge, and lower
hinge. Changes to these metrics above configurable thresholds are
considered symptoms.
4.5. Sink nodes
• Spike detect continuously computes the variance of the most re-
cent window of a configurable number of samples. A new sample
Sinks are nodes that subscribe to Redis channels to perform final that falls too far outside the variance of the current window is
processing and do not publish their results. They usually serve as considered a symptom of an anomaly.
interfaces to other systems, such as GUIs, or alarm systems. Examples of • Spectrum detect stores a model of the frequency distribution
sink nodes include the insertion of received data into databases, writing calculated from the Fourier transform of a non-anomalous time
to I/O ports when an anomaly is received, and storing results into files. series averaged over time. Windows of the input time series
are transformed into the frequency domain and compared to
the average frequency model. Deviations beyond a configurable
4.6. Example anomaly detectors distance metric are considered symptoms of anomalies.

Palisade currently includes more than 20 example anomaly detec- 4.6.2. Event series example detectors
tors. All of these detectors are based on existing methods and are • HMM is similar to the SAX + HMM detector, except that the input
distributed with Palisade to provide out-of-the-box detection of the is already composed of events that represent state change instead
anomaly symptoms listed in Section 3. Tables 2 and 3 show detectors in of continuous samples of a signal, so there is no need to transform
Palisade and the anomaly symptoms they are capable of detecting. Ta- them using SAX. Since events have a variable time between them,
ble 2 shows the detectors for continuous-signal anomaly symptoms (see the HMM can consider the time between events, as well as the
Section 3.1) while Table 3 shows the detectors for event-series anomaly type of the event when evaluating the likelihood of the sequence
symptoms (see Section 3.2). In both tables the left-most column shows against the learned model.
the detectors while the top-most row shows the anomaly symptoms. • Nfer is a recently introduced language and system for inferring
The intersection of row and column contains a check mark (✓) if the event stream abstractions [44,45] that utilizes a syntax based on
detector is sensitive to the anomaly symptom and is blank otherwise. Allen’s Temporal Logic (ATL) [46]. Nfer transforms an event
stream that represent state transitions into a series of intervals
For some detectors, the capability to detect an anomaly symptom
that represent state. These intervals create a hierarchy of abstrac-
depends on the magnitude of the symptom. For example, the spike
tions that simplify human and machine trace comprehension. If
detector can detect the noise symptom as long as the noise falls outside
an interval has been designated as anomalous, its generation is
of the variance of the previous time window. Even though multiple considered a symptom of an anomaly. Palisade supports both
detectors may be able to detect the same anomaly symptom, they hand-written and mined nfer specifications [47].
complement each other by detecting it in different situations. The • SiPTA uses the expected periodicity in events from embedded
distributed nature of Palisade allows running multiple detectors in systems to apply signal processing techniques to compare the
parallel, increasing the robustness of the system (we show such a input traces to non-anomalous data. For more information, see
situation in Section 6). Zedah et al.’s 2014 paper [48].
Below is a brief description of each of the example detectors. For • TPG trains a Task Precedence Graph based on a non-anomalous
the more complex detectors, see the relevant citations for a detailed event stream. This method exploits the periodicity of tasks ex-
explanation of the algorithm. ecuted in an embedded system. If the input event stream does

7
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Table 2
Example detectors and their detected continuous-signal anomaly symptoms.

SAX + HMM [42,43]

Spectrum detect
Autoencoders

Range check

Spike detect
Clip detect

Sixnum
Name
Spike ✓ ✓ ✓ ✓ ✓
S-Wave ✓ ✓ ✓ ✓ ✓
Drifting ✓ ✓ ✓ ✓ Fig. 5. Overview of the software/hardware organization in the autonomous car.
Noise ✓ ✓ ✓
Clipping ✓ ✓ ✓
Loss ✓ ✓ ✓ ✓ ✓ ✓
Smoothing ✓ ✓ ✓ stack. The output of the autonomy stack (control commands) is sent,
Amplification ✓ ✓ ✓ ✓
for example, to actuators controlling the steering and brakes. Two
Level change ✓ ✓ ✓ ✓
Frequency change ✓ ✓
Renesas automotive computers were installed on the vehicle to run the
Echo/Reflection ✓ autonomous driving software. Each computer was equipped with two
Systems-on-Chips (SoCs) with multiple ARM CPU cores and a single
ASIL-D certified Micro-controller Unit (MCU).
Table 3
The autonomous driving software was built using Robot Operating
Example detectors and their detected event-series anomaly symptoms.
System (ROS). ROS is an open source framework for robotic application
TRE detector [50] development in C++ and Python for POSIX-based Operating Systems
nfer [44,45,47]

(OSs). ROS employs the concept of nodes (a process that performs


SiPTA [48]
HMM [43]

TPG [49]

computation), which provides modularity and development isolation.


ROS nodes operate on a periodic loop, are event-driven, or both (they
Name publish data at different frequencies into topics). Like Palisade, ROS
Event Freq. change ✓ ✓ ✓ ✓ ✓ uses a publish–subscribe model to communicate between nodes.
Unexpected event ✓ ✓ We implemented a new ROS node, named ros2redis (see Fig. 5), to
Periods of silence ✓ ✓ ✓ ✓ ✓ receive messages published to ROS topics and republish them to Redis
Sampled value ✓
channels. A Palisade sink node received the command and sensor data
and stored them in a database. The stored data were obtained from
several ROS topics with different publish frequencies. For instance, GPS
not follow the learned graph, it is considered a symptom of an information was published at a frequency of 50 Hz, while throttle and
anomaly. For more information, see Iegorov and Fischmeister’s gear reports were sent at 20 and 10 Hz, respectively. We logged the
2018 paper [49]. data during several autonomous driving sessions and then replayed the
• Timed regular expression (TRE) trains Timed Regular Expres- recorded data as input to Palisade.
sions based on a non-anomalous event stream. If an event from In the next sections, we present the results of running Palisade with
the input stream of the detector violates the learned expres- the collected data from the autonomous car in two scenarios: gear
sions, then it is considered a symptom of an anomaly. For more flip-flop and autonomy mode flip-flop.
information, see Narayan et al.’s 2018 paper [50].
5.1. Gear flip-flop

4.7. Fault handling The autonomous vehicle in the case study reports its current gear in
messages published to the _vehicle_gear_report topic. The val-
Palisade is designed to detect anomalies in streams of data from ues reported in the data item _vehicle_gear_report_cmd_gear
remote systems, not in its own operation. However, Palisade includes reflect the requested shift position of the automatic transmission, and
some failure handling capabilities. All nodes in Palisade are monitored the values in _vehicle_gear_report_state_gear reflect the
by a system service and restarted in the presence of a failure. Detector reported shift position. Both values are encoded as integers with the
node failures are interpreted as though an anomaly has been reported following mapping: {0: No change, 1: Park, 2: Reverse, 3: Neutral, 4:
by the failed detector, including failures to respond in a configurable Drive, 5: Low}.
time window. This can lead to false-positives, but it is a simple mech- The autonomous driving software sends gear change requests over
anism to alert operators to a situation that deserves their attention. an Internet Protocol (IP) network to a separate controller that interfaces
Source node failures are interpreted as loss or period of silence anomaly with the vehicle. The controller then converts the requests into CAN
symptoms by the relevant detectors, and will be reported as anomalies. messages that the vehicle transmission understands, and also converts
Sink node failure handling varies depending on the node, but many are messages from the transmission back into IP packets sent to the driving
obvious (GUI failures) or fail-warn (alarm systems). software. Messages on the _vehicle_gear_report_cmd_gear
channel are only sent when the software requests a change. The request
5. Case study 1: Autonomous vehicle is repeated over a short interval until a message is received on the
_vehicle_gear_report_state_gear channel reporting that the
We evaluated the performance and applicability of Palisade using new gear has been reached. Conversely, the transmission regularly
the University of Waterloo’s autonomous car as a case study. The reports its current gear on the _vehicle_gear_report_state
vehicle was a 2016 Lincoln MKZ fitted with a range of sensor arrays _gear channel regardless of whether or not it has recently changed.
including LiDAR, a GPS receiver, Inertial Measurement Units (IMUs), The reports of the current gear from the transmission exhibit the
cameras, and radars. Fig. 5 shows an overview of the software and Sampled Value Anomaly Symptom of Noise. When the value of the gear
hardware organization in the vehicle. Sensors, such as LiDAR and is stable, the signal appears to be nominal. However, when the gear
cameras, produce data that is the input of the autonomy software is changing, the signal varies wildly before finally stabilizing on the

8
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Fig. 7. nfer specification for detection of Gear Flip-Flops.


Fig. 6. Gear flip-flop anomaly example.

5.2. Autonomy mode flip-flop


correct gear. While it is not clear if intermediate gear values should be
reported by the transmission during a gear change, it is clear that the The autonomous vehicle reports its current autonomy state in mes-
transition should be approximately linear. The fluctuations in the signal sages published to specific topics, such as _vehicle_brake_report
amount to noise, and are a symptom that an anomaly has occurred in and _vehicle_throttle_report. When the data item enabled
either the transmission itself or in the CAN controller. is true, the car is running in autonomy mode. When enabled is false,
Fig. 6 shows a brief sample of the two channels over a period when a the vehicle is controlled by the human driver.
gear change into Drive was requested. The _vehicle_gear_report Once enabled, autonomy mode should remain enabled, as indicated
_cmd_gear messages (the blue points) begin at zero, indicating no by the enabled data item, for the duration of a trip. However, in
change is requested, then change to four to request a change to Drive, track testing, we found instances when the vehicle would switch from
then switch back to No change once the gear change is complete. autonomy enabled back to autonomy disabled during a lap, giving the
The _vehicle_gear_report_state_gear value (the red line) driver control of the vehicle. We discovered that the Dataspeed CAN
demonstrates the Noise Sampled Value Anomaly Symptom as it tran- module of the autonomous vehicle would disable autonomy mode if
sitions from Park to Drive. no command was given to the vehicle for at least 80 ms. This timeout
We used nfer to detect the Gear Flip-Flop anomaly. To highlight was unexpected and caused an unsafe driving condition. When the
the flexibility of Palisade, we implemented two different integrations timeout occurs, and autonomy mode is disabled, we call the condition
with nfer. The first built Redis and JSON support directly into the Autonomy Mode Flip-Flop.
C implementation of nfer, and the second used a Python processor We used TREs to monitor the Autonomy Mode Flip-Flop anomaly.
node to call the nfer Python API. The advantage of building support Regular expressions provide a declarative way to express patterns for
directly in C is its execution speed, while the advantage of calling the a system specification. TREs define timing constraints in a regular
tool through its Python API is its simplicity: the Python nfer processor expression [50]. For instance, a regular expression specification of the
is 42 lines of code. form ‘‘state 𝑎 is followed by state 𝑏’’ can be modified to ‘‘state 𝑎 is
Our nfer specification for detecting the Gear Flip-Flop anomaly is followed by state 𝑏 within 𝑡 time units’’ to obtain a TRE specification.
given in Fig. 7. The specification contains one rule which defines con- The TRE processor uses a manual specification of a TRE to monitor and
ditions which, each time they are met, cause a new interval abstraction detect anomalies. To integrate with Palisade, we added Redis support
to be produced with the associated label (topic), timestamps, and data to the C implementation of the TRE detector.
items also being specified by the rule. The rule says that a new interval The TRE specification used for detecting the Autonomy Mode Flip
abstraction should be published to the gear_flip_flop topic when Flop anomaly is the following
there are three messages published to the _vehicle_gear_report
((⟨𝑃 .𝑆⟩ [0, 3]) ∥ (⟨𝑆.𝑃 ⟩ [0, 3]))+
topic within 200 time units (ms) such that the first and third specify
the same gear state, but the second specifies a different gear state. 𝑃 and 𝑆 above represents the two Autonomy Mode Flip-Flop states
Line 1 specifies the topic of the resulting message, while Lines 2 and — Autonomy Enabled and Disabled respectively. The above specifica-
3 give the topics on which the tool will listen for events. Lines 2 and 3 tion can be translated as the occurrence of patterns where the flip-flop
also partially specify the temporal relationship of those events, and changes from enabled state to disabled state or vice versa within 3 time
assigns shorthand identifiers (g1, g2, and g3) to each of the events. The units.
where clause, from Lines 4 through 9 specifies further conditions that
must be met to produce a new interval abstraction. The map clause, 5.3. Comparison with Siddhi
from Lines 10 through 16 specifies the data items associated with the
published abstraction so that consumers of the message can access the To evaluate Palisade’s rules-based capabilities, we matched the
details of the anomaly. nfer processor against the Complex Event Processing (CEP) system,
We ran nfer using our Gear Flip-Flop specification with the min- Siddhi. We developed a query in the Siddhi query language to find
imality restriction disabled (--full in the tool) and using the window all instances of the gear flip-flop anomaly described in Section 5.1,
optimization with the window size set to 200. Disabling the minimality then modified the nfer specification in Fig. 7 to exactly match the
restriction causes all matched intervals to be reported instead of the results from Siddhi. By first configuring Siddhi, we were able to keep
default of only reporting the shortest (minimal) intervals [44]. The its query short and natural, while the specification for nfer to exactly
window optimization restricts reported intervals to those shorter than match its result was more complex. The purpose of this choice was to
the window size, instead of the default of no restriction on interval avoid biasing the test results against Siddhi. The data and configuration
length [45]. The results are reported in Section 5.3. used for this comparison is available at [51]. In this test, Palisade

9
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Table 4
Results of comparing Palisade with Siddhi.
Round-trip latency Siddhi Palisade
Mean (lower is better) 60.6 ms 1.58 ms
Standard deviation (lower is better) 20.9 ms 0.50 ms

6. Case study 2: ADAS-on-a-Treadmill

Fig. 8. Partial Siddhi specification for detection of Gear Flip-Flops.


Advanced driver-assistance systems (ADAS)-on-a-Treadmill is a re-
search platform of the Real-time Embedded Software Group of the
University of Waterloo [54]. The platform mimics the movement of
detected anomalies over 35 times faster than Siddhi and with much car on a straight road using a treadmill. A model car with on-board
lower variance in the latency.
ADAS features emulates various driving conditions, such as Adaptive
Siddhi is a good choice to compare with the nfer processor of Cruise Control (ACC), Lane Keeping Assistance (LKA), Lane Depar-
Palisade because it is a specification-based stream-processing frame- ture Warning (LDW), and Forward Collision Detection and Avoidance
work. Both Siddhi and nfer require the user to write rules in a (FCDA). As in the autonomous vehicle case study, ADAS algorithms
declarative language to generate facts from a stream of input events. are implemented on top of ROS. We have integrated Palisade with
Both languages are complex enough to define queries for the Gear Flip- the ADAS-on-a-Treadmill platform and run four anomaly detection
Flop anomaly: nfer is Turing complete when circular references are algorithms (spike, clipping, loss, and range) in two different scenarios
permitted [45] and Siddhi is likely Turing complete, although no com- (GPS spoof and dead spot). The next sections describe each scenario
plexity analysis is available [52]. Like Palisade, Siddhi supports data and present an evaluation.
streaming frameworks where sources and sinks may be remote from
the processor. While Siddhi supports cloud-based installations, it can
also be installed locally and deployed where internet connections are 6.1. GPS spoof attack
not available. The ability to install the software locally was important
both for automotive use cases and for our ability to accurately measure
The GPS spoof scenario simulates an attack on the vehicle posi-
the tool’s detection latency. Siddhi is also easy to install, which is a
tioning system. In this scenario, one car moves from one side of the
requirement for Palisade that many CEP systems fail to meet.
treadmill to the other on the 𝑌 -axis, while keeping the same position
We used Siddhi and the nfer Palisade processor to independently
on the 𝑋-axis. The GPS spoof attack changes the car’s Y position by
monitor events sent over a network. We ran the test on a desktop
fooling the controller into correcting for an inaccurate reading. In a
computer with an Intel I5-5200U 2.7 GHz processor and 8 GB of
real autonomous vehicle, such an attack could result in an accident,
memory running Linux 3.10.0. For Palisade, we streamed the data
for instance. We performed three GPS spoof attacks in a five minute
over Redis and for Siddhi, as it did not support Redis, we sent the
period and ran two Palisade detector nodes, spike and clipping.
data directly over HTTP. For the data, we chose a period of about 68
minutes during which 73079 events occurred. To simulate an online The spike detector keeps each data point in a buffer. Whenever
environment, we delayed publication between messages for the same the buffer is full and new data is received, the detector discards the
period as the difference in their timestamps. For example, the difference oldest data point. Then, it compares the value of the newly received
between the timestamps of sequential messages 𝐴 and 𝐵 was 75 ms, we data with the mean and standard deviation (std) of the data in the
would publish message 𝐴 and then delay 75 ms before publishing 𝐵. buffer. If the received value is greater or smaller then the std multiplied
Fig. 8 contains part of the Siddhi query to detect the Gear Flip-Flop by a constant plus the mean, then an anomaly is reported. Once an
anomaly. In the query, Line 1 matches the input stream when three anomaly is detected, the detector waits for a period before starting to
events occur within 200 ms where the first and third event report gear compute the mean again. The buffer length considered was 50, the
zero but the second reports a different gear. If there is a match, the code std multiplicative constant was 4.6, and the waiting period was 8 s.
in Line 2 is a select statement for the data to be published, while Line 3 These are all configurable parameters in the detector. We chose these
sends the data on the output stream over HTTP where we record the parameters because they output anomalies without false positives (we
capture. The omitted portions of the query repeat Lines 1–3, but replace discuss the choice of detectors parameters in Section 7).
the gear value of zero in the first line with the other possible gears. The clip detector also buffers incoming messages to avoid detection
For more information on the Siddhi query language, see the Siddhi jitter. When the clip detector fills its buffer, it counts how many data
documentation [53] and previous publication [52]. points in the buffer are within a configured distance of new values
To measure round-trip detection latency, we added the time of (buffer value + distance > received value and buffer value - distance
publication to each event and then copied those values to the output < received value). When there are ten or more matching data points
when a gear flip-flop anomaly was detected. A second script monitored within the interval, a clipping anomaly is reported. The buffer length
the results and recorded the timestamp when the anomaly report was used in the experiment was 60 and the distance parameter was 0.0005
received. We then subtracted the passed-through publish timestamp (difference in the received 𝑌 -axis position).
of the message that triggered the anomaly report (the most recent Fig. 9 shows the car’s Y position when the attacks were performed
message) from the timestamp when the anomaly report was received. and the output from the two detectors. The spike detector identifies
The comparison in Table 4 shows that, in this case, Palisade detects two out of the three anomalies, while the clip detector finds all three
anomalies over 35 times faster than Siddhi. In the table, lower values inserted anomalies. The first anomaly is identified quickly by the spike
are better for both the mean and standard deviation of the latency. Not detector because there is a sharp jump in the received Y position.
only is Palisade’s mean detection latency much faster, but the standard We also ran both detectors with the same parameters using a dataset
deviation of its latency is also about 2.4% of that of Siddhi. without anomalies, and neither detected any anomalies, as expected.

10
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Fig. 9. Car positioning with four inserted anomalies and the anomaly detection points
(Spike and Clipping detector).

6.2. Dead spot in a platooning formation

This scenario simulates a situation where a lighting inconsistency


in the environment causes ‘‘dead spots’’ on the conveyor, causing any
car that drives into the spot to lose its positioning data. This is inspired
by a real experience running the University of Waterloo autonomous
car at CES 2018. In this scenario, two cars drive on the treadmill in
a platoon formation. A dead spot is inserted in the treadmill and a
command is issued to move the first car forward on the 𝑋-axis. The car
then moves to the desired point, stays for a while, and returns to its
original position. We ran the experiment for five minutes and inserted
four dead spots while executing two Palisade detector nodes, loss and
range check.
The loss detector keeps track of the average period of message
reception and, when a data point takes longer to arrive than the average Fig. 10. Car positioning with inserted dead spots and the anomaly detection points
multiplied by a constant factor, an anomaly is reported. The detector (Loss and Range detector).
checks for anomalies after a minimum number of samples is received. In
the experiment, we set the minimum number of samples to 30 and the
constant factor to four. The range detector tests whether a data point 6.3. Comparison with Beep Beep 3
is between a maximum and minimum value. If a received data point is
outside the range, then an anomaly is reported. We trained the range We evaluated Palisade against the stream-processing system
check with a dataset without anomalies. Beep Beep 3. We developed a Beep Beep 3 processor to find all in-
Fig. 10(a) illustrates the expected behavior (without anomalies) of stances of the dead-spot anomaly described in Section 6.2. To provide
the platooning formation for the first car. For instance, around 100 s an accurate comparison, we used the Beep Beep 3 HTTP palette to con-
into the experiment a command to move the first car forward is issued, struct a distributed detector, with events sent and anomalies reported
which causes it to move from the position of about 1.0 to 1.5. When over a network connection. We found that Palisade and Beep Beep 3
the vehicle reaches the desired position, it takes a second or two to attained comparable anomaly detection latency in this case, but that
stabilize, causing the small spikes after each acceleration. building such a distributed processor with Beep Beep 3 was more
cumbersome than with Palisade. The data and configuration used for
Fig. 10(b) shows the car 𝑋-axis position with inserted dead spots.
this comparison is available at [51].
The first command to move forward is issued around 25 s into the
Beep Beep 3 is a good choice for a comparison with Palisade
experiment. The first car attempts to move to the desired point but
because it is specifically designed for online, streaming data processing.
reaches a dead spot where it loses its positioning signal for a short
Futhermore, the tool is highly flexible, supports arbitrary data types,
time. This causes the ‘‘shaking’’ at the bottom of the figure as the
and allows distributed processors to be created using official libraries.
controller tries to reestablish the car’s position. The range detector Like Palisade, Beep Beep 3 is more of an architecture and set of APIs
is able to identify such a situation because those values are lower than a standalone tool. However, unlike Palisade, Beep Beep 3 does
than the minimum in the dataset without anomalies. The loss detector not include out-of-the-box processors designed for anomaly detection,
recognizes the loss of communication while the car passes through a although such extensions exist [55].
dead spot. Around 140 s into the experiment, we can see two lines that We used Beep Beep 3 along with the Palisade RangeCheck and Loss-
move up and down in a short period. This happens because the first car Detect processors to independently monitor events sent over a network.
passes the dead spot by its side, corrects its trajectory by returning to We ran all the detectors on a desktop computer with an Intel I5-
the dead spot, and then returns to its position before the command to 6300U 2.4 GHz processor and 16 GB of memory running Linux 4.19.72.
move was issued. This is another scenario where Palisade improves the Palisade was run on Python 3.6.10 and Beep Beep 3 was executed using
anomaly detection by providing means to easily run two detectors using OpenJDK 1.8.0_252 (IcedTea 3.16.0). For Palisade, we streamed the
the same input data. We discuss how Palisade improves the anomaly data over Redis and for Beep Beep 3, as it did not support Redis, we
detection in Section 7. used the Beep Beep 3 HTTP Palette and its serialization library. For

11
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Table 5
Results of comparing Palisade with Beep Beep 3.
Round-trip latency Beep Beep 3 Palisade
Mean (lower is better) 2.87 ms 2.85 ms
Standard deviation (lower is better) 3.2 ms 0.66 ms

Palisade components are loosely coupled is a fundamental advantage


for distributed stream processing.
Although it was not necessary to write new Palisade processors
for the comparison, the existing processors are much simpler than the
equivalent processor written with Beep Beep 3. The reason for this
disparity is that the programming model for Beep Beep 3 is not aligned
with the programming language in which it must be implemented.
That is, Beep Beep 3 programs do not resemble Java programs. This
can most clearly be seen in the code to apply a function to check if
an event is outside a fixed range. In Palisade, this check is written
in Python and requires a series of three conditional statements with
inequality expressions. In Beep Beep 3, the check requires the creation
of 14 objects, essentially requiring the author to create an abstract
syntax tree by hand. Beep Beep 3 does support the creation of domain
Fig. 11. Graphical representation of the Beep Beep 3 processor omitting the details of specific languages (DSLs) but intentionally avoids providing one.1
how Range and Loss were computed. The comparison in Table 5 shows that, in this case, Palisade and
Beep Beep 3 had similar mean detection latency but the standard
deviation of Palisade’s latency was lower (lower is better). The higher
the data, we used the same period of about 5 minutes from Fig. 10(b) standard deviation for Beep Beep 3’s detection latency appears to be
during which the car’s position was reported 7030 times. To simulate the result of the Java Virtual Machine’s Just-In-Time (JIT) compiler
an online environment, we delayed publication between messages for as a few anomalies early in the experiment had many times longer
the same period as the difference in their timestamps. For example, detection latencies than the rest. These longer detection times could
the difference between the timestamps of sequential messages 𝐴 and 𝐵 probably be avoided by implementing a boot-strapping process for the
was 33 ms, we would publish message 𝐴 and then delay 33 ms before Beep Beep 3 processor, but this would require another component and
publishing 𝐵. added complexity not required by Palisade.
To measure round-trip detection latency, we added the time of pub-
lication to each event and then copied those values to the output when 7. Discussion
a loss or range anomaly was reported. A second program monitored
the results and recorded the timestamp when the anomaly report was This section discusses the Palisade results and design choices. We
received. We then subtracted the passed-through publish timestamp divide the discussion in three parts: software architecture, performance,
of the message that triggered the anomaly report (the most recent and anomaly detection.
message) from the timestamp when the anomaly report was received.
To compare between Palisade and Beep Beep 3, we constructed 7.1. Software architecture evaluation
a Beep Beep 3 stream processor that mimicked the behavior of both
the RangeCheck and LossDetect Palisade processors. Fig. 11 shows an Evaluating software architectures is not a straightforward task.
outline of this processor, along with Beep Beep 3 programs for reading There is no common language to describe different software architec-
and printing events [56]. tures and no clear way to understand and compare different software
Fig. 11 uses the official Beep Beep 3 drawing guide to show how concerns, such as maintainability, portability, modularity, and reusabil-
events are read, transmitted, filtered, retransmitted, and printed. The ity [57]. Also, the effectiveness of the software architecture is related
top diagram in the figure shows the events being read from a file and to the experience and knowledge of the development team, thus quality
transmitted using the HTTP palette to the central processor. The central must be considered in this context.
diagram in the figure shows how events are filtered to only be included We examined two software architecture evaluation methods, Soft-
in the output if they are out-of-range or occur after a long delay. The ware Architecture Analysis Method (SAAM) [57] and Architecture
events arrive via HTTP and are duplicated to follow two paths: in the Tradeoff Analysis Method (ATAM) [58], to determine if we could ob-
lower path, they are tested to see if they are out-of-range or occur after jectively evaluate Palisade’s architecture. We found that both methods
a long delay (there is more logic here, not shown in the figure), in the are intended for evaluating monolithic software projects that serve
upper path, they reach a filter (shown as a traffic light) which is gated specific business cases, and that they do not map well onto Palisade.
based on the result of the lower path test. Events for which the test However, both SAAM and ATAM argue that modularity and extensibility
is true are transmitted via HTTP to the final program, shown as the are important metrics for evaluating the quality of an architecture.
third diagram. The third diagram shows how events that arrive are then While we cannot quantitatively measure these metrics, here we present
printed to the console. an argument that they are well satisfied by our design.
Beep Beep 3 is not designed for distributed processing, however. One measurement of the extensibility of software is the number of
The only officially supported networking mechanism is direct HTTP lines of code that must be written to meaningfully extend it. The most
connections, which necessitates tightly coupled components. That is, common way to extend Palisade is to write a new processor node.
the program that reads the events from a file must know the address of The average number of lines in the current Palisade processor nodes
the processor, which must, in turn, know the address of the program
that prints the events. This is why the range and loss checks are per-
formed in one processor; if they were separated into two processors, the 1
The API provided by Beep Beep 3 could arguably be described as a deep
file reader would need to send events directly to both end points. That internal DSL.

12
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

(written in Python) is 144.25 lines (including comments). Autoencoder and we hope that our work motivates others to design comparable
is the longest processor node with 601, while Clip detector is the tools.
shortest with 72 lines. The Processor abstract base class has 239 lines. An important design decision in Palisade regards the copying of
The nfer detector, which uses the nfer Python API, has only 42 lines messages instead of passing message IDs. Once data arrives into a
of Python code. channel, Redis copies the messages to all nodes that subscribed to
As discussed in Section 4.4, processor nodes are independent mod- that specific channel. Another approach, found in Zero-Copy message
ules that share infrastructure from a base class. Editing a processor protocols [67] for example, would be to pass just the message ID to
node has no effect on upstream or sibling processor nodes. Only nodes all destination nodes. The ID would then be used to access a central
dependent on the output of the edited node may themselves require database to retrieve the data. When most nodes require the data,
editing. however, the ID passing approach causes a performance bottleneck
Constructing a new source node does not affect other source nodes due to access serialization at the central database (increased latency).
in the system. Only processor nodes that will be subscribing to a new We assume that a node that subscribes to a channel needs the data
source may need adjustment, and then only if the new source differs on that channel, so the message copying approach reduces latency
from those that already exist. Adding a new processor node has a lesser while not affecting the processing or memory requirements. This is a
impact than editing one, as no downstream nodes should be affected, reasonable assumption given that it only requires different types of data
typically. Instead a new processor node can be added to the system to be assigned separate channels. The ID passing approach is usually
without a single modification to any other component. used in micro-service architectures and is preferable when the target
For adding a new processor node, we consider the basic code to application needs all of the data (good for batch processing) [68].
extend the base class. A new processor node requires at least 24 lines
of code in Python. Obviously, the total number of lines depends on the 7.3. Anomaly detection
complexity of the algorithm, but the processor abstract base class makes
extending Palisade straightforward. The multiple anomalies detected by different processors can be
We compared the extensibility of Palisade with the CEP/Runtime compared against each other to verify anomalies and thereby decrease
Verification (RV) system Beep Beep 3 In Section 6.3. While it was possi- the false positive rate of anomaly detection by Palisade (this could be
ble to build processors in Beep Beep 3 that mimicked those in Palisade, done by a voter sink node, for instance). There are also cases where
they required tight coupling between components. In Beep Beep 3, some anomalies are detected by only a subset of the detectors. Palisade
constructing a new processor or sink node would require modifying the covers these cases as a variety of detectors can be integrated with low-
other nodes. Palisade’s loose coupling between components means that development effort (due to our design choices — command channel,
these similar modifications are not required to support new or modified abstract base class, and data formats).
nodes. The choice of parameters in the detector nodes plays a central
Palisade can be used in any embedded system that provides a role in the efficiency of such detectors. In our experiments, we varied
network interface. As the core Palisade functionality is built around the detector parameters until we found a configuration without false
the Redis publish–subscribe interface, any system that has a network positives (Sections 5 and 6). This was possible because we could repeat
interface can send data directly to Redis or to a server that then sends the execution of the detectors several times. When the execution cannot
to Redis. Also, RESP is simple and would be easy to port to an embed- be repeated, we suggest tuning the detector parameters using a system
ded system without Linux support. Consequently, we believe that the simulation.
integration of Palisade with any embedded system is a straightforward
task. 8. Related work

7.2. Performance evaluation This section describes existing work related to Palisade. We divide
the discussion by subject area: anomaly detection, Information Flow
Palisade is built for low latency anomaly detection and this is Processing (IFP) systems, anomaly detection with streaming frame-
evident from our comparisons with other frameworks. In the case study works, offline frameworks, and outdated or commercial frameworks.
evaluation in Section 5.3, an nfer Palisade processor detected anoma- Few existing works combine the central features of Palisade: online,
lies over 35 times faster than a comparable detector using the CEP distributed anomaly detection for both time series and event streams.
system Siddhi. In the case study evaluation in Section 6.3, two Palisade Our work is motivated by the lack of options in this niche area.
processors detected anomalies with similar latency to a comparable
detector using Beep Beep 3, which required tight coupling between 8.1. Anomaly detection
components and a using a complicated API for performing simple data
comparisons. Anomaly detection, sometimes called outlier detection, attempts to
We looked for other appropriate frameworks to compare against find unexpected or non-conforming patterns in data [5–10]. Anomalies
Palisade detectors such as the Autoencoder processor, but we discov- are distinct from noise, in that noise is not of interest and hinders
ered that no such framework exists. It does not make sense to compare analysis. The output of an anomaly detector may be either a score or
Palisade’s performance against a framework which does not support a label, but the purpose is always to provide a verdict on whether an
many of the same core features or which is unusable in the same anomaly was detected at a given time.
environments. Frameworks such as Extendible and Generic Anomaly Anomaly detection has appeared in statistics literature for many
Detection System (EGADS) [59] and Datastream.io [60] only support decades [69,70], but more recently it has found application and been
CSV input for offline detection, while Palisade operates online. Other studied in other fields. In healthcare, anomaly detection is used to look
frameworks like Esper [61] and TeSSLa [62] support online stream for cardiac irregularities that might indicate heart failure or patterns
processing, but lack support for distributing processors over a network. of disease outbreak [71,72]. In computer network security, anomaly
Detectors like Hogzilla [63], StreamMill [64], and NiagaraCQ [65] are detection is widely used in intrusion detection systems to look for
abandoned projects that cannot be installed. Others, like Thirdeye [66], suspicious activity [43,73,74]. Banks, insurance companies, and ad-
can only be run in a cloud environment, making them ill-suited for vertising firms, among others, employ anomaly detection to search for
latency comparisons. The lack of online, streaming, distributed, locally instances of fraud [75–77]. Heavy industry and safety-critical systems
runnable anomaly detection frameworks shows the need for Palisade, operators like airlines use anomaly detection for equipment damage

13
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

Table 6 CEP. That is, Triceps is meant to be used as a library and to be


Anomaly symptoms defined in [83] compared to our proposed symptoms.
embedded into other programs. This fills an interesting niche, but it
Symptoms in [83] Our Symptoms
is not a framework for distributed anomaly detection like Palisade. In
Sine anomaly S-Wave the future it would be interesting to build a processor using Triceps,
Plateau stuck anomaly Loss
Peak anomaly Spike
similar to the already existing nfer processor.
Negative peak anomaly Spike Esper is a CEP and DSMS for Java and .Net (Nesper) with a SQL
Noise Noise variant called Event Processing Language (EPL) that Esper compiles to
Plateau rise/fall anomaly Clipping
Zero fall anomaly Clipping/Loss
byte code [61]. Esper is designed for low latency and high throughput,
as well as extensibility and low resource utilization. These traits make
Esper a good candidate for online anomaly detection. Esper is designed
to work well running inside a distributed stream processing framework
detection [78,79]. Recent work has shown how anomaly detection can and includes examples of integrations with Java networking libraries.
be applied to detect events in a power grid [80]. However, Esper is not itself a distributed stream processing framework,
Lightweight online detector of anomalies (LODA) is a data stream- and integrating Esper with Palisade would require building a variety of
ing online anomaly detection system [81]. LODA uses a collection of custom components to handle networking, serialization, and command.
one-dimensional histograms to improve the anomaly detection. The Siddhi is an open source CEP system deployed by companies such
rationale behind the use of a collection of weak classifier is because as Uber, eBay, and PayPal for use cases like fraud analysis and policy
together they can form a strong classifier [82]. LODA presented the enforcement [52]. Siddhi uses a specification language called Stream-
same performance in terms of precision of HS-Tress, but with better ing SQL and supports input from a variety of streaming sources such
time to process a data stream. as Apache Kafka and NATS in diverse formats like JSON and Extensi-
Weber et al. proposed a two-stage anomaly detection framework for ble Markup Language (XML). It supports streaming input and output,
vehicle signals [83]. The first stage is based on static checkers (for multiple end-points, and specification-based anomaly detection, and is
CAN messages) and the second stage is based on machine learning one of the existing works closest to supporting Palisade’s requirements
algorithms (named learning checks). The learning checks stage imple- (defined in Section 4). We compared the detection latency of Palisade
ments a Recurrent Neural Network (RNN) and LODA. In a performance with Siddhi, where detection latency is defined as the time difference
evaluation using CAN messages from a vehicle, RNN had a false positive between the instant data is generated by a source and the instant it
rate of 0.065%. Palisade also supports both static and machine learning- is reported as anomalous by a detector. The results, reported in Sec-
based algorithms. However, Palisade supports the execution of all tion 5.3, show that Palisade responds over 35 times faster on average
detectors in parallel or in any number of stages (not only two). In the than Siddhi for our case study.
same work, the authors defined seven types of anomalies that can occur
in a sensor or Electronic control unit (ECU) [83]. Table 6 compares 8.3. Anomaly detection with streaming frameworks
the seven types of anomalies propose in [83] with our nomenclature
described in Section 3. We can note that the seven types of anomalies Several other works have used data streaming frameworks for online
are a subset of ours. In this sense, we provide a more comprehensive detection of errors or anomalies. Lopez et al. discuss the characteristics
overview of anomaly symptoms that can occur in embedded real-time and compare the throughput of three stream processing platforms
systems. (Apache Spark, Flink, and Storm) using a threat detection applica-
tion [87]. Solaimani et al. used Apache Spark to detect anomalies for a
8.2. Information flow processing systems multi-source VMware-based cloud data center [88]. Subramaniam et al.
proposed a framework to detect anomalies online (outlier detection) in
Palisade processes flows of information online, deriving high level wireless sensor networks [89]. However, the authors only implemented
events and alerts from the flow as data is received. This online flow the framework in a simulator. Du et al. built a streaming detector
processing has similarities to the definition by Cugola and Margara of using Apache Storm that used k-Nearest Neighbors (k-NN) to detect
a CEP system [84]. CEP systems are a kind of IFP system that sup- anomalies in IP network traffic [90]. Shi et al. implemented an online
ports continuous and timely processing of low-level event streams into fault diagnosis system based on Apache Spark for power grid equip-
high-level abstractions according to predefined rules. CEP systems are ment [91]. Song et al. proposed an integrated system for explainable
differentiated from Data Stream Management System (DSMS) systems anomaly detection using Apache Spark called EXAD [92].
in that DSMS systems work on generic data streams rather than event Thirdeye is an anomaly detection framework based on Apache
streams and their output is similarly unconstrained. Palisade supports Spark [66]. It uses machine learning and artificial intelligence algo-
both event and non-event stream information and includes processors, rithms for cybersecurity, data analytics, and outlier detection. Thirdeye
like nfer, that use predefined rules, and learning-based processors that is designed for deployment on Amazon’s AWS cloud computing plat-
use training data to construct behavioral models. form. Palisade, by contrast, is designed to run on a curated local area
This section discusses some CEP systems in the literature that could network to reduce communication latency.
be applied to some of the use cases for Palisade. However, all of these Beep Beep 3 is a stream-processing system that combines some
systems require predefined rules to construct their event abstractions, aspects of CEP systems with ideas from RV [93]. Beep Beep 3 is
which most Palisade processors do not. As a result, these CEP sys- primarily a set of Java APIs to build synchronous processors for ar-
tems should only be compared to Palisade processors with the same bitrary data types. The standard Beep Beep 3 APIs may be augmented
requirements. with modules called palettes that implement interfaces such as network
Gigascope is a DSMS designed for network monitoring, intrusion communication, temporal logic, and plots.
detection, and traffic analysis [85]. Gigascope uses a Structured Query Beep Beep 3 has been studied for use in online, streaming anomaly
Language (SQL)-like query language called Gigascope Query Language detection [55]. However, Beep Beep 3 has different goals from Palisade
(GSQL) that uses data streams as its input and output. Gigascope is that make it less well suited for distributed anomaly detection. We
explicitly aimed at intrusion detection in networked systems and is not compared the detection latency of Palisade with Beep Beep 3, as well
a general solution for anomaly detection. as the experience of building anomaly detectors using the two frame-
Triceps is an open source CEP system that does not define its own works, in Section 6.3. The results show that the two tools have similar
SQL variant, but rather has the user implement queries and operations detection latency, but that Palisade is better suited than Beep Beep 3
directly in C++ or Perl [86]. Triceps is unique in that it is an embedded for detecting anomalies in a distributed environment.

14
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

TeSSLa is a stream-based specification language and monitoring 9. Conclusion


system designed for specifying and analyzing the behavior of systems
where timing is important [62]. TeSSLa, like Beep Beep 3, combines In this article, we presented Palisade, a software framework for
aspects of CEP systems with RV. However, unlike Beep Beep 3, TeSSLa anomaly detection in embedded systems. We introduced a new tax-
may only be used through an external DSL and its interpreter only onomy of anomaly symptoms, and we designed Palisade to support
accepts input via file arguments or standard in. While TeSSLa’s lan- their detection using a variety of algorithms. Palisade is built around
guage and theoretical foundation are exciting, its lack of network the Redis publish–subscribe interface, which allows running differ-
support means that it cannot currently operate as a distributed, online ent anomaly detectors with the same input data across a distributed
anomaly detection framework. Integrating TeSSLa with Palisade is also network. We demonstrated the viability of the proposed framework
using two case studies, one using data from an autonomous vehicle
impossible because TeSSLa is only distributed as a compiled binary.
and another one using data from an ADAS platform. We argued that
Palisade is easy to operate and modify and that it detects anomalies
8.4. Offline frameworks with low latency.
As future work, we plan to integrate Palisade with the University
Datastream.io is an open-source anomaly detection framework that of Waterloo’s autonomous car and implement more learning-based
allows users to integrate their custom detectors for testing and train- anomaly detectors. The datasets used in the experiments is available
ing [60]. The project plans to support online streaming but presently online [51] and the Palisade source code is available upon request.
only supports the use of CSV files as input to perform offline detection.
Declaration of competing interest
EGADS is an open-source anomaly detection framework by Ya-
hoo [59]. EGADS is a self-contained Java package developed for
The authors declare that they have no known competing finan-
time-series anomaly detection, providing access to multiple detectors.
cial interests or personal relationships that could have appeared to
EGADS accepts input only in the form of CSV and standard-input and
influence the work reported in this paper.
is no longer actively maintained.
Frankowski et al. used a variety of CEP systems, including Siddhi References
and their own SECOR CEP, to detect intrusions and anomalies [94].
Their work combined several CEP systems to periodically analyze log [1] X. Liu, H. Ding, K. Lee, L. Sha, M. Caccamo, Feedback fault tolerance of real-
files and store the results in a database. They showed that it is possible time embedded systems: Issues and possible solutions, SIGBED Rev. 3 (2) (2006)
23–28.
to build an effective, signature-free anomaly detection framework using [2] A. Taylor, S. Leblanc, N. Japkowicz, Anomaly detection in automobile control
off-the-shelf components. However, they did not construct an online network data with long short-term memory networks, in: 2016 IEEE International
detector. Conference on Data Science and Advanced Analytics, DSAA, 2016, pp. 130–139.
[3] C. McCarthy, K. Harnett, A. Carter, Characterization of Potential Security Threats
in Modern Automobiles: A Composite Modeling Approach, Tech. Rep., National
8.4.1. Outdated or commercial frameworks Highway Traffic Safety Administration, Washington, 2014.
Other frameworks have been proposed in the past but are un- [4] E. Marin, D. Singelée, B. Yang, I. Verbauwhede, B. Preneel, On the feasibility
of cryptography for a wireless insulin pump system, in: Proceedings of the Sixth
usable in practice because they are either expensive to license or ACM Conference on Data and Application Security and Privacy, in: CODASPY
unmaintained. NiagaraCQ was an early and influential continuous ’16, ACM, New York, NY, USA, 2016, pp. 113–120.
query system, but the software has not been available for at least a [5] S. Agrawal, J. Agrawal, Survey on anomaly detection using data mining
techniques, Procedia Comput. Sci. 60 (2015) 708–713, Knowledge-Based and In-
decade [65]. SASE was a stream processing system designed to support
telligent Information & Engineering Systems 19th Annual Conference, KES-2015,
complex queries and high throughput, but it was last maintained in Singapore, September 2015 Proceedings.
2014 [95]. Cayuga was a stateful publish–subscribe system based on [6] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput.
Non-deterministic Finite AutomatonNon-deterministic Finite Automata Surv. 41 (3) (2009) 15:1–15:58.
[7] A. Patcha, J.-M. Park, An overview of anomaly detection techniques: Exist-
(NFAs) that was adapted as an event monitoring system, but it was
ing solutions and latest technological trends, Comput. Netw. 51 (12) (2007)
last maintained in 2013 [96]. Stream Mill was a DSMS that com- 3448–3470.
bines predefined rules with statistical learning algorithms for mining [8] Z.A. Bakar, R. Mohemad, A. Ahmad, M.M. Deris, A comparative study for outlier
queries [64], but it was last maintained in 2012. GEM is a commercial detection techniques in data mining, in: 2006 IEEE Conference on Cybernetics
and Intelligent Systems, 2006, pp. 1–6.
CEP vendor in the industrial space and, as such, their software is not [9] M. Agyemang, K. Barker, R. Alhajj, A comprehensive survey of numeric and
freely available [97]. symbolic outlier mining techniques, Intell. Data Anal. 10 (6) (2006) 521–538.
Hogzilla is an open-source anomaly-based Intrusion Detection Sys- [10] M.A. Pimentel, D.A. Clifton, L. Clifton, L. Tarassenko, A review of novelty de-
tection, Signal Process. 99 (2014) 215–249, http://dx.doi.org/10.1016/j.sigpro.
tem (IDS) targeted towards network communications [63]. Hogzilla
2013.12.026.
purports to detect a wide range of network attacks including zero-day [11] Redis, Website, 2018, https://redis.io/, (Accessed: May 2018).
attacks. At the time of publishing, the software no longer runs and [12] Mitre, Common attack pattern enumeration and classification, 2018, https://
has not been maintained for some time. However, in October 2019 the capec.mitre.org/data/definitions/1000.html, (Accessed: Sep 2018).
[13] Y. Bengio, Gradient-based optimization of hyperparameters, Neural Comput. 12
project website was updated to report that the tool will be maintained (8) (2000) 1889–1900, http://dx.doi.org/10.1162/089976600300015187.
and supported by a commercial partner, so Hogzilla may return to [14] M. Feurer, F. Hutter, Hyperparameter optimization, in: Automated Machine
relevance in the area. Learning: Methods, Systems, Challenges, Springer International Publishing,
Cham, 2019, pp. 3–33, http://dx.doi.org/10.1007/978-3-030-05318-5_1.
Many of the IFP systems Cugola and Margara review are either
[15] C. Miller, C. Valasek, Remote exploitation of an unaltered passenger vehicle, in:
no longer maintained or locked up behind commercial licenses [84]. Blackhat USA, IOActive, 2015, pp. 1–91.
Other promising systems from their study that are unavailable or [16] K. Rollick, A. Roczko, L. Mitchell, Combustible gas detector sensor drift: Catalytic
impractical include: the Borealis stream processor (abandoned 2008), vs. infrared, InTech Mag. (2010).
[17] M. Seiter, H.J. Mathony, P. Knoll, Parking assist, in: Handbook of Intelligent
StreamBase (commercial), SQLStream (commercial), Oracle CEP (com- Vehicles, Springer, 2012, pp. 829–864.
mercial), Tribeca (disappeared), and TelegraphCQ (abandoned 2003). [18] J. Petit, B. Stottelaar, M. Feiri, Remote attacks on automated vehicles sensors :
Palisade is intentionally designed as a set of distributed micro- Experiments on camera and lidar, in: Black Hat Europe, 2015, pp. 1–13.
[19] S. Mukherjee, H. Shirazi, I. Ray, J. Daily, R. Gamble, Practical DoS attacks on
services built around a data streaming architecture. Palisade provides
embedded networks in commercial vehicles, in: I. Ray, M.S. Gaur, M. Conti, D.
a balance between low latency anomaly detection and loosely coupled Sanghi, V. Kamakoti (Eds.), Information Systems Security, Springer, Cham, 2016,
services. pp. 23–42.

15
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

[20] Y. Mo, B. Sinopoli, Secure control against replay attacks, in: 2009 47th Annual [51] S.K.M.D.G.G.W.K.N.B.S. Fischmeister, Palisade: A Framework for Anomaly De-
Allerton Conference on Communication, Control, and Computing, Allerton, 2009, tection in Embedded Systems Dataset, IEEE Dataport, 2020, http://dx.doi.org/
pp. 911–918. 10.21227/44z5-9k90.
[21] A. Bolshev, J. Larsen, M. Krotofil, R. Wightman, A rising tide: Design exploits in [52] S. Suhothayan, K. Gajasinghe, I. Loku Narangoda, S. Chaturanga, S. Perera,
industrial control systems, in: 10th USENIX Workshop on Offensive Technologies, V. Nanayakkara, Siddhi: A second look at complex event processing archi-
WOOT 16, USENIX Association, Austin, TX, 2016, pp. 1–11. tectures, in: Proceedings of the 2011 ACM Workshop on Gateway Computing
[22] C. Moreno, S. Fischmeister, M.A. Hasan, Non-intrusive program tracing and Environments, in: GCE ’11, ACM, New York, NY, USA, 2011, pp. 43–50, http:
debugging of deployed embedded systems through side-channel analysis, in: Proc. //dx.doi.org/10.1145/2110486.2110493.
of the 14th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and [53] Siddhi, Website, 2019, https://siddhi.io/, (Accessed: Jun 2020).
Tools for Embedded Systems, LCTES, ACM, New York, USA, 2013, pp. 77–88. [54] D. Shin, A Platform for Generating Anomalous Traces Under Cooperative Driving
[23] C. Moreno, S. Kauffman, S. Fischmeister, Efficient program tracing and moni- Scenarios, UWSpace, University of Waterloo, 2018, URL http://hdl.handle.net/
toring through power consumption – with a little help from the compiler, in: 10012/13534.
Design, Automation & Test in Europe Conference & Exhibition, in: DATE ’16, [55] M. Roudjane, D. Rebaïne, R. Khoury, S. Hallé, Real-time data mining for
IEEE, 2016, pp. 1556–1561, http://dx.doi.org/10.3850/9783981537079_0829. event streams, in: 2018 IEEE 22nd International Enterprise Distributed Object
[24] S. Kauffman, C. Moreno, S. Fischmeister, Static transformation of power Computing Conference, EDOC, 2018, pp. 123–134, http://dx.doi.org/10.1109/
consumption for software attestation, in: 22nd International Conference on EDOC.2018.00025.
Embedded and Real-Time Computing Systems and Applications, in: RTCSA ’16, [56] S. Hallé, Event Stream Processing with BeepBeep 3: Log Crunching and Analysis
IEEE, 2016, pp. 188–194, http://dx.doi.org/10.1109/RTCSA.2016.45. Made Easy, Presses de l’Université du Québec, 2018.
[25] T.T.Y. Lin, D.P. Siewiorek, Error log analysis: statistical modeling and heuristic
[57] R. Kazman, L. Bass, M. Webb, G. Abowd, SAAM: A method for analyzing the
trend analysis, IEEE Trans. Reliab. 39 (4) (1990) 419–432.
properties of software architectures, in: Proceedings of the 16th International
[26] S.M. Bellovin, Packets found on an internet, SIGCOMM Comput. Commun. Rev.
Conference on Software Engineering, IEEE Computer Society Press, 1994, pp.
23 (3) (1993) 26–31.
81–90.
[27] A. Haque, A. DeLucia, E. Baseman, Markov Chain modeling for anomaly
[58] R. Kazman, M. Klein, P. Clements, ATAM: Method For Architecture Evaluation,
detection in high performance computing system logs, in: Proceedings of the
Tech. Rep., Carnegie-Mellon Univ Pittsburgh PA Software Engineering Inst, 2000.
Fourth International Workshop on HPC User Support Tools, in: HUST’17, ACM,
[59] Y. EGADS, Website, 2019, https://github.com/yahoo/egads, (Accessed: Sep
New York, USA, 2017, pp. 3:1–3:8.
2019).
[28] H. Stark, Image Recovery: Theory and Application, Elsevier, 1987.
[29] F. Marvasti, M. Analoui, M. Gamshadzahi, Recovery of signals from nonuniform [60] Datastream.io, Website, 2019, https://blog.ment.at/datastream-io-open-source-
samples using iterative methods, IEEE Trans. Signal Process. 39 (4) (1991) anomaly-detection-64db282735e0, (Accessed: Sep 2019).
872–878. [61] E. Tech, Website, 2019, http://www.espertech.com/, (Accessed: Jun 2020).
[30] K. Sauer, J. Allebach, Iterative reconstruction of bandlimited images from [62] L. Convent, S. Hungerecker, M. Leucker, T. Scheffel, M. Schmitz, D. Thoma,
nonuniformly spaced samples, IEEE Trans. Circuits Syst. 34 (12) (1987) TeSSLa: Temporal stream-based specification language, in: T. Massoni, M.R.
1497–1506. Mousavi (Eds.), Formal Methods: Foundations and Applications, Springer Inter-
[31] H. Feichtinger, Discretization of convolutions and the generalized sampling prin- national Publishing, Cham, 2018, pp. 144–162, http://dx.doi.org/10.1007/978-
ciple, in: Progress in Approximation Theory, Academic Press, Boston, Toronto, 3-030-03044-5_10.
1991, pp. 333–345. [63] H. ids Data, Website, 2020, http://ids-hogzilla.org/, (Accessed: Jun 2020).
[32] K. Wang, S.J. Stolfo, Anomalous payload-based network intrusion detection, in: E. [64] H. Thakkar, B. Mozafari, C. Zaniolo, Designing an inductive data stream
Jonsson, A. Valdes, M. Almgren (Eds.), Recent Advances in Intrusion Detection, management system: the stream mill experience, in: Proceedings of the 2nd
Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 203–222. International Workshop on Scalable Stream Processing System, ACM, 2008, pp.
[33] J. Rumbaugh, I. Jacobson, G. Booch, Unified Modeling Language Reference 79–88.
Manual, second ed., Pearson Higher Education, 2004. [65] J. Chen, D.J. DeWitt, F. Tian, Y. Wang, NiagaraCQ: A scalable continuous
[34] M. Dunne, G. Gracioli, S. Fischmeister, A comparison of data streaming frame- query system for internet databases, in: Proceedings of the 2000 ACM SIGMOD
works for anomaly detection in embedded systems, in: Proceedings of the International Conference on Management of Data, in: SIGMOD ’00, Association
1st International Workshop on Security and Privacy for the Internet-of-Things, for Computing Machinery, New York, NY, USA, 2000, pp. 379–390, http://dx.
IoTSec, Orlando, FL, USA, 2018, pp. 30–33. doi.org/10.1145/342009.335432.
[35] M.Q. Rabbit, Website, 2018, http://www.rabbitmq.com/, (Accessed: Jan 2018). [66] T. Data, Website, 2019, https://thirdeyedata.io/, (Accessed: Sep 2019).
[36] A. Kafka, Website, 2018, https://kafka.apache.org/, (Accessed: Jan 2018). [67] B.P. Swenson, G.F. Riley, A new approach to zero-copy message passing with re-
[37] NATS, Website, 2018, https://nats.io/, (Accessed: Jan 2018). versible memory allocation in multi-core architectures, in: 2012 ACM/IEEE/SCS
[38] ECMA, The JSON Data Interchange Syntax 2nd Edition, ECMA, (404) European 26th Workshop on Principles of Advanced and Distributed Simulation, 2012, pp.
Computer Manufacturers Association, Geneva, Switzerland, 2017. 44–52.
[39] Json.org, 2018, http://json.org/, (Accessed: 25 May 2018). [68] N. Dragoni, S. Giallorenzo, A.L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, L.
[40] G.P.B. Documentation, Website, 2020, https://developers.google.com/protocol- Safina, Microservices: yesterday, today, and tomorrow, in: Present and Ulterior
buffers/docs/overview#extensions, (Accessed: Jun 2020). Software Engineering, Springer, 2017, pp. 195–216.
[41] M. Kramer, Nonlinear principal component analysis using autoassociative neural
[69] F. Edgeworth, Xli. on discordant observations, London Edinb. Dublin Phil. Mag.
networks, Am. Inst. Chem. Eng. 37 (2) (1991) 233–243.
J. Sci. 23 (143) (1887) 364–375.
[42] J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series,
[70] J. Pickands, Statistical inference using extreme order statistics, Ann. Statist. 3
with implications for streaming algorithms, in: Proceedings of the 8th ACM
(1) (1975) 119–131, URL http://www.jstor.org/stable/2958083.
SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,
[71] A.L. Goldberger, L.A. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
in: DMKD ’03, ACM, New York, NY, USA, 2003, pp. 2–11.
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit,
[43] C. Warrender, S. Forrest, B. Pearlmutter, Detecting intrusions using system calls:
and PhysioNet: components of a new research resource for complex physiologic
alternative data models, in: Proceedings of the 1999 IEEE Symposium on Security
signals, Circulation 101 (23) (2000) e215–e220.
and Privacy, 1999, pp. 133–145.
[72] W.-K. Wong, A.W. Moore, G.F. Cooper, M.M. Wagner, Bayesian network anomaly
[44] S. Kauffman, K. Havelund, R. Joshi, Nfer–A notation and system for inferring
pattern detection for disease outbreaks, in: Proceedings of the 20th International
event stream abstractions, in: International Conference on Runtime Verification,
Conference on Machine Learning, ICML-03, 2003, pp. 808–815.
Springer, 2016, pp. 235–250.
[45] S. Kauffman, K. Havelund, R. Joshi, S. Fischmeister, Inferring event stream [73] D. Snyder, On-Line Intrusion Detection Using Sequences of System Calls, Florida
abstractions, Form. Methods Syst. Des. 53 (1) (2018) 54–82. State University, 2001.
[46] J.F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26 [74] B. Agarwal, N. Mittal, Hybrid approach for detection of anomaly network
(11) (1983) 832–843. traffic using data mining techniques, Proc. Technol. 6 (2012) 996–1003,
[47] S. Kauffman, S. Fischmeister, Mining temporal intervals from real-time system http://dx.doi.org/10.1016/j.protcy.2012.10.121, 2nd International Conference
traces, in: Proceedings of the 6th International Workshop on Software Mining, on Communication, Computing & Security [ICCCS-2012].
SoftwareMining, Champaign, USA, 2017, pp. 1–8. [75] T. Fawcett, F. Provost, Activity monitoring: Noticing interesting changes in
[48] M.M.Z. Zadeh, M. Salem, N. Kumar, G. Cutulenco, S. Fischmeister, SiPTA: Signal behavior, in: Proceedings of the Fifth ACM SIGKDD International Conference
processing for trace-based anomaly detection, in: Proc. of the International on Knowledge Discovery and Data Mining, in: KDD ’99, ACM, New York, NY,
Conference on Embedded Software, EMSOFT, New Dehli, India, 2014, pp. 1–6. USA, 1999, pp. 53–62.
[49] O. Iegorov, S. Fischmeister, Mining task precedence graphs from real-time [76] S. Donoho, Early detection of insider trading in option markets, in: Proceedings
embedded system traces, in: 2018 IEEE Real-Time and Embedded Technology of the Tenth ACM SIGKDD International Conference on Knowledge Discovery
and Applications Symposium, RTAS, 2018, pp. 251–260. and Data Mining, in: KDD ’04, ACM, New York, NY, USA, 2004, pp. 420–429.
[50] A. Narayan, G. Cutulenco, Y. Joshi, S. Fischmeister, Mining timed regular [77] Ghosh, Reilly, Credit card fraud detection with a neural-network, in: 1994
specifications from system traces, ACM Trans. Embed. Comput. Syst. 17 (2) Proceedings of the Twenty-Seventh Hawaii International Conference on System
(2018) 46:1–46:21. Sciences, Vol. 3, 1994, pp. 621–630.

16
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876

[78] E. Keogh, J. Lin, S.-H. Lee, H.V. Herle, Finding the most unusual time series Murray Dunne is a Ph.D. student in the Real-time Em-
subsequence: algorithms and applications, Knowl. Inf. Syst. 11 (1) (2007) 1–27. bedded System Lab (RESL) at the University of Waterloo
[79] S. Basu, M. Meckesheimer, Automatic outlier detection for time series: an where he previously earned a Master of Mathematics. He
application to sensor data, Knowl. Inf. Syst. 11 (2) (2007) 137–154. received his BSc. in Computer Science from the University
[80] Y. Zhou, H. Zou, R. Arghandeh, W. Gu, C.J. Spanos, Non-parametric outliers of Victoria. His research interests include cyber security,
detection in multiple time series a case study: Power grid data analysis, in: machine learning, embedded, and distributed systems.
Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 4605–4612.
[81] T. Pevný, Loda: Lightweight on-line detector of anomalies, Mach. Learn. 102 (2)
(2016) 275–304.
[82] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms,
Wiley-Interscience, New York, NY, USA, 2004.
[83] M. Weber, G. Wolf, E. Sax, B. Zimmer, Online detection of anomalies in vehicle
signals using replicator neural networks, in: 6th Embedded Security in Cars USA,
Escar USA, 2018, pp. 1–14. Giovani Gracioli received his Ph.D. in Automation and
[84] G. Cugola, A. Margara, Processing flows of information: From data stream to Systems Engineering from the Federal University of Santa
complex event processing, ACM Comput. Surv. 44 (3) (2012) 15. Catarina (UFSC), Brazil in 2014. He currently is assis-
[85] C. Cranor, T. Johnson, O. Spataschek, V. Shkapenyuk, Gigascope: a stream tant professor at UFSC and his research interests include
database for network applications, in: Proceedings of the 2003 ACM SIGMOD real-time, embedded, and operating systems.
International Conference on Management of Data, ACM, 2003, pp. 647–651.
[86] Triceps: An innovative CEP, 2017, http://triceps.sourceforge.net/, (Accessed: 18
October 2017).
[87] M.A. Lopez, A.G.P. Lobato, O.C.M.B. Duarte, A performance comparison of
open-source stream processing platforms, in: 2016 IEEE GLOBECOM, 2016, pp.
1–6.
[88] M. Solaimani, M. Iftekhar, L. Khan, B. Thuraisingham, J. Ingram, S.E. Seker,
Online anomaly detection for multi-source VMware using a distributed streaming
framework, Softw. Pract. Exper. 46 (11) (2016) 1479–1497. Waleed Khan is a Ph.D. student in the Real-time Embedded
[89] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, D. Gunopulos, System Lab (RESL) at the University of Waterloo. Prior to
Online outlier detection in sensor data using non-parametric models, in: Proc. becoming a Ph.D. student, he also received his Bachelor’s
of the 32Nd VLDB, 2006, pp. 187–198. degree in Mechatronics Engineering and his Master’s degree
[90] Y. Du, J. Liu, F. Liu, L. Chen, A real-time anomalies detection system based on in Science at the University of Waterloo. His research
streaming technology, in: 2014 Sixth IHMSC, Vol. 2, 2014, pp. 275–279. interests include embedded systems, cyber security and IoT.
[91] W. Shi, Y. Zhu, T. Huang, G. Sheng, Y. Lian, G. Wang, Y. Chen, An integrated
data preprocessing framework based on Apache Spark for fault diagnosis of
power grid equipment, J. Signal Process. Syst. 86 (2) (2017) 221–236.
[92] F. Song, Y. Diao, J. Read, A. Stiegler, A. Bifet, EXAD: A system for explainable
anomaly detection on big data traces, in: 2018 IEEE International Conference on
Data Mining Workshops, ICDMW, 2018, pp. 1435–1440.
[93] S. Hallé, When RV meets CEP, in: International Conference on Runtime
Verification, Springer, 2016, pp. 68–91. Nirmal Benann graduated with a Master of Science degree
[94] G. Frankowski, M. Jerzak, M. Miłostan, T. Nowak, M. Pawłowski, Application from the Real-time Embedded System Lab (RESL) at the Uni-
of the complex event processing system for anomaly detection and network versity of Waterloo. Prior to becoming a Master’s student,
monitoring, Comput. Sci. 16 (4) (2015) 351–371, http://dx.doi.org/10.7494/csci. he received his Bachelor’s degree in Computer Engineering
2015.16.4.351. at the University of Waterloo. His research interests include
[95] H. Zhang, Y. Diao, N. Immerman, Recognizing patterns in streams with imprecise embedded systems, parallel and distributed computing.
timestamps, Proc. VLDB Endow. 3 (1–2) (2010) 244–255, http://dx.doi.org/10.
14778/1920841.1920875.
[96] A. Demers, J. Gehrke, M. Hong, M. Riedewald, W. White, Towards expressive
publish/subscribe systems, in: Y. Ioannidis, M.H. Scholl, J.W. Schmidt, F.
Matthes, M. Hatzopoulos, K. Boehm, A. Kemper, T. Grust, C. Boehm (Eds.),
Advances in Database Technology - EDBT 2006, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2006, pp. 627–644, http://dx.doi.org/10.1007/11687238_38.
[97] GEM, Website, 2020, https://www.softwareag.com/gem/default.html, (Accessed: Sebastian Fischmeister received his Ph.D. degree in com-
Jun 2020). puter science from the University of Salzburg, Salzburg,
Austria, in 2002. He is an Associate Professor with the De-
Sean Kauffman is a Ph.D. candidate in the Real-time partment of Electrical and Computer Engineering, University
Embedded System Lab (RESL) at the University of Waterloo of Waterloo, Waterloo, ON, Canada. His primary application
where he previously earned a Master of Mathematics. He domains include automotive systems, aerospace, and medi-
worked as a software engineer on in industry for eight cal devices. His research interests include systems research
years after receiving a Bachelor of Arts in Computer Science in safety-critical software and embedded networking of
from Goshen College. His research interests include runtime time-sensitive systems.
verification, specification mining, and functional safety.

17

You might also like