Anomaly Detection
Anomaly Detection
Keywords: In this article, we propose Palisade, a distributed framework for streaming anomaly detection. Palisade is
Anomaly detection motivated by the need to apply multiple detection algorithms for distinct anomalies in the same scenario. Our
Real-time embedded systems solution blends low latency detection with deployment flexibility and ease-of-modification. This work includes
Software architecture
a thorough description of the choices made in designing Palisade and the reasons for making those choices.
Streaming-based architecture
We carefully define symptoms of anomalies that may be detected, and we use this taxonomy in characterizing
our work. The article includes two case studies using a variety of anomaly detectors on streaming data to
demonstrate the effectiveness of our approach in an embedded setting.
∗ Corresponding author.
E-mail addresses: sean.kauffman@uwaterloo.ca (S. Kauffman), mdunne@uwaterloo.ca (M. Dunne), giovani@lisha.ufsc.br (G. Gracioli),
wqkhan@uwaterloo.ca (W. Khan), njbenann@uwaterloo.ca (N. Benann), sfischme@uwaterloo.ca (S. Fischmeister).
https://doi.org/10.1016/j.sysarc.2020.101876
Received 24 January 2020; Received in revised form 1 July 2020; Accepted 25 August 2020
Available online 12 September 2020
1383-7621/© 2020 Elsevier B.V. All rights reserved.
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
2
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
∀𝑦𝑡 ∶ 𝑝 ≤ 𝑡 ≤ 𝑞, 𝑦𝑡 = 𝑦̂𝑡 + (0, 𝑛) Example: Analog to Digital Converter (ADC) can be attacked by am-
plifying analog signals past the dynamic range of the device. These at-
where (0, 𝑛) is a standard normal distribution centered at zero with tacks can obscure other malicious behavior and damage hardware [21].
standard deviation 𝑛. This type of attack is an example of ⟨CAPEC-153: Input Data
Example: Compressed air in truck brakes may generate acoustical Manipulation⟩ [12]. It could be detected by monitoring the analog
interference and cause metallic friction noise from track vehicles in signal for Amplification anomalies.
ultrasonic sensors [17]. Brake failure could be detected by correlating
3.1.8. Level change
Noise anomalies in ultrasonic sensors with air brake usage.
A Level Change (Fig. 1(i)) symptom is observed when the mean of
a signal changes in a short period and then remains consistent at the
3.1.4. Clipping new level. Slower changes may fall under drifting. Given a time series
𝑦, an acceptable minimum level change threshold 𝓁, and a minimum
We define Clipping as a loss of data at the extrema of a signal
number of samples the mean change must persist 𝑛, a level change has
range (Fig. 1(e)), where a signal is of a higher amplitude than is
occurred over a window of 𝑤 samples 𝑦[𝑡,𝑡+𝑤−1] iff
supported by the sensor or transmission medium. Thus a clipped signal
can be represented by a series of identical samples at the maximum or |𝑦̄[𝑡+𝑤,𝑡+𝑤+𝑛] − 𝑦̄[𝑡−𝑛−1,𝑡−1] | > 𝓁
minimum extent of the sample medium. Example: An attack that increases the amount of code execu-
Example: A partially blinding attack on a camera of a vehicle by tion will increase the power consumption of the Central Processing
emitting light can hide objects [18]. This light can exceed the input Unit (CPU), which can be observed as a Level Change [22–24]. Such
range of the camera and would appear as clipped. This attack is an an attack could be an example of ⟨CAPEC-175: Code Inclusion⟩, or
example of ⟨CAPEC-607: Obstruction⟩ [12]. Such blinding light attacks ⟨CAPEC-242: Code Injection⟩ [12]. Many attacks with this profile can
could be detected by monitoring for Clipping anomalies. be detected by monitoring the power consumption of the CPU for Level
Change anomalies.
3
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Fig. 1. Time Series Anomaly Symptoms. The thick red line below the signal indicates the period of the described anomaly. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)
3.2. Event-series anomaly symptoms then the change in event frequency may indicate an anomaly. Given
a trace 𝑇 ∈ T, an event name 𝜇 ∈ , a window size 𝑤 ∈ 𝑙𝑜𝑐𝑘, and
Symptoms of anomalies also appear in event series, which are a threshold 𝜖, an event frequency change may be defined as ∃𝑡1 , 𝑡2 , 𝑡3 ∶
defined as a sequence of discrete events rather than a continuous signal. 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡1 , 𝑡2 )) − 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡2 , 𝑡3 )) > 𝜖.
An event series represents a trace of the execution of an automaton Example: Lin and Siewiorek introduced their Dispersion Frame
where each event describes a state transition. Event series differ from Technique (DFT) to predict hardware failures [25]. From analyzing
discretized continuous signals in that their events are not required to the logs of file servers, they observed that there exists a period of an
occur at regular time intervals and they may carry more complex data increasing rate of intermittent errors before most hardware failures.
than only a single real value. Many such failures could be detected by monitoring error reports for
Event Frequency Change anomalies.
3.2.1. Event frequency change
When events of the same name are periodic or semi-periodic, they
3.2.2. Unexpected event
have fairly consistent inter-arrival times and, by extension, frequency.
Most traces only contain events with a limited vocabulary of event
When that frequency changes suddenly, it can be a symptom of a system
names. While events themselves are unique, due to their varying clock
anomaly. Similarly, when the frequency of all events in a trace change
times, the event names are repeated many times. When an event occurs
suddenly, it may be due to an anomaly.
in a trace with a name that has not come earlier in the trace, it may be
The inter-arrival time of an event is the difference between the
a symptom of an anomaly.
clock times of successive events of the same name. It can be thought of
as the period of the event. More precisely, given a trace 𝑇 ∈ T, an Given a trace 𝑇 ∈ T, an unexpected event may be defined as an
event name 𝜇 ∈ , and a non-empty time interval defined by the event (𝜇, 𝑡1 , ⋅) ∈ 𝑇 ∶ ∄(𝜇, 𝑡2 , ⋅) ∈ 𝑇 where 𝑡2 < 𝑡1 . That is, we can think
end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡1 < 𝑡2 , the inter-arrival time is defined of an unexpected event as the first event with a new name.
as 𝑖𝑛𝑡𝑒𝑟𝐴𝑟𝑟𝑖𝑣𝑎𝑙(𝜇, (𝑡1 , 𝑡2 )) ≜ ((max 𝑡 ∶ (𝜇, 𝑡, ⋅) ∈ 𝑆) − (min 𝑡 ∶ (𝜇, 𝑡, ⋅) ∈ By this definition, however, most events at the beginning of a trace
𝑆))∕(|𝑆| − 1) where 𝑆 = {(𝜇, 𝑡, ⋅) ∈ 𝑇 ∶ 𝑡1 ≤ 𝑡 ≤ 𝑡2 }. will be considered unexpected. To solve this problem, we can restate
Event frequency measures how often events occur in a given time the definition in terms of the probability that an event occurs. Given a
span. It is given as the inverse of inter-arrival time, or given a trace trace 𝑇 ∈ T and a threshold 𝜖, an unexpected event may be defined as
𝑇 ∈ T, an event name 𝜇 ∈ , and a non-empty time interval defined an event 𝑒 ∈ 𝑇 ∶ (𝑒 ∈ 𝑇 ) < 𝜖.
by the end points 𝑡1 , 𝑡2 ∈ 𝑙𝑜𝑐𝑘 ∶ 𝑡1 < 𝑡2 , the event frequency of 𝜇 is Example: Bellovin reported receiving broadcast packets meant
given as 𝑒𝑣𝑒𝑛𝑡𝐹 𝑟𝑒𝑞(𝜇, (𝑡1 , 𝑡2 )) ≜ 1∕𝑖𝑛𝑡𝑒𝑟𝐴𝑟𝑟𝑖𝑣𝑎𝑙(𝜇, (𝑡1 , 𝑡2 )). for local networks, requests to unused ports, and requests to un-
A sudden change in event frequency, then, is when the first deriva- occupied addresses over the public Internet at AT&T in his classic
tive of event frequency is high. A rapid change in event frequency can whitepaper [26]. These types of requests are examples of ⟨CAPEC-169:
be found by taking the difference between successive time intervals Footprinting⟩ [12] and they could be detected by monitoring network
(or windows) in the trace. If the difference exceeds some threshold, traffic for Unexpected Event anomalies.
4
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
5
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Table 1
Redis latency results in seconds.
Freq. (Hz)/Packet size 256 B 1 KB 100 KB 1 MB
6
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Palisade currently includes more than 20 example anomaly detec- 4.6.2. Event series example detectors
tors. All of these detectors are based on existing methods and are • HMM is similar to the SAX + HMM detector, except that the input
distributed with Palisade to provide out-of-the-box detection of the is already composed of events that represent state change instead
anomaly symptoms listed in Section 3. Tables 2 and 3 show detectors in of continuous samples of a signal, so there is no need to transform
Palisade and the anomaly symptoms they are capable of detecting. Ta- them using SAX. Since events have a variable time between them,
ble 2 shows the detectors for continuous-signal anomaly symptoms (see the HMM can consider the time between events, as well as the
Section 3.1) while Table 3 shows the detectors for event-series anomaly type of the event when evaluating the likelihood of the sequence
symptoms (see Section 3.2). In both tables the left-most column shows against the learned model.
the detectors while the top-most row shows the anomaly symptoms. • Nfer is a recently introduced language and system for inferring
The intersection of row and column contains a check mark (✓) if the event stream abstractions [44,45] that utilizes a syntax based on
detector is sensitive to the anomaly symptom and is blank otherwise. Allen’s Temporal Logic (ATL) [46]. Nfer transforms an event
stream that represent state transitions into a series of intervals
For some detectors, the capability to detect an anomaly symptom
that represent state. These intervals create a hierarchy of abstrac-
depends on the magnitude of the symptom. For example, the spike
tions that simplify human and machine trace comprehension. If
detector can detect the noise symptom as long as the noise falls outside
an interval has been designated as anomalous, its generation is
of the variance of the previous time window. Even though multiple considered a symptom of an anomaly. Palisade supports both
detectors may be able to detect the same anomaly symptom, they hand-written and mined nfer specifications [47].
complement each other by detecting it in different situations. The • SiPTA uses the expected periodicity in events from embedded
distributed nature of Palisade allows running multiple detectors in systems to apply signal processing techniques to compare the
parallel, increasing the robustness of the system (we show such a input traces to non-anomalous data. For more information, see
situation in Section 6). Zedah et al.’s 2014 paper [48].
Below is a brief description of each of the example detectors. For • TPG trains a Task Precedence Graph based on a non-anomalous
the more complex detectors, see the relevant citations for a detailed event stream. This method exploits the periodicity of tasks ex-
explanation of the algorithm. ecuted in an embedded system. If the input event stream does
7
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Table 2
Example detectors and their detected continuous-signal anomaly symptoms.
Spectrum detect
Autoencoders
Range check
Spike detect
Clip detect
Sixnum
Name
Spike ✓ ✓ ✓ ✓ ✓
S-Wave ✓ ✓ ✓ ✓ ✓
Drifting ✓ ✓ ✓ ✓ Fig. 5. Overview of the software/hardware organization in the autonomous car.
Noise ✓ ✓ ✓
Clipping ✓ ✓ ✓
Loss ✓ ✓ ✓ ✓ ✓ ✓
Smoothing ✓ ✓ ✓ stack. The output of the autonomy stack (control commands) is sent,
Amplification ✓ ✓ ✓ ✓
for example, to actuators controlling the steering and brakes. Two
Level change ✓ ✓ ✓ ✓
Frequency change ✓ ✓
Renesas automotive computers were installed on the vehicle to run the
Echo/Reflection ✓ autonomous driving software. Each computer was equipped with two
Systems-on-Chips (SoCs) with multiple ARM CPU cores and a single
ASIL-D certified Micro-controller Unit (MCU).
Table 3
The autonomous driving software was built using Robot Operating
Example detectors and their detected event-series anomaly symptoms.
System (ROS). ROS is an open source framework for robotic application
TRE detector [50] development in C++ and Python for POSIX-based Operating Systems
nfer [44,45,47]
TPG [49]
4.7. Fault handling The autonomous vehicle in the case study reports its current gear in
messages published to the _vehicle_gear_report topic. The val-
Palisade is designed to detect anomalies in streams of data from ues reported in the data item _vehicle_gear_report_cmd_gear
remote systems, not in its own operation. However, Palisade includes reflect the requested shift position of the automatic transmission, and
some failure handling capabilities. All nodes in Palisade are monitored the values in _vehicle_gear_report_state_gear reflect the
by a system service and restarted in the presence of a failure. Detector reported shift position. Both values are encoded as integers with the
node failures are interpreted as though an anomaly has been reported following mapping: {0: No change, 1: Park, 2: Reverse, 3: Neutral, 4:
by the failed detector, including failures to respond in a configurable Drive, 5: Low}.
time window. This can lead to false-positives, but it is a simple mech- The autonomous driving software sends gear change requests over
anism to alert operators to a situation that deserves their attention. an Internet Protocol (IP) network to a separate controller that interfaces
Source node failures are interpreted as loss or period of silence anomaly with the vehicle. The controller then converts the requests into CAN
symptoms by the relevant detectors, and will be reported as anomalies. messages that the vehicle transmission understands, and also converts
Sink node failure handling varies depending on the node, but many are messages from the transmission back into IP packets sent to the driving
obvious (GUI failures) or fail-warn (alarm systems). software. Messages on the _vehicle_gear_report_cmd_gear
channel are only sent when the software requests a change. The request
5. Case study 1: Autonomous vehicle is repeated over a short interval until a message is received on the
_vehicle_gear_report_state_gear channel reporting that the
We evaluated the performance and applicability of Palisade using new gear has been reached. Conversely, the transmission regularly
the University of Waterloo’s autonomous car as a case study. The reports its current gear on the _vehicle_gear_report_state
vehicle was a 2016 Lincoln MKZ fitted with a range of sensor arrays _gear channel regardless of whether or not it has recently changed.
including LiDAR, a GPS receiver, Inertial Measurement Units (IMUs), The reports of the current gear from the transmission exhibit the
cameras, and radars. Fig. 5 shows an overview of the software and Sampled Value Anomaly Symptom of Noise. When the value of the gear
hardware organization in the vehicle. Sensors, such as LiDAR and is stable, the signal appears to be nominal. However, when the gear
cameras, produce data that is the input of the autonomy software is changing, the signal varies wildly before finally stabilizing on the
8
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
9
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Table 4
Results of comparing Palisade with Siddhi.
Round-trip latency Siddhi Palisade
Mean (lower is better) 60.6 ms 1.58 ms
Standard deviation (lower is better) 20.9 ms 0.50 ms
10
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Fig. 9. Car positioning with four inserted anomalies and the anomaly detection points
(Spike and Clipping detector).
11
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
Table 5
Results of comparing Palisade with Beep Beep 3.
Round-trip latency Beep Beep 3 Palisade
Mean (lower is better) 2.87 ms 2.85 ms
Standard deviation (lower is better) 3.2 ms 0.66 ms
12
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
(written in Python) is 144.25 lines (including comments). Autoencoder and we hope that our work motivates others to design comparable
is the longest processor node with 601, while Clip detector is the tools.
shortest with 72 lines. The Processor abstract base class has 239 lines. An important design decision in Palisade regards the copying of
The nfer detector, which uses the nfer Python API, has only 42 lines messages instead of passing message IDs. Once data arrives into a
of Python code. channel, Redis copies the messages to all nodes that subscribed to
As discussed in Section 4.4, processor nodes are independent mod- that specific channel. Another approach, found in Zero-Copy message
ules that share infrastructure from a base class. Editing a processor protocols [67] for example, would be to pass just the message ID to
node has no effect on upstream or sibling processor nodes. Only nodes all destination nodes. The ID would then be used to access a central
dependent on the output of the edited node may themselves require database to retrieve the data. When most nodes require the data,
editing. however, the ID passing approach causes a performance bottleneck
Constructing a new source node does not affect other source nodes due to access serialization at the central database (increased latency).
in the system. Only processor nodes that will be subscribing to a new We assume that a node that subscribes to a channel needs the data
source may need adjustment, and then only if the new source differs on that channel, so the message copying approach reduces latency
from those that already exist. Adding a new processor node has a lesser while not affecting the processing or memory requirements. This is a
impact than editing one, as no downstream nodes should be affected, reasonable assumption given that it only requires different types of data
typically. Instead a new processor node can be added to the system to be assigned separate channels. The ID passing approach is usually
without a single modification to any other component. used in micro-service architectures and is preferable when the target
For adding a new processor node, we consider the basic code to application needs all of the data (good for batch processing) [68].
extend the base class. A new processor node requires at least 24 lines
of code in Python. Obviously, the total number of lines depends on the 7.3. Anomaly detection
complexity of the algorithm, but the processor abstract base class makes
extending Palisade straightforward. The multiple anomalies detected by different processors can be
We compared the extensibility of Palisade with the CEP/Runtime compared against each other to verify anomalies and thereby decrease
Verification (RV) system Beep Beep 3 In Section 6.3. While it was possi- the false positive rate of anomaly detection by Palisade (this could be
ble to build processors in Beep Beep 3 that mimicked those in Palisade, done by a voter sink node, for instance). There are also cases where
they required tight coupling between components. In Beep Beep 3, some anomalies are detected by only a subset of the detectors. Palisade
constructing a new processor or sink node would require modifying the covers these cases as a variety of detectors can be integrated with low-
other nodes. Palisade’s loose coupling between components means that development effort (due to our design choices — command channel,
these similar modifications are not required to support new or modified abstract base class, and data formats).
nodes. The choice of parameters in the detector nodes plays a central
Palisade can be used in any embedded system that provides a role in the efficiency of such detectors. In our experiments, we varied
network interface. As the core Palisade functionality is built around the detector parameters until we found a configuration without false
the Redis publish–subscribe interface, any system that has a network positives (Sections 5 and 6). This was possible because we could repeat
interface can send data directly to Redis or to a server that then sends the execution of the detectors several times. When the execution cannot
to Redis. Also, RESP is simple and would be easy to port to an embed- be repeated, we suggest tuning the detector parameters using a system
ded system without Linux support. Consequently, we believe that the simulation.
integration of Palisade with any embedded system is a straightforward
task. 8. Related work
7.2. Performance evaluation This section describes existing work related to Palisade. We divide
the discussion by subject area: anomaly detection, Information Flow
Palisade is built for low latency anomaly detection and this is Processing (IFP) systems, anomaly detection with streaming frame-
evident from our comparisons with other frameworks. In the case study works, offline frameworks, and outdated or commercial frameworks.
evaluation in Section 5.3, an nfer Palisade processor detected anoma- Few existing works combine the central features of Palisade: online,
lies over 35 times faster than a comparable detector using the CEP distributed anomaly detection for both time series and event streams.
system Siddhi. In the case study evaluation in Section 6.3, two Palisade Our work is motivated by the lack of options in this niche area.
processors detected anomalies with similar latency to a comparable
detector using Beep Beep 3, which required tight coupling between 8.1. Anomaly detection
components and a using a complicated API for performing simple data
comparisons. Anomaly detection, sometimes called outlier detection, attempts to
We looked for other appropriate frameworks to compare against find unexpected or non-conforming patterns in data [5–10]. Anomalies
Palisade detectors such as the Autoencoder processor, but we discov- are distinct from noise, in that noise is not of interest and hinders
ered that no such framework exists. It does not make sense to compare analysis. The output of an anomaly detector may be either a score or
Palisade’s performance against a framework which does not support a label, but the purpose is always to provide a verdict on whether an
many of the same core features or which is unusable in the same anomaly was detected at a given time.
environments. Frameworks such as Extendible and Generic Anomaly Anomaly detection has appeared in statistics literature for many
Detection System (EGADS) [59] and Datastream.io [60] only support decades [69,70], but more recently it has found application and been
CSV input for offline detection, while Palisade operates online. Other studied in other fields. In healthcare, anomaly detection is used to look
frameworks like Esper [61] and TeSSLa [62] support online stream for cardiac irregularities that might indicate heart failure or patterns
processing, but lack support for distributing processors over a network. of disease outbreak [71,72]. In computer network security, anomaly
Detectors like Hogzilla [63], StreamMill [64], and NiagaraCQ [65] are detection is widely used in intrusion detection systems to look for
abandoned projects that cannot be installed. Others, like Thirdeye [66], suspicious activity [43,73,74]. Banks, insurance companies, and ad-
can only be run in a cloud environment, making them ill-suited for vertising firms, among others, employ anomaly detection to search for
latency comparisons. The lack of online, streaming, distributed, locally instances of fraud [75–77]. Heavy industry and safety-critical systems
runnable anomaly detection frameworks shows the need for Palisade, operators like airlines use anomaly detection for equipment damage
13
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
14
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
15
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
[20] Y. Mo, B. Sinopoli, Secure control against replay attacks, in: 2009 47th Annual [51] S.K.M.D.G.G.W.K.N.B.S. Fischmeister, Palisade: A Framework for Anomaly De-
Allerton Conference on Communication, Control, and Computing, Allerton, 2009, tection in Embedded Systems Dataset, IEEE Dataport, 2020, http://dx.doi.org/
pp. 911–918. 10.21227/44z5-9k90.
[21] A. Bolshev, J. Larsen, M. Krotofil, R. Wightman, A rising tide: Design exploits in [52] S. Suhothayan, K. Gajasinghe, I. Loku Narangoda, S. Chaturanga, S. Perera,
industrial control systems, in: 10th USENIX Workshop on Offensive Technologies, V. Nanayakkara, Siddhi: A second look at complex event processing archi-
WOOT 16, USENIX Association, Austin, TX, 2016, pp. 1–11. tectures, in: Proceedings of the 2011 ACM Workshop on Gateway Computing
[22] C. Moreno, S. Fischmeister, M.A. Hasan, Non-intrusive program tracing and Environments, in: GCE ’11, ACM, New York, NY, USA, 2011, pp. 43–50, http:
debugging of deployed embedded systems through side-channel analysis, in: Proc. //dx.doi.org/10.1145/2110486.2110493.
of the 14th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and [53] Siddhi, Website, 2019, https://siddhi.io/, (Accessed: Jun 2020).
Tools for Embedded Systems, LCTES, ACM, New York, USA, 2013, pp. 77–88. [54] D. Shin, A Platform for Generating Anomalous Traces Under Cooperative Driving
[23] C. Moreno, S. Kauffman, S. Fischmeister, Efficient program tracing and moni- Scenarios, UWSpace, University of Waterloo, 2018, URL http://hdl.handle.net/
toring through power consumption – with a little help from the compiler, in: 10012/13534.
Design, Automation & Test in Europe Conference & Exhibition, in: DATE ’16, [55] M. Roudjane, D. Rebaïne, R. Khoury, S. Hallé, Real-time data mining for
IEEE, 2016, pp. 1556–1561, http://dx.doi.org/10.3850/9783981537079_0829. event streams, in: 2018 IEEE 22nd International Enterprise Distributed Object
[24] S. Kauffman, C. Moreno, S. Fischmeister, Static transformation of power Computing Conference, EDOC, 2018, pp. 123–134, http://dx.doi.org/10.1109/
consumption for software attestation, in: 22nd International Conference on EDOC.2018.00025.
Embedded and Real-Time Computing Systems and Applications, in: RTCSA ’16, [56] S. Hallé, Event Stream Processing with BeepBeep 3: Log Crunching and Analysis
IEEE, 2016, pp. 188–194, http://dx.doi.org/10.1109/RTCSA.2016.45. Made Easy, Presses de l’Université du Québec, 2018.
[25] T.T.Y. Lin, D.P. Siewiorek, Error log analysis: statistical modeling and heuristic
[57] R. Kazman, L. Bass, M. Webb, G. Abowd, SAAM: A method for analyzing the
trend analysis, IEEE Trans. Reliab. 39 (4) (1990) 419–432.
properties of software architectures, in: Proceedings of the 16th International
[26] S.M. Bellovin, Packets found on an internet, SIGCOMM Comput. Commun. Rev.
Conference on Software Engineering, IEEE Computer Society Press, 1994, pp.
23 (3) (1993) 26–31.
81–90.
[27] A. Haque, A. DeLucia, E. Baseman, Markov Chain modeling for anomaly
[58] R. Kazman, M. Klein, P. Clements, ATAM: Method For Architecture Evaluation,
detection in high performance computing system logs, in: Proceedings of the
Tech. Rep., Carnegie-Mellon Univ Pittsburgh PA Software Engineering Inst, 2000.
Fourth International Workshop on HPC User Support Tools, in: HUST’17, ACM,
[59] Y. EGADS, Website, 2019, https://github.com/yahoo/egads, (Accessed: Sep
New York, USA, 2017, pp. 3:1–3:8.
2019).
[28] H. Stark, Image Recovery: Theory and Application, Elsevier, 1987.
[29] F. Marvasti, M. Analoui, M. Gamshadzahi, Recovery of signals from nonuniform [60] Datastream.io, Website, 2019, https://blog.ment.at/datastream-io-open-source-
samples using iterative methods, IEEE Trans. Signal Process. 39 (4) (1991) anomaly-detection-64db282735e0, (Accessed: Sep 2019).
872–878. [61] E. Tech, Website, 2019, http://www.espertech.com/, (Accessed: Jun 2020).
[30] K. Sauer, J. Allebach, Iterative reconstruction of bandlimited images from [62] L. Convent, S. Hungerecker, M. Leucker, T. Scheffel, M. Schmitz, D. Thoma,
nonuniformly spaced samples, IEEE Trans. Circuits Syst. 34 (12) (1987) TeSSLa: Temporal stream-based specification language, in: T. Massoni, M.R.
1497–1506. Mousavi (Eds.), Formal Methods: Foundations and Applications, Springer Inter-
[31] H. Feichtinger, Discretization of convolutions and the generalized sampling prin- national Publishing, Cham, 2018, pp. 144–162, http://dx.doi.org/10.1007/978-
ciple, in: Progress in Approximation Theory, Academic Press, Boston, Toronto, 3-030-03044-5_10.
1991, pp. 333–345. [63] H. ids Data, Website, 2020, http://ids-hogzilla.org/, (Accessed: Jun 2020).
[32] K. Wang, S.J. Stolfo, Anomalous payload-based network intrusion detection, in: E. [64] H. Thakkar, B. Mozafari, C. Zaniolo, Designing an inductive data stream
Jonsson, A. Valdes, M. Almgren (Eds.), Recent Advances in Intrusion Detection, management system: the stream mill experience, in: Proceedings of the 2nd
Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp. 203–222. International Workshop on Scalable Stream Processing System, ACM, 2008, pp.
[33] J. Rumbaugh, I. Jacobson, G. Booch, Unified Modeling Language Reference 79–88.
Manual, second ed., Pearson Higher Education, 2004. [65] J. Chen, D.J. DeWitt, F. Tian, Y. Wang, NiagaraCQ: A scalable continuous
[34] M. Dunne, G. Gracioli, S. Fischmeister, A comparison of data streaming frame- query system for internet databases, in: Proceedings of the 2000 ACM SIGMOD
works for anomaly detection in embedded systems, in: Proceedings of the International Conference on Management of Data, in: SIGMOD ’00, Association
1st International Workshop on Security and Privacy for the Internet-of-Things, for Computing Machinery, New York, NY, USA, 2000, pp. 379–390, http://dx.
IoTSec, Orlando, FL, USA, 2018, pp. 30–33. doi.org/10.1145/342009.335432.
[35] M.Q. Rabbit, Website, 2018, http://www.rabbitmq.com/, (Accessed: Jan 2018). [66] T. Data, Website, 2019, https://thirdeyedata.io/, (Accessed: Sep 2019).
[36] A. Kafka, Website, 2018, https://kafka.apache.org/, (Accessed: Jan 2018). [67] B.P. Swenson, G.F. Riley, A new approach to zero-copy message passing with re-
[37] NATS, Website, 2018, https://nats.io/, (Accessed: Jan 2018). versible memory allocation in multi-core architectures, in: 2012 ACM/IEEE/SCS
[38] ECMA, The JSON Data Interchange Syntax 2nd Edition, ECMA, (404) European 26th Workshop on Principles of Advanced and Distributed Simulation, 2012, pp.
Computer Manufacturers Association, Geneva, Switzerland, 2017. 44–52.
[39] Json.org, 2018, http://json.org/, (Accessed: 25 May 2018). [68] N. Dragoni, S. Giallorenzo, A.L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, L.
[40] G.P.B. Documentation, Website, 2020, https://developers.google.com/protocol- Safina, Microservices: yesterday, today, and tomorrow, in: Present and Ulterior
buffers/docs/overview#extensions, (Accessed: Jun 2020). Software Engineering, Springer, 2017, pp. 195–216.
[41] M. Kramer, Nonlinear principal component analysis using autoassociative neural
[69] F. Edgeworth, Xli. on discordant observations, London Edinb. Dublin Phil. Mag.
networks, Am. Inst. Chem. Eng. 37 (2) (1991) 233–243.
J. Sci. 23 (143) (1887) 364–375.
[42] J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series,
[70] J. Pickands, Statistical inference using extreme order statistics, Ann. Statist. 3
with implications for streaming algorithms, in: Proceedings of the 8th ACM
(1) (1975) 119–131, URL http://www.jstor.org/stable/2958083.
SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,
[71] A.L. Goldberger, L.A. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
in: DMKD ’03, ACM, New York, NY, USA, 2003, pp. 2–11.
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit,
[43] C. Warrender, S. Forrest, B. Pearlmutter, Detecting intrusions using system calls:
and PhysioNet: components of a new research resource for complex physiologic
alternative data models, in: Proceedings of the 1999 IEEE Symposium on Security
signals, Circulation 101 (23) (2000) e215–e220.
and Privacy, 1999, pp. 133–145.
[72] W.-K. Wong, A.W. Moore, G.F. Cooper, M.M. Wagner, Bayesian network anomaly
[44] S. Kauffman, K. Havelund, R. Joshi, Nfer–A notation and system for inferring
pattern detection for disease outbreaks, in: Proceedings of the 20th International
event stream abstractions, in: International Conference on Runtime Verification,
Conference on Machine Learning, ICML-03, 2003, pp. 808–815.
Springer, 2016, pp. 235–250.
[45] S. Kauffman, K. Havelund, R. Joshi, S. Fischmeister, Inferring event stream [73] D. Snyder, On-Line Intrusion Detection Using Sequences of System Calls, Florida
abstractions, Form. Methods Syst. Des. 53 (1) (2018) 54–82. State University, 2001.
[46] J.F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26 [74] B. Agarwal, N. Mittal, Hybrid approach for detection of anomaly network
(11) (1983) 832–843. traffic using data mining techniques, Proc. Technol. 6 (2012) 996–1003,
[47] S. Kauffman, S. Fischmeister, Mining temporal intervals from real-time system http://dx.doi.org/10.1016/j.protcy.2012.10.121, 2nd International Conference
traces, in: Proceedings of the 6th International Workshop on Software Mining, on Communication, Computing & Security [ICCCS-2012].
SoftwareMining, Champaign, USA, 2017, pp. 1–8. [75] T. Fawcett, F. Provost, Activity monitoring: Noticing interesting changes in
[48] M.M.Z. Zadeh, M. Salem, N. Kumar, G. Cutulenco, S. Fischmeister, SiPTA: Signal behavior, in: Proceedings of the Fifth ACM SIGKDD International Conference
processing for trace-based anomaly detection, in: Proc. of the International on Knowledge Discovery and Data Mining, in: KDD ’99, ACM, New York, NY,
Conference on Embedded Software, EMSOFT, New Dehli, India, 2014, pp. 1–6. USA, 1999, pp. 53–62.
[49] O. Iegorov, S. Fischmeister, Mining task precedence graphs from real-time [76] S. Donoho, Early detection of insider trading in option markets, in: Proceedings
embedded system traces, in: 2018 IEEE Real-Time and Embedded Technology of the Tenth ACM SIGKDD International Conference on Knowledge Discovery
and Applications Symposium, RTAS, 2018, pp. 251–260. and Data Mining, in: KDD ’04, ACM, New York, NY, USA, 2004, pp. 420–429.
[50] A. Narayan, G. Cutulenco, Y. Joshi, S. Fischmeister, Mining timed regular [77] Ghosh, Reilly, Credit card fraud detection with a neural-network, in: 1994
specifications from system traces, ACM Trans. Embed. Comput. Syst. 17 (2) Proceedings of the Twenty-Seventh Hawaii International Conference on System
(2018) 46:1–46:21. Sciences, Vol. 3, 1994, pp. 621–630.
16
S. Kauffman et al. Journal of Systems Architecture 113 (2021) 101876
[78] E. Keogh, J. Lin, S.-H. Lee, H.V. Herle, Finding the most unusual time series Murray Dunne is a Ph.D. student in the Real-time Em-
subsequence: algorithms and applications, Knowl. Inf. Syst. 11 (1) (2007) 1–27. bedded System Lab (RESL) at the University of Waterloo
[79] S. Basu, M. Meckesheimer, Automatic outlier detection for time series: an where he previously earned a Master of Mathematics. He
application to sensor data, Knowl. Inf. Syst. 11 (2) (2007) 137–154. received his BSc. in Computer Science from the University
[80] Y. Zhou, H. Zou, R. Arghandeh, W. Gu, C.J. Spanos, Non-parametric outliers of Victoria. His research interests include cyber security,
detection in multiple time series a case study: Power grid data analysis, in: machine learning, embedded, and distributed systems.
Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 4605–4612.
[81] T. Pevný, Loda: Lightweight on-line detector of anomalies, Mach. Learn. 102 (2)
(2016) 275–304.
[82] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms,
Wiley-Interscience, New York, NY, USA, 2004.
[83] M. Weber, G. Wolf, E. Sax, B. Zimmer, Online detection of anomalies in vehicle
signals using replicator neural networks, in: 6th Embedded Security in Cars USA,
Escar USA, 2018, pp. 1–14. Giovani Gracioli received his Ph.D. in Automation and
[84] G. Cugola, A. Margara, Processing flows of information: From data stream to Systems Engineering from the Federal University of Santa
complex event processing, ACM Comput. Surv. 44 (3) (2012) 15. Catarina (UFSC), Brazil in 2014. He currently is assis-
[85] C. Cranor, T. Johnson, O. Spataschek, V. Shkapenyuk, Gigascope: a stream tant professor at UFSC and his research interests include
database for network applications, in: Proceedings of the 2003 ACM SIGMOD real-time, embedded, and operating systems.
International Conference on Management of Data, ACM, 2003, pp. 647–651.
[86] Triceps: An innovative CEP, 2017, http://triceps.sourceforge.net/, (Accessed: 18
October 2017).
[87] M.A. Lopez, A.G.P. Lobato, O.C.M.B. Duarte, A performance comparison of
open-source stream processing platforms, in: 2016 IEEE GLOBECOM, 2016, pp.
1–6.
[88] M. Solaimani, M. Iftekhar, L. Khan, B. Thuraisingham, J. Ingram, S.E. Seker,
Online anomaly detection for multi-source VMware using a distributed streaming
framework, Softw. Pract. Exper. 46 (11) (2016) 1479–1497. Waleed Khan is a Ph.D. student in the Real-time Embedded
[89] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, D. Gunopulos, System Lab (RESL) at the University of Waterloo. Prior to
Online outlier detection in sensor data using non-parametric models, in: Proc. becoming a Ph.D. student, he also received his Bachelor’s
of the 32Nd VLDB, 2006, pp. 187–198. degree in Mechatronics Engineering and his Master’s degree
[90] Y. Du, J. Liu, F. Liu, L. Chen, A real-time anomalies detection system based on in Science at the University of Waterloo. His research
streaming technology, in: 2014 Sixth IHMSC, Vol. 2, 2014, pp. 275–279. interests include embedded systems, cyber security and IoT.
[91] W. Shi, Y. Zhu, T. Huang, G. Sheng, Y. Lian, G. Wang, Y. Chen, An integrated
data preprocessing framework based on Apache Spark for fault diagnosis of
power grid equipment, J. Signal Process. Syst. 86 (2) (2017) 221–236.
[92] F. Song, Y. Diao, J. Read, A. Stiegler, A. Bifet, EXAD: A system for explainable
anomaly detection on big data traces, in: 2018 IEEE International Conference on
Data Mining Workshops, ICDMW, 2018, pp. 1435–1440.
[93] S. Hallé, When RV meets CEP, in: International Conference on Runtime
Verification, Springer, 2016, pp. 68–91. Nirmal Benann graduated with a Master of Science degree
[94] G. Frankowski, M. Jerzak, M. Miłostan, T. Nowak, M. Pawłowski, Application from the Real-time Embedded System Lab (RESL) at the Uni-
of the complex event processing system for anomaly detection and network versity of Waterloo. Prior to becoming a Master’s student,
monitoring, Comput. Sci. 16 (4) (2015) 351–371, http://dx.doi.org/10.7494/csci. he received his Bachelor’s degree in Computer Engineering
2015.16.4.351. at the University of Waterloo. His research interests include
[95] H. Zhang, Y. Diao, N. Immerman, Recognizing patterns in streams with imprecise embedded systems, parallel and distributed computing.
timestamps, Proc. VLDB Endow. 3 (1–2) (2010) 244–255, http://dx.doi.org/10.
14778/1920841.1920875.
[96] A. Demers, J. Gehrke, M. Hong, M. Riedewald, W. White, Towards expressive
publish/subscribe systems, in: Y. Ioannidis, M.H. Scholl, J.W. Schmidt, F.
Matthes, M. Hatzopoulos, K. Boehm, A. Kemper, T. Grust, C. Boehm (Eds.),
Advances in Database Technology - EDBT 2006, Springer Berlin Heidelberg,
Berlin, Heidelberg, 2006, pp. 627–644, http://dx.doi.org/10.1007/11687238_38.
[97] GEM, Website, 2020, https://www.softwareag.com/gem/default.html, (Accessed: Sebastian Fischmeister received his Ph.D. degree in com-
Jun 2020). puter science from the University of Salzburg, Salzburg,
Austria, in 2002. He is an Associate Professor with the De-
Sean Kauffman is a Ph.D. candidate in the Real-time partment of Electrical and Computer Engineering, University
Embedded System Lab (RESL) at the University of Waterloo of Waterloo, Waterloo, ON, Canada. His primary application
where he previously earned a Master of Mathematics. He domains include automotive systems, aerospace, and medi-
worked as a software engineer on in industry for eight cal devices. His research interests include systems research
years after receiving a Bachelor of Arts in Computer Science in safety-critical software and embedded networking of
from Goshen College. His research interests include runtime time-sensitive systems.
verification, specification mining, and functional safety.
17