Operating System Noise in The Linux Kernel
Operating System Noise in The Linux Kernel
1, JANUARY 2023
Abstract—As modern network infrastructure moves from hardware-based to software-based using Network Function Virtualization,
a new set of requirements is raised for operating system developers. By using the real-time kernel options and advanced CPU isolation
features common to the HPC use-cases, Linux is becoming a central building block for this new architecture that aims to enable a new
set of low latency networked services. Tuning Linux for these applications is not an easy task, as it requires a deep understanding of the
Linux execution model and the mix of user-space tooling and tracing features. This paper discusses the internal aspects of Linux that
influence the Operating System Noise from a timing perspective. It also presents Linux’s osnoise tracer, an in-kernel tracer that
enables the measurement of the Operating System Noise as observed by a workload, and the tracing of the sources of the noise, in
an integrated manner, facilitating the analysis and debugging of the system. Finally, this paper presents a series of experiments
demonstrating both Linux’s ability to deliver low OS noise (in the single-digit ms order), and the ability of the proposed tool to provide
precise information about root-cause of timing-related OS noise problems.
Index Terms—Linux kernel, operating system noise, high-performance computing, soft real-time systems
are not a side effect of the operating system, it is impossible (for example, a constrained-deadline sporadic task [20]) will
to observe the events via trace. This noise observed by the complete within the deadline, under a partitioned schedul-
workload but not by the trace creates a grey area that often ing setting where workloads are considered for each proces-
misleads the analysis. sor separately. To this end, the classical worst-case response
time equation can be leveraged
3.1 Proposed Approach X
In this paper, we propose an integrated tracing and syn- Ri ¼ ei þ hh ðRi Þ eh ; (1)
thetic workload solution that aims to join workload- and th 2hpi
tracing-based approaches benefits while minimizing the where eh is the worst-case execution time (WCET) of t h ,
drawbacks of each solution. hh ðDÞ is its arrival curve [21] bounding the maximum num-
The steps taken for such an approach include: ber of release events of t h in a time window3 of length D,
Define the composition of the OS Noise on Linux and the set hpi contains the higher-priority activities that
from the real-time HPC point of view; can interfere with the thread t i under analysis.
Define the minimum set of tracing events to provide While using Equation (1) at design time is in principle
evidence of the root cause of each noise, at a limited possible, it is quite often hard. Indeed, in modern heteroge-
overhead; neous computing platforms, many design principles used
Create a synthetic workload aware of tracing, to increase average-case performance (e.g., complex cache
enabling an unambiguous correlation of the trace hierarchies [22], un-revealed memory controller poli-
and the noise; cies [23], out-of-order execution, etc.) are making it hard to
Make the approach production-ready, with a stan- obtain reliable WCET estimates for user threads. This is
dard and easy-to-use interface. even harder for OS threads and interrupt service routines,
for which also the arrival pattern is unknown, and therefore
3.2 OS Noise Composition for Real-Time HPC it is difficult to obtain an arrival curve.
Workload Adopting such computing platforms in a small subset of
This paper adopts the following generalized definition of strictly hard real-time systems, e.g., avionics, calls for com-
OS noise: prehensive solutions allowing to know all the parameters
involved in Equation (1), e.g., by leveraging static analysis
Definition 1 (Generalized (OS)-Noise). The OS noise is tools for WCET estimation [24].
defined as all the time spent by a CPU executing instructions However, most real-time systems are robust enough to
not belonging to a given application task assigned to that CPU tolerate small uncertainties in the estimation of the parame-
while the task is ready to run. ters, and they can tolerate a small amount of deadline
misses [25] (e.g., in multimedia [26]).
The definition generalizes the usual interpretation of OS
In these cases, osnoise can be used to empirically mea-
noise, which typically only includes OS-related activities
sure the high-priority interference in Equation (1). For
and overheads, by accounting also for the time used by any
example, to estimate the high-priority interference faced by
interfering computational activity, not limited to the OS but
a NFV workload running at a given priority under
also from regular user-space threads. This makes a differ-
SCHED_FIFO (a common use-case), the system engineer
ence when multiple user threads can run in the same CPU,
can setup osnoise to run under SCHED_FIFO at the same
as any computational activity that can interfere with the
priority, thus exposing the measurement thread to the same
measurement thread would also interfere with any user
sources of noise.
thread running with the same scheduler and scheduler set-
tings (e.g., priority), irrespectively to whether it belongs to
the OS or not. Therefore, it would constitute an actual 4 RELATED WORK
source of noise which the fine-grained tuning of the system The adverse effects to workload performance due to the
needs to account for. operating system noise have been known for a long
This extended definition gives room for an interesting time [27]. One of the first works addressing the problem of
link between the OS noise, a metric from the HPC domain, detecting the OS noise is due to Petrini et al. [15], which
and the high-priority interference commonly considered in identified and eliminated some source of noise for an HPC
real-time systems theory. application running on the ASCI Q supercomputer. The
This generalizes the approach beyond the HPC and NFV study has been extended in a later paper [28]. Ferreira et al.
use cases, allowing to practically profile all the sources of [29] provided a characterization of application sensitivity to
interference that can affect a task running with a given con- the noise by injecting interference at the OS level.
figuration of the scheduler: for example, a thread running at As discussed at the beginning of the paper, Linux tools
a given priority under the fixed-priority scheduler of Linux. for detecting OS noise are divided into two categories:
Indeed, thanks to this generalized definition, the workload and trace-based methods.
osnoise tracer can be used not only to monitor the noise Some workload-based methods run micro-benchmarks
strictly-related to the operating system but all high-priority with a known duration, and they measure the difference
interference in a broader sense. between the expected duration of the microbenchmark and
Generalized (OS)-Noise Under Fixed-Priority Scheduling. As
a highly-relevant example, we consider the case in which a 3. e.g., if th is a sporadic thread with minimum inter-arrival time Ti ,
designer needs to determine whether a thread of interest t i it holds hh ðDÞ ¼ dD=Th e).
200 IEEE TRANSACTIONS ON COMPUTERS, VOL. 72, NO. 1, JANUARY 2023
the actual time needed to process it. For this category, one required by each interfering activity from its measured run-
relevant example is due to Sottile and Minnich [30], who time. The tool is not limited to a specific preemption model
proposed the Finite-Time Quantum (FTQ) benchmark. FTQ of Linux, and it can work with any of its preemption mod-
measured the number of basic operations done within a els, from the non-preemptive kernel to PREEMPT_RT.
fixed time quantum. Another work from Tsafrir et al. [31] Before discussing the internals, we present the tool at a
used microbenchmarks in conjunction with a mechanism high level. As mentioned, osnoise has two components:
based on “smart timers” to measure OS noise. the workload and the tracing components.
A widely used by practitioners workload-based tool is
sysjitter [32]. It measures OS noise by running a thread
on each CPU and keeping track of the duration of each time 5.1 The osnoise Workload Threads
interval in which the thread is not running, e.g., due to OS The osnoise workload threads used for measurements
activities. Other similar tools are oslat [33], jHicckup [34], work on a per-CPU basis.
and MicroJitterSampler [35]. However, all these tools are By default, osnoise creates a periodic kernel thread on each
not capable of pointing to the root causes of the OS noise. CPU. The kernel thread can be assigned to any Linux sched-
This problem can be solved with other tools, which use a uler, such as SCHED_DEADLINE, SCHED_FIFO, SCHED_RR, or
trace-based approach. For example, De et al. [36] presented CFS.
a tool to identify the sources of the OS jitter using kernel Each thread runs for a pre-determined amount of run-
instrumentation by leveraging OProfile. The authors time. The primary purpose of the workload thread is to
obtained statistics on preemptions and interruptions. Later, detect the time stolen from its execution, which is consid-
Morari et al. [16] proposed a different tool to measure the ered OS noise. Each osnoise thread works by reading the
OS noise but using a similar (but more extended) technique time in a loop. When it detects a gap between two consecu-
based on kernel instrumentation and building on the tive readings higher than a given tolerance threshold, a new
LTTng [37] tracer. With respect to [36], the work by Morari noise sample is collected. The time is read using the trace_-
et al. allows for capturing additional causes of the OS noise, local_clock() function. This architecture-specific non-
e.g., softirqs. Nataraj et al. [38] proposed another approach blocking function provides a lightweight CPU-level coher-
to instrument the Linux kernel and measure the OS noise ent timestamp, at the nanoseconds granularity, at the same
during application execution by using KTAU [39]. More accuracy used by other ftrace tracing mechanisms.
recent work is due to Gonzalez et al. [40], which presents Jit- The thread runs with preemption and IRQs enabled. This
ter-Trace, a tool that uses the information provided by the way, it can be preempted at any time by any task abstraction
perf tracer (see Section 5.3). However, all these trace-based present in Linux.
methods do not account for how workloads perceive the After runtime microseconds are elapsed since the first
noise. time read of the current period, the workload reports a sum-
Different from previous work, the osnoise tool pro- mary of the OS noise faced by the current activation. This
posed in this paper takes the best of both workload-based summary is reported using tracing features of Linux, as in
and trace-based methods, allowing to point to the root Fig. 2.
causes of the operating system noise while also accounting The osnoise summary reports:
for how the workload perceives the noise. RUNTIME IN US, i.e., the amount of time in ms in
And finally, but most relevantly, osnoise is the only which osnoise looped reading the timestamp.
tool for directly monitoring the OS noise that has been just NOISE IN US, i.e., the overall amount of noise in ms
recently made available in the mainline Linux kernel [41], observed in the associated runtime.
and hence it is ready-to-use on billions of devices. PERCENTAGE OF CPU AVAILABLE, i.e., the percent-
age of CPU available to the osnoise thread in the
measuring period.
5 THE OSNOISE TRACER MAX SINGLE NOISE IN US, i.e., the longest observed
This section presents the osnoise tracer, which leverages occurrence of noise in ms during the runtime.
the rules presented in Section 2.1 to correctly profile the exe- The interference counters: for each type of interference
cution time of each task by correctly subtracting the time among the classes NMI, IRQs, softirqs, and threads,
DE OLIVEIRA ET AL.: OPERATING SYSTEM NOISE IN THE LINUX KERNEL 201
Fig. 3. Example of tracepoints: IRQ and thread context switch events read from ftrace interface4.
osnoise maintains an interference counter that is possible to pre-process data in the tracepoints in such a way
increased in correspondence of an entry event of as to minimize the amount of data written to the trace
activity of that type. buffer. This method has shown good results, reducing the
It is worth noting that Fig. 2 shows a high number of tracing overhead when the trace processing presents lower
hardware noise samples: this is because osnoise was run- overhead than writing trace to the buffer [42].
ning on a virtual machine, and the interference due to virtu- The osnoise tracer leverages the current tracing infra-
alization is detected as hardware noise. structure in two ways. It adds probes to existing tracepoints
to collect information and adds a new set of tracepoints
5.2 The osnoise Parameters with pre-processed information.
The osnoise tracer has a set of parameters. These options Linux already has tracepoints that intercept the entry and
are accessible via ftrace’s interface, and they are: exit of IRQs, softirqs, and threads. osnoise attaches a
probe to all entry and exit events and uses it to: 1) account
osnoise/cpus: CPUs on which a osnoise thread for the number of times each of these classes of tasks added
s will execute. noise to the workload; 2) to compute the value of the inter-
osnoise/period_us: the period ( in ms) of the ference counter used by the workload to identify how many
osnoise thread s. interferences occurred between two consecutive reads of
osnoise/runtime_us: how long ( in ms) an the time;5 3) to compute the execution time of the current
osnoise thread s will look for noise occurrences. interfering task; 4) to subtract the noise occurrence duration
osnoise/stop_tracing_us: stop the system of a preempted noise occurrence by leveraging the rules dis-
tracing if a single noise occurrence higher than the cussed in Section 2.1.
configured value in ms happens. Writing 0 disables At the exit probe of each of these interference sources, a
this option. single tracepoint from osnoise is generated, reporting the
osnoise/stop_tracing_total_us: stop the noise-free execution time of the task’s noise observed via trace.
system tracing if total noise occurrence higher than In addition to the tracepoints and the summary at the
the configured value in ms happens. Writing 0 dis- end of the period, the osnoise workload emits a tracepoint
ables this option. anytime a noise is identified. This tracepoint informs about
tracing_threshold: the minimum delta between the noise observed via workload, and the amount of interfer-
two time reads to be considered as a noise occur- ence that happened between the two consecutive time
rence, in ms. When set to 0, the default value will will reads. The interference counters are fundamental to unam-
be used, which is currently five ms. biguously defining the root cause for a given noise.
For example, in Fig. 4, the first four lines represent the
noise as identified by the trace, while the last line is the
5.3 The osnoise Tracing Features tracepoint generated by the workload, mentioning the pre-
The tracepoints are one of the key pillars of the Linux kernel vious four interferences.
tracing. The tracepoints are points in the kernel code where Both Figs. 3 and 4 were extracted from the same trace file.
it is possible to attach a probe to run a function. They are The difference is that the former contains the previous exist-
most commonly used to collect trace information. For exam- ing tracepoints, while the latter includes the new tracepoints
ple, ftrace register a callback function to the tracepoints. added to the kernel with osnoise. With these two exam-
These callback functions collect the data, saving it to a trace ples, it is possible to notice that the amount of information
buffer. The data in the trace buffer can then be accessed by a reported by the osnoise tracepoints is reduced and more
tracing interface. Fig. 3 shows an example of tracepoint out- intuitive.
put via ftrace interface. Regarding the noise reported in Fig. 4, it is important to
The usage of tracepoints is not limited to saving data to notice that the duration reported by the irq_noise and
the buffer. They have been leveraged for many other use- thread_noise are free of interference. For example, the
cases. For instance, patch the kernel at runtime or transform local_timer:236 has a start time later than the sleep-5843.
network packets [42]. Tracepoints can also be used to opti- This means that local_timer:236 preempted sleep-5843,
mize tracing itself. While saving data to the trace buffers in a case of nested noise. The local_timer:236, however,
has been optimized to the minimum overhead, it is also
5. The single per-CPU NMI is a particular case without tracepoints;
4 All ftrace excerpts share the same column description, as in the in this case, a special function was added to collect the same
header in Fig. 2. information.
202 IEEE TRANSACTIONS ON COMPUTERS, VOL. 72, NO. 1, JANUARY 2023
Fig. 4. Example of tracepoints: osnoise events read from ftrace interface with equivalent data highlighted4.
discounted its own duration from the duration of sleep-5843 The rtla Interface. Since version 5.17, Linux includes the real-
This facilitates the debugging of the system by removing the fas- time Linux analysis tool, named rtla. This is a meta-tool that
tidious work of computing these values manually or via a script aims at analyzing the real-time properties of Linux by exploit-
in user-space. This also reduces the amount of data saved in the ing its tracing features to provide information on the causes
trace buffer, reducing resource usage and overhead. of measurements. rtla includes an user-space tool named
Another important thing to notice is that the total noise rtla osnoise [44], transforming the tracer into a bench-
observed via trace accounts for 1409532 ns,6 but the noise mark-like tool. This tool works by configuring, dispatching
obseved via workload reports 5092 ns more (1414624 ns), as and collecting data from the osnoise tracer via ftrace interface.
illustrated in Fig. 5. The reasons behind are multiple. For The rtla osnoise can either collect the periodic workload
example, the overhead added by the tracepoints enabled in summary, creating a long run summary, or create a histogram
Figs. 3 and 4; the delays added by the hardware to manage with data from the sample_threshold tracepoint.
context switch and the dispatch of IRQs handlers; delays
caused by cache inlocality after an interrupt [43]; low level 5.5 The osnoise Internals
the code that enables the tracing at IRQ context, like making
The osnoise tracer aims to measure possible sources of
the RCU aware of the current context;7 and the scheduler
noise at the single-digit ms scale, and this represents a chal-
call caused by the thread noise.
lenge when dealing with parallel and re-entrant code as in
This justifies the dual approach and motivates the nov-
the Linux kernel. This section presents some of these chal-
elty to prior work that used only one of the main distin-
lenges and how they have been tackled.
guishing factors with respect to prior work (as extensively
discussed in Section 4): using both the measuring thread
and the tracing. Indeed, the trace cannot be used as the only 5.5.1 Task and Memory Model
source of information because it cannot account for the osnoise aims to simulate applications built using the
overheads occurring outside the scope of the tracing. Simi- SPMD model, presented in Section 2. When dispatched, the
larly, the measurement thread alone cannot capture the rea- tracer creates a per-CPU kernel thread to run the osnoise
sons for the OS noise, and hence it does not provide workload component. Each per-CPU thread has its affinity
essential information to understand and reduce the OS- configured to run on the target CPU only. All threads share
related interference. the same configuration that the user can update as the
Hardware-Induced Noise. To identify hardware-induced workload runs. This data is only accessed outside the main
noise, introduced in Section 2, Linux includes a tracer workload loop, thus not representing a problem for the
named hwlat. It works running a workload in the kernel, measurement phase. A mutex protects the configuration.
with preemption and IRQs disabled, avoiding all the sour- The runtime data utilized during the measurement phase is
ces of interference except the hardware and NMIs noise, organized on per-CPU structures, and the thread only
which cannot be masked. While running a busy-loop read- accesses the one related to the CPU where it is executing.
ing the time, when hwlat detects a gap in two subsequent osnoise aims to simulate a user-space workload that
reads, it reports a hardware-induced noise. follows the rules reported in Section 2.1. Specifically, a user-
The resemblance of hwlat and osnoise is not a coinci- space workload on Linux can be preempted by all types of
dence because the latter was indeed inspired to the former OS tasks mentioned in Section 2. While there are methods to
tool. osnoise is also able to detect hardware noise. Because temporarily disable thread preemption, softirqs, and inter-
it tracks all the tasks execution, when a sample noise is rupts, they have some undesired drawbacks. For example,
detected without a respective increase in any of the interfer- disabling interrupts is a costly operation, and it should be
ence counters, it is safe to assume that a layer below the avoided in cases where overhead needs to be minimized,
operating system generated the noise. like on the Linux tracing subsystem. Moreover, it is not
Fig. 7. Code excerpt of set_int_safe_time(): how osnoise deals with reentracy problems.
deployments with dynamic and diverse workloads in the The Tuned kernel was able to deliver consistent results,
same host. with the kernel Tuned using FIFO:1 was able to provide
below 5 ms maximum single noise occurrence. That is
because background OS activities that run as threads are
6.2 OS Noise Occurrence Analysis deferred by the real-time scheduler, without creating a fault
A six-hours experiment has been conducted for all tune/ in the system. For example, jobs dispatched on all CPUs
FIFO priority cases collecting a histogram of each detected viakworkers threads that execute deferrable work [45].
noise occurrence. This experiment is important for the NFV However, these still need to have the possibility to run to
use-case because a single long noise occurrence might cause avoid major problems. Thus, a wise choice for the develop-
the overflow of queues in the network packets processing. ment of high-performance applications with low latency
The results are presented in Fig. 9. requirements is to be aware of this property and yield some
With this experiment, it is possible to see the main prob- CPU time when it would not cause performance issues (for
lem of using the system As-is in Fig. 9a. Theosnoise instance, when network buffers are empty), even for a small
workload detected 230 out-of-scale noise samples, with amount of time, like 100 ms every 10 seconds.
the maximum value as long as 13045 ms. Fig. 9b also shows These experiments also serve to show the low impact that
that usingFIFO:1 in the system As-is represents an easy- the osnoise internals imposes in the evaluation, allowing
to-use option to reduce the maximum single noise occur- the user to receive information in the ms granularity used
rence value. The reason being is that because the workload by practitioners on other tools like cyclictest.
causes starvation of non-real-time threads, these threads It is important to highlight that the results presented in
are migrated to the CPUs with time available for them to this section are only valid for this specific scenario. Different
run. hardware, CPU count, auxiliary operating system services,
As-is using FIFO:1 however has two major drawbacks and conditions will likely provide different results. Thus
when compared against the Tuned options with or without the importance of such a tool, providing an integrated OS
using FIFO:1 in Figs. 9c and 9d. The first is the high count of noise benchmark and guidance for the fine tune of the
noise occurrences. The Tuned experiment includes the system.
nohz_full option that reduces the occurrence of the
scheduler tick, reducing the execution of the ksoftirqd kernel
6.3 Using osnoise to Trace Sources of Latency
thread that checks for expired timers and activities that fol-
low. Another difference is the tail latency, which is lower on The experiment of the system As-is with FIFO:1 pre-
the Tuned cases. This difference is explored in Section 6.3. sented an interesting result with regard to the tail latency,
The results with system Tuned in Figs. 9c and 9d show as only few samples passing the 30 ms mark. To understand
that the tune dramatically changes the entries and duration the reasons behind these cases that go over 30 ms, the
of each noise occurrence when compared with the osnoise tracer was set to trace the osnoise events, stop-
system As-is. Figs. 9e and 9f have been added to better visu- ping the tracer when a noise occurrence over 30 ms was
alize the Tuned cases. detected. The trace with sole osnoise events is shown in
Fig. 10. It shows an interrupt noise, caused by the interrupt
62, responsible for the eno1 ethernet driver. Right after the
ksoftirqd is scheduled, causing a long -duration noise
occurrence. The ksoftirqd thread is responsible for run-
ning the softirq jobs context in the PREEMPT_RT kernel
(recall from Section 2.1 that the softirq context does not exist
under PREEMPT_RT, and softirq jobs run in the thread
context).
Following the evidence that the problem is caused by
softirqs, the sole events that provide information about
softirqs were enabled, as selecting a small amount of tracing
events helps to avoid influencing too much in the timing
Fig. 8. Average percentage of OS noise observed by the workload on dif- behavior of the system due to overhead. The trace again
ferent scenarios. Error bars represent the range between minimum and reported a similar output in Fig. 11. This trace shows that
maximum percentage. the network receive (NET_RX) softirq was the reason for
DE OLIVEIRA ET AL.: OPERATING SYSTEM NOISE IN THE LINUX KERNEL 205
Fig. 9. osnoise noise occurrence per-cpu histogram under different system setup, mixing CPU isolation tune and real-time priority for the workload
(less noise occurrence and less occurrence count is better).
Fig. 11. osnoise tracer finding source of latencies augmented with other events4.
the ksoftirqd activation. The NET_RX softirq is acti- was configured to fire in the CPUs ½0 : 1. With this configu-
vated by the network driver that is running in the same ration applied, the experiment in Fig. 9b was re-executed
CPU. This it is an side effect of the eno1 ethernet driver for six hours, the results is shown in Fig. 12. This configura-
causing the interrupt. Following this evidence, the IRQ 62 tion alone was reposible to reduce the the tail latency to
206 IEEE TRANSACTIONS ON COMPUTERS, VOL. 72, NO. 1, JANUARY 2023
REFERENCES
[1] L. Tung, “SpaceX: We’ve launched 32,000 linux computers
into space for starlink internet,” Jun. 2011. [Online]. Available:
https://www.zdnet.com/article/spacex-weve-launched-32000-
linux-computers-into-space-for-starlink-internet/
[2] Network Functions Virtualisation – Introductory White Paper – An
Introduction, Benefits, Enablers, Challenges & Call for Action,”
SDN and OpenFlow World Congress. Oct. 2012. [Online]. Avail-
able: https://portal.etsi.org/NFV/NFV_White_Paper.pdf
[3] D. Kreutz, F. M. V. Ramos, P. E. Verıssimo, C. E. Rothenberg, S. Azo-
dolmolky, and S. Uhlig, “Software-defined networking: A. compre-
hensive survey,” Proc. IEEE, vol. 103, no. 1, pp. 14–76, Jan. 2015.
[4] Z. Li, L. Ruan, C. Yao, Y. Dong, and N. Cui, “A best practice of 5G
layer 2 network on Intel architecture,” in Proc. IEEE Globecom
Fig. 12. As-is using FIFO:1 after moving the network IRQ as suggest Workshops, 2019, pp. 1–5.
from trace in Fig. 11. [5] T. Cucinotta, L. Abeni, M. Marinoni, R. Mancini, and C. Vitucci,
“Strong temporal isolation among containers in OpenStack for
figures like the system Tuned, with the debug facilitated by NFV services,” IEEE Trans. Cloud Comput., to be published,
doi: 10.1109/TCC.2021.3116183.
osnoise. [6] GSM Association, “Cloud infrastructure reference model version
1.0,” [Online]. Available: https://www.gsma.com/newsroom/
7 CONCLUSION AND FUTURE WORK wp-content/uploads//NG.126-v1.0-2.pdf
[7] N. Bhushan et al., “Industry perspective,” IEEE Wireless Commun.,
Network function virtualization and modern low latency vol. 24, no. 5, pp. 6–8, Oct. 2017.
communications are creating the need for Linux systems [8] L. Mandyam and S. Hoenisch, “RAN workload performance is
equivalent on bare metal and vSphere,” [Online]. Available:
with low latency for both scheduling latency and OS noise. https://blogs.vmware.com/telco/ran-workload-performance-
These real-time HPC workloads require noise to be in the tests-on-vmware-vsphere/
order of a few tens of microseconds. [9] Intel, “FlexRAN,” [Online]. Available: https://github.com/intel/
FlexRAN
However, debugging these cases is not an easy task. [10] Red Hat, “What is NFV?” [Online]. Available: https://www.
Workload-based tools are precise for measurements but do redhat.com/en/topics/virtualization/what-is-nfv
not point to a root cause. Trace-based measurements pro- [11] P. E. McKenney, J. Fernandes, S. Boyd-Wickizer, and J. Walpole,
vide information about the cause but without an accurate “RCU usage in the linux kernel: Eighteen years later,” SIGOPS
Oper. Syst. Rev., vol. 54, no. 1, pp. 47–63, Aug. 2020. [Online].
picture of the actual noise observed by the thread.
Available: https://doi.org/10.1145/3421473.3421481
Practitioners use both methods together, but this requires [12] D. B. de Oliveira and R. S. de Oliveira, “Timing analysis of the
advanced knowledge of the tracing features, and it can often PREEMPT_RT linux kernel,” Softw., Pract. Exper., vol. 46, no. 6,
mislead the investigation because the trace is not synchro- pp. 789–819, 2016.
[13] D. B. de Oliveira, D. Casini, R. S. de Oliveira, and T. Cucinotta,
nized with the workload or adds too much overhead. “Demystifying the real-time linux scheduling latency,” in Proc.
The osnoise tool puts together the tracing and the 32nd Euromicro Conf. Real-Time Syst., 2020, pp. 9:1–9:23.
workload, providing precise information at low overhead [14] F. Cerqueira and B. Brandenburg, “A comparison of scheduling
by processing and exporting only the necessary information latency in linux, PREEMPT-RT, and LITMUS RT,” in Proc. 9th
Annu. Workshop Operating Syst. Platforms Embedded Real-Time
for pointing to the root causes of the latency, serving as a Appl., 2013, pp. 19–29.
good starting point for the investigation. [15] F. Petrini, D. Kerbyson, and S. Pakin, “The case of the missing
The experimental results show that the tool is able to supercomputer performance: Achieving optimal performance on
the 8,192 processors of ASCI Q,” in Proc. ACM/IEEE Conf. Super-
serve both as a tracer and benchmark tool, facilitated by the computing, 2003, pp. 55–55.
usage of rtla osnoise interface to collect data. The exper- [16] A. Morari, R. Gioiosa, R. W. Wisniewski, F. J. Cazorla, and
iment shows that Linux can deliver extremely low OS noise, M. Valero, “A quantitative analysis of OS noise,” in Proc. IEEE Int.
achieving maximum sample noises as low as less than 5 ms. Parallel Distrib. Process. Symp., 2011, pp. 852–863.
[17] D. B. de Oliveira, R. S. de Oliveira, and T. Cucinotta, “A thread
But more importantly, the tool is able to follow the kernel, synchronization model for the PREEMPT_RT linux kernel,”
delivering results in the desired scale. J. Syst. Archit., vol. 107, 2020, Art. no. 101729.
Both the osnoise tool and rtla osnoise interfaces are [18] J. Lelli, C. Scordino, L. Abeni, and D. Faggioli, “Deadline scheduling
an integral part of the Linux kernel, thus accessible for the in the linux kernel,” Softw.: Pract. Exp., vol. 46, no. 6, pp. 821–839,
2016.
entire Linux user base. [19] A. C. Dusseau, R. H. Arpaci, and D. E. Culler, “Effective distrib-
Because the osnoise tracer uses the most basic building uted scheduling of parallel workloads,” in Proc. ACM SIGMET-
blocks of the Linux tracing sub-system, it can be combined RICS Int. Conf. Meas. Model. Comput. Syst., 1996, pp. 25–36.
[20] J. Lehoczky, L. Sha, and Y. Ding, “The rate monotonic scheduling
with many other existing tracing tools, such as performance
algorithm: Exact characterization and average case behavior,” in
counters provided via perf tool, or to be used with graphi- Proc. Real-Time Syst. Symp., 1989, pp. 166–171.
cal interfaces provided by LTTng and KernelShark . This [21] R. Henia, A. Hamann, M. Jersak, R. Racu, K. Richter, and R. Ernst,
creates an end-less set of possibilities for future work, “System level performance analysis - the SymTA/S approach,” in
IEEE Proc. - Comput. Digit. Techn., 2005.
extending the osnoise measurements to include data from [22] G. Gracioli, A. Alhammad, R. Mancuso, A. A. Fr€ ohlich, and R. Pel-
the memory/cache, to include workload-dependent meth- lizzoni, “A survey on cache management mechanisms for real-
ods, other clock sources, and energy-aware methods, for time embedded systems,” ACM Comput. Surv., vol. 48, no. 2, Nov.
example. Extending the analysis with a more formal 2015, Art. no. 32.
[23] D. Casini, A. Biondi, G. Nelissen, and G. Buttazzo, “A holistic
approach is another possibility, as well as conducting exper- memory contention analysis for parallel real-time tasks under par-
imental evaluations based on other real-time schedulers of titioned scheduling,” in Proc. IEEE Real-Time Embedded Technol.
Linux, e.g.,SCHED_DEADLINE. Appl. Symp., 2020, pp. 239–252.
DE OLIVEIRA ET AL.: OPERATING SYSTEM NOISE IN THE LINUX KERNEL 207
[24] D. Hardy, B. Rouxel, and I. Puaut, “The heptane static worst-case [44] D. B. de Oliveira, “rlta-osnoise: Measure the operating system
execution time estimation tool,” in Proc. 17th Int. Workshop Worst- noise,” 2022. [Online]. Available: https://www.kernel.org/doc/
Case Execution Time Anal., 2017, pp. 8:1–8:12. html/latest/tools/rtla/rtla-osnoise.html
[25] B. Brandenburg, “The case for an opinionated, theory-oriented [45] J. Corbet, A. Rubini, and G. Kroah-Hartman, Linux Device Driver,
real-time operating system,” in Proc. 1st Int. Workshop Next- Gener. 3rd ed. Sebastopol, CA, USA: O’Reilly Media, 2005.
Oper. Syst. Cyber- Phys. Syst., 2019.
[26] L. Abeni and G. Buttazzo, “Adaptive bandwidth reservation for
multimedia computing,” in Proc. 6th Int. Conf. Real-Time Comput. Daniel Bristot de Oliveira received the joint PhD
Syst. Appl., 1999, pp. 70–77. degree in automation engineering from UFSC (BR)
[27] R. Mraz, “Reducing the variance of point to point transfers in the
and embedded real-time systems from Scuola
IBM 9076 parallel computer,” in Proc. Conf. Supercomputing, 1994,
Superiore Sant’Anna (IT). Currently, he is senior
pp. 620–629. principal software engineer with Red Hat, working
[28] R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare, on developing the real-time features of the Linux ker-
“Analysis of system overhead on parallel computers,” in Proc. 4th nel. He helps in the maintenance of real-time related
IEEE Int. Symp. Signal Process. Inf. Technol., 2004, pp. 387–390. tracers and toolings for the Linux kernel and the
[29] K. B. Ferreira, P. Bridges, and R. Brightwell, “Characterizing SCHED_DEADLINE. He is an affiliate researcher
application sensitivity to OS interference using kernel-level
with the Retis Lab, and researches real-time and for-
noise injection,” in Proc. ACM/IEEE Conf. Supercomputing, 2008,
mal methods. He is an active member of the real-
pp. 1–12. time academic community, participating in the technical program committee
[30] M. Sottile and R. Minnich, “Analysis of microbenchmarks for per- of academic conferences, such as RTSS, RTAS, and ECRTS.
formance tuning of clusters,” in Proc. IEEE Int. Conf. Cluster Com-
put., 2004, pp. 371–377.
[31] D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick, “System
noise, OS clock ticks, and fine-grained parallel applications,” in Daniel Casini (Member, IEEE) received the grad-
uate (cum laude) degree in embedded computing
Proc. 19th Annu. Int. Conf. Supercomputing, 2005, pp. 303–312.
systems engineering, the master’s degree jointly
[Online]. Available: https://doi.org/10.1145/1088149.1088190
[32] D. Riddoch, “sysjitter v1.4,” [Online]. Available: https://github. offered by the Scuola Superiore Sant’Anna of
com/alexeiz/sysjitter Pisa and University of Pisa, and the PhD degree
[33] RT-Tests. [Online]. Available: https://git.kernel.org/pub/scm/ in computer engineering from the Scuola Superi-
utils/rt-tests/rt-tests.git ore Sant’Anna of Pisa (with honors), working
[34] G. Tene, “jHicckup,” [Online]. Available: http://www.azulsystems. under the supervision of Prof. Alessandro Biondi
and Prof. Giorgio Buttazzo. He is an assistant
com/jHiccup
[35] P. Lawrey, “MicroJitterSampler,” [Online]. Available: http:// professor with the Real-Time Systems (ReTiS)
blog.vanillajava.blog/2013/07/micro-jitter-busy-waiting-and- Laboratory of the Scuola Superiore Sant’Anna of
binding.html Pisa. In 2019, he has been visiting scholar with the Max Planck Institute
[36] P. De, R. Kothari, and V. Mann, “Identifying sources of operating for Software Systems Germany. His research interests include software
system jitter through fine-grained kernel instrumentation,” in predictability in multi-processor systems, schedulability analysis, syn-
chronization protocols, and the design and implementation of real-time
Proc. IEEE Int. Conf. Cluster Comput., 2007, pp. 331–340.
operating systems and hypervisors.
[37] The LTTng Project, “LTTng,” [Online]. Available: https://lttng.
org/
[38] A. Nataraj, A. Morris, A. D. Malony, M. Sottile, and P. Beckman,
“The ghost in the machine: Observing the effects of kernel opera- Tommaso Cucinotta (Member, IEEE) received
tion on parallel application performance,” in Proc. ACM/IEEE the MSc degree in computer engineering from
Conf. Supercomputing, 2007, pp. 1–12. the University of Pisa, Italy, and the PhD degree
[39] A. Nataraj, A. D. Malony, S. Shende, and A. Morris, “Kernel-level in computer engineering from Scuola Superiore
measurement for integrated parallel performance views: The Sant’Anna (SSSA), in Pisa, where he has been
KTAU project,” in Proc. IEEE Int. Conf. Cluster Comput., 2006, investigating on real-time scheduling for soft real-
pp. 1–12. time and multimedia applications, and predictabil-
[40] N. M. Gonzalez, A. Morari, and F. Checconi, “Jitter-trace: A low- ity in infrastructures for cloud computing and NFV.
overhead OS noise tracing tool based on linux perf,” in Proc. 7th He has been MTS in Bell Labs in Dublin, Ireland,
Int. Workshop Runtime Oper. Syst. Supercomputers, 2017, Art. no. 2. investigating on security and real-time perfor-
[41] D. B. de Oliveira, “Osnoise tracer,” [Online]. Available: https:// mance of cloud services. He has been a software
www.kernel.org/doc/html/latest/trace/osnoise-tracer.html, engineer in Amazon Web Services in Dublin, Ireland, where he worked
2021. on improving the performance and scalability of DynamoDB. Since
[42] D. B. de Oliveira, T. Cucinotta, and R. S. de Oliveira, “Efficient for- 2016, he is associate professor with SSSA and head of the Real-Time
mal verification for the linux kernel,” in Proc. Int. Conf. Softw. Eng. Systems Lab (RETIS) since 2019.
Formal Methods, 2019, pp. 315–332.
[43] L. Soares and M. Stumm, “FlexSC: Flexible system call scheduling
" For more information on this or any other computing topic,
with exception-less system calls,” in Proc. 9th USENIX Conf. Oper.
Syst. Des. Implementation, 2010, pp. 33–46. please visit our Digital Library at www.computer.org/csdl.
Open Access funding provided by ‘Scuola Superiore ‘S.Anna’ di Studi Universitari e di Perfezionamento’
within the CRUI CARE Agreement