Hardware Breakpoint (Or Watchpoint) Usage in Linux Kernel
Hardware Breakpoint (Or Watchpoint) Usage in Linux Kernel
Prasad Krishnan
IBM Linux Technology Centre
prasad@linux.vnet.ibm.com
The hardware breakpoint registers in several processors While debug registers would treat every breakpoint ad-
provide a mechanism to interrupt the programmed ex- dress in the same way, there is a fundamental dif-
ecution path to notify the user through of a hardware ference in the way kernel and user-space breakpoint
• 149 •
150 • Hardware Breakpoint (or watchpoint) usage in Linux Kernel
requests are effected. A user-space breakpoint be- In order to avoid fragmentation of debug registers upon
longing to one thread (and hence stored in struct an unregistration operation, all kernel-space breakpoints
thread_struct) will be active only on one proces- are “compacted” by shifting the debug register values by
sor at any given point of time. The kernel-space break- one-level although this is not possible for user-space re-
points, on the other hand should remain active on all quests as it would break the semantics of existing ptrace
processors of the system to remain effective since each implementation. This implies that even if a user-thread
of them can potentially run kernel code any time. This downgraded its usage of breakpoints from n to n - 1,
necessitates the propagation of kernel-space requests for the breakpoint infrastructure will continue to reserve n
(un)registration to all processors and is done through debug registers. A solution for this has been proposed
inter-processor interrupts (IPI). The per-thread user- in Section 8.1.
space breakpoint takes effect only just before the thread
is scheduled. This means that a system at run-time 3.2 Register Bookkeeping
can have as many breakpoint requests as the number
of threads running and the number of free (i.e., not in
Accounting of free and used debug registers is essential
use by kernel) breakpoint registers put together (number
for effective arbitration of requests, and allows multiple
of threads x number of available breakpoint registers)
users to exist concurrently. Debug register bookkeeping
since they can be active simultaneously without inter-
is done with the help of following variables and struc-
fering with each other.
tures.
On architectures (such as x86) containing more than one hbp_kernel[] – An array containing the list of
debug register per processor, the infrastructure arbitrates kernel-space breakpoint structures
between requests from multiple sources. To achieve
this, the implementation submitted to the Linux commu- this_hbp_kernel[] – A per-cpu copy of hbp_
nity (refer [3]) makes certain assumptions about the na- kernel maintained to mitigate the problem discussed
ture of requests for breakpoint registers from user-space in Section 7.2.
through ptrace syscall, and simplifies the design based
on them. hbp_kernel_pos – Variable denoting the next avail-
able debug register number past the last kernel break-
point. It is equal to HBP_NUM at the time of initialisa-
The register allocation is done on a first-come, first-
tion.
serve basis with the kernel-space requests being accom-
modated starting from the highest numbered debug reg- hbp_user_refcount[] – An array containing re-
ister growing towards the lowest; while user-space re- fcount of threads using a given debug register number.
quests are granted debug registers starting from the low- Thus a value x in any element of index n will indicate
est numbered register. Thus in case of x86, the infras- that there are x number of threads in user-space that cur-
tructure begins looking out for free registers beginning rently use n number of breakpoints, and so on.
from DR0 while for kernel-space requests it will begin
with DR3 thus reducing the scope for conflict of re- A system can accommodate new requests for break-
quests. points as long as the kernel-space breakpoints and those
2009 Linux Symposium • 151
of any given thread (after accounting for the new re- int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
quest) in the system can be fit into the available debug int register_user_hw_breakpoint(struct task_struct *tsk,
struct hw_breakpoint *bp);
registers. In essence,
A call to register a breakpoint is accompanied by a While the breakpoint register can usually store one ad-
pointer to the breakpoint structure populated with cer- dress, the processor can be configured to monitor ac-
tain attributes of which some are architecture-specific. cesses for a range of addresses (using the stored address
152 • Hardware Breakpoint (or watchpoint) usage in Linux Kernel
User-space debuggers
(GDB)
USER-SPACE
ptrace() KERNEL-SPACE
arch_update_user_hw_breakpoint()
struct thread_struct {
...
...
Hardware breakpoint regs
struct hw_breakpoint *hbp[HBP_NUM]
... on_each_cpu(arch_update_kernel_hw_breakpoint)
...
}
Context Switch - switch_to() Context Switch - switch_to() Context Switch - switch_to() Context Switch - switch_to()
Figure 1: This figure illustrates the handling of requests from kernel and user-space by the breakpoint infrastructure
Table 2: Time taken for (un)register_kernel operation in Table 3: Time taken for breakpoint handler with a
micro-seconds dummy callback function (in nano-seconds)
• Table 2 – Contains overhead measurements for reg- • balance between the the need for a uniform be-
ister and unregister requests on two systems. haviour and exploitation of unique processor fea-
tures
• Table 3 – Average time taken for the breakpoint
handler execution with a dummy trigger in four dif-
ferent trials on two systems. The implementation of such goals gave rise to chal-
lenges, some of which are discussed here.
The trials were conducted on two machines, System A
and B whose specifications are as below. 7.1 Ptrace integration
System A – 24 CPU x86_64 machine Intel(R) Xeon(R)
MP 4000 MHz The user-space has been the most common user of hard-
ware breakpoints through the ptrace system call. Ptrace
System B – 2 CPU i386 Intel(R) Pentium(R) 4 CPU interface’s ability to read or write from/into any phys-
3.20GHz ical register has been exploited to enable breakpoints
for user-space addresses. While it required little or no
These systems, chosen for tests are sufficiently diverse
knowledge about the host architecture’s debug registers,
in the number of CPUs in them to expose the overhead
it remained the responsibility of the application invoking
caused by of IPIs in the (un)register_kernel_
ptrace (such as GNU Debugger GDB) to be a knowl-
hw_breakpoint() operations. The readings were
edgeable user and activate/disable them through appro-
taken without any true workload on the systems.
priate control information.
While the overhead for unregister operations is greater
For instance, on x86 processors containing multiple de-
in System A (with many CPUs), interestingly this be-
bug registers and dedicated control and status registers
haviour does not manifest during the register operations
(unlike in PPC64 where the control and debug address
(Refer to Table 2).
registers are composite), operations such as read and
write become non-trivial—i.e., every request for a new
7 Challenges breakpoint must require one write operation on the de-
bug address register (DR0 - DR3) and one for the control
Among the the goals set during the design of the hard- register.
ware breakpoint infrastructure, a few to mention are:
Since ptrace is exposed to the user-space as a system
• provide a generic interface that abstracts out call it is important to preserve its error return behaviour.
the arch-specific variations in breakpoint facility Achieving this becomes complicated because of the fact
and allowing the end-user to harness this facility that ptrace and its user in the user-space assumes exclu-
through a consistent interface sive availability of the debug registers and are ignorant
156 • Hardware Breakpoint (or watchpoint) usage in Linux Kernel
of any kernel space users. Hence, the number of avail- handle exceptions through its own copy of the break-
able registers may be lesser than the ptrace user’s as- point data until removed. Although this generates mul-
sumption and may result in failure of request when not tiple copies of the same global data, it is much preferred
expected. over the alternatives such as global disabling of break-
points (through IPIs) before every unregister operation,
On architectures like x86 where the status of multiple due to the overhead associated with processing the IPIs
breakpoint requests can be modified through one ptrace (Refer Table 2 for data containing turnaround time for
call (using a write operation on debug control register register/unregister operations).
DR7), care is taken to avoid a partially fulfilled request
to prevent the debug registers from gaining a set of val-
ues that is different from the ptrace’s requested values 8 Future enhancements
and its past state. Consider a case where, among the
four debug registers, one was active and the remaining Enhanced abstraction of the interface to include defini-
three were disabled in the initial state. If the new re- tions of attributes that are common to several architec-
quest through ptrace was to de-activate the single active tures (such as read/write breakpoint types), widening the
breakpoint and enable the rest of them, then we do not support for more processors, improvements to the ca-
effect the breakpoint unregistration first but begin with pabilities, interface and output of ksym_tracer; cre-
the registration requests and this is done for a reason. ation of more end-users to support the breakpoint infras-
tructure such as “perfcounters” and SystemTap in inno-
Supposing that one of the breakpoint register operation vative ways are just a few enhancements contemplated
fails (due to one of the reasons noted above in Section at the moment for this feature.
4.1) and if it was preceded by the unregister operation
the result of the ptrace call is still considered a failure. Virtualised debug registers was a feature in one of the
The state of the debug registers must now be restored to versions of the patchset submitted to the Linux commu-
its previous one which implies that the breakpoint un- nity but was eventually dropped in favour of a simplified
registration operation must be reversed. Under certain approach to register allocation. The details of the feature
conditions this may not be possible leaving the debug and benefits are detailed below.
registers with an altogether new set of values.
8.1 Virtualisation of Debug registers
Thus all breakpoint disable requests in ptrace for x86 is
processed only after successful registration requests if In processors having multiple registers such as x86, re-
any. This prevents a window of opportunity for debug quests for breakpoint from ptrace are targeted for spe-
register grabbing by other requests thereafter leading to cific numbered debug register and is not a generic re-
a problem as described above. quest. While this mechanism works well in the absence
of any register allocation mechanism and when requests
7.2 Synchronised removal of kernel breakpoints from user-space have exclusive access to the debug reg-
isters, their inter-operability with other users is affected.
A kernel breakpoint unregistration request would re- The hardware breakpoint infrastructure discussed here,
quire updating of the global kernel breakpoint structure mitigates this problem to a certain extent by using the
and debug registers of all CPUs in the system (similar fact that requests from ptrace tend to grow upwards—
to the process of registration). However every processor i.e., starting from the lower numbered register to the
is susceptible to receive a breakpoint exception from the higher ones.
breakpoint that is pending removal although the related
global data structures may be cleared by then causing A true solution to this problem lies in creating a thin
indeterminate behaviour. layer that maps the physical debug registers to those re-
quested by ptrace and allow the any free debug regis-
This potential issue was circumvented by storing a per- ter to be allocated irrespective of the requested regis-
cpu copy of the global kernel breakpoint structures ter number. The ptrace request can continue to access
which would be updated in the context of IPI process- through the virtual debug register thus allo-
ing. It enables every processor to continue to receive and cated.
2009 Linux Symposium • 157
8.2 Prioritisation of breakpoint requests of the patches by Alan Stern. The author gratefully ac-
knowledges their contribution.
Allow the user to specify the priority for breakpoint re-
Special thanks to Balbir Singh for initiating the author
quests to be handled. If a breakpoint request with a
into the creation of this paper and being a great source
higher priority arrives, the existing breakpoint yields the
of encouragement throughout.
debug register to accommodate the former. An accom-
paniment to this feature would be the callback routines The author wishes to thank Naren A Devaiah and the
that are invoked whenever a breakpoint request is pre- IBM management who generously provided an oppor-
empted or regains the debug registers on the processor. tunity to work on this feature and paper, without which
This is done at the time of every new registration to bal- its presentation at the Linux Symposium 2009 wouldn’t
ance the requests and accommodate requests based on have been possible.
their priorities.
This work represents the view of the author and does not nec-
The Hardware Breakpoint infrastructure and the as- essarily represent the view of IBM.
sociated consumers of the infrastructure such as
ksym_tracer makes available a hitherto scarcely IBM, IBM logo, ibm.com are trademarks of International
used hardware resource to good use in newer ways such Business Machines Corporation in the United States, other
as profiling and tracing apart from their vital roles in de- countries, or both.
bugging. The overhead in taking a breakpoint, as our
Linux is a registered trademark of Linus Torvalds in the
results in Section 6 show are tolerable even in produc-
United States, other countries, or both.
tion environments and if any would be the result of the
user-defined callback function. It is hoped that when Other company, product, and service names may be trade-
the patches head into the mainline kernel, a wider user- marks or service marks of others.
feedback and testing will help evolve the infrastructure
into a more powerful and robust one than the proposed. References in this publication to IBM products or services
do not imply that IBM intends to make them available in all
countries in which IBM operates.
10 Acknowledgements
INTERNATIONAL BUSINESS MACHINES CORPORA-
TION PROVIDES THIS PUBLICATION “AS IS” WITH-
The author wishes to thank his team at Linux Technol-
OUT WARRANTY OF ANY KIND, EITHER EX-
ogy Centre, IBM and the management for their encour-
PRESS OR IMPLIED, INCLUDING, BUT NOT LIM-
agement and support during the creation of the hardware
ITED TO, THE IMPLIED WARRANTIES OF NON-
breakpoint patchset and the paper.
INFRINGEMENT, MERCHANTABILITY OR FITNESS
The profound work done by Alan Stern, whose patch- FOR A PARTICULAR PURPOSE. Some states do not al-
set and ideas were the foundation for the present code in low disclaimer of express or implied warranties in certain
-tip tree, and an earlier patchset from Prasanna S Pan- transactions, therefore, this statement may not apply to you.
chamukhi need a mention of thanks from the author. This information could include technical inaccuracies or ty-
pographical errors. Changes are periodically made to the in-
The design of this feature is heavily influenced by sug- formation herein; these changes will be incorporated in new
gestions from Ingo Molnar and code was vetted by editions of the publication. IBM may make improvements
Ananth N Mavinakayanahalli, Frederic Weisbecker and and/or changes in the product(s) and/or the program(s) de-
Maneesh Soni; also benefiting from the in-depth review scribed in this publication at any time without notice.
158 • Hardware Breakpoint (or watchpoint) usage in Linux Kernel
References
Programme Committee
Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,
Thin Lines Mountaineering
James Bottomley, Novell
Bdale Garbee, HP
Dave Jones, Red Hat
Dirk Hohndel, Intel
Gerrit Huizenga, IBM
Alasdair Kergon, Red Hat
Matthew Wilson, rPath
Proceedings Committee
Robyn Bergeron
Chris Dukes, workfrog.com
Jonas Fonseca
John ‘Warthog9’ Hawley
With thanks to
John W. Lockhart, Red Hat
Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights
to all as a condition of submission.