KVM Architecture Overview
2015 Edition
Stefan Hajnoczi <stefanha@redhat.com>
1 February 16, 2015 | Stefan Hajnoczi
Introducing KVM virtualization
KVM hypervisor runs virtual machines on Linux hosts
Mature on x86, recent progress on ARM and ppc
Most popular and best supported hypervisor on
OpenStack
https://wiki.openstack.org/wiki/HypervisorSupportMatrix
Built in to Red Hat Enterprise Linux
Qumranet startup created KVM, joined Red Hat in
2008
2 February 16, 2015 | Stefan Hajnoczi
Virtualization goals
Efficiently and securely running virtual machines on a
Linux host
Linux, Windows, etc guest operating systems
Access to networking and storage in a controlled
fashion
Linux Windows
guest guest
Host
Net Disk
3 February 16, 2015 | Stefan Hajnoczi
Where does KVM fit into the stack?
Management for
OpenStack RHEV
datacenters and clouds
Management for
one host libvirt
QMP
Emulation for
one guest QEMU Guest
Host hardware access
Host kernel kvm.ko
and resource mgmt
4 February 16, 2015 | Stefan Hajnoczi
More on QEMU and kvm.ko Virtualization features
QEMU Device emulation
RAM Live migration
... ...
QXL gfx card VNC remote display
virtio-blk disk Storage migration
Intel VMX
Guest/host mode switching
In-kernel
Device emulation
Host kernel kvm.ko
5 February 16, 2015 | Stefan Hajnoczi
Hardware virtualization support with Intel VMX
Allows safe guest code execution at native speed
Certain operations trap out to the hypervisor
VMXON
VMRESUME
VMLAUNCH
Host mode Guest mode
VMEXIT
6 February 16, 2015 | Stefan Hajnoczi
Memory virtualization with Intel EPT
Extended Page Tables (EPT) add a level of address
translation for guest physical memory.
Guest Guest memory address
Page
Table
Host
Page
Table
Physical
RAM
7 February 16, 2015 | Stefan Hajnoczi
How QEMU uses kvm.ko
QEMU userspace process uses kvm.ko driver to execute guest
code:
open("/dev/kvm")
ioctl(KVM_CREATE_VM)
ioctl(KVM_CREATE_VCPU)
for (;;) {
ioctl(KVM_RUN)
switch (exit_reason) {
case KVM_EXIT_IO: /* ... */
case KVM_EXIT_HLT: /* ... */
}
}
8 February 16, 2015 | Stefan Hajnoczi
QEMU process model
QEMU is a userspace process
Guest
RAM Unprivileged and isolated using
SELinux for security
QEMU Each KVM vCPU is a thread
Host kernel scheduler decides
Host kernel when vCPUs run
9 February 16, 2015 | Stefan Hajnoczi
Linux concepts apply to QEMU/KVM
Since QEMU is a userspace process, the usual Linux
tools work:
ps(1), top(1), etc see QEMU processes and threads
tcpdump(8) sees tap network traffic
blktrace(8) sees disk I/O requests
SystemTap and perf see QEMU activity
etc
10 February 16, 2015 | Stefan Hajnoczi
Architecture: Event-driven multi-threaded
Event loops are used for timers, file descriptor
monitoring, etc
Non-blocking I/O
Callbacks or coroutines
Multi-threaded architecture but with big lock
VCPU threads execute in parallel
Specific tasks that would block event loop are done
in threads, e.g. remote display encoding, RAM live
migration work, virtio-blk dataplane, etc
Rest of QEMU code runs under global mutex
11 February 16, 2015 | Stefan Hajnoczi
Architecture: Emulated and pass-through devices
Guest sees CPU, RAM, disk, etc like on real machines
Unmodified operating systems can run
Paravirtualized devices for better performance
Most devices are emulated and not real
Isolation from host for security
Sharing of resources between guests
Pass-through PCI adapters, disks, etc also possible
Dedicated hardware
12 February 16, 2015 | Stefan Hajnoczi
Architecture: Host/guest device emulation split
Guest device device model visible to guest
rtl8139 Intel e1000 virtio-net
Decouples
hardware
emulation from
I/O mechanism
tap L2TPv3 socket
Host device performs I/O on behalf of guest
13 February 16, 2015 | Stefan Hajnoczi
Architecture: virtio devices
KVM implements virtio device models
net, blk, scsi, serial, rng, balloon
See http://docs.oasis-open.org/virtio/ for specs
Open standard for paravirtualized I/O devices
Red Hat contributes to Linux and Windows guest
drivers
14 February 16, 2015 | Stefan Hajnoczi
Architectural exception: vhost in-kernel devices
Most device emulation is best done in userspace
Some APIs or performance features only available
in host kernel
vhost drivers emulate virtio devices in host kernel
vhost_net.ko high-performance virtio-net emulation
takes advantage of kernel-only zero-copy and
interrupt handling features
Other devices could be developed in theory, but
usually userspace is a better choice
15 February 16, 2015 | Stefan Hajnoczi
Storage in QEMU
Block drivers fall in two categories:
Formats image file formats (qcow2, vmdk, etc)
qcow2 raw
raw-posix rbd (Ceph)
Protocols I/O transports (POSIX file, rbd/Ceph, etc)
Plus additional block drivers that interpose like quorum,
blkdebug, blkverify
16 February 16, 2015 | Stefan Hajnoczi
Storage stack
Application
Guest application plus full file system
VFS and block layer
Block layer
Format
QEMU image format, storage migration,
Protocol I/O throttling
VFS Host full file system and block layer
Block layer
Beware double caching and anticipatory
Disk
scheduling delays!
17 February 16, 2015 | Stefan Hajnoczi
Walkthrough: virtio-blk disk read request (Part 1)
1. Guest fills in 2. Guest writes to virtio-blk
request descriptors virtqueue notify register
QEMU
Request header Device Guest
emulation
Data buffer
Request footer
Guest RAM
kvm.ko
18 February 16, 2015 | Stefan Hajnoczi
Walkthrough: virtio-blk disk read request (Part 2)
3. QEMU issues I/O request on behalf of guest
QEMU
Guest
Device
emulation Data buffer
Linux AIO
VFS
Block layer kvm.ko
Physical disk
19 February 16, 2015 | Stefan Hajnoczi
Walkthrough: virtio-blk disk read request (Part 3)
4. QEMU fills in request footer and injects completion interrupt
QEMU
Guest
Device
emulation Request footer
Linux AIO
Interrupt
VFS
Block layer kvm.ko
Physical disk
20 February 16, 2015 | Stefan Hajnoczi
Walkthrough: virtio-blk disk read request (Part 4)
5. Guest receives interrupt 6. Guest reads data
and executes handler from buffer
QEMU
Guest Request header
Data buffer
Request footer
Interrupt
Guest RAM
kvm.ko
21 February 16, 2015 | Stefan Hajnoczi
Thank you!
Technical discussion: qemu-devel@nongnu.org
IRC
#qemu on irc.oftc.net
#kvm on chat.freenode.net
http://qemu-project.org/
http://linux-kvm.org/
More on my blog: http://blog.vmsplice.net/
22 February 16, 2015 | Stefan Hajnoczi