BROUGHT TO YOU IN PARTNERSHIP WITH
CONTENTS
Get More Refcardz! Visit DZone.com/Refcardz
236
Overview
The Docker Monitoring Challenge
Architectural Models
Intro to Docker Monitoring
Troubleshooting Options
BY A P U RVA DAV
Docker Stats API... and more!
into the container to monitor the activity in the container. It
will also require complex, brittle, and insecure networking
among containers. If you deploy the agent inside the container,
you have to modify each container to add the agent and
deploy N agents for N containers. This increases dependencies
and makes image management more difficult. And, at scale,
running thousands of monitoring agents for even a modestlysized deployment is an expensive use of resources.
OVERVIEW
Docker started as a tool for developers and test engineers to
simplify software delivery, but it has rapidly evolved into a
production-ready infrastructure platform. It promises to deliver
software more flexibly and more scalably to your end users,
while at the same time making microservices a reality.
As any new platform moves into production, monitoring
becomes an important aspect of its viability. Thats especially
true with a platform like Docker, where its architectural model
actually changes how you need to instrument your systems in
order to monitor it properly.
ARCHITECTURAL MODELS FOR MONITORING CONTAINERS
Models for collecting instrumented data from containersdo not
stray too far afield from the past, and can generally be broken
down into pushand pullmodels. Push models have an agent
that actively pushes metrics out to a central collection facility;
pull models periodically query the monitoring target for the
desired information.
This Refcard will lay out the basics of the Docker monitoring
challenge, give you hands on experience with basic monitoring
options, and also spell out some more advanced options.
As mentioned above, the most standard approach to
infrastructure monitoring in a VM-based world is a pushbased agent living in the user space. Two potential alternative
solutions arise for containers: 1) ask your developers to
instrument their code directly and push that data to a central
collection point,or 2) leverage a transparent form of pushbased instrumentation to see all application and container
activity on your hosts.
THE DOCKER MONITORING CHALLENGE
Containers have gained prominence as the building blocks
of microservices. The speed, portability, and isolation
of containers made it easy for developers to embrace a
microservice model. Theres been a lot written on the benefits
of containers, so we wont recount it all here.
INTRODUCTION TO DOCKER MONITORING
Containers are black boxes to most systems that live around
them. Thats incredibly useful for development, enabling a
high level of portability from Dev through Prod, from developer
laptop to cloud. But when it comes to operating, monitoring,
and troubleshooting a service, black boxes make common
activities harder, leading us to wonder: whats running in
the container? How is the application code performing? Is
it spitting out important custom metrics? From a DevOps
perspective, you need deep visibility inside containers rather
than just knowing that some containers exist.
The typical process for instrumentation in a non-containerized
environmentan agent that lives in the user space of a
host or VMdoesnt work particularly well for containers.
Thats because containers benefit from being small, isolated
processes with as few dependencies as possible. If you deploy
the agent outside of the container, the agent can not easily see
DZONE, INC.
DZONE.COM
INTRODUCTION TO DOCKER MONITORING
DOCKER STATS API
Docker has one unified API, and in fact all commands youd run
from a CLI are simply tapping that endpoint.
Forexample,ifyou have a host running Docker,docker ps would
returnthis,which isjust a reformatting of API data.
To show this lets query the API via curl and ask for all containers
running. For brevity were showing the JSON blob below for just
one container, and prettied up the JSON.
curl --unix-socket /var/run/docker.sock http:/containers/json
| python -m json.tool
{
Command: python bot.py,
Created: 1470960101,
HostConfig: {
NetworkMode: default
},
Id: 8a9973a456b3af0601c 44cf0ec19b35f01355622
There is an additional, advanced topic that Ill touch on briefly in
this Refcard: Docker containers often also use an orchestration to
aggregate containers into services. These orchestration systems
provide additional metadata that can be used to better monitor
Docker. We will see an example later on of using Docker labels in
this way to assist in service-level monitoring.
b5d5061552f5b84a6a335d25,
Image: sysdig/sysdig-bot,
ImageID: sha256:90d5bbf5afc4
ce73223b5d57a249c5e05451f d4ab2414a799498e333503ffe4a,
Labels: {},
Mounts: [],
Names: [
Lets now put some of this into practice with some common, opensource-based ways of gleaning metrics from Docker.
/sysdig-bot
],
NetworkSettings: {
Networks: {
DOCKER MONITORING & TROUBLESHOOTING OPTIONS
bridge: {
Aliases: null,
There are of course a lot of commercial tools available that monitor
Docker in various ways. For your purposes in getting started, its
more useful to focus on open-source Docker monitoring options.
Not only will you be able to roll your sleeves up right away, youll
get a better understanding of the primitives that underpin Docker.
OPEN SOURCE
TOOL
DESCRIPTION
PROS & CONS
Docker Stats API
Poll basic metrics directly
from Docker Engine.
Basic stats output from CLI.No
aggregation or visualization.
cAdvisor
Google-provided agent
that graphs 1-minute data
from the Docker Stats API.
Limited time-frame, limited
metrics.
Time-series
databases
Category of products like
InfluxDB and Graphite that
can store metrics data.
Good for historical trending.
Requires you to set up a
database, and glue together
ingestion, DB, and visualization.
Sysdig
Container-focused Linux
troubleshooting and
monitoring tool.
Useful for deep troubleshooting
and historical captures, but
doesnt provide historical
trending on its own.
EndpointID: 253f6015
2b62e4d0a551657895bc84ec2e0e15657d f90d403f09ca6021425227,
Gateway: 172.17.0.1,
GlobalIPv6Address: ,
GlobalIPv6PrefixLen: 0,
IPAMConfig: null,
IPAddress: 172.17.0.2,
IPPrefixLen: 16,
IPv6Gateway: ,
Links: null,
MacAddress: 02:42:ac:11:00:02,
NetworkID:
}
}
},
Ports: [],
D Z ONE, INC.
State: running,
Status: Up 20 hours
},
Now lets apply this API to our monitoring needs. The /stats/
endpoint gives you streaming output of a wide selection of
resource-oriented metrics for your containers. Lets get the
available stats for just one container:
DZONE.COM
INTRODUCTION TO DOCKER MONITORING
If you are looking to historically graph this data, you could also
route data from cAdvisor to numerous time-series datastores
via plugins, described here. Tying an open-source visualization
engine on top of this, like Grafana, will allow you to produce
something like this:
curl --unix-socket /var/run/docker.sock
http:/containers/8a9973a456b3/stats
,"system\_cpu\_usage":266670930000000,"throttling\_
data":},"cpu\_stats":,"system\_cpu\_
usage":266671910000000,"throttling\_
data":},"memory\_stats":{"usage":27516928,"max\_
usage":31395840,"stats":{"active\_anon":17494016,"active\_
file":5144576,"cache":10022912,
Not pretty, but an awful lot of metrics for us to work with!
If you wanted a one-shot set of metrics instead of streaming, use the
stream=false option:
curl --unix-socket /var/run/docker.sock
http:/containers/8a9973a456b3/stats?stream=false
DOCKER MONITORING OVER TIME &IN-DEPTH
As youve probably guessed, the API is useful to get started but likely
not the only thing you need to robustly monitor your applications
running in Docker. The API is limiting in two ways: 1) it doesnt allow
you to perform time-based trending and analysis,and 2) it doesnt
give you the ability tododeep analysis on application-or systemlevel data. Lets attack these problems with cAdvisor and sysdig.
In most of these cases, however, were limited to basic CPU,
memory, and network data from these tools. What if we wanted to
get deeperto not only monitor resource usage, but processes, files,
ports, and more?
cAdvisor is a simple server that taps the Docker API and provides
one minute of historical data in 1-second increments. Its a useful
way to visualize whats going on at a high level with your Docker
containers on a given host. cAdvisor simply requires one container
per host that youd like to visualize.
DOCKER MONITORING AND DEEP TROUBLESHOOTING WITH SYSDIG
Thats where another open-source tool, sysdig, comes into play. Its
a Linux visibility tool with powerful command-line options that
allow you to control what to look at and display it.You can also use
csysdig, its curses-based interface, for an easier way to start. Sysdig
also has the concept of chisels, which are pre-defined modules that
simplify common actions.
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080\
--detach=true\
--name=cadvisor \
google/cadvisor:latest
Once you install sysdig as a process or a container on your machine,
it sees every process, every network action, and every file action on
the host. You can use sysdig live or view any amount of historical
data via a system capture file.
As a next step, we can take a look at the total CPU usage of each
running container:
cAdvisor is now running (in the background) on http://localhost:8080.
The setup includes directories with Docker state cAdvisor needs to
observe. Accessing the interface gives you this:
\$ sudo sysdig -c topcontainers\_cpu
CPU% container.name
---------------------------------------------------------90.13% mysql
15.93% wordpress1
7.27% haproxy
3.46% wordpress2
...
This tells us which containers are consuming the machines CPU.
What if we want to observe the CPU usage of a single process,
but dont know which container the process belongs to? Before
answering this question, let me introduce the -pc(or -pcontainer)
command-line switch. This switch tells sysdig that we are
requesting container context in the output.
For instance, sysdig offers a chisel called topprocs_cpu, which we
can use to see the top processes in terms of CPU usage. Invoking
this chisel in conjunction with -pcwill add information about
D Z ONE, INC.
DZONE.COM
5
which container each process belongs to.
\$ sudo sysdig -pc -c topprocs\_cpu
As you can see, this includes details such as both the external and
the internal PID and the container name.
Keep in mind: -pcwill add container context to many of the
command lines that you use, including the vanilla sysdig output.
By the way, you can do all of these actions live or create a capture
of historical data. Captures are specified by:
\$ sysdig w myfile.scap
And then analysis works exactly the same.
INTRODUCTION TO DOCKER MONITORING
Now lets move on to the network, where things get even more
interesting.
We can see network utilization broken up by process:
sudo sysdig -pc -c topprocs\_net
Bytes Process Host\_pid Container\_pid container.
name
--------------------------------------------------------72.06KB haproxy 7385
13 haproxy
56.96KB docker.io 1775 7039 host
44.45KB mysqld 6995 91 mysql
44.45KB mysqld 6995 99 mysql
29.36KB apache2 7893 124 wordpress1
29.36KB apache2 26895 126
wordpress4
29.36KB apache2 26622 131 wordpress2
29.36KB apache2 27935 132 wordpress3
29.36KB apache2 27306 125 wordpress4
22.23KB mysqld 6995 90 mysql
Note how this includes the internal PID and the container name
of the processes that are causing most network activity, which is
useful if we need to attach to the container to fix stuff. We can
also see the top connections on this machine:
sudo sysdig -pc -c topconns
Bytes container.name Proto Conn
What if we want to zoom into a single container and only see the
processes running inside it? Its just a matter of using the same
topprocs_cpuchisel, but this time with a filter:
\$ sudo sysdig -pc -c topprocs\_cpu container.name=client
CPU% Process container.name
----------------------------------------------------------22.23KB wordpress3 tcp 172.17.0.5:46955->
172.17.0.2:3306
22.23KB wordpress1 tcp 172.17.0.3:47244->
172.17.0.2:3306
22.23KB mysql tcp 172.17.0.5:46971->
172.17.0.2:3306
----------------------------------------------
22.23KB mysql tcp 172.17.0.3:47244->
02.69% bash client
172.17.0.2:3306
31.04% curl client
0.74% sleep client
22.23KB wordpress2 tcp 172.17.0.4:55780->
172.17.0.2:3306
22.23KB mysql tcp 172.17.0.4:55780->
172.17.0.2:3306
Compared to dockertopand friends, this filtering functionality
gives us the flexibility to decide which containers we see. For
example, this command line shows processes from all of the
wordpress containers:
\$ sudo sysdig -pc -c topprocs\_cpu container.name contains
wordpress
CPU% Process container.name
-------------------------------------------------6.38% apache2 wordpress3
7.37% apache2 wordpress2
5.89% apache2 wordpress4
6.96% apache2 wordpress1
So to recap, we can:
See every process running in each container including
internal and external PIDs
Dig down into individual containers
Filter to any set of containers using simple, intuitive filters
all without installing a single thing inside each container.
14.21KB host tcp 127.0.0.1:60149->
127.0.0.1:80
This command line shows the top files in terms of file I/O, and
tells you which container they belong to:
\$ sudo sysdig -pc -c topfiles\_bytes
Bytes container.name Filename
----------------------------------------------------------63.21KB mysql /tmp/\#sql\_1\_0.MYI
6.50KB client /lib/x86\_64-linux-gnu/libc.so.6
3.25KB client /lib/x86\_64-linux-gnu/libpthread.so.0
3.25KB client /lib/x86\_64-linux-gnu/libgcrypt.so.11
3.25KB client /usr/lib/x86\_64-linux-gnu/libwind.so.0
3.25KB client
/usr/lib/x86\_64-linux-gnu/libgssapi\_
krb5.so.2
3.25KB client /usr/lib/x86\_64-linux-gnu/liblber2.4.so.2
3.25KB client /lib/x86\_64-linux-gnu/libssl.so.1.0.0
3.25KB client /usr/lib/x86\_64-linux-gnu/libheimbase.
so.1
3.25KB client /lib/x86\_64-linux-gnu/libcrypt.so.1
Naturally there is a lot more you can do with a tool like this, but
that should be a sufficient start to put our knowledge to work in
some real-life examples.
INTRODUCTION TO DOCKER MONITORING
Firstis using 2 times the Host CPU than the others because it
has 2 times the shares. All of them are using 100% of CPU shares
assigned. But what happens if Thirddoes not need any CPU at all?
REAL-WORLD EXAMPLES: WHAT TO MONITOR, WHY, AND HOW
So now weve done some of the basics, and its time to take the
training wheels off. Lets take a look at some more complex,
real-world metrics you should pay attention to. Well show you
the metrics, talk about why theyre important and what they
might mean. For this section weve visualized the data using
Sysdig Cloud, the commercial version of Sysdig thats designed
to aggregate data across many hosts and display within a web
UI. You could do the following examples via any of the opensource time-series databases, provided youre collecting the
correct information.
VISUALIZING CPU SHARES & QUOTA
For those of you used to monitoring in a VM-based world,
youre likely familiar with the concepts of CPU allocation,
stolen CPU, and greedy VMs. Those same issues apply with
containers, except they are magnified significantly. Because
you may be packing containers densely on a machine, and
because workloads are typically much more dynamic than
in VM-based environments, you may encounter significantly
more resource conflict if youre not carefully monitoring and
managing allocation. Lets focus on CPU, as its a bit more
complex than memory.
Lets start by visualizing CPU shares. Imagine a host with 1 core
and 3 containers using as much CPU as possible. We assign 1024
shares to one container and 512 shares to the other two. This is
what we get:
The amount of unused shares is given to others relative to their
weight. So if Thirdis not using any of its CPU shares, Firstand
Secondinstead are using 140% of CPU Shares. In general, its OK
to consume more shares than originally allocated, because the
kernel tries not to waste CPU.
A percentage of shares used thats consistently over 100 means
we are not allocating enough resources to our services. The
implication in the example above is that Firstand Secondwere
able to consume much more CPU than they were originally
allocated. If either of those were, for example, a web server, it
likely means we are allocating less CPU than it needs to complete
current user requests (thats not a good situation). If either were a
batch processing job, it means the job can use more CPU to finish
faster (good, but maybe not critical).
VISUALIZING CPU QUOTA
Giving processes the maximum available CPU may be not always
be what you want. If your cluster is multi-tenant, or if you just
need a safe ceiling for an unpredictable application, you might
like to implement a hard limit on CPU utilization. The Linux
kernel supports absolute CPU limits with CPU quotas. You assign
a quota in milliseconds relative to a period, and the process will
be able to spend on CPU only that fraction of time in a period.
For example lets consider the same case as above, now with a
quota of 50ms/100ms for Firstand 25ms/100ms for Secondand
Third:
D Z O NE, INC .
DZ O NE .C O M
INTRODUCTION TO DOCKER MONITORING
important to keep track of basic information, such as how much
data is a container consuming? Emitting?
This type of data collection requires something morefull-featured
than the Docker API, so instead you could collect this type of
information from open-source sysdig. Lets look at some basic
network data for a set of three containers each running the same
Java application:
As you can see.there is some slight variation among these three
containers. If, however, we saw an extreme variation, we may want
to investigate further.
At the same time, since these containers are all running the same
Java application, it may be more useful to consider them a service
and see how they are performing in aggregate. This leads up to our
last example.
The result is the same as with shares. The difference occurs when
Thirddoes not use the CPU allocated to it.
FROM CONTAINER TO MICROSERVICE DATA WITH LABELS
Docker provides a concept called labels. These are much like
they soundadditional, contextual information isare applied on
a per-container basis. They are unstructured and non-hierarchical.
As such,you can use them to broadly identify subcategoriesof
your containers. All the containers of a given service could carry
the same label, non-standard containers could carry another
label, different versions of software could have yet another label. If
youre a filer and an organizer, labels will be heaven for you.
So what can we do with a label? Well, the first thing is that we can
aggregate data. From the example above, lets suppose we applied
the label javapp to those three containers. Now, when we show
our network data we see something much simpler:
One linethats it. In this case were showing the average network
data across all three containers, but you could easily calculate
anything that helps you better understand the performance of this
collection of containers.
But lets go a little further with labels, network data, and the top
connections example we showed in the open-source section.
Now instead of giving CPU to other containers, the kernel is
enforcing the absolute quota given. The total CPU usage we will see
Using this information and an appropriate visualization, we can do
reported for the host will be 75%.
more than create a table of network data: we can actually create a
BASIC NETWORKING DATA
they are communicating with. Here we can see the aggregated java
map of our services, the containers that make them up, and who
Regardless of your platform, some things dont change and
service, the individual containers that make up the service, and
thats certainly true when it comes to networking data. Especially
(in a more complete view) would show all the other services in
with Docker in the mix, networking can become more complex
your environment that the java service communicates with. Note
and communication patterns can become more convoluted. Its
that this is a little more advanced than the other examples, and
D Z O NE, INC .
DZ O NE .C O M
INTRODUCTION TO DOCKER MONITORING
in particular the visualization may require some coding in D3 or
Mongo service and a Redis service, and presumably those are made
something similar if you want to stay fully opensource.
up of containers as well (hidden here to avoid too much complexity).
This view helps us in a few different ways:
We quickly can understand the logical composition of our
application.
We can aggregate containers into higher-level services.
We can easily see communication patterns among containers.
We may be able to easily spot outliers or anomalies.
CONCLUSION
In this Refcard, weve walked from first principles using the
Docker Stats API all the way up to more complex analysis of our
systems performance. Weve used data sources such as cAdvisor
and sysdig to analyze real-world use cases such as greedy
containers or mapping network communication.
Here we see a few different things: our javaapp consists of three
containers (blue) and a service called javapp (grey), which is just
an abstraction created by whoever is routing requests to those
containers. We see each of those containers communicating with a
As you can see, Docker monitoring can start very simply but grow
complex as you actually take containers into production. Get
experience early and then grow your monitoring sophistication to
what your environment requires.
ABOUT THE AUTHOR
RESOURCES
APURVA DAV @ApurvaBDave is the VP of marketing at Sysdig.
Hes in marketing and (gasp!) not afraid of a command line. Hes
been helping people analyze and accelerate infrastructure for the
better part of two decades. He previously worked at Riverbed on both
WAN acceleration and Network Analysis products, and at Inktomi
on infrastructure products. He has a computer science degree from
Brown University and an MBA from UC Berkley.
Docker Stats Documentation:
https://docs.docker.com/engine/
reference/api/docker_remote_api
Sysdig Open Source
Documentation:
http://www.sysdig.org/wiki
BROWSE OUR COLLECTION OF FREE RESOURCES, INCLUDING:
RESEARCH GUIDES: Unbiased insight from leading tech experts
REFCARDZ: Library of 200+ reference cards covering the latest tech topics
COMMUNITIES: Share links, author articles, and engage with other tech experts
JOIN NOW
DZONE, INC.
150 PRESTON EXECUTIVE DR.
CARY, NC 27513
DZone communities deliver over 6 million pages each month to more than 3.3 million software
888.678.0399
developers, architects and decision makers. DZone offers something for everyone, including
919.678.0300
news, tutorials, cheat sheets, research guides, feature articles, source code and more.
REFCARDZ FEEDBACK WELCOME
"DZone is a developer's dream," says PC Magazine.
refcardz@dzone.com
Copyright 2016 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
SPONSORSHIP OPPORTUNITIES
sales@dzone.com
BROUGHT TO YOU IN PARTNERSHIP WITH