Part II Cloud Infrastructure And Virtualization
Chapter 6 Containers
Introduction
• The previous chapter examines one form of server virtualization: virtual machines (VMs).
• This chapter considers an alternative form of server virtualization that has become popular for use in data centers:
container technology. The chapter explains the motivation for containers and describes the underlying
technology.
The Advantages And Disadvantages Of VMs
• The chief advantage of the VM approach lies in its support of arbitrary operating systems. VM technology
virtualizes processor hardware and creates an emulation so close to an actual processor that a conventional
operating system built to run directly on hardware can run inside a VM with no change. Thus, a cloud customer
who leases a VM has the freedom to choose all the software that runs in the VM, including the operating system.
• VM technology disadvantages are that VM creating entails the overhead of hosting an operating system and
running multiple VMs on a server imposes computation overhead because each operating system schedules apps
and runs background processes.
Figure 6.1 Illustration of four VMs each running their own operating system.
Traditional Apps And Elasticity On Demand
• VM technology works well in situations where a virtual server persists for a long time (e.g., days) or a user needs
the freedom to choose an operating system.
• All the overhead is unnecessary if a a user only runs a single application and does not need all the facilities in an
operating system.
• A conventional operating system includes a facility that satisfies most of the need: support for concurrent
processes. But in a multi-tenant cloud service, because processes share facilities, such as network address and a
file system that all an app to obtain information about other apps.
Isolation Facilities In An Operating System
• Most operating systems use virtual memory hardware to provide each process with a separate memory address
space and ensure that a running app cannot see or alter memory locations owned by other apps.
• User IDs provide additional isolation by assigning an owner to each running process and each file, and forbidding
a process owned by one user from accessing or removing an object owned by another user. However, the user ID
mechanism in most operating systems does not scale to handle a cloud service with arbitrary numbers of
customers.
Linux Namespaces Used For Isolation
• Significant advances in isolation mechanisms have arisen in the open-source community. Under various names,
such as jails, the community has incorporated isolation mechanisms into the Linux operating system. Known as
namespaces, a current set of mechanisms can be used to isolate various aspects of an application. Below are
seven major namespaces used with containers.
• The mechanisms control various aspects of isolation. For example, the process ID namespace allows each
isolated app to use its own set of process IDs 0, 1, 2..., and so on.
• Instead of all processes sharing the single Internet address that the host owns, the network namespace allows a
process to be assigned a unique Internet address.
• Isolation facilities in an operating system make it possible to run multiple apps on a computer without
interference and without allowing an app to learn about other isolated apps
The Container Approach For Isolated Apps
• Industry uses the term container to capture the idea of an environment that surrounds and protects an app while
the app runs.
• At any time, a server computer runs a set of containers, each of which contains an app.
Figure 6.3 The conceptual organization of software when a server computer runs a set of containers.
• A container consists of an isolated environment in which an application can run. A container runs on a
conventional operating system, and multiple containers can run on the same operating system concurrently. Each
container provides isolation, which means an application in one container cannot interfere with an application in
another container.
Docker Containers
• The Docker approach has become prominent for use with cloud systems for four main reasons:
o Tools that enable rapid and easy development of containers
o An extensive registry of software for use with containers
o Techniques that allow rapid instantiation of an isolated app
o Reproducible execution across hosts
• Development tools - The Docker technology provides an easy way to develop apps that can be deployed in an
isolated environment. Docker uses a high-level approach that allows a programmer to combine large pre-built
code modules into a single image that runs when a container is deployed.
• Extensive registry of software - The Docker project has produced Docker Hub, an extensive registry of open
source software that is ready to use and enables a programmer to share deployable apps, such as a web server,
without writing code. More important, a user (or an operator) can combine pieces from the registry in the same
way that a conventional program uses modules from a library.
• Rapid instantiation. Because a container does not require a full operating system, a container is much smaller
than a VM. Consequently, the time required to download a container can be an order of magnitude less than the
time required to download a VM. In addition, unlike a conventional app that may require an operating system to
load one or more libraries dynamically, the early binding approach means a Docker container does not need the
operating system to perform extra work when the container starts.
• Reproducible execution. Once a Docker container has been built, the container image becomes immutable —
the image remains unchanged, independent of the number of times the image runs in a container. Furthermore,
because all the necessary software components have been built in, a container image performs the same on any
system. As a result, container execution always gives reproducible results.
• The Docker model does not separate a container from its contents. A programmer creates all the software needed
for a container, including an app to run, and places the software in an image file. A separate image file must be
created for each app. When an image file runs, a container is created to run the app. We say the app has been
containerized.
Docker Terminology And Development Tools
• In addition to tools used to create and launch an app, the extended environment includes tools used to deploy and
control copies of running containers.
• A Docker image is a file that can be executed, and a container is the execution of an image. An image can have
two forms: a partial image that forms a building block, and a container image that includes all the software
needed for a particular app. Partial images form a useful part of the Docker programming environment analogous
to library functions in a conventional programming environment.
• Docker uses the term layer to refer to each piece of code a programmer includes in a container image. We say that
the programmer starts by specifying a base image and then adds one or more layers of software which can include
code from scratch or downloaded from one of the pre-built building blocks from a registry.
• Docker uses a special build facility analogous to Linux’s make. A programmer creates a text file with items that
specify how to construct a container image. By default Docker expects the specifications to be placed in a text file
named Dockerfile. A programmer runs docker build which reads Dockerfile, follows the specified steps,
constructs a container image, and issues a log of steps taken (along with error messages, if any).
• A Docker container image is not an executable file, and cannot be launched the same way one launches a
conventional app. Instead, one must use the command docker run to specify that an image is to be run as a Docker
container.
• The table in Figure 6.4 lists a few terms that Docker uses, along with items from conventional computing
systems that provide an analogy to help clarify the concept.
Docker Software Components
• Containers operate under the control of an application known as a Docker daemon (dockerd). In addition, Docker
provides a user interface program, docker. Collectively, the software is known as the Docker Engine. Figure 6.5
illustrates Docker software and shows that dockerd manages both images that have been saved locally and running
containers.
Figure 6.5 Illustration of the Docker daemon, dockerd, which manages both containers and images.
• The dockerd program runs in background at all times and contains several key subsystems. In addition to a
subsystem that launches and terminates containers, dockerd contains a subsystem used to build images and a
subsystem used to download items from a registry.
• A user does not interact with dockerd directly. Instead, dockerd provides two interfaces through which a user
can make requests and obtain information:
o A RESTful interface - intended for applications
o A Command Line Interface (CLI) - intended for humans
• RESTful Interface - Customers do not typically create containers manually but instead use orchestration
software to deploy and manage sets of containers. When it needs to create, terminate, or otherwise manage
containers, orchestration software uses dockerd’s RESTful interface. As expected, the RESTful interface uses
the HTTP protocol, and allows orchestration software to send requests and receive responses.
• Command Line Interface (CLI) - To accommodate situations when a human needs to manage containers
manually, dockerd offers an interactive command-line interface that allows a user to enter one command at a
time and receive a response. To send a command to dockerd, a user can type the following in a terminal window:
docker command arguments...
• Figure 6.6 lists example commands.
Figure 6.6 Examples of Docker commands a user can enter through the CLI.
• To construct a container image, a programmer creates a Dockerfile, places the file in the current directory, and
runs:
docker build .
where the dot specifies that Docker should look in the current directory for a Dockerfile that specifies how to build
the image.
• To run an image as a container, a programmer invokes the run command and supplies the image name. For
example:
docker run f34cd9527ae6
• Docker stores images until the user decides to remove them. To review the list of all saved images, a user can
enter:
docker images
which will print a list of all the saved images along with their names and the time at which each was created.
Base Operating System And Files
• In addition to layers of app software, most container images include a layer of software known as a base
operating system that acts as an interface to the underlying host operating system.
• Apps running in the container make calls to the base operating system which then makes calls to the host
operating system. When building a container image, a programmer starts by specifying a base operating system.
From a programmer’s perspective, most of the distinctions between a base operating system and the host
operating system are unimportant. For example, when a user runs a container interactively in a terminal window
and an app running the container writes to standard output, the output will appear in the user’s terminal window
and keystrokes entered in the window will be sent to the app running in the container.
• One aspect of containers differs from conventional apps: the file system. Apps running in the container can read
files and can create new files but because a container is immutable, however, so unless a programmer connects
a container to permanent storage, changes made to local files when a container runs will not be saved for
subsequent invocations of the container image.
Figure 6.7 Illustration of a small base operating system in a container.
Items In A Dockerfile
• A Dockerfile specifies a sequence of steps to be taken to build a container image.
• Each item in a Dockerfile begins with a keyword.
o FROM. The FROM keyword, which specifies a base operating system to use for the image, must appear
as the first keyword in a Dockerfile. For example, to use the al- pine Linux base, one specifies:
FROM alpine
o RUN. The RUN keyword specifies that a program should be run to add a new layer to the image. The
name is unfortunately confusing because the “running” takes place when the image is built rather than
when the container runs. For example, to execute the apk program during the build and request that it
add Python and Pip to the image, a programmer uses:
RUN apk add py2-pip
o ENTRYPOINT. A programmer must specify where execution begins when a container starts running.
To do so, a programmer uses the ENTRYPOINT keyword followed by the name of an executable program
in the image file system and arguments that will be passed to the program. ENTRYPOINT has two
forms: one in which arguments are passed as literal strings and one that uses a shell to invoke the initial
pro- gram. Using a shell allows variable substitution, but the non-shell option is preferred because the
program will run with process ID 1 and will receive Unix signals.
o CMD. The CMD keyword, which is related to ENTRYPOINT, has multiple forms and multiple purposes.
As a minor use, CMD can be used as an alternative to ENTRYPOINT to specify a command to be run
by a shell when the container starts; a programmer cannot specify an initial command with both CMD
and ENTRYPOINT. The main purpose of CMD focuses on providing a set of default arguments for EN-
TRYPOINT that can be overridden when the container is started.
o COPY and ADD. Both the COPY and the older ADD keywords can be used to add directories and files
to the file system being constructed for the image (i.e., the file system that will be available when the
image runs in a container). COPY is straightforward because it copies files from the computer where the
image is being built into the image file system. As an example, a programmer might build a Python
script, test the script on the local computer, and then specify that the script should be copied into an
image.
o The ADD keyword allows a programmer to specify local files or give a URL that specifies a remote file.
Furthermore, ADD understands how to decompress and open archives. In many cases, however, opening
an archive results in extra, unneeded files in the image, making the image larger than necessary.
Therefore, the recommended approach has become: before building an image, download remote files to
the local computer, open archives, remove unneeeded files, and use the COPY keyword to copy the files
into the image.
o EXPOSE. Although EXPOSE deals with Internet access and VOLUME deals with file systems, both
specify a way to connect a container to the outside world. EXPOSE specifies protocol port numbers that
the container is designed to use. For example, an image that contains a web server might specify that the
container is designed to use port 80:
EXPOSE 80
o VOLUME specifies a mount point in the image file system where an external file system can connect.
That is, VOLUME provides a way for a container to connect to persistent storage in a data center.
VOLUME does not specify a remote storage loca- tion, nor does it specify how the connection to a
remote location should be made. In- stead, such details must be specified when a user starts a container.
Figure 6.8 Examples of keywords used for items in a Dockerfile.
An Example Dockerfile
• Figure 6.9 contains a trivial example of a Dockerfile that specifies alpine to be the base operating system and
specifies that file /bin/echo (i.e., the Linux echo command) should be run when the container starts. When it runs
as a container, the image prints hi there. Once it runs, the echo command exits, which causes the container to exit.
FROM alpine
ENTRYPOINT ["/bin/echo", "hi there"]
Summary
• Container technology provides an alternate form of virtualization that avoids the overhead incurred when a guest
operating system runs in each VM. Although it operates much like a conventional process, a container uses
namespace mechanisms in the host operating system to remain isolated from other containers.
• To build a container for a Docker project, a programmer writes specifications in a text file (Dockerfile).
Specifications include a base operating system to use, other software that should be added to the container, an
initial program to execute when the container starts, and external network and file system connections that can
be used when an image runs in a container. To construct a container image, a programmer invokes Docker’s
build mechanism, which follows the specifications and produces an image. Once it has been created, an image
can be run in a container.
• Docker software includes two main pieces: a daemon named dockerd that runs in background, and a command
named docker that a user invokes to interact with dockerd through a command-line interface. The docker
interface allows a user to build an image or run and manage containers.