Module 4
Module 4
Contents
Container fundamentals
Containers versus virtual machines
Different container technologies
Configuring a container engine
Container virtual networking
Container orchestration and clustering
Images and containers
Case study : Docker
Container fundamentals
Operating system Runs a complete operating system Runs the user mode portion of an operating
including the kernel, thus requiring more system, and can be tailored to contain just
system resources (CPU, memory, and the needed services for your app, using
storage). fewer system resources.
Guest Runs just about any operating Runs on the same operating system
compatibility system inside the virtual machine. version as the host (Hyper-V
isolation enables you to run earlier
versions of the same OS in a
lightweight VM environment)
Fault tolerance VMs can fail over to another If a cluster node fails, any
server in a cluster, with the containers running on it are
VM's operating system rapidly recreated by the
restarting on the new server. orchestrator on another cluster
node.
Different container technologies
Container runtimes
Docker was the first major open-source container offering, and
quickly emerged as a de facto standard. Now Kubernetes is
evolving as the new standard for clusters and cluster management.
Kubernetes initially supported Docker and rkt (or "rocket")
through custom code. But now, with the creation of the Container
Runtime Interface (CRI), you have many ways to store virtual
machines and at the same time communicate through that
interface.
What is clustering..??
Container Orchestration Engines (COE) are tools which help in managing many
containers running on multiple hosts.
Introduction to COE
⇒ Containers provide users with an easy way to package and run their applications.
Packaging involves defining the library and tools that are necessary for a user's
application to run. These packages, once converted to images, can be used to create
and run containers.
⇒ These containers can be run anywhere, whether it's on developer laptops, QA
systems, or production machines, without any change in environment. Docker and
other container runtime tools provide the facility to manage the life cycle of such
containers.
⇒ Using these tools, users can build and manage images, run
containers, delete containers, and perform other container life cycle
operations. But these tools can only manage one container on a
single host. When we deploy our application on multiple containers
and multiple hosts, we need some kind of automation tool. This type
of automation is generally called orchestration.
Orchestration tools provide a number of features, including:
→ Provisioning and managing hosts on which containers will run
→ Pulling the images from the repository and instantiating the containers
→ Managing the life cycle of containers
→ Scheduling containers on hosts based on the host's resource availability
→ Starting a new container when one dies
→ Scaling the containers to match the application's demand
→ Providing networking between containers so that they can access each other on
different hosts
→ Exposing these containers as services so that they can be accessed from
outside
→ Health monitoring of the containers
→ Upgrading the containers
Generally, these kinds of orchestration tools provide declarative configuration in
YAML or JSON format. ⇒ These definitions carry all of the information related
to containers including image, networking, storage, scaling, and other things.
Orchestration tools use these definitions to apply the same setting to provide the
same environment every time.
There are many container orchestration tools available, such as Docker Machine,
Docker Compose, Kuberenetes, Docker Swarm, and Apache Mesos.
The primary clustering and orchestration tools available are
Docker-Swarm, Kubernetes.
Docker Swarm
We can use Docker Swarm to make Docker work across multiple nodes,
allowing them to share containers with each other. It's an environment where
you can have various Docker images running on the same host operating
system.
Each node of a Docker Swarm is a Docker daemon, and all Docker daemons
interact using the Docker API. Each container within the Swarm can be
deployed and accessed by nodes of the same cluster.
How Does Docker Swarm Work?
In Swarm, containers are launched using services. A service is a group of
containers of the same image that enables the scaling of applications. Before you
can deploy a service in Docker Swarm, you must have at least one node
deployed.
Worker node: Receives and executes tasks from the manager node
How Does Docker Swarm Work?
The manager node knows the status of the worker nodes in a cluster, and the
worker nodes accept tasks sent from the manager node.
Every worker node has an agent that reports on the state of the node's tasks to
the manager. This way, the manager node can maintain the desired state of the
cluster.
The worker nodes communicate with the manager node using API over HTTP.
In Docker Swarm, services can be deployed and accessed by any node of the
same cluster.
How Does Docker Swarm Work?
The following diagram represents the Docker Swarm architecture:
Docker Swarm components
⇒ The following sections explain the various components in Docker Swarm.
Node
⇒ Node is an instance of the Docker host participating in the Swarm cluster. There
can be one or multiple nodes in a single Swarm cluster deployment.
⇒ Nodes are categorized into Manager and Worker based on their roles in the
system.
Manager node
⇒ The Swarm manager node manages the nodes in the cluster. It provides the API
to manage the nodes and containers across the cluster.
⇒ Manager nodes distribute units of work, also known as tasks, to worker nodes. If
there are multiple manager nodes, then they select a single leader to perform an
orchestration task.
Worker node
⇒ The worker node receives and executes task distributed by manager nodes. By
default, every manager node is also a worker node, but they can be configured to run
manager tasks exclusively.
⇒ Worker nodes run agents and keep track of tasks running on them and reports them.
The worker node also notifies the manager node about the current state of assigned
tasks.
Tasks
⇒ Task is the individual Docker container with a command to run inside the container.
⇒ The manager assigns the tasks to worker nodes. Tasks are the smallest unit of
scheduling in the cluster.
Services
⇒ Service is the interface for a set of Docker containers or tasks running across the
Swarm cluster.
Discovery service
⇒ The Discovery service stores cluster states and provides node and service
discoverability. (Service discovery is how applications and (micro)services locate each
other on a network.)
⇒ Swarm supports a pluggable backend architecture that supports etcd, Consul,
Zookeeper, static files, lists of IPs, and so on, as discovery services.
Scheduler
⇒ The Swarm scheduler schedules the tasks on different nodes in the system.
⇒ Docker Swarm comes with many built-in scheduling strategies that gives users the
ability to guide container placement on nodes in order to maximize or minimize the task
distribution across the cluster.
⇒ The random strategy is also supported by Swarm. It chooses a random node to place
the task on.
Features of Docker Swarm
Some of the most essential features of Docker Swarm are:
Decentralized access: Swarm makes it very easy for teams to access and
manage the environment
High security: Any communication between the manager and client nodes
within the Swarm is highly secure
Autoload balancing: There is autoload balancing within your environment, and
you can script that into how you write out and structure the Swarm environment
High scalability: Load balancing converts the Swarm environment into a highly
scalable infrastructure
Roll-back a task: Swarm allows you to roll back environments to previous safe
environments
Swarm Mode Key Concepts
Service and Tasks
Docker containers are launched using services.
Services can be deployed in two different ways - global and replicated.
Swarm Mode Key Concepts
Global services are responsible for monitoring containers that want to run on a
Swarm node.
A service is the definition of the tasks to execute on the manager or worker nodes.
When you create a service, you specify which container image to use and which
commands to execute inside running containers.
Swarm Mode Key Concepts
In the replicated services model, the swarm manager distributes a specific number
of replica tasks among the nodes based upon the scale you set in the desired state.
For a replicated service, you specify the number of identical tasks you want to run.
For example, you decide to deploy an HTTP service with three replicas, each
serving the same content.
For global services, the swarm runs one task for the service on every available node
in the cluster.
A global service is a service that runs one task on every node. There is no pre-
specified number of tasks. Each time you add a node to the swarm, the orchestrator
creates a task and the scheduler assigns the task to the new node.
Good candidates for global services are monitoring agents, an anti-virus scanners or
other types of containers that you want to run on every node in the swarm.
Swarm mode
⇒ In version 1.12, Docker introduced the Swarm mode, built into its engine. To run a
cluster, the user needs to execute two commands on each Docker host:
⇒ To enter Swarm mode:
$ docker swarm init
⇒ To add a node to the cluster:
$ docker swarm join
⇒ Unlike Swarm, Swarm mode comes with service discovery, load balancing, security,
rolling updates and scaling, and so on, built into the Docker engine itself.
⇒ Swarm mode makes the management of the cluster easy since it does not require any
orchestration tools to create and manage the cluster.
KUBERNETES
⇒ Kubernetes is a container orchestration engine created by Google, designed to
automate the deployment, scaling, and operating of containerized applications.
⇒ Kubernetes automates your application, manages its life cycle, and maintains and
tracks resource allocation in a cluster of servers. It can run application containers on
physical or virtual machine clusters.
⇒ It provides a unified API to deploy web applications, databases, and batch jobs.
Features of Kubernetes
→ Auto-scaling
→ Self-healing infrastructure
→ Quota management
Kubernetes architecture
External request
⇒ It exposes the Kubernetes APIs. All of the internal and external requests
go through the API server. It verifies all of the incoming requests for
authenticity and the right level of access, and then forwards the requests to
targeted components in the cluster.
etcd
⇒ etcd is used for storing all of the cluster state information by Kubernetes. etcd
is a critical component in Kubernetes. Kubernetes uses etcd to store all its data –
its configuration data, its state, and its metadata. Kubernetes is a distributed
system, so it needs a distributed data store like etcd. etcd lets any of the nodes in the
Kubernetes cluster read and write data.
kube-controller-manager
⇒ There are multiple controllers in the Kubernetes cluster such as the node
controller, replication controller, endpoints controller, service account, and token
controllers. These controllers are run as background threads that handle routine tasks
in the cluster.
kube-scheduler
⇒ It watches all of the newly created pods (A pod is the smallest execution unit in
Kubernetes.) and schedules them to run on a node if they aren't assigned to any node.
Worker nodes
⇒ The worker nodes run the user's applications and services. There can be one or
more worker node in the cluster. You can add or remove nodes from the cluster to
achieve scalability in the cluster. Worker nodes also run multiple components to
manage applications.
kubelet
⇒ Pods have:
a unique IP address (which allows them to communicate with each other)
persistent storage volumes (as required)
configuration information that determine how a container should run.
⇒ Although most pods contain a single container, many will have a few containers
that work closely together to execute a desired function.
Concepts in Kubernetes
⇒ In the following sections, we will learn about the concepts of Kubernetes that
are used to represent your cluster.
Replica sets and replication controllers
⇒ Replica sets are the next generation of replication controllers.
⇒ A Replica Set's purpose is to maintain a stable set of replica Pods running at any
given time. As such, it is often used to guarantee the availability of a specified
number of identical Pods.
⇒ A Replication Controller ensures that a specified number of pod replicas are
running at any one time. In other words, a Replication Controller makes sure that
a pod or a homogeneous set of pods is always up and available.
⇒ A pod is ephemeral and won't be rescheduled if the node it is running on goes
down. The replica set ensures that a specific number of pod instances (or replicas)
are running at any given time.
Deployments
⇒ Deployment is high-level abstraction which creates replica sets and
pods. Replica sets maintain the desired number of pods in a running
state.
⇒ Deployment provides an easy way to upgrade, rollback, and scale up
or scale down pods by just changing the deployment specification.
Secrets
⇒ Secrets are used to store sensitive information such as usernames,
passwords, OAuth tokens, certificates, and SSH keys.
⇒ It's safer and more flexible to store such sensitive information in
secrets rather than putting them in pod templates.
⇒ Pods can refer these secrets and use the information inside them.
Labels and selectors
⇒ Labels are key value pairs that can be attached to objects, such as pods and even
nodes.
⇒ They are used to specify the identifying attributes of objects that are meaningful
and relevant to users. Labels can be attached to objects at creation time and added
or modified later.
⇒ They are used to organize and select subsets of objects. Some examples include
environment (development, testing, production, release), stable, pike, and so on.
⇒ Labels don't provide uniqueness. Using label selectors, a client or user can
identify and subsequently manage a group of objects. This is the core grouping
primitive of Kubernetes and it is used in many situations.
⇒ Kubernetes supports two kinds of selectors: equality-based and set-based.
→ Equality-based uses key value pairs to filter based on basic equality or
inequality, whereas set-based are a bit more powerful and allow for the
filtering of keys according to a set of values.
Services
⇒ As pods are short-lived objects in Kubernetes, the IP address assigned to them
can't be relied upon to be stable for a long time. This makes the communication
between pods difficult. Hence, Kubernetes has introduced the concept of a
service.
⇒ A service is an abstraction on top of a number of pods and a policy by which to
access them, typically requiring the running of a proxy for other services to
communicate with it via a virtual IP address.
Volumes
⇒ Volume provides persistent storage to pods or containers. If data is not persisted
on external storage, then once the container crashes, all of its files will be lost.
⇒ Volumes also make data sharing easy between multiple containers inside the pod.
Kubernetes supports many types of volumes, and pods can use any number of
volumes simultaneously.
IMAGES AND CONTAINERS
A Docker image is a collection of all of the files that make up a
software application.
→ A Docker image is a file used to execute code in a Docker container.
Docker is used to create, run and deploy applications in containers. A
Docker image contains application code, libraries, tools, dependencies and
other files needed to make an application run.
→ When a user runs an image, it can become one or many instances of
a container.
→ Each change that is made to the original image is stored in a separate
layer. To be precise, any Docker image has to originate from a base image
according to the various requirements.
A Docker container is a runtime instance of an image. From one image you
can create multiple containers (all running the sample application) on
multiple Docker platform.
A container runs as a discrete process on the host machine. Because the
container runs without the need to boot up a guest operating system it is
lightweight and limits the resources (e.g. memory) which are needed to let it
run.
A base image is the image that is used to create all of your container
images. Your base image can be an official Docker image, such as Centos, or
you can modify an official Docker image to suit your needs, or you can create
your own base image from scratch.
⇒ Additional modules can be attached to the base image for deriving the
various images that can exhibit the preferred behavior.
→ Each time you commit to a Docker image you are creating a new layer on
the Docker image, but the original image and each pre-existing layer
remains unchanged.
→ In other words, images are typically of the read-only type.
→ If they are empowered through the systematic attachment of newer
modules, then a fresh image will be created with a new name.
→ The Docker images are turning out to be a viable base for developing and
deploying the Docker containers.
A base image has been illustrated here. Debian is the base image,
and a variety of desired capabilities in the form of functional
modules can be incorporated on the base image for arriving at
multiple images:
⇒ Every image has a unique ID, The base images can be enhanced such that
they can create the parent images, which in turn can be used for creating the
child images.
⇒ As per the Docker home page, a Docker image has a read-only template.
⇒ Docker provides a simple way for building new images or of updating the
existing images. You can also download the Docker images that the other
people have already created.
⇒ The Docker images are the building components of the Docker containers.
⇒ In general, the base Docker image represents an operating system, and in the
case of Linux, the base image can be one of its distributions, such as Debian.
⇒ Each commit invariably makes a new image. This makes the number of
images go up steadily, and so managing them becomes a complicated
affair.
→ However, the storage space is not a big challenge because the
new image that is generated is only comprised of the newly added
modules.
A Docker layer
⇒ Docker layer could represent either read-only images or read-
write images. However, the top layer of a container stack is always
the read-write (writable) layer, which hosts a Docker container.
Docker Registry
⇒ A Docker Registry is a place where the Docker images can be stored in
order to be publicly found, accessed, and used by the worldwide
developers for quickly crafting fresh and composite applications without
any risks.
→ Using the Docker push command, you can dispatch your Docker
image to the Registry so that it is registered and deposited.
→ As a clarification, the registry is for registering the Docker images,
whereas the repository is for storing those registered Docker images in
a publicly discoverable and centralized place.
→ A Docker image is stored within a Repository in the Docker
Registry. Each Repository is unique for each user or account.
Docker Repository
⇒ A Docker Repository is a namespace that is used for storing a Docker image.
For instance, if your app is named helloworld and your username or
namespace for the Registry is thedockerbook then, in the Docker
Repository, where this image would be stored in the Docker Registry would be
named thedockerbook/helloworld.
Working with Docker images
⇒ As an example, we are starting a Docker container. It is standard practice
to start with the basic Hello World! application. In the following example,
we will echo Hello World! by using a busybox image, which we have
already downloaded, as shown here:
$ sudo docker run busybox echo "Hello World!"
Output : "Hello World!"
In the example, the docker run subcommand has been used for creating a
container and for printing Hello World! by using the echo command.
⇒ Now, let us check the docker pull subcommand by adding the -a
option
$ sudo docker pull -a busybox
⇒ You can easily check the images that are available on the Docker host
by running the docker images subcommand, which comes in handy, and
it reveals more details with respect to :latest and the additional images
that are downloaded by running this command. Let us run this
command:
$ sudo docker images
⇒ You will get the list of images, as follows:
CREATED VIRTUAL SIZE
busybox ubuntu-14.04 f6169d24347d 3
months ago 5.609 MB
busybox ubuntu-12.04 492dad4279ba 3
months ago 5.455 MB
busybox buildroot-2014.02 4986bf8c1536 3
months ago 2.433 MB
busybox latest 4986bf8c1536 3
months ago 2.433 MB
busybox buildroot-2013.08.1 2aed48a4e41d 3
months ago 2.489 MB
⇒ Here is a list of the possible categories:
→ REPOSITORY: This is the name of the repository or image. In the preceding
example, the repository name is busybox.
→ TAG: This is the tag associated with the image, for example buildroot-2014.02,
ubuntu-14.04, latest. One or more tags can be associated with one image.
→ IMAGE ID: Every image is associated with a unique ID. The image ID is
represented by using a 64 Hex digit long random number.
▪ By default, the Docker images subcommand will only show 12 Hex digits.
→ CREATED: Indicates the time when the image was created.
→ VIRTUAL SIZE: Highlights the virtual size of the image.
⇒ In the preceding example, a single pull command with the -a option was
able to download five images, even though we had only specified one
image by the name of busybox.
→ This happened because each Docker image repository can have multiple
variants of the same image and the -a option downloads all the variants that
are associated with that image.
→ In the preceding example, the variants are tagged as buildroot-2013.08.1,
ubuntu-14.04, ubuntu-12.04, buildroot-2014.02 and latest.
⇒ By default, Docker always uses the image that is tagged as latest.
⇒ Each image variant can be directly identified by qualifying it with its tag.
DOCKER
⇒ Docker is an open platform for developing, shipping, and running
applications.
→ Docker enables you to separate your applications from your
infrastructure so you can deliver software quickly.
⇒ With Docker, you can manage your infrastructure in the same ways you
manage your applications.
→ By taking advantage of Docker’s methodologies for shipping,
testing, and deploying code quickly, you can significantly reduce the
delay between writing code and running it in production.
DOCKER
Docker is a tool used to automate the deployment of an application as a lightweight
container so that the application can work efficiently in different environments.
Docker container is a lightweight software package that consists of the dependencies
(code, frameworks, libraries, etc.) required to run an application.
THE DOCKER PLATFORM
⇒ Docker provides the ability to package and run an application in a loosely
isolated environment called a container.
⇒ The isolation and security allow you to run many containers simultaneously on
a given host.
⇒ Containers are lightweight and contain everything needed to run the
application, so you do not need to rely on what is currently installed on the host.
⇒ You can easily share containers while you work and be sure that everyone you
share with gets the same container that works in the same way.
⇒ Docker provides tooling and a platform to manage the lifecycle of your
containers:
→ Develop your application and its supporting components using
containers.
→ The container becomes the unit for distributing and testing your
application.
→ When you’re ready, deploy your application into your production
environment, as a container or an orchestrated service. This works the
same whether your production environment is a local data center, a
cloud provider, or a hybrid of the two.
WHAT CAN I USE DOCKER FOR?
Fast, consistent delivery of your applications
⇒ Docker streamlines the development lifecycle by allowing developers
to work in standardized environments using local containers which
provide – your applications and services.
→ Containers are great for continuous integration and continuous
delivery (CI/CD) workflows.
⇒ Consider the following example scenario:
→ Your developers write code locally and share their work with their
colleagues using Docker containers.
→ They use Docker to push their applications into a test environment and
execute automated and manual tests.
→ When developers find bugs, they can fix them in the development
environment and redeploy them to the test environment for testing and
validation.
→ When testing is complete, getting the fix to the customer is as simple as
pushing the updated image to the production environment.
Responsive deployment and scaling
⇒ Docker’s container-based platform allows for highly portable
workloads.
→ Docker containers can run on a developer’s local laptop, on
physical or virtual machines in a data center, on cloud providers, or in
a mixture of environments.
⇒ Docker’s portability and lightweight nature also make it easy to
dynamically manage workloads, scaling up or tearing down applications
and services – as business needs dictate, in near real time.
Running more workloads on the same hardware
⇒ Docker is lightweight and fast.
→ It provides a viable, cost-effective alternative to hypervisor-based
virtual machines, so you can use more of your compute capacity to
achieve your business goals.
⇒ Docker is perfect for high density environments and for small and
medium deployments where you need to do more with fewer resources.
Docker objects
When you use Docker, You are creating and using Objects (images, containers,
networks, volumes, plugins, and other objects).
You can create, start, stop, move, or delete a container using the Docker API or CLI.
You can connect a container to one or more networks, attach storage to it, or even
create a new image based on its current state.
By default, a container is relatively well isolated from other containers and its host
machine. You can control how isolated a container’s network, storage, or other
underlying subsystems are from other containers or from the host machine.
A container is defined by its image as well as any configuration options you
provide to it when you create or start it.
→ When a container is removed, any changes to its state that are not
stored in persistent storage will disappear.
A container is simply another process on your machine that has been isolated
from all other processes on the host machine. That isolation leverages kernel
namespaces and cgroups.
→ These features have been in Linux for a long time.
→ Docker has worked to make these capabilities approachable and easy to
use.
The Docker client
The Docker client (docker) is the primary way that many Docker users interact with
Docker. When you use commands such as docker run, the client sends these commands
to dockerd, which carries them out. The docker command uses the Docker API. The
Docker client can communicate with more than one daemon.
Docker registries
A Docker registry stores Docker images. Docker Hub is a public registry that
anyone can use, and Docker is configured to look for images on Docker Hub by
default. You can even run your own private registry.
When you use the docker pull or docker run commands, the required images are
pulled from your configured registry. When you use the docker push command, your
image is pushed to your configured registry.
Example: docker run command
It is possible to run multiple containers at the same time, each with their own
installations and dependencies. Unlike virtual machines, containers share host
resources rather than fully simulating all hardware on the computer, making
containers smaller and faster than virtual machines and reducing overhead.
A container network is a form of virtualization similar to virtual machines (VM) in
concept but with distinguishing differences. Primarily, the container method is a
form of operating system virtualization as compared to VMs, which are a form of
hardware virtualization.