KEMBAR78
Container Networking Deep Dive | PDF
Container Networking
Gaetano Borgione
Sr. Staff Engineer
Gaetano Borgione
Sr. Staff Engineer @ VMware
Gaetano Borgione
Senior Staff Engineer
Cloud Native Applications
VMWare
SDN Technologies @ PLUMgrid
Data Center Networking @ Cisco
Passionate Engineer with special interests on:
Networking Architecture
Engineering Leadership
Product Management
Customer Advocacy
+
…new Networking / Virtualization ideas !!!
Agenda
2017
Agenda
§ Containers, Microservices
§ Container Interfaces, Network Connectivity
§ Service Discovery, Load Balancing
§ Multi-Tenancy, Container Isolation, Micro-Segmentation
§ On-Premise Private Cloud design
4
Containers && Microservices
2017
Containers
• A container image is a lightweight, stand-alone, executable unit of software
• Includes everything needed to run it: code, runtime, system tools, system libraries, settings
• Containerized software run regardless of the environment (i.e. Host OS distro)
• Containers isolate software from its surroundings
– “smooth out” differences between development and staging environments
• Help reduce conflicts between teams running different software on the same infrastructure
6
What Developers Want:
Portable Fast Light
What IT Ops Needs:
Network
Services
Data
Persistence
Rich
SLAs
Consistent
Management
+
Security
Isolation
2017
Containers “at-a-glance”
Physical Server
Hypervisor
VM VM
Bins/Libraries
App A
Bins/Libraries
App B
Physical Server
Bins/Libraries
App A
Bins/Libraries
App B
Container Engine
Guest OSGuest OS
Host OS Host OS
Containers are isolated,
but share OS and (where
appropriate) bins/libraries
Server
with
VMs
Server
with
Containers
Abstraction at the OS layer rather than hardware layer
7
2017
Microservices: Application Design is changing !!!
Properties of a Microservice
ü Small code base
ü Easy to scale, deploy and throw away
ü Autonomous
ü Resilient
Benefits of a Microservices Architecture
ü A highly resilient, scalable and resource
efficient application
ü Enables smaller development teams
ü Teams free to use the right languages and
tools for the job
ü Rapid application development
8
2017
Cloud Native Application
Applications built using the “Microservices” architecture pattern
User mgmt. Payments Inventory
Billing Delivery Notification
API GW Web UI Mobile
• Loosely coupled distributed application
Application tier is decomposed into multiple web services
• Datastore
Each micro service typically has its own datastore
• Packaging
Each microservice is typically packaged in a “Container”
image
• Teams
Typically a team owns one or more Microservices
9
2017
More on Microservices….
10
• Microservices != Containers
• The idea behind Microservices is to
separate functionality into small parts that
are created independently, by different teams,
and possibly even in very different languages
• Microservices communicate with each other
using language-agnostic APIs
(e.g. REST)
• The host for each Microservice could be
a VM, but containers are seen are ideal
packaging unit to deploy a Microservice => low footprint
https://upload.wikimedia.org/wikipedia/commons/9/9b/
Social_Network_Analysis_Visualization.png
2017
Challenges of running Microservices…
• Service Discovery
• Operational Overhead (100s+ of Services !!!)
• Distributed System... inherently complex
• Service Dependencies
– service fan-out
– dependency services running “hot”
• Traffic / Load each service can handle
• Service Health / Fault Tolerance
• Auto-Scale
11
2017
Applications and Micro-Services
12
Service A
Instance #1
Service A
Instance #2
Service A
Instance #3
Internet
Users accessing
services
Service B
Instance #1
Service B
Instance #2
Service B
Instance #3
Service C
Instance #1
Service C
Instance #2
Service A
Service B
Service C
External
Network
System
Administrator
Container Interfaces &&
Network Connectivity
2017
Basics of Container Networking
Minimalist Networking requirements:
• IP Connectivity in Container’s Network
• IP Address Management (IPAM) and
Network Device Creation
• External Connectivity via Host NAT or
Route Advertisement
Bare Metal / Virtual Machine Bare Metal / VM
OS Networking OS Networking
14
Container Interfaces &&
Network Connectivity
Docker
2017
Docker is a “Shipping Container” for Code
16
2017
Docker: The Container Network Model (CNM) Interfacing
17
• Sandbox
– A Sandbox contains the configuration of a container's network stack. This includes management of the
container's interfaces, routing table and DNS settings. An implementation of a Sandbox could be a
Linux Network Namespace, a FreeBSD Jail or other similar concept.
• Endpoint
– An Endpoint joins a Sandbox to a Network. An implementation of an Endpoint could be a veth pair, an
Open vSwitch internal port or similar
• Network
– A Network is a group of Endpoints that are able to communicate with each-other directly. An
implementation of a Network could be a VXLAN Segment, a Linux bridge, a VLAN, etc.
Backend Container
Network
Sandbox
Backend Network Frontend Network
GW Bridge
Container Host
App Container
Network
Sandbox
GW Bridge
Container Host
Frontend Container
Network
Sandbox
GW Bridge
Container Host
External
Network
Endpoint
2017
Container Network Model (CNM)
• The intention is for CNM (aka libnetwork) to implement and use any kind of networking
technology to connect and discover containers
• Partitioning, Isolation, and Traffic Segmentation are achieved by dividing network addresses
• CNM does not specify one preferred methodology for any network overlay scheme
18
2017
Docker Host (VM)
Docker networking – Using the defaults
19
int
eth0
192.168.178.0/24
192.168.178.100
int
docker 0
172.17.42.1/16
Iptables
Firewall
Linux
Kernel
Routing
Linux
Bridge
‘docker0’
Iptables
Firewall
Iptables
Firewall
int
veth0f00eed
int
veth27e6b05
container
container
172.17.0.1/16
172.17.0.2/16
2017
Docker Swarm && libnetwork – Built-In Overlay model
20
Swarm Master
Admin-Clients
docker network …
Distributed Key-Value
Store node(s)
master writes
available
global overlay
networks in kvs
Swarm Node (Docker Host) Swarm Node (Docker Host)
nodes write
endpoints seen
with all their
details into kvs
Nodes create the
networks seen in kvs
as new lx bridges
int
eth0
int
eth0
docker_gwbridge
User_defined_net User_defined_net
docker_gwbridge
Each container has two interfaces
• eth0 = Plugs into the overlay
• eth1 = Plugs into a local bridge for
NAT internet / uplink access
Overlay networks are
implemented with fixed
/ static MAC to VTEP
mappings
Datacenter of public cloud provider Network
2017
Docker Networking – key points
• Docker adopts the Container Network Model (CNM), providing the following contract
between networks and containers:
• All containers on the same network can communicate freely with each other
• Multiple networks are the way to segment traffic between containers and should be supported by all drivers
• Multiple endpoints per container are the way to join a container to multiple networks
• An endpoint is added to a network sandbox to provide it with network connectivity
• Docker Engine can create overlay networks on a single host. Docker Swarm can create
overlay networks that span hosts in the cluster
• A container can be assigned an IP on an overlay network. Containers that use the same
overlay network can communicate, even if they are running on different hosts
• By default, nodes in the swarm encrypt traffic between themselves and other nodes.
Connections between nodes are automatically secured through TLS authentication with
certificates
21
Container Interfaces &&
Network Connectivity
Kubernetes
2017
Kubernetes Node (Minion)
Kubernetes Node (Minion)
Kubernetes Architectural overview
23
Kubernetes Master
Master components are colocated or
spread across machines
APIs
scheduler
Controller Manager
(replication controller, etc)
Distributed Key-Value Store
node(s) (etcd)
Scheduling
actuator
REST interface
(pods, services,
rep. controllers)
Authentication /
Authorization
Admin-Clients
(kubectl, ..)
Kubernetes
Nodes
(Minions)
Users accessing
services
Docker engine
Control Pod
Pod
Pod
cadvisor Pause
Kubelet Kube-Proxy
skyDNS
2017
Quick Overview of Kubernetes
Kubernetes (k8s) = Open Source Container Cluster Manager
• Pods: tightly coupled group of containers
• Replication controller: ensures that a specified number of
pod "replicas" are running at any one time.
• Networking: Each pod gets its own IP address
• Service: Load balanced endpoint for a set of pods with internal and external
IP endpoints
• Service Discovery: Using env variable injection or SkyDNS with the Service
• Uses etcd as distributed key-value store
• Has its roots in ‘borg’, Google’s internal container cluster management
24
2017
Kubernetes
Node
(Minion)
Kubernetes Node (Minion) – Docker networking details
25
ip route 10.24.1.0/24 10.240.0.3
• Traffic destined to a POD is
routed by the IaaS network to the
Kubernetes node that ‘owns’ the
subnet
Pod
Pause
Kubernetes
Node
(Minion)
Pod
Pause
Pod
Pause
Pod
Pause
crb0
Linux bridge
int
cbr0
10.24.1.0/24
10.24.1.2 10.24.1.3 10.24.1.4
10.24.1.1
int
eth0
10.240.0.3
Iptables
Firewall
Kube-
Proxy
ip route 10.24.2.0/24 10.240.0.4
• Each POD uses one single IP
from the nodes IP range
• Every container in the POD
shares the same IP
2017
Container Network Interface (CNI)
• Kubernetes uses the Container Network Interface (CNI) specification and plug-ins to
orchestrate networking
• Very differently from CNM, CNI is capable of addressing other containers’ IP addresses without
resorting to network address translation (NAT)
• Every time a POD is initialized or removed, the default CNI plug-in is called with the default
configuration
• This CNI plug-in creates a pseudo interface, attaches it to the relevant underlay network, sets
IP Address / Routes and maps it to the POD namespace
26
/etc/cni/net.d/10-bridge.conf
2017
Kubernetes Networking – key points
• Kubernets adopts the Container Network Interface (CNI) model to provide a
contract between networks and containers
• From a user perspective, provisioning networking for a container involves two steps:
ØDefine the network JSON
ØConnect container to the network
• Internally, CNI provisioning involves three steps:
ØRuntime create a network namespace and gives it a name
ØInvokes the CNI plugin specified in the “type” field of the network JSON. Type field refers to the
plugin being used and so CNI invokes the corresponding binary
ØPlugin code in turn will create a veth pair, check the IPAM type and data in the JSON, invoke the
IPAM plugin, get the available IP, and finally assign the IP address to the interface
27
Container Interfaces &&
Network Connectivity
Summary
2017
Container Networking Specifications
Container Networking Model
CNM
• Specification proposed by Docker,
adopted by projects such as
libnetwork
• Plugins built by projects such as
Weave, Project Calico and Kuryr
• Supports only Docker runtime
Container Networking Interface
CNI
• Specification proposed by CoreOS
and adopted by projects such as
Kubernetes, Cloud Foundry and
Apache Mesos
• Plugins built by projects such as
Weave, Project Calico, Contiv
Networking
• Supports any container runtime
29
2017
CNI and CNM commonalities…
• CNI and CNM models are both driver-based
– provide “freedom of selection” for a specific type of container networking
• Multiple Network drivers can be active and used concurrently
– 1-1 mapping among network type and network driver
• Containers are allowed to join one or more networks
• Container runtime can lunch network in its own namespace
– delegate to the network driver the responsibility of connecting the container to
the network
30
2017
Container Networking Specifications (cont.)
31
Service Discovery && Load Balancing
2017
Service Anatomy
33
Service
Instance #1
Service
Instance #2
Service
Instance #N
Service
Registry
Load
Balancer
Service
2017
Client vs Server side Service discovery
• Client talks to Service registry and does
load balancing.
• Client service needs to be Service registry
aware.
eg: Netflix OSS
• Client talks to load balancer and load
balancer talks to Service registry.
• Client service need not be Service
registry aware
eg: Consul, AWS ELB, K8s, Docker
Client Discovery Server Discovery
34
2017
What should Service Discovery provide ?
• Discovery
– Services need to discover each other dynamically, to get IP address and port detail to
communicate with other services in the cluster
– Service Registry maintains a database of services and provides an external API
(HTTP/DNS). Typically implemented as a distributed key, value store
– Registrator registers services dynamically to Service registry by listening to Service
creation and deletion events
• Health check
– Monitoring Service Instance health dynamically and updates Service registry
appropriately
• Load balancing
– Traffic destined to a particular service should be dynamically load balanced to “healthy”
instances providing that service
35
2017
Health Check options…
• Script based check
– User provided script is run periodically to verify health of the service.
• HTTP based check
– Periodic HTTP based check is done to the service IP and endpoint address.
• TCP based check
– Periodic TCP based check is done to the service IP and specified port.
• Container based check
– Health check application is available as a Container. Health Check Manager invokes the
Container periodically to do the health-check.
36
Service Discovery && Load Balancing
Docker
2017
Service Discovery
38
Service Discovery in a nutshell
2017
Internal Load Balancer - IPVS
• IPVS (IP Virtual Server) implements transport-layer load balancing inside the Linux kernel, so
called Layer-4 switching
• It’s based on Netfilter and supports TCP, SCTP & UDP, v4 and v7
• IPVS is dynamically configurable, supports 8+ balancing methods, provides health checking
39
2017
Ingress Load Balancing
40
Service Discovery && Load Balancing
Kubernetes
2017
Service Discovery
• Kubernetes provides two options for internal service discovery :
– Environment variable: When a new Pod is created, environment variables from older services
can be imported. This allows services to talk to each other. This approach enforces ordering in
service creation.
– DNS: Every service registers to the DNS service; using this, new services can find and talk to
other services. Kubernetes provides the kube-dns service for this.
• Kubernetes provides several ways to expose services to the outside:
– NodePort: In this method, Kubernetes exposes the service through special ports (30000-32767)
of the node IP address.
– Loadbalancer: In this method, Kubernetes interacts with the cloud provider to create a load
balancer that redirects the traffic to the Pods. This approach is currently available with GCE
– Ingress Controller : Since Kubernetes v1.2.0 it’s possible to use Kubernetes ingress which
includes support for TLS and L7 http-based traffic routing
42
2017
• Service name gets mapped to Virtual IP and port using Skydns
• Kube-proxy watches Service changes and updates IPtables. Virtual IP to Service IP, port
remapping is achieved using IP tables
• Kubernetes does not use DNS based load balancing to avoid some of the known issues
associated with it
Internal Load Balancing
43
2017
Internal Load Balancing (cont.)
44
2017
Ingress Load Balancing w/t Ingress Controller
• An Ingress is a collection of rules that allow inbound connections to reach the cluster services.
• It can be configured to give services externally-reachable urls, load balance traffic, terminate
SSL, offer name based virtual hosting etc
– Users request ingress by POSTing the Ingress resource to the API server.
• In order for the Ingress resource to work, the cluster must have an Ingress controller running.
The Ingress controller is responsible for fulfilling the Ingress dynamically by watching the
ApiServer’s /ingresses endpoint.
45
2017
Networking for Services
46
Node 1
ProjA-1 ProjB-1
10.10.10.2 10.10.10.3
Guest vSwitch
10.10.10.0/24 Node 2
10.10.20.2 10.10.20.3
Guest vSwitch
10.10.20.0/24
10.10.10.0/24 à 10.114.214.100
10.10.20.0/24 à 10.114.214.101
10.114.214.100/24 10.114.214.101/24
myapp.k8s.com à
{10.10.10.2, 10.10.20.2}
myapp.k8s.com
ProjA-2 ProjB-2
• K8s default networking configures
• Routable IP per POD
• Subnet per node / minion
• K8s Service provides East-West Load
Balancing
• Provides DNS based service discovery –
Service Name to IP
• Network Security Policy – in beta
• Not in K8s scope
• Edge LB – e.g. external to frontend
pods
• Routing of a subnet to k8s node
Node specific routes
Edge LB
Multi-Tenancy
Container Isolation
Micro-Segmentation
2017
Multi-Tenancy and Application tiering
48
2017
Multi-Tenancy and Application tiering (cont.)
49
Example of Multi-Tenancy Model
Tenant CTenant BTenant A
Project A – 250 GB, 100 vCPU
Access for paulf, jamesz and tinga
Project B – 200 GB, 200 vCPU
Access for kitc, mikep and mikew
Project E – 600 GB, 600 vCPU
Access for martijnb
Kubernetes
Project C – 250 GB, 150 vCPU
Access for stegeler and francisg
Pivotal CF
Kubernetes
VM
VM
VM
Project D – 300 GB, 100 vCPU
Access for tinga
DockerPivotal CF
VM
VM
VM
2017
Multi-Tenancy, Namespaces && Microsegmentation
50
Internet
Users accessing
services
External
Network
Tenant 2
Tenant 1
Namespace 2
Namespace 1
On-Premise Private Cloud design
2017
From Physical Layout…
Data Center Core
Internet /
Corporate Network
52
…to Overlay-based Networking Model…
• Neutron plugin talk to SDN Controller via vendor APIs
• SDN Controller manages vSwitches in the Hypervisors
• Vmware NSX, Contrail, Nuage, Midokura, …
53
2017
Kubernetes
…to Cluster Deployment on Logical Networks…
54
Master ‘VM’ Minion ‘VM’ Minion ‘VM’ Minion ‘VM’
Cluster Management Nodes - Logical Switch
Pod
1
Pod
3
Pod
5
Pod
2
Pod
4
Pod
6
etcd
Kube
DNS
API
Srv
Kube
DNS
Pod
1
Pod
2
Pod
3
Pod
5
Pod
6
Pod
4
Namespace ‘demo’ POD – Logical Switch
Namespace ‘foo’ POD - Logical Switch
kube-system POD - Logical Switch
Logical Router Edge Router
Kube
Proxy
Kube
Proxy
Internet /
Corporate
Network
2017
…to Multi-Cluster / Multi-Tenancy deployments
55
Multi-Tenancy deployment and Networking constrains
Q & A
Thank You!
@cloudnativeapps
#vmwcna
vmware.github.io
blogs.vmware.com/cloudnative
microservices@vmware.com

Container Networking Deep Dive

  • 1.
    Container Networking Gaetano Borgione Sr.Staff Engineer Gaetano Borgione Sr. Staff Engineer @ VMware
  • 2.
    Gaetano Borgione Senior StaffEngineer Cloud Native Applications VMWare SDN Technologies @ PLUMgrid Data Center Networking @ Cisco Passionate Engineer with special interests on: Networking Architecture Engineering Leadership Product Management Customer Advocacy + …new Networking / Virtualization ideas !!!
  • 3.
  • 4.
    2017 Agenda § Containers, Microservices §Container Interfaces, Network Connectivity § Service Discovery, Load Balancing § Multi-Tenancy, Container Isolation, Micro-Segmentation § On-Premise Private Cloud design 4
  • 5.
  • 6.
    2017 Containers • A containerimage is a lightweight, stand-alone, executable unit of software • Includes everything needed to run it: code, runtime, system tools, system libraries, settings • Containerized software run regardless of the environment (i.e. Host OS distro) • Containers isolate software from its surroundings – “smooth out” differences between development and staging environments • Help reduce conflicts between teams running different software on the same infrastructure 6 What Developers Want: Portable Fast Light What IT Ops Needs: Network Services Data Persistence Rich SLAs Consistent Management + Security Isolation
  • 7.
    2017 Containers “at-a-glance” Physical Server Hypervisor VMVM Bins/Libraries App A Bins/Libraries App B Physical Server Bins/Libraries App A Bins/Libraries App B Container Engine Guest OSGuest OS Host OS Host OS Containers are isolated, but share OS and (where appropriate) bins/libraries Server with VMs Server with Containers Abstraction at the OS layer rather than hardware layer 7
  • 8.
    2017 Microservices: Application Designis changing !!! Properties of a Microservice ü Small code base ü Easy to scale, deploy and throw away ü Autonomous ü Resilient Benefits of a Microservices Architecture ü A highly resilient, scalable and resource efficient application ü Enables smaller development teams ü Teams free to use the right languages and tools for the job ü Rapid application development 8
  • 9.
    2017 Cloud Native Application Applicationsbuilt using the “Microservices” architecture pattern User mgmt. Payments Inventory Billing Delivery Notification API GW Web UI Mobile • Loosely coupled distributed application Application tier is decomposed into multiple web services • Datastore Each micro service typically has its own datastore • Packaging Each microservice is typically packaged in a “Container” image • Teams Typically a team owns one or more Microservices 9
  • 10.
    2017 More on Microservices…. 10 •Microservices != Containers • The idea behind Microservices is to separate functionality into small parts that are created independently, by different teams, and possibly even in very different languages • Microservices communicate with each other using language-agnostic APIs (e.g. REST) • The host for each Microservice could be a VM, but containers are seen are ideal packaging unit to deploy a Microservice => low footprint https://upload.wikimedia.org/wikipedia/commons/9/9b/ Social_Network_Analysis_Visualization.png
  • 11.
    2017 Challenges of runningMicroservices… • Service Discovery • Operational Overhead (100s+ of Services !!!) • Distributed System... inherently complex • Service Dependencies – service fan-out – dependency services running “hot” • Traffic / Load each service can handle • Service Health / Fault Tolerance • Auto-Scale 11
  • 12.
    2017 Applications and Micro-Services 12 ServiceA Instance #1 Service A Instance #2 Service A Instance #3 Internet Users accessing services Service B Instance #1 Service B Instance #2 Service B Instance #3 Service C Instance #1 Service C Instance #2 Service A Service B Service C External Network System Administrator
  • 13.
  • 14.
    2017 Basics of ContainerNetworking Minimalist Networking requirements: • IP Connectivity in Container’s Network • IP Address Management (IPAM) and Network Device Creation • External Connectivity via Host NAT or Route Advertisement Bare Metal / Virtual Machine Bare Metal / VM OS Networking OS Networking 14
  • 15.
  • 16.
    2017 Docker is a“Shipping Container” for Code 16
  • 17.
    2017 Docker: The ContainerNetwork Model (CNM) Interfacing 17 • Sandbox – A Sandbox contains the configuration of a container's network stack. This includes management of the container's interfaces, routing table and DNS settings. An implementation of a Sandbox could be a Linux Network Namespace, a FreeBSD Jail or other similar concept. • Endpoint – An Endpoint joins a Sandbox to a Network. An implementation of an Endpoint could be a veth pair, an Open vSwitch internal port or similar • Network – A Network is a group of Endpoints that are able to communicate with each-other directly. An implementation of a Network could be a VXLAN Segment, a Linux bridge, a VLAN, etc. Backend Container Network Sandbox Backend Network Frontend Network GW Bridge Container Host App Container Network Sandbox GW Bridge Container Host Frontend Container Network Sandbox GW Bridge Container Host External Network Endpoint
  • 18.
    2017 Container Network Model(CNM) • The intention is for CNM (aka libnetwork) to implement and use any kind of networking technology to connect and discover containers • Partitioning, Isolation, and Traffic Segmentation are achieved by dividing network addresses • CNM does not specify one preferred methodology for any network overlay scheme 18
  • 19.
    2017 Docker Host (VM) Dockernetworking – Using the defaults 19 int eth0 192.168.178.0/24 192.168.178.100 int docker 0 172.17.42.1/16 Iptables Firewall Linux Kernel Routing Linux Bridge ‘docker0’ Iptables Firewall Iptables Firewall int veth0f00eed int veth27e6b05 container container 172.17.0.1/16 172.17.0.2/16
  • 20.
    2017 Docker Swarm &&libnetwork – Built-In Overlay model 20 Swarm Master Admin-Clients docker network … Distributed Key-Value Store node(s) master writes available global overlay networks in kvs Swarm Node (Docker Host) Swarm Node (Docker Host) nodes write endpoints seen with all their details into kvs Nodes create the networks seen in kvs as new lx bridges int eth0 int eth0 docker_gwbridge User_defined_net User_defined_net docker_gwbridge Each container has two interfaces • eth0 = Plugs into the overlay • eth1 = Plugs into a local bridge for NAT internet / uplink access Overlay networks are implemented with fixed / static MAC to VTEP mappings Datacenter of public cloud provider Network
  • 21.
    2017 Docker Networking –key points • Docker adopts the Container Network Model (CNM), providing the following contract between networks and containers: • All containers on the same network can communicate freely with each other • Multiple networks are the way to segment traffic between containers and should be supported by all drivers • Multiple endpoints per container are the way to join a container to multiple networks • An endpoint is added to a network sandbox to provide it with network connectivity • Docker Engine can create overlay networks on a single host. Docker Swarm can create overlay networks that span hosts in the cluster • A container can be assigned an IP on an overlay network. Containers that use the same overlay network can communicate, even if they are running on different hosts • By default, nodes in the swarm encrypt traffic between themselves and other nodes. Connections between nodes are automatically secured through TLS authentication with certificates 21
  • 22.
    Container Interfaces && NetworkConnectivity Kubernetes
  • 23.
    2017 Kubernetes Node (Minion) KubernetesNode (Minion) Kubernetes Architectural overview 23 Kubernetes Master Master components are colocated or spread across machines APIs scheduler Controller Manager (replication controller, etc) Distributed Key-Value Store node(s) (etcd) Scheduling actuator REST interface (pods, services, rep. controllers) Authentication / Authorization Admin-Clients (kubectl, ..) Kubernetes Nodes (Minions) Users accessing services Docker engine Control Pod Pod Pod cadvisor Pause Kubelet Kube-Proxy skyDNS
  • 24.
    2017 Quick Overview ofKubernetes Kubernetes (k8s) = Open Source Container Cluster Manager • Pods: tightly coupled group of containers • Replication controller: ensures that a specified number of pod "replicas" are running at any one time. • Networking: Each pod gets its own IP address • Service: Load balanced endpoint for a set of pods with internal and external IP endpoints • Service Discovery: Using env variable injection or SkyDNS with the Service • Uses etcd as distributed key-value store • Has its roots in ‘borg’, Google’s internal container cluster management 24
  • 25.
    2017 Kubernetes Node (Minion) Kubernetes Node (Minion)– Docker networking details 25 ip route 10.24.1.0/24 10.240.0.3 • Traffic destined to a POD is routed by the IaaS network to the Kubernetes node that ‘owns’ the subnet Pod Pause Kubernetes Node (Minion) Pod Pause Pod Pause Pod Pause crb0 Linux bridge int cbr0 10.24.1.0/24 10.24.1.2 10.24.1.3 10.24.1.4 10.24.1.1 int eth0 10.240.0.3 Iptables Firewall Kube- Proxy ip route 10.24.2.0/24 10.240.0.4 • Each POD uses one single IP from the nodes IP range • Every container in the POD shares the same IP
  • 26.
    2017 Container Network Interface(CNI) • Kubernetes uses the Container Network Interface (CNI) specification and plug-ins to orchestrate networking • Very differently from CNM, CNI is capable of addressing other containers’ IP addresses without resorting to network address translation (NAT) • Every time a POD is initialized or removed, the default CNI plug-in is called with the default configuration • This CNI plug-in creates a pseudo interface, attaches it to the relevant underlay network, sets IP Address / Routes and maps it to the POD namespace 26 /etc/cni/net.d/10-bridge.conf
  • 27.
    2017 Kubernetes Networking –key points • Kubernets adopts the Container Network Interface (CNI) model to provide a contract between networks and containers • From a user perspective, provisioning networking for a container involves two steps: ØDefine the network JSON ØConnect container to the network • Internally, CNI provisioning involves three steps: ØRuntime create a network namespace and gives it a name ØInvokes the CNI plugin specified in the “type” field of the network JSON. Type field refers to the plugin being used and so CNI invokes the corresponding binary ØPlugin code in turn will create a veth pair, check the IPAM type and data in the JSON, invoke the IPAM plugin, get the available IP, and finally assign the IP address to the interface 27
  • 28.
    Container Interfaces && NetworkConnectivity Summary
  • 29.
    2017 Container Networking Specifications ContainerNetworking Model CNM • Specification proposed by Docker, adopted by projects such as libnetwork • Plugins built by projects such as Weave, Project Calico and Kuryr • Supports only Docker runtime Container Networking Interface CNI • Specification proposed by CoreOS and adopted by projects such as Kubernetes, Cloud Foundry and Apache Mesos • Plugins built by projects such as Weave, Project Calico, Contiv Networking • Supports any container runtime 29
  • 30.
    2017 CNI and CNMcommonalities… • CNI and CNM models are both driver-based – provide “freedom of selection” for a specific type of container networking • Multiple Network drivers can be active and used concurrently – 1-1 mapping among network type and network driver • Containers are allowed to join one or more networks • Container runtime can lunch network in its own namespace – delegate to the network driver the responsibility of connecting the container to the network 30
  • 31.
  • 32.
    Service Discovery &&Load Balancing
  • 33.
    2017 Service Anatomy 33 Service Instance #1 Service Instance#2 Service Instance #N Service Registry Load Balancer Service
  • 34.
    2017 Client vs Serverside Service discovery • Client talks to Service registry and does load balancing. • Client service needs to be Service registry aware. eg: Netflix OSS • Client talks to load balancer and load balancer talks to Service registry. • Client service need not be Service registry aware eg: Consul, AWS ELB, K8s, Docker Client Discovery Server Discovery 34
  • 35.
    2017 What should ServiceDiscovery provide ? • Discovery – Services need to discover each other dynamically, to get IP address and port detail to communicate with other services in the cluster – Service Registry maintains a database of services and provides an external API (HTTP/DNS). Typically implemented as a distributed key, value store – Registrator registers services dynamically to Service registry by listening to Service creation and deletion events • Health check – Monitoring Service Instance health dynamically and updates Service registry appropriately • Load balancing – Traffic destined to a particular service should be dynamically load balanced to “healthy” instances providing that service 35
  • 36.
    2017 Health Check options… •Script based check – User provided script is run periodically to verify health of the service. • HTTP based check – Periodic HTTP based check is done to the service IP and endpoint address. • TCP based check – Periodic TCP based check is done to the service IP and specified port. • Container based check – Health check application is available as a Container. Health Check Manager invokes the Container periodically to do the health-check. 36
  • 37.
    Service Discovery &&Load Balancing Docker
  • 38.
  • 39.
    2017 Internal Load Balancer- IPVS • IPVS (IP Virtual Server) implements transport-layer load balancing inside the Linux kernel, so called Layer-4 switching • It’s based on Netfilter and supports TCP, SCTP & UDP, v4 and v7 • IPVS is dynamically configurable, supports 8+ balancing methods, provides health checking 39
  • 40.
  • 41.
    Service Discovery &&Load Balancing Kubernetes
  • 42.
    2017 Service Discovery • Kubernetesprovides two options for internal service discovery : – Environment variable: When a new Pod is created, environment variables from older services can be imported. This allows services to talk to each other. This approach enforces ordering in service creation. – DNS: Every service registers to the DNS service; using this, new services can find and talk to other services. Kubernetes provides the kube-dns service for this. • Kubernetes provides several ways to expose services to the outside: – NodePort: In this method, Kubernetes exposes the service through special ports (30000-32767) of the node IP address. – Loadbalancer: In this method, Kubernetes interacts with the cloud provider to create a load balancer that redirects the traffic to the Pods. This approach is currently available with GCE – Ingress Controller : Since Kubernetes v1.2.0 it’s possible to use Kubernetes ingress which includes support for TLS and L7 http-based traffic routing 42
  • 43.
    2017 • Service namegets mapped to Virtual IP and port using Skydns • Kube-proxy watches Service changes and updates IPtables. Virtual IP to Service IP, port remapping is achieved using IP tables • Kubernetes does not use DNS based load balancing to avoid some of the known issues associated with it Internal Load Balancing 43
  • 44.
  • 45.
    2017 Ingress Load Balancingw/t Ingress Controller • An Ingress is a collection of rules that allow inbound connections to reach the cluster services. • It can be configured to give services externally-reachable urls, load balance traffic, terminate SSL, offer name based virtual hosting etc – Users request ingress by POSTing the Ingress resource to the API server. • In order for the Ingress resource to work, the cluster must have an Ingress controller running. The Ingress controller is responsible for fulfilling the Ingress dynamically by watching the ApiServer’s /ingresses endpoint. 45
  • 46.
    2017 Networking for Services 46 Node1 ProjA-1 ProjB-1 10.10.10.2 10.10.10.3 Guest vSwitch 10.10.10.0/24 Node 2 10.10.20.2 10.10.20.3 Guest vSwitch 10.10.20.0/24 10.10.10.0/24 à 10.114.214.100 10.10.20.0/24 à 10.114.214.101 10.114.214.100/24 10.114.214.101/24 myapp.k8s.com à {10.10.10.2, 10.10.20.2} myapp.k8s.com ProjA-2 ProjB-2 • K8s default networking configures • Routable IP per POD • Subnet per node / minion • K8s Service provides East-West Load Balancing • Provides DNS based service discovery – Service Name to IP • Network Security Policy – in beta • Not in K8s scope • Edge LB – e.g. external to frontend pods • Routing of a subnet to k8s node Node specific routes Edge LB
  • 47.
  • 48.
  • 49.
    2017 Multi-Tenancy and Applicationtiering (cont.) 49 Example of Multi-Tenancy Model Tenant CTenant BTenant A Project A – 250 GB, 100 vCPU Access for paulf, jamesz and tinga Project B – 200 GB, 200 vCPU Access for kitc, mikep and mikew Project E – 600 GB, 600 vCPU Access for martijnb Kubernetes Project C – 250 GB, 150 vCPU Access for stegeler and francisg Pivotal CF Kubernetes VM VM VM Project D – 300 GB, 100 vCPU Access for tinga DockerPivotal CF VM VM VM
  • 50.
    2017 Multi-Tenancy, Namespaces &&Microsegmentation 50 Internet Users accessing services External Network Tenant 2 Tenant 1 Namespace 2 Namespace 1
  • 51.
  • 52.
    2017 From Physical Layout… DataCenter Core Internet / Corporate Network 52
  • 53.
    …to Overlay-based NetworkingModel… • Neutron plugin talk to SDN Controller via vendor APIs • SDN Controller manages vSwitches in the Hypervisors • Vmware NSX, Contrail, Nuage, Midokura, … 53
  • 54.
    2017 Kubernetes …to Cluster Deploymenton Logical Networks… 54 Master ‘VM’ Minion ‘VM’ Minion ‘VM’ Minion ‘VM’ Cluster Management Nodes - Logical Switch Pod 1 Pod 3 Pod 5 Pod 2 Pod 4 Pod 6 etcd Kube DNS API Srv Kube DNS Pod 1 Pod 2 Pod 3 Pod 5 Pod 6 Pod 4 Namespace ‘demo’ POD – Logical Switch Namespace ‘foo’ POD - Logical Switch kube-system POD - Logical Switch Logical Router Edge Router Kube Proxy Kube Proxy Internet / Corporate Network
  • 55.
    2017 …to Multi-Cluster /Multi-Tenancy deployments 55 Multi-Tenancy deployment and Networking constrains
  • 56.
  • 57.