KEMBAR78
DCUS17 : Docker networking deep dive | PPTX
Docker Networking deep dive
Application-Plane to Data-Plane
Madhu Venugopal
Sr. Director Networking,
Docker Inc
Network Layers, Planes and
Dimensions
Application dimension
“OSI is a beautiful dream, and TCP/IP is living it!” - Einar Stefferud
Application
Presentation
Session
Transport
Network
Data Link
Physical
OSI Model
Application
Transport
Network
Data Link
TCP/IP Model
HTTP, DNS,
SSH, DHCP, …
TCP, UDP
IPv4, IPv6, ARP
Ethernet
Infrastructure dimension
Management plane
Control plane
Data plane
UX, CLI, REST-API, SNMP, …
Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB)
User/Operator/Tools managing Network Infrastructure
Signaling between network entities to exchange reachability states
Actual movement of application data packets
IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
Application
Transport
Network
Data Link
MgmtPlane
ControlPlane
DataPlane
Docker networking
• Provides portable application services
• Service-Discovery
• Load-Balancing
• Built-in and pluggable network drivers
• Overlay, macvlan, bridge
• Remote Drivers / Plugins
• Built-in Management plane
• API, CLI
• Docker Stack / Compose
• Built-in distributed control plane
• Gossip based
• Encrypted Control & Data plane
Deep dive
Application Stackversion: "3"
services:
web:
ports:
- “8080:80”
networks:
- frontend
deploy:
replicas: 2
app:
networks:
- frontend
- backend
db:
networks:
- backend
networks:
frontend:
driver: overlay
backend:
driver: overlay
driver_opts:
encrypted : true
Stack Deploy$ docker stack deploy -c d.yml demo
Creating network demo_frontend
Creating network demo_backend
Creating service demo_web
Creating service demo_app
Creating service demo_db
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
n5myqlubepvl demo_backend overlay swarm
4m5e9hn5x0xx demo_frontend overlay swarm
$ docker service ls
ID NAME MODE REPLICAS
69rwee5mbbzm demo_web replicated 2/2
gkwx4z4ksrz1 demo_app replicated 1/1
4m5e9hn5x0xx demo_db replicated 1/1
Application Stack
$ docker stack deploy -c d.yml demo
Creating service demo_web
Creating service demo_app
Creating service demo_db
Creating network demo_frontend
Creating network demo_backend
Day in life of a Stack Deploy
• Manager only operation
• Reserves network resources at mgmt plane
such as subnet and vxlan-id. No impact to
the data-plane yet.
• Manager reserves service and task
resources : Service VIP and Task IPs
• Tasks Scheduled to swarm workers
• Network scoped Service Registration on Docker DNS
server
• Service name -> VIP
• Task name -> Task IP
• task.Service-Name -> All Task IPs
• Exchange SD & LB states via Gossip
• Prepare Data-plane*
• Call Driver APIs and exchange driver states via Gossip
Resource Allocation
Manager
Network
Create
Orchestrator
Allocator
Scheduler
Dispatcher
Service
Create
Task
Create
Task
Dispatch
Task
Dispatch
Worker1 Worker2
Engine
Libnetwork
Engine
Libnetwork
• Centralized resource and policy
definition
• Networks are a definition of policy
• Central resource allocation
(IP Subnets, Addresses, VNIs)
• Can mutate state as long as
managers are available
De-centralized events
Swarm Scope Gossip
W1
W2
W3
W1
W5
W4
Network Scope Gossip
• Eventually consistent
• State dissemination through
de-centralized events
• Service Registration
• Load-Balancer configs
• Routing states
• Fast convergence
• ~ O(logn)
• Highly scalable
• Continues to function even if all
managers are Down
Gossip
State dissemination
Node A
Broadcast state
change to 3 nodes in
the network-scope
Random
Node C
Random
Node D
Random
Node E
9 More
nodes
receive
rebroadcast
Rebroadcast
Entire cluster
receives
rebroadcast
Rebroadcast
Accept state update only if
entry’s lamport time is greater
than the lamport time of
existing entry
Random
Node F
Periodic bulk sync to a
random node in the
network-scope
Create State
Worker1
task1.web task2.web
Worker3
demo_frontend overlay network (vxlan-id 4097)
DNS resolver
127.0.0.11
Worker2
task1.app
Docker
DNS
server
Docker
DNS
server
Docker
DNS
server
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
task1.db
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.6 :{Worker2,4097}
10.0.1.9 :{Worker2,4097}
demo_backend overlay network (vxlan-id 4098)
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.5 :{Worker1,4097}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.5 :{Worker3,4098}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.6 :{Worker2,4098}
Gossip Gossip
10.0.1.5 10.0.1.6 10.0.1.9 10.0.2.6 10.0.2.5
Troubleshooting Control-Plane
$ docker network inspect -v demo_frontend
[
{
"Name": “demo_frontend",
"Id": "m669nibgiwc0mfleq8geaa6mk",
"Created": "2017-04-12T13:18:58.049831936Z",
"Scope": "swarm",
"Driver": “overlay",
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
…
…
"Peers": [
{
"Name": "ip-172-31-28-108",
"IP": "172.31.28.108"
},
{
"Name": "ip-172-31-46-47",
"IP": "172.31.46.47"
},
]
Troubleshooting Control-Plane
"Services": {
"web": {
"VIP": “10.1.0.6”,
"LocalLBIndex": 5,
"Tasks": [
{
"Name": “web.1",
"EndpointID": "1a5323d0e94c",
"EndpointIP": "10.1.0.7",
"Info": {
"Host IP": "172.31.28.108"
}
Troubleshooting Control-Plane
Service Discovery
Worker1
task1.web task2.web
Worker3
demo_frontend overlay network (vxlan-id 4097)
DNS resolver
127.0.0.11
Worker2
task1.app
Docker
DNS
server
Docker
DNS
server
Docker
DNS
server
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
task1.db
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.6 :{Worker2,4097}
10.0.1.9 :{Worker2,4097}
demo_backend overlay network (vxlan-id 4098)
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.5 :{Worker1,4097}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.5 :{Worker3,4098}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.6 :{Worker2,4098}
Gossip Gossip
/etc/resolv.conf
nameserver 127.0.0.11 web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
task2.app 10.0.1.10
Docker DNS Server
Docker Daemon
Dissecting the DNS lookup
task1.web
resolve
“app”
IPTables
{127.0.0.11, 53} : DNAT
DNS Query
“app” to
127.0.0.11
DNS A Record
query : “app”
/etc/resolv.conf
nameserver 127.0.0.11
Dissecting the DNS lookup
task1.web
IPTables
{127.0.0.11, 53} : DNAT
DNS A Record
response : “app”
: 10.0.1.8
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
task2.app 10.0.1.10
Docker DNS Server
Docker Daemon
/etc/resolv.conf
nameserver 127.0.0.11
Dissecting the DNS-rr lookup
task1.web
IPTables
{127.0.0.11, 53} : DNAT
DNS A Record
response : “app”
: [
10.0.1.9,
10.0.1.10
]
web 10.0.1.4
(vip)
app 10.0.1.9
10.0.1.10
task1.app 10.0.1.9
task2.app 10.0.1.10
task1.web 10.0.1.5
Docker DNS Server
Docker Daemon
docker service create —name=app —endpoint-mode=dns-rr demo/my-app
Dataplane
$ docker info
…
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge contiv/v2plugin:latest host macvlan null overlay
Swarm: active
Drivers provide data-plane
Docker host 1 Docker host 2 Docker host 3
CntnrA CntnrB CntnrC CntnrD CntnrE CntnrF
Overlay network
All containers on the overlay network can communicate!
What is Docker Overlay Networking
The overlay driver enables simple and secure multi-host networking
Docker Overlay• The overlay driver uses VXLAN
technology
• A VXLAN tunnel is created on top of
underlay network(s)
• At each end of the tunnel is a VXLAN
tunnel end point (VTEP)
• The VTEP performs encapsulation and
de-encapsulation
• The VTEP exists in the Docker Host’s
network namespace
VXLAN
Docker Host 1 Docker Host 2
172.31.1.5 192.168.1.25
Br0 Br0
VXLAN tunnel
VTEP
:4789/udp
VTEP
:4789/udp
veth veth
C1: 10.0.0.3 C2: 10.0.0.4
Network
Namespace
Network
Namespace
Layer 3 IP transport network
Building an Overlay Network (more detailed)
1.docker network <commands>
2.nsenter —net=<net-namespace>
3.tcpdump -nnvvXXS -i <interface> port <port>
4.iptables -nvL -t <table>
5.ipvsadm -L
6.ip <commands>
7.bridge <commands>
8.drill
9.netstat -tulpn
10.iperf <commands>
The Ten Commandments
All-in-one tools container : https://github.com/nicolaka/netshoot
root@my-host $ docker network ls
NETWORK ID NAME DRIVER SCOPE
jm1eohsff6b4 demo_default overlay swarm
a5f124aef90b docker_gwbridge bridge local
root@my-host $ ls /var/run/docker/netns
1-jm1eohsff6 1-o2hnj2jm1f 2229639766c2 79f0ad997956 ingress_sbox
root@my-host $ nsenter —net=/var/run/docker/netns/1-jm1eohsff6
root@my-host $ brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.3a87525fe051 no vxlan0
veth0
veth1
Overlay dataplane
root@my-host $ ip -d link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
mode DEFAULT group default
link/ether 3a:87:52:5f:e0:51 brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 addrgenmode eui64
root@my-host $ ip -d link show veth0
17: veth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue
master br0 state UP mode DEFAULT group default
link/ether be:dc:c5:da:8c:0d brd ff:ff:ff:ff:ff:ff link-netnsid 2
promiscuity 1
veth
bridge_slave state forwarding priority 32 cost 2 hairpin off guard off
root_block off fastleave off learning on flood on addrgenmode eui64
Overlay dataplane
root@my-host $ ip -d link show vxlan0
14: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master
br0 state UNKNOWN mode DEFAULT group default
link/ether f6:ae:70:27:6c:9c brd ff:ff:ff:ff:ff:ff link-netnsid 0
promiscuity 1
vxlan id 4097 srcport 0 0 dstport 4789 proxy l2miss l3miss ageing 300
bridge_slave state forwarding priority 32 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on addrgenmode eui64
Overlay dataplane
root@my-host $ ip -s neighbor show
10.0.0.6 dev vxlan0 lladdr 02:42:0a:00:00:06 used 1100/1100/1100 probes 0 PERMANENT
10.0.0.3 dev vxlan0 lladdr 02:42:0a:00:00:03 used 1101/1101/1101 probes 0 PERMANENT
root@my-host $ bridge fdb show
…
f6:ae:70:27:6c:9c dev vxlan0 vlan 1 master br0 permanent
02:42:0a:00:00:03 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent
02:42:0a:00:00:06 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent
be:dc:c5:da:8c:0d dev veth0 vlan 1 master br0 permanent
3a:87:52:5f:e0:51 dev veth1 vlan 1 master br0 permanent
…
Overlay dataplane
Inside container netns
Worker1
task1.web
Worker3
demo_frontend overlay network (east-west)
Worker2
task1.app task1.dbtask2.web
default_gwbridge default_gwbridge
default_gwbridge
L2/L3 underlay network (North-South connectivity)
demo_backend overlay network (east-west)
Inside container netns
root@my-host $ docker inspect demo_app.1.d35s03a7xryoeta34lqys1v5j | grep Key
"SandboxKey": "/var/run/docker/netns/2229639766c2",
root@my-host $ $ ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0a:00:00:08
inet addr:10.0.0.8 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
eth1 Link encap:Ethernet HWaddr 02:42:ac:a8:01:42
inet addr:172.168.1.66 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
Inside container netns
Load Balancing
app : 10.0.1.8
Client-side VIP Load Balancing
task1.web
IPTables
mangle table : OUTPUT chain
MARK : 10.0.1.8 -> lb-index 5
IPVS
lb-index 5 : RR : 10.0.1.9,
10.0.1.10
Conntracker
root@my-host $ iptables -nvL -t mangle
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.7 MARK set 0x101
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.4 MARK set 0x100
root@my-host $ ipvsadm -L
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 256 rr
-> 10.0.0.5:0 Masq 1 0 0
-> 10.0.0.6:0 Masq 1 0 0
FWM 257 rr
-> 10.0.0.3:0 Masq 1 0 0
root@my-host $ conntrack -L
tcp 6 431997 ESTABLISHED src=10.0.0.8 dst=10.0.0.4 sport=33635 dport=80
src=10.0.0.5 dst=10.0.0.8 sport=80 dport=33635 [ASSURED] mark=0 use=1
Client-side Load Balancing
/etc/resolv.conf
nameserver 127.0.0.11
Client-side DNS-rr Load Balancing
task1.web
DNS A Record
response : “app”
: [
10.0.1.9,
10.0.1.10
]
web 10.0.1.4
(vip)
app 10.0.1.9
10.0.1.10
task1.app 10.0.1.9
task2.app 10.0.1.10
task1.web 10.0.1.5
Docker DNS Server
Docker Daemon
docker service create —name=app —endpoint-mode=dns-rr demo/my-app
app : [ 10.0.1.9,
10.0.1.10 ]
Routing Mesh
• Native load balancing of requests coming
from an external source
• Services get published on a single port
across the entire Swarm
• Incoming traffic to the published port can be
handled by all Swarm nodes
• Traffic is internally load balanced as per
normal service VIP load balancing
Ingress Network
Docker host 2
task2.myservice
Docker host 1
task1.myservice
Docker host 3
IPVS IPVS IPVS
8080 8080 8080
Ingress network
docker service create -p 8080:80 nginx
Linux Kernel NetFilter dataflow
iptables NAT table
DOCKER-INGRESS
DNAT : Published-Port -> ingress-sbox
eth0 Host1
default_gwbridge
ingress-sboxeth1
iptables MANGLE table
PREROUTING
MARK : Published-Port -> <fw-mark-id>
IPVS
Match <fw-mark-id> -> Masq
{RR across container-IPs)
ingress-overlay-bridge
Ingress Network
eth0
iptables NAT table
DOCKER-INGRESS
DNAT : Published-Port -> ingress-sbox
eth0 Host2
default_gwbridge
ingress-sbox
…
eth1
ingress-overlay-bridge
eth0
vxlan tunnel with vni
Ingress Network
eth0
Container-sbox
eth1
iptables NAT table
PREROUTING
Redirect -> target-port
Routing Mesh
Homework
Deep-dive into Routing-Mesh
Questions ?
Tweet : @MadhuVenugopal
Slack : madhu in #dockercommunity org
Thank You.
106270 - Deep Dive in Docker Overlay Networks (Apr 19, 3:45 PM)
110420 - Docker Networking in Production at Visa (Apr 19, 2:25 PM)
@docker #dockercon

DCUS17 : Docker networking deep dive

  • 1.
    Docker Networking deepdive Application-Plane to Data-Plane Madhu Venugopal Sr. Director Networking, Docker Inc
  • 2.
    Network Layers, Planesand Dimensions
  • 3.
    Application dimension “OSI isa beautiful dream, and TCP/IP is living it!” - Einar Stefferud Application Presentation Session Transport Network Data Link Physical OSI Model Application Transport Network Data Link TCP/IP Model HTTP, DNS, SSH, DHCP, … TCP, UDP IPv4, IPv6, ARP Ethernet
  • 4.
    Infrastructure dimension Management plane Controlplane Data plane UX, CLI, REST-API, SNMP, … Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB) User/Operator/Tools managing Network Infrastructure Signaling between network entities to exchange reachability states Actual movement of application data packets IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
  • 5.
    Application Transport Network Data Link MgmtPlane ControlPlane DataPlane Docker networking •Provides portable application services • Service-Discovery • Load-Balancing • Built-in and pluggable network drivers • Overlay, macvlan, bridge • Remote Drivers / Plugins • Built-in Management plane • API, CLI • Docker Stack / Compose • Built-in distributed control plane • Gossip based • Encrypted Control & Data plane
  • 6.
  • 7.
    Application Stackversion: "3" services: web: ports: -“8080:80” networks: - frontend deploy: replicas: 2 app: networks: - frontend - backend db: networks: - backend networks: frontend: driver: overlay backend: driver: overlay driver_opts: encrypted : true Stack Deploy$ docker stack deploy -c d.yml demo Creating network demo_frontend Creating network demo_backend Creating service demo_web Creating service demo_app Creating service demo_db $ docker network ls NETWORK ID NAME DRIVER SCOPE n5myqlubepvl demo_backend overlay swarm 4m5e9hn5x0xx demo_frontend overlay swarm $ docker service ls ID NAME MODE REPLICAS 69rwee5mbbzm demo_web replicated 2/2 gkwx4z4ksrz1 demo_app replicated 1/1 4m5e9hn5x0xx demo_db replicated 1/1
  • 8.
    Application Stack $ dockerstack deploy -c d.yml demo Creating service demo_web Creating service demo_app Creating service demo_db Creating network demo_frontend Creating network demo_backend Day in life of a Stack Deploy • Manager only operation • Reserves network resources at mgmt plane such as subnet and vxlan-id. No impact to the data-plane yet. • Manager reserves service and task resources : Service VIP and Task IPs • Tasks Scheduled to swarm workers • Network scoped Service Registration on Docker DNS server • Service name -> VIP • Task name -> Task IP • task.Service-Name -> All Task IPs • Exchange SD & LB states via Gossip • Prepare Data-plane* • Call Driver APIs and exchange driver states via Gossip
  • 9.
    Resource Allocation Manager Network Create Orchestrator Allocator Scheduler Dispatcher Service Create Task Create Task Dispatch Task Dispatch Worker1 Worker2 Engine Libnetwork Engine Libnetwork •Centralized resource and policy definition • Networks are a definition of policy • Central resource allocation (IP Subnets, Addresses, VNIs) • Can mutate state as long as managers are available
  • 10.
    De-centralized events Swarm ScopeGossip W1 W2 W3 W1 W5 W4 Network Scope Gossip • Eventually consistent • State dissemination through de-centralized events • Service Registration • Load-Balancer configs • Routing states • Fast convergence • ~ O(logn) • Highly scalable • Continues to function even if all managers are Down Gossip
  • 11.
    State dissemination Node A Broadcaststate change to 3 nodes in the network-scope Random Node C Random Node D Random Node E 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast Accept state update only if entry’s lamport time is greater than the lamport time of existing entry Random Node F Periodic bulk sync to a random node in the network-scope Create State
  • 12.
    Worker1 task1.web task2.web Worker3 demo_frontend overlaynetwork (vxlan-id 4097) DNS resolver 127.0.0.11 Worker2 task1.app Docker DNS server Docker DNS server Docker DNS server DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 task1.db web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.6 :{Worker2,4097} 10.0.1.9 :{Worker2,4097} demo_backend overlay network (vxlan-id 4098) web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.5 :{Worker1,4097} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.5 :{Worker3,4098} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.6 :{Worker2,4098} Gossip Gossip 10.0.1.5 10.0.1.6 10.0.1.9 10.0.2.6 10.0.2.5
  • 13.
    Troubleshooting Control-Plane $ dockernetwork inspect -v demo_frontend [ { "Name": “demo_frontend", "Id": "m669nibgiwc0mfleq8geaa6mk", "Created": "2017-04-12T13:18:58.049831936Z", "Scope": "swarm", "Driver": “overlay", "Options": { "com.docker.network.driver.overlay.vxlanid_list": "4096" }, …
  • 14.
    … "Peers": [ { "Name": "ip-172-31-28-108", "IP":"172.31.28.108" }, { "Name": "ip-172-31-46-47", "IP": "172.31.46.47" }, ] Troubleshooting Control-Plane
  • 15.
    "Services": { "web": { "VIP":“10.1.0.6”, "LocalLBIndex": 5, "Tasks": [ { "Name": “web.1", "EndpointID": "1a5323d0e94c", "EndpointIP": "10.1.0.7", "Info": { "Host IP": "172.31.28.108" } Troubleshooting Control-Plane
  • 16.
  • 17.
    Worker1 task1.web task2.web Worker3 demo_frontend overlaynetwork (vxlan-id 4097) DNS resolver 127.0.0.11 Worker2 task1.app Docker DNS server Docker DNS server Docker DNS server DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 task1.db web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.6 :{Worker2,4097} 10.0.1.9 :{Worker2,4097} demo_backend overlay network (vxlan-id 4098) web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.5 :{Worker1,4097} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.5 :{Worker3,4098} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.6 :{Worker2,4098} Gossip Gossip
  • 18.
    /etc/resolv.conf nameserver 127.0.0.11 web10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 task2.app 10.0.1.10 Docker DNS Server Docker Daemon Dissecting the DNS lookup task1.web resolve “app” IPTables {127.0.0.11, 53} : DNAT DNS Query “app” to 127.0.0.11 DNS A Record query : “app”
  • 19.
    /etc/resolv.conf nameserver 127.0.0.11 Dissecting theDNS lookup task1.web IPTables {127.0.0.11, 53} : DNAT DNS A Record response : “app” : 10.0.1.8 web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 task2.app 10.0.1.10 Docker DNS Server Docker Daemon
  • 20.
    /etc/resolv.conf nameserver 127.0.0.11 Dissecting theDNS-rr lookup task1.web IPTables {127.0.0.11, 53} : DNAT DNS A Record response : “app” : [ 10.0.1.9, 10.0.1.10 ] web 10.0.1.4 (vip) app 10.0.1.9 10.0.1.10 task1.app 10.0.1.9 task2.app 10.0.1.10 task1.web 10.0.1.5 Docker DNS Server Docker Daemon docker service create —name=app —endpoint-mode=dns-rr demo/my-app
  • 21.
  • 22.
    $ docker info … LoggingDriver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge contiv/v2plugin:latest host macvlan null overlay Swarm: active Drivers provide data-plane
  • 23.
    Docker host 1Docker host 2 Docker host 3 CntnrA CntnrB CntnrC CntnrD CntnrE CntnrF Overlay network All containers on the overlay network can communicate! What is Docker Overlay Networking The overlay driver enables simple and secure multi-host networking
  • 24.
    Docker Overlay• Theoverlay driver uses VXLAN technology • A VXLAN tunnel is created on top of underlay network(s) • At each end of the tunnel is a VXLAN tunnel end point (VTEP) • The VTEP performs encapsulation and de-encapsulation • The VTEP exists in the Docker Host’s network namespace VXLAN
  • 25.
    Docker Host 1Docker Host 2 172.31.1.5 192.168.1.25 Br0 Br0 VXLAN tunnel VTEP :4789/udp VTEP :4789/udp veth veth C1: 10.0.0.3 C2: 10.0.0.4 Network Namespace Network Namespace Layer 3 IP transport network Building an Overlay Network (more detailed)
  • 26.
    1.docker network <commands> 2.nsenter—net=<net-namespace> 3.tcpdump -nnvvXXS -i <interface> port <port> 4.iptables -nvL -t <table> 5.ipvsadm -L 6.ip <commands> 7.bridge <commands> 8.drill 9.netstat -tulpn 10.iperf <commands> The Ten Commandments All-in-one tools container : https://github.com/nicolaka/netshoot
  • 27.
    root@my-host $ dockernetwork ls NETWORK ID NAME DRIVER SCOPE jm1eohsff6b4 demo_default overlay swarm a5f124aef90b docker_gwbridge bridge local root@my-host $ ls /var/run/docker/netns 1-jm1eohsff6 1-o2hnj2jm1f 2229639766c2 79f0ad997956 ingress_sbox root@my-host $ nsenter —net=/var/run/docker/netns/1-jm1eohsff6 root@my-host $ brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.3a87525fe051 no vxlan0 veth0 veth1 Overlay dataplane
  • 28.
    root@my-host $ ip-d link show br0 2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default link/ether 3a:87:52:5f:e0:51 brd ff:ff:ff:ff:ff:ff promiscuity 0 bridge forward_delay 1500 hello_time 200 max_age 2000 addrgenmode eui64 root@my-host $ ip -d link show veth0 17: veth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default link/ether be:dc:c5:da:8c:0d brd ff:ff:ff:ff:ff:ff link-netnsid 2 promiscuity 1 veth bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64 Overlay dataplane
  • 29.
    root@my-host $ ip-d link show vxlan0 14: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default link/ether f6:ae:70:27:6c:9c brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 1 vxlan id 4097 srcport 0 0 dstport 4789 proxy l2miss l3miss ageing 300 bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64 Overlay dataplane
  • 30.
    root@my-host $ ip-s neighbor show 10.0.0.6 dev vxlan0 lladdr 02:42:0a:00:00:06 used 1100/1100/1100 probes 0 PERMANENT 10.0.0.3 dev vxlan0 lladdr 02:42:0a:00:00:03 used 1101/1101/1101 probes 0 PERMANENT root@my-host $ bridge fdb show … f6:ae:70:27:6c:9c dev vxlan0 vlan 1 master br0 permanent 02:42:0a:00:00:03 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent 02:42:0a:00:00:06 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent be:dc:c5:da:8c:0d dev veth0 vlan 1 master br0 permanent 3a:87:52:5f:e0:51 dev veth1 vlan 1 master br0 permanent … Overlay dataplane
  • 31.
  • 32.
    Worker1 task1.web Worker3 demo_frontend overlay network(east-west) Worker2 task1.app task1.dbtask2.web default_gwbridge default_gwbridge default_gwbridge L2/L3 underlay network (North-South connectivity) demo_backend overlay network (east-west) Inside container netns
  • 33.
    root@my-host $ dockerinspect demo_app.1.d35s03a7xryoeta34lqys1v5j | grep Key "SandboxKey": "/var/run/docker/netns/2229639766c2", root@my-host $ $ ifconfig eth0 Link encap:Ethernet HWaddr 02:42:0a:00:00:08 inet addr:10.0.0.8 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 eth1 Link encap:Ethernet HWaddr 02:42:ac:a8:01:42 inet addr:172.168.1.66 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 Inside container netns
  • 34.
  • 35.
    app : 10.0.1.8 Client-sideVIP Load Balancing task1.web IPTables mangle table : OUTPUT chain MARK : 10.0.1.8 -> lb-index 5 IPVS lb-index 5 : RR : 10.0.1.9, 10.0.1.10 Conntracker
  • 36.
    root@my-host $ iptables-nvL -t mangle Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 MARK all -- * * 0.0.0.0/0 10.0.0.7 MARK set 0x101 0 0 MARK all -- * * 0.0.0.0/0 10.0.0.4 MARK set 0x100 root@my-host $ ipvsadm -L Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 256 rr -> 10.0.0.5:0 Masq 1 0 0 -> 10.0.0.6:0 Masq 1 0 0 FWM 257 rr -> 10.0.0.3:0 Masq 1 0 0 root@my-host $ conntrack -L tcp 6 431997 ESTABLISHED src=10.0.0.8 dst=10.0.0.4 sport=33635 dport=80 src=10.0.0.5 dst=10.0.0.8 sport=80 dport=33635 [ASSURED] mark=0 use=1 Client-side Load Balancing
  • 37.
    /etc/resolv.conf nameserver 127.0.0.11 Client-side DNS-rrLoad Balancing task1.web DNS A Record response : “app” : [ 10.0.1.9, 10.0.1.10 ] web 10.0.1.4 (vip) app 10.0.1.9 10.0.1.10 task1.app 10.0.1.9 task2.app 10.0.1.10 task1.web 10.0.1.5 Docker DNS Server Docker Daemon docker service create —name=app —endpoint-mode=dns-rr demo/my-app app : [ 10.0.1.9, 10.0.1.10 ]
  • 38.
    Routing Mesh • Nativeload balancing of requests coming from an external source • Services get published on a single port across the entire Swarm • Incoming traffic to the published port can be handled by all Swarm nodes • Traffic is internally load balanced as per normal service VIP load balancing Ingress Network Docker host 2 task2.myservice Docker host 1 task1.myservice Docker host 3 IPVS IPVS IPVS 8080 8080 8080 Ingress network docker service create -p 8080:80 nginx
  • 39.
  • 40.
    iptables NAT table DOCKER-INGRESS DNAT: Published-Port -> ingress-sbox eth0 Host1 default_gwbridge ingress-sboxeth1 iptables MANGLE table PREROUTING MARK : Published-Port -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) ingress-overlay-bridge Ingress Network eth0 iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host2 default_gwbridge ingress-sbox … eth1 ingress-overlay-bridge eth0 vxlan tunnel with vni Ingress Network eth0 Container-sbox eth1 iptables NAT table PREROUTING Redirect -> target-port Routing Mesh
  • 41.
    Homework Deep-dive into Routing-Mesh Questions? Tweet : @MadhuVenugopal Slack : madhu in #dockercommunity org
  • 42.
    Thank You. 106270 -Deep Dive in Docker Overlay Networks (Apr 19, 3:45 PM) 110420 - Docker Networking in Production at Visa (Apr 19, 2:25 PM) @docker #dockercon