Linux Network Fundamentals
and Applications
Andrew yongjoon kong
Cell lead, Architect
Kakaocorp
Contents
Linux networking fundamentals
Linux networking applications for VM
Linux netwokring applications for
container
Kakao’s applications
Krnet 2016 kakaocorp
Networking Data Structures
The most important structures in linux
kernel:
sk_buff (defined in include/linux/skbuff.h)
netdevice (defined in
include/linux/netdevice.h)
Krnet 2016 kakaocorp
Linux Network Stack
Network
Applications User
Kernel
BSD Sockets
Socket Interface
INET Sockets
TCP UDP
Protocol Layers
IP
ARP
Link Layers
PPP SLIP Ethernet
Krnet 2016 kakaocorp
Real View of Network transfer
Krnet 2016 kakaocorp
Simplifying Receiving a Packet
Network card
receives a frame
issues an Driver
interrupt handles the interrupt
•Frame RAM
•Allocates sk_buff
(called skb)
•Frame skb
Krnet 2016 kakaocorp
Network Fundamental 1:
sk_buff (skbuff.h)
Generic buffer for all packets
sk_buff represents data and
headers
Almost always sk_buff instances
appear as “skb” in the kernel code
Transport Header Layer4 (TCP/UDP/ICMP)
Network Header Layer4(IPv4/v6/ARP)
MAC Header Layer2 (Mac)
sk_buff ‘s 3 unions
Krnet 2016 kakaocorp
sk_buff (cont.)
struct sk_buff *next
struct sk_buff *prev
struct sk_buff_head *list
struct sock *sk
…
union {tcphdr; udphdr; …} h; Transport Header
union {iph; ipv6h;arph;…} nh; Network Header
union {raw} mac; MAC Header
…. DATA
Krnet 2016 kakaocorp
SK_BUFF contd.
struct dst_entry *dst – the route for this
sk_buff; this route is determined by the routing
subsystem.
It has 2 important function pointers:
int(*input)(struct sk_buff*);
int (*output)(struct sk_buff*);
input() can be assigned to: ip_local_deliver,
ip_forward, ip_mr_input, ip_error or
dst_discard_in.
output() can be assigned to: ip_output,
ip_mc_output, ip_rt_bug, or dst_discard_out.
Krnet 2016 kakaocorp
sk_buff (cont.)
Krnet 2016 kakaocorp
“Understanding Linux Network Internals”, Christian Benvenuti
Network Fundamental 2:
net_device
net_device represents a network interface card.
Not exactly represents physical device
There are cases when we work with virtual
devices.
For example, bonding or VLAN
Many times this is implemented using the
private data of the device (the void *priv
member of net_device);
Krnet 2016 kakaocorp
net_device contd
unsigned int mtu – Maximum Transmission Unit: the
maximum size of frame the device can handle.
Each protocol has mtu of its own; the default is 1500 for
Ethernet.
unsigned int flags (which you see or set using ifconfig
utility): for example, RUNNING or NOARP.
unsigned char dev_addr[MAX_ADDR_LEN] : the MAC
address of the device (6 bytes).
Krnet 2016 kakaocorp
Receiving a Packet (Device)
Driver (cont.)
calls device independent
core/dev.c:netif_rx(skb)
•puts skb into CPU queue
•issues a “soft” interrupt
CPU
calls core/dev.c:net_rx_action()
•removes skb from CPU queue
•passes to network layer e.g. ip/arp
•In this case: IPv4 ipv4/ip_input.c:ip_rcv()
Krnet 2016 kakaocorp
Receiving a Packet (IP)
ip_input.c:ip_rcv()
checks
•Length >= IP Header (20 bytes)
•Version == 4
•Checksum
•Check length again
calls calls
ip_rcv_finish() route.c:ip_route_input()
Krnet 2016 kakaocorp
Receiving a Packet (routing)
ipv4/route.c:ip_route_input()
Destination == local?
YES ip_input.c:ip_local_deliver()
NO Calls ip_route_input_slow()
ipv4/route.c:ip_route_input_slow()
Can forward?
•Forwarding enabled?
•Know route?
NO Sends ICMP
Krnet 2016 kakaocorp
Forwarding a Packet
Forwarding is handled per-device basis
Receiving device usually do the forwarding
Enable/Disable forwarding in Linux:
/proc file system ↔ Kernel
read/write normally (in most cases)
•/proc/sys/net/ipv4/conf/<device>/forwarding
•/proc/sys/net/ipv4/conf/default/forwarding
•/proc/sys/net/ipv4/ip_forwarding
Krnet 2016 kakaocorp
Forwarding a Packet (cont.)
ipv4/ip_forward.c:ip_forward()
IP TTL > 1
YES Decreases TTL
NO Sends ICMP
.... a few more calls
core/dev.c:dev_queue_xmit()
Default queue: priority FIFO
sched/sch_generic.c:pfifo_fast_enqueue()
Others: FIFO, Stochastic Fair Queuing, etc.
Krnet 2016 kakaocorp
Skb life cycle
Krnet 2016 kakaocorp
Linux Network for L3 (Routing)
Zebra
Linux System
RIP BGP OSPF
Routing Information Base
Netlink User Daemon
Kernel Route
Forwarding Information
Base
Krnet 2016 kakaocorp
Routing Lookup
Cache
ip_route_input() in: net/ipv4/route.c lookup
Miss
ip_route_input_slow() Fib_lookup () in Hit
Deliver packet by:
ip_fib_local_table
in: net/ipv4/route.c ip_local_deliver()
or ip_forward()
according to result
Miss
Fib_lookup () in
ip_fib_main_table
Miss
Drop packet
Krnet 2016 kakaocorp
RIB decision by Dynamic Routing
protocols: SDN in L3
FiB, Decided by State and
Algorithm.
Isn’t it already Software
Defined Something?
Krnet 2016 kakaocorp
http://www.xorp.org/papers.html
Software forwarding plane:
Linux kernels
Control plane Interface between control
and forwarding planes:
routing daemons
Linux (old)
/proc /proc, sysctl, ioctl
ioctl()
netlink
routing socket
Linux (new)
Netlink socket
Linux kernel BSD
Routing socket
Forwarding plane
Krnet 2016 kakaocorp
http://www.xorp.org/papers.html
OpenFlow : SDN for L2
Physical separation of control
OpenFlow and forwarding
Controller Forwarding plane in L2
Flow table instead of FIB
More general than IP
OpenFlow Switch exposes flow table
SSL Protocol
though simple OpenFlow
protocol
Keep it simple
Flow table
Vendor can keep platform
closed
OpenFlow-enabled
Use outboard device for packet
Layer-2 Switch
processing
Matches subsets of packet header fields
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Port src dst type ID Src Dst Prot sport dport
Krnet 2016 kakaocorp
Linux networking for VM
Basic networking
Ethernet
VLAN
Subnet, ARP
DHCP
IP
TCP/UDP/ICMP
Krnet 2016 kakaocorp
Linux networking for VM, cont.
Network Components
Switch ( packet swtiching vs flow)
Router ( vs Gateway )
Firewalls ( vs Iptables )
Load balancers ( vs Routers)
Krnet 2016 kakaocorp
Linux networking for VM, cont.
Tunnel technologies
Generally Known as Overlay
GRE
VXLAN
Why not ipsec?
Krnet 2016 kakaocorp
Linux networking for VM, cont.
Network namespaces
A way ( not only ) of scoping networking
functions and components.
VRF : multiple Gateway on the same router
at the same time
Krnet 2016 kakaocorp
Linux networking for VM, cont.
SNAT: router modifies source IP in
packet
DNAT: router modifies destination IP in
packet
One-to-one NAT
Krnet 2016 kakaocorp
Linux networking for VM, example
Openstack networking
Add more complexity
veth, openvswitch, linux bridge
Krnet 2016 kakaocorp
Linux networking for VM, example
Openstack networking, cont.
Krnet 2016 kakaocorp
SDN: practical
Google’s Jupiter
Krnet 2016 kakaocorp
SDN:practical, Kakao’s case
What we try to solve
IP movement inter-rack, inter-zone, inter-
dc(?)
IP resource imbalance
Fault Resilience
Dynamically check status of network
Simple IP Resource Planning and
Management
Krnet 2016 kakaocorp
SDN:practical, Kakao’s case cont.
Use 32bit subnet, BGP and switch
namespace
Routing Table
1 10.100.10.2/32 via 192.1.1.201
192.1.1.202 eBGP
Compute node Routing Table
Default GW 192.168.1.1 eth1
Switch Namespace dhcp-server 192.1.1.201
iBGP
Host Route dest 10.10.100.2/32
to 10.10.100.1
process eth1
10.10.100.1
neutron-dhcp-
linux bridge agent
IP:10.10.100.2/ neutron-
32 linuxbridge-
agent
vm Routing Table
Default GW x.x.x.x eth0
GW
nova-compute
eth0 Controller
global name space
Krnet 2016 kakaocorp
What is container?
Container comprises multiple
namespaces
Standardized resource
Brick or Lego
Krnet 2016 kakaocorp
Typical container orchestrator’s network
Yes, it’s overlay again.
Flannel
Krnet 2016 kakaocorp
Scalable container network: Kakao’s
case
Have to deal with those when you try to
use overlay.
Have to re-think about performance
Have to think about fault-resiliency, and
migration issues.
Still consider how send the packet out of the
system.
Krnet 2016 kakaocorp
Scalable container network: Kakao’s
case, cont.
It has history
First approach was using docker libnetwork
Using Docker libnet
blog.midonet.org
BTW, Kubernetes give it up! OMG
Krnet 2016 kakaocorp
Scalable container network: Kakao’s
case, cont.
Use node port and Load balancer
It’s very easy.
Had issue with scalability node port has limited
port range.
Only have 5digits number of containers
Load balancer is expensive.
Krnet 2016 kakaocorp
Scalable container network: Kakao’s
case, cont.
Use routable container bridge subnet and
bgp injector
Predefine subnet for each containers bridge
router
Have to provision before resource depleted.
BGP
Router
Router
Injector
Cluster
subnet1 subnet2 subnet3
Container Container Container
Krnet 2016 Node1
kakaocorp Node2 Node3
BTW
It’s all about connecting/controlling
fundamental network elements. (we
didn’t invent new wheel)
But we try to find the secret composition
Hope that openflow/overlay based
solution will be getting more popular,
cheaper and simpler.
Krnet 2016 kakaocorp