CMPE 252A : Computer Networks
An Introduction to
Software Defined Networking (SDN)
&
OpenFlow
Huazhe Wang
huazhe.wang@ucsc.edu
Some slides are borrowed from http://www.cs.princeton.edu/courses/archive/spr12/cos461/
http://www-net.cs.umass.edu/kurose-ross-ppt-6e/
The Internet: A Remarkable Story
Tremendous success
From research experiment
to global infrastructure
Brilliance of under-specifying
Network: best-effort packet delivery
Hosts: arbitrary applications
Enables innovation in applications
Web, P2P, VoIP, social networks, virtual worlds
But, change is easy only at the edge…
1-2
Inside the ‘Net: A Different Story…
Closed equipment
Software bundled with hardware
Vendor-specific interfaces
Over specified
Slow protocol standardization
Few people can innovate
Equipment vendors write the code
Long delays to introduce new features
Impacts performance, security, reliability, cost…
1-3
Networks are Hard to Manage
Operating a network is expensive
More than half the cost of a network
Yet, operator error causes most outages
Buggy software in the equipment
Routers with 20+ million lines of code
Cascading failures, vulnerabilities, etc.
The network is “in the way”
Especially a problem in data centers
… and home networks
1-4
Traffic engineering: difficult for traditional
routing
5
3
2 v w 5
u 2 1
3 z
1
2
x 1 y
Q: what if network operator wants u-to-z traffic to flow
along uvwz, x-to-z traffic to flow xwyz?
A: need to define link weights so traffic routing
algorithm computes routes accordingly (or need a new
routing algorithm)!
1-5
Traffic engineering: difficult
5
3
2 v w 5
u 2 1
3 z
1
2
x 1 y
Q: what if network operator wants to split u-to-z
traffic along uvwz and uxyz (load balancing)?
A: can’t do it (or need a new routing algorithm)
1-6
Traffic engineering: difficult
5
3
v
v
w
w
2 5
zz
u 2 1
3
1
2
xx yy
1
Q: what if w wants to route blue and red traffic
differently?
A: can’t do it (with destination based forwarding, and
LS, DV routing)
1-7
Rethinking the “Division of Labor”
Network-layer functions
Recall: two network-layer functions:
forwarding: move packets
from router’s input to data plane
appropriate router output
routing: determine route
taken by packets from source control plane
to destination
Two approaches to structuring network control plane:
per-router control (traditional)
logically centralized control (software defined networking)
1-9
Traditional Computer Networks
Data plane:
Packet
streaming
Forward, filter, buffer, mark,
rate-limit, and measure packets 1-10
Traditional Computer Networks
Per-router control plane:
Distributed algorithms
Track topology changes, compute
routes, install forwarding rules 1-11
Traditional Computer Networks
Management plane:
Human time scale
Collect measurements and configure the equipment
1-12
Software Defined Networking (SDN)
Logically-centralized control
Smart,
slow
API to the data plane
(e.g., OpenFlow)
Dumb,
fast
Switches
1-13
Software defined networking (SDN) 3. control plane
functions
4.programmable external to
control routing
access
control
… load
balance data-plane
applications switches
Remote Controller
control
plane
data
plane
CA
CA CA CA CA 2. control,
data plane
separation
1: generalized“
flow-based”
forwarding (e.g.,
OpenFlow) 1-14
Software defined networking (SDN)
Why a logically centralized control plane?
easier network management: avoid router
misconfigurations, greater flexibility of
traffic flows
table-based forwarding allows
“programming” routers
centralized “programming” easier: compute
tables centrally and distribute
distributed “programming: more difficult:
compute tables as result of distributed
algorithm (protocol) implemented in each and
every router
open (non-proprietary) implementation of
control plane
1-15
Analogy: mainframe to PC evolution *
Ap Ap Ap Ap Ap Ap Ap Ap Ap Ap
App
Specialized p p p p p p p p p p
Applications Open Interface
Specialized Windows Mac
Operating (OS)
or Linux or OS
System
Open Interface
Specialized
Hardware
Microprocessor
Vertically integrated Horizontal
Closed, proprietary Open interfaces
Slow innovation Rapid innovation
Small industry Huge industry
1-16
* Slide courtesy: N. McKeown
SDN perspective: data plane switches
Data plane switches network-control applications
fast, simple, commodity …
switches implementing routing
generalized data-plane access load
control balance
forwarding in hardware control
switch flow table computed, northbound API plane
installed by controller
SDN Controller
API for table-based switch (network operating system)
control (e.g., OpenFlow)
defines what is controllable southbound API
and what is not
protocol for communicating data
plane
with controller (e.g.,
OpenFlow) SDN-controlled switches
1-17
SDN perspective: SDN controller
SDN controller (network
network-control applications
…
OS): routing
maintain network state access load
information control balance
interacts with network northbound API
control
plane
control applications “above”
via northbound API
SDN Controller
interacts with network (network operating system)
switches “below” via
southbound API southbound API
implemented as distributed
system for performance, data
scalability, fault-tolerance, plane
robustness
SDN-controlled switches
1-18
SDN perspective: control applications
network-control apps: network-control applications
“brains” of control: …
implement control routing
functions using lower-level access load
services, API provided by control balance
SND controller control
plane
unbundled: can be
northbound API
provided by 3rd party:
distinct from routing SDN Controller
vendor, or SDN controller
(network operating system)
southbound API
data
plane
SDN-controlled switches
1-19
SDN: control/data plane interaction example
Dijkstra’s link-state
1 S1, experiencing link
Routing failure using OpenFlow
port status message to
4 5 notify controller
network RESTful … intent
2 SDN controller
graph API
receives OpenFlow
…
3 Application interface layer
statistics flow tables message, updates link
status info
…
3 Dijkstra’s routing
Link-state info host info switch info
2 Network-wide management layer algorithm application
OpenFlow
… SNMP has previously
registered to be called
Communication layer when ever link status
1 changes. It is called.
s2 4 Dijkstra’s routing
algorithm access
s1 network graph info, link
s4
s3 state info in controller,
computes new routes 1-20
SDN: control/data plane interaction example
Dijkstra’s link-state
Routing
5 link state routing app
4 5 interacts with flow-
network RESTful … intent table-computation
graph API
component in SDN
controller, which
…
3 Application interface layer
statistics flow tables computes new flow
tables needed
Link-state info host info … switch info
2 Network-wide management layer 6 Controller uses
OpenFlow
… OpenFlow to install new
tables in switches that
SNMP
Communication layer need updating
1
s2
s1
s4
s3
1-21
OpenFlow (OF) defines the communication
protocol that enables the SDN Controller
to directly interact with the forwarding
plane of network devices.
OpenFlow data plane abstraction
flow: defined by header fields
generalized forwarding: simple packet-handling rules
Pattern: match values in packet header fields
Actions: drop, forward, modify, matched packet or send
matched packet to controller
Priority: disambiguate overlapping patterns
Counters: #bytes and #packets
* : wildcard
1. src=1.2.*.*, dest=3.4.5.* drop
2. src = *.*.*.*, dest=3.4.*.* forward(2)
3. src=10.1.2.3, dest=*.*.*.* send to controller 1-23
OpenFlow data plane abstraction
match+action: unifies different kinds of devices
Router Firewall
• match: longest match: IP addresses
destination IP prefix and TCP/UDP port
• action: forward out numbers
a link action: permit or
Switch deny
• match: destination NAT
MAC address match: IP address
• action: forward or and port
flood action: rewrite
address and port 1-24
Examples
Destination-based forwarding:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Action
Port src dst type ID Src Dst Prot sport dport
* * * * * * 51.6.0.8 * * * port6
IP datagrams destined to IP address 51.6.0.8 should
be forwarded to router output port 6
Firewall:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * * * * * 22 drop
do not forward (block) all datagrams destined to TCP port 22
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * 128.119.1.1
* * * * drop
do not forward (block) all datagrams sent by host 128.119.1.1 1-25
OpenFlow protocol
operates between
controller, switch
TCP used to
exchange messages
OpenFlow messages
1-26
OpenFlow: controller-to-switch messages
features: controller
queries switch features,
switch replies
configure: controller
queries/sets switch
configuration parameters
modify-state: add, delete,
modify flow entries in the
OpenFlow tables
packet-out: controller can
send this packet out of
specific switch port 1-27
OpenFlow: switch-to-controller messages
packet-in: transfer packet
(and its control) to controller.
See packet-out message from
controller
flow-removed: flow table entry
deleted at switch
port status: inform controller
of a change on a port.
Fortunately, network operators don’t “program”
switches by creating/sending OpenFlow messages
directly. Instead use higher-level abstraction at
controller 1-28
OpenFlow in the Wild
Open Networking Foundation
Google, Facebook, Microsoft, Yahoo, Verizon, Deutsche
Telekom, and many other companies
Commercial OpenFlow switches
HP, NEC, Quanta, Dell, IBM, Juniper, …
Network operating systems
NOX, Beacon, Floodlight, OpenDaylight, POX, Pyretic
Network deployments
Eight campuses, and two research backbone networks
Commercial deployments (e.g., Google backbone)
1-29
Challenges
Heterogeneous Switches
Number of packet-handling rules
Range of matches and actions
Multi-stage pipeline of packet processing
Offload some control-plane functionality (?)
access MAC IP
control look-up look-up
1-31
Controller Delay and Overhead
Controller is much slower the the switch
Processing packets leads to delay and
overhead
Need to keep most packets in the “fast path”
packets
1-32
Distributed Controller
For scalability
Controller and reliability Controller
Application Application
Partition and replicate state
Network OS Network OS
1-33
Testing and Debugging
OpenFlow makes programming possible
Network-wide view at controller
Direct control over data plane
Plenty of room for bugs
Still a complex, distributed system
Need for testing techniques
Controller applications
Controller and switches
Rules installed in the switches
1-34
Evolution of OpenFlow
1-35
Evolution of OpenFlow
P4 OpenFlow 2.0
1-36
Run OpenFlow Experiments
Mininet
Open vSwitch (OVS)
1-37
Hedera: Dynamic Flow Scheduling
for Data Center Networks
Mohammad Al-Fares Sivasankar Radhakrishnan
Barath Raghavan* Nelson Huang Amin Vahdat
UC San Diego *Williams College
Slides modified from https://people.eecs.berkeley.edu/~istoica/classes/cs294/.../21-Hedera-TD.pptx
Problem
MapReduce/Hadoop style workloads have substantial
BW requirements
ECMP-based multi-paths load balancing often leads
oversubscription --> jobs bottlenecked by network
Key insight/idea
Identify large flows and periodically rearrange them
to balance the load across core switches
Challenges
Estimate bandwidth demand of flows
Find optimal allocation network paths for flows
ECMP Paths
Many equal cost paths going up to the core switches
Only one path down from each core switch
Randomly allocate paths to flows using hash
Agnostic to available resources
Long lasting collisions between long (elephant) flows
S D
Collisions of elephant flows
Collisions possible in two different ways
Upward path
Downward path
S1 S2 S3 D2 S4 D1 D4 D3
Hedera Scheduler
1. Detect 2. Estimate 3. Schedule
Large Flows Flow Demands Flows
Detect Large Flows
Flows that need bandwidth but are network-limited
Estimate Flow Demands
Use min-max fairness to allocate flows between src-
dst pairs
Place Flows
Use estimated demands to heuristically find better
placement of large flows on the ECMP paths
Elephant Detection
Scheduler continually polls edge switches for
flow byte-counts
Flows exceeding B/s threshold are “large”
• > %10 of hosts’ link capacity (i.e. > 100Mbps)
What if only mice on host?
Default ECMP load-balancing efficient for
small flows
Demand Estimation
Measured flow rate is misleading
Need to find a flow’s “natural” bandwidth
requirement when not limited by the network
find max-min fair bandwidth allocation
Demand Estimation
Given traffic matrix of large flows, modify
each flow’s size at it source and destination
iteratively…
Sender equally distributes bandwidth among
outgoing flows that are not receiver-limited
Network-limited receivers decrease exceeded
capacity equally between incoming flows
Repeat until all flows converge
Guaranteed to converge in O(|F|) time
Demand Estimation
Demand Estimation
Demand Estimation
Demand Estimation
Flow Placement Heuristics
Two approaches
Global First Fit: Greedily choose path that has
sufficient unreserved b/w
Simulated Annealing: Iteratively find a globally
better mapping of paths to flows
Global First-Fit
Scheduler
? ? ?
Flow A
Flow B 0 1 2 3
Flow C
New flow detected, linearly search all possible paths
from SD
Place flow on first path whose component links can fit
that flow
Global First-Fit
Scheduler
Flow A
Flow B 0 1 2 3
Flow C
Once flow ends, entries + reservations time out
Simulated Annealing
Simulated Annealing:
a numerical optimization technique for searching for a
solution in a space otherwise too large for ordinary
search methods to yield results
Probabilistic search for good flow-to-core mappings
Simulated Annealing
State: A set of mappings from destination hosts to
core switches.
Neighbor State: Swap core switches between 2 hosts
Within same pod,
Within same edge,
etc
Simulated Annealing
Function/Energy: Total exceeded b/w capacity
Using the estimated demand of flows
Minimize the exceeded capacity
Temperature: Iterations left
Fixed number of iterations (1000s)
Achieves good core-to-flow mapping
Sometimes very close to global optimal
Simulated Annealing
Scheduler
Core ? ? ? ?
Flow A 2
Flow B 10 0 1 2 3
Flow C 03
2
Example run: 3 flows, 3 iterations
Simulated Annealing
Scheduler
Core ? ? ? ?
Flow A 2
Flow B 0 0 1 2 3
Flow C 3
Final state is published to the switches and
used as the initial state for next round
Simulated Annealing
Optimizations
Assign a single core switch to each destination host
Incremental calculation of exceeded capacity
Using previous iterations best result as initial state
Evaluation
Evaluation
Data Shuffle
ECMP GFF SA Control
Total Shuffle Time (s) 438.4 335.5 336.0 306.4
Avg. Completion Time (s) 358.1 258.7 262.0 226.6
Avg. Bisection BW (Gbps) 2.81 3.89 3.84 4.44
Avg. host goodput (MB/s) 20.9 29.0 28.6 33.1
16-hosts: 120 GB all-to-all in-memory shuffle
Hedera achieves 39% better bisection BW over
ECMP, 88% of ideal non-blocking switch
Reactiveness
Demand Estimation:
27K hosts, 250K flows, converges < 200ms
Simulated Annealing:
Asymptotically dependent on # of flows + #
iterations
• 50K flows and 10K iter: 11ms
Most of final bisection BW: first few hundred
iterations
Scheduler control loop:
Polling + Estimation + SA = 145ms for 27K hosts
Limitations
Dynamic workloads,
ECMP Hedera large flow turnover
faster than control
loop
Stable
Scheduler will be
Matrix Stability
continually chasing the
traffic matrix
Unstable
Need to include
penalty term for
unnecessary SA flow
Flow Size re-assignments
Conclusion
Rethinking networking
Open interfaces to the data plane
Separation of control and data
Leveraging techniques from distributed systems
Significant momentum
In both research and industry
1-64