KEMBAR78
Network Algorithmics | PPTX
Network Algorithmics
황인욱
Atto Research
2018. 6. 15.
순서
• Building faster routers
– Scheduling packets
• BW guarantee & DRR
• Random early detection
• Token bucket
– Traffic measurement
– Lookups
• Prefix Match
• Exact-Match: inventing bridge
• “Network Algorithmics” 소개
2
Router Bottlenecks
• Exact match-lookup
• Prefix match-lookup
• Switching
• QoS
3
Scheduling Packet
• Output queue에서 packet처리
– FIFO with tail-drop의 문제점
• 할일
– BW guarantee, rate-limiting, TCP congestion control
– 네트워크를 A와 B가 사용. A는 우선적으로 80%의 대역폭 보장
하려면?
– 동영상 트래픽은 1Mbps를 넘지 못하게 하자.
4
BW guarantee
• FIFO queue
• Multi queue with Round robin
A B A B B B
200 200 200 200 200 200
200 600 400
A
B
5
A
B
BW guarantee
• Multi queue with Priority
• 문제
– Priority 관리 – 가장 높은 priority 찾기 (heap – log n)
• 이렇게까지 하지말고, long term에서 맞춰주자
200 200 200 200 200
200 600
A B
200 400A
B
6
timestamp
Deficit RR (DRR)
• O(1)
– Active List: 보낼 패킷이 있는 queue의 목록
– Quantum은 최소 packet size보다 크게: queue 방문하면 반드시 packet 보냄
200 200 200 200 200
200 600
200
200 600 400
0
0
0
Round robin
pointer Deficit counter
A B C
A
B
C
7
300
100
100
400
0
0
DRR extention
• Class based queuing (CBQ)
– Hierarchical DRR
• Node 당 scheduler 하나
• Modified DRR (cisco/juniper)
– voIP는 top priority
Tenant A Tenant B
Web 그외 Web 그외
70% 30%
40% 60% 50% 50%
8
TCP congestion control
• IPv4에는 congestion 예방을 위한 DECbit가 없다?
– Proposal on table for a ECN bit for IPv6
9
Random Early Detection (RED)
• TCP restart를 최소화하자.
– Output queue가 어느 크기 이상이면 packet drop.
– 일종의 신호.
• 대부분의 라우터에서 구현.
– de facto standard
• Weighted RED (WRED)
– Cisco
– IP TOS bit에 따라서 threshold 다르게
• Adaptive RED(ARED), robust RED(RRED)
10
Token bucket
• 언제 필요한가
– 어떤 flow에 대해 100Kbps로 대역폭 제한
– 하지만 4KB 정도는 burstiness 허용
OpenFlow 1.3
11
Token bucket
12
• Bucket에 B 이상은 담기지 않는다.
• 실제로는 counter와 timer로 구현
Shaping .vs. policing
https://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-policing/19645-policevsshape.html
13
Traffic Measurement
• Traffic 측정은 중요
– Internet Backbones에서 Accounting/Billing
– Traffic engineering
– Capacity planning
– Network diagnostics and forensics: Intrusion detection, denial-
of-service attacks
– Products: NetFlow (Cisco), cflowd (Juniper), NetStream (Huawei)
• 그리고 “떠오르는 분야”
14
Counting
: 0
: 0
: 0
: 0
: 0
: 1: 2
: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24
: 1: 2: 3
: 1
: 1: 2: 3
• Counting: 대표적인 measurement
– Interface에 대한 counting: 쉽다
– Filter-based, per-prefix: 어렵다
15
Hybrid DRAM-SRAM architecture
• DRAM(large) .vs. SRAM(fast)
– DRAMs have access times of 50 - 60 ns
– SRAMs have access times of 4.5 -7 ns, but around 50 - 60 Mb (Micro
n Tech.)
16
“Expensive infeasible”
Approximate counting
• 더 단순하게 만들고, 메모리를 줄이기 위해 정확성을 희생시키자.
• Randomized counting
– 확률적으로 counter 증가
• Large flows (elephants) 만 측정하고 small flows (mice)는 무시해도
될 것 같다.
– 그런데 elephant 인지는 어떻게 알지?
– Elephant인지 알기 위해서 모든 flow에 대해서 counting하면 똑같음 -> hashing
– False positive 줄이기 위해 multi-hash
17
Overall Architecture
Elephant Traps
Few, deep counters
Mouse Traps
Many, shallow counters
Status bit
Indicates overflow
flows
18
Sampling
• Basic Netflow
– DRAM에 부담, collection overhead
– 보통 1/16, 1/1000
• Sampled charging
• Trajectory sampling
– 라우터들의 hash를 동일하게
19
Trajectory sampling
Longest Prefix Matching
20
10.0.0.0/8 00001010 XXXXXXXX XXXXXXXX XXXXXXXX R1
128.0.0.0/9 10000000 0XXXXXXX XXXXXXXX XXXXXXXX R2
10.0.1.0/24 00001010 00000000 00000001 XXXXXXXX R3
0.0.0.0/0 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX R4
10.0.1.5 와 match되는 가장 긴 prefix는?
Longest Prefix Matching
• Non-algorithmic
– Caching
• Map: 32-bit address에서 next hop로
• “Cache hit ratios in backbone: poor”
– TCAM Issues (Ternary Content-Addressable Memories)
• Density Scaling
• Power Scaling
• Time Scaling
• Extra Chips
• Algorithmic
– TRIE: 문자열에 특화된 tree 자료구조
– Binary search
• Semiconductor manufacturers
– “양쪽에 베팅” - algorithmic, CAM-based
21
TCAM
• Ternary Content Addressable Memory
– 0, 1, X에 대해서 match된다 (ternary)
– data를 넣으면 주소가 나온다.
• Great for partial match
– Longest prefix
– Access lists
00
1
2
3
4
5
6
7
1 0 1 1 1
1 0 1 1 X X
1 0 1 X X X
1 1 0 1 X X
0 0 1 0 X X
X 0 0 X 0 0
X 0 0 X 1 0
X X X X X X
22
SRAM TCAM
Power 6x
Area 7x
Latency 4x
00
1
2
3
4
5
6
7
1 0 1 1 1
1 0 1 1 X X
1 0 1 X X X
1 1 0 1 X X
0 0 1 0 X X
X 0 0 X 0 0
X 0 0 X 1 0
X X X X X X
1 0 1 1 0 0
1
One-bit TRIE
23
DRAM access: 60ns
32 * 60ns = 1.92 us
Multi-bit TRIE
여러가지 최적화: stride, compression 등
“10Gbps는 이걸로 충분” 24
Binary search
• Multi-bit trie보다 느림
• 두가지 필요성
– 특허
• “이것 때문에 하고 있는 vendor들이 있다”
– IPv6
• 8bit stride multibit TRIE: 16 acceses
• Binary search on prefix lengths: 7 accesses
25
Binary search
• Binary search on ranges
• Binary search on prefix lengths
26
Length-1 Length-2 Length-3 …
101
100 111
110
Exact match lookup: history of
bridge
• 1980년대 후반
– Ethernet의 한계
– Ethernet을 확장 필요.
• Filter repeater with learning (Mark Kempf, DEC)
– “훌륭한 아이디어”
27
Wire Speed를 위해 한 것들
• 10Mbps
– 2 lookups per port in 51.2 usec
• Architecture
– 4-port cheap DRAM with cycle time of 100 nsec for packet bufers and lookup
memory. Bus parallelism, memory bandwidth, page mode.
• Data Copying
– Ethernet chips used DMA, packets copied from one port to other by flipping
pointers.
• Control Overhead
– Interrupt overhead minimized by processor polling, staying in a loop after a
packet interrupt.
• Lookups
– Used caveats. Wrote software to verify lookup bottleneck
28
Scaling lookups
• 1990년대
– DEC의 결정: 100Mbps ethernet ring 연결 위한 FD야
bridge
– 패킷 최소크기: 64b -> 40b
– Lookup DB: 8K -> 64K
• Two approaches
– Perfect Hashing (pre-computation)
– HW parallelism
29
Network Algorithmics
30
Network algorithmics is the use of an interdisciplinary systems approach, seasoned
with algorithmic thinking, to design fast implementations of network processing tasks
at servers, routers, and other networking devices
Topics
• Endnode bottlenecks
– Data copy: DMA, programmed IO
– Context switching
• service model (process/thread/event-driven), select()
– Timer: timing wheel
– Demultiplexing
– Protocol processing
• UDP checksum, buffer 관리, Reassembly
• Router bottlenecks
– exact match: bridge
– prefix match: router의 longest first match
– switching
– packet classification
• service differentiation (router)
– QoS: rate-limiting, RED
• 그외
– Network Measurement: counter, trajectory sampling
– Network Security: exact/approximate string matching 31
15 implementation principles
32
Polya, “How to solve it”
33
ROUTE COMPUTATION USING DIJKSTRA’S ALGORITHM (4.3)
0
8
9
8
10
1
3 12
7
11
8
∞
∞
∞
∞
∞
∞
∞
2
4
5
6 0
11
9
88
9
8
10
1
3 12
7
11
8
∞
∞
∞
∞
2
4
5
6 0
11
9
18
8
8
10
1
3 12
7
8
∞
∞
∞
2
4
5
6
0
11
9
10
8
8
12
7
8
12
∞
∞
2
4
5
0
19
11
9
19
10
8
8
12
7
8
12
4
5
6 0
11
9
10
8
8
1
3 12
7
8
∞
∞
∞
2
4
5
6
34
ROUTE COMPUTATION USING DIJKSTRA’S ALGORITHM (4.3)
35
2
5 10
12 15 16
Heap
n log n -> n + diam*maxlinkcost
Updating TCAM (3.1)
00
1
2
3
4
5
6
7
1 0 1 1 1
1 0 1 1 X X
1 0 1 X X X
1 1 0 1 X X
0 0 1 0 X X
X 0 0 X 0 0
X 0 0 X 1 0
X X X X X X
1 0 1 1 0 0
1
36
principles 나름의 요약
• 자료구조, 하드웨어를 잘 사용하자
– TCAM, TRIE, Hash 등
• Common case를 최적화하라
– Cache
• 제한조건을 완화시켜서 더 쉬운 알고리즘 적용.
– 예: Real number 대신에 integer
• 그것도 안되면, 정확도를 희생하거나 확률적인 방법도 고려하라
– 아주 정확하지 않아도 되는 값 (ranking)
– Ethernet, RED
37
Summary
• Building faster routers
– Scheduling packets
• BW guarantee & DRR
• Random early detection
• Token bucket
– Traffic measurement
– Lookups
• Prefix Match
• Exact-Match: inventing bridge
• “Network Algorithmics” 소개
38

Network Algorithmics

  • 1.
  • 2.
    순서 • Building fasterrouters – Scheduling packets • BW guarantee & DRR • Random early detection • Token bucket – Traffic measurement – Lookups • Prefix Match • Exact-Match: inventing bridge • “Network Algorithmics” 소개 2
  • 3.
    Router Bottlenecks • Exactmatch-lookup • Prefix match-lookup • Switching • QoS 3
  • 4.
    Scheduling Packet • Outputqueue에서 packet처리 – FIFO with tail-drop의 문제점 • 할일 – BW guarantee, rate-limiting, TCP congestion control – 네트워크를 A와 B가 사용. A는 우선적으로 80%의 대역폭 보장 하려면? – 동영상 트래픽은 1Mbps를 넘지 못하게 하자. 4
  • 5.
    BW guarantee • FIFOqueue • Multi queue with Round robin A B A B B B 200 200 200 200 200 200 200 600 400 A B 5 A B
  • 6.
    BW guarantee • Multiqueue with Priority • 문제 – Priority 관리 – 가장 높은 priority 찾기 (heap – log n) • 이렇게까지 하지말고, long term에서 맞춰주자 200 200 200 200 200 200 600 A B 200 400A B 6 timestamp
  • 7.
    Deficit RR (DRR) •O(1) – Active List: 보낼 패킷이 있는 queue의 목록 – Quantum은 최소 packet size보다 크게: queue 방문하면 반드시 packet 보냄 200 200 200 200 200 200 600 200 200 600 400 0 0 0 Round robin pointer Deficit counter A B C A B C 7 300 100 100 400 0 0
  • 8.
    DRR extention • Classbased queuing (CBQ) – Hierarchical DRR • Node 당 scheduler 하나 • Modified DRR (cisco/juniper) – voIP는 top priority Tenant A Tenant B Web 그외 Web 그외 70% 30% 40% 60% 50% 50% 8
  • 9.
    TCP congestion control •IPv4에는 congestion 예방을 위한 DECbit가 없다? – Proposal on table for a ECN bit for IPv6 9
  • 10.
    Random Early Detection(RED) • TCP restart를 최소화하자. – Output queue가 어느 크기 이상이면 packet drop. – 일종의 신호. • 대부분의 라우터에서 구현. – de facto standard • Weighted RED (WRED) – Cisco – IP TOS bit에 따라서 threshold 다르게 • Adaptive RED(ARED), robust RED(RRED) 10
  • 11.
    Token bucket • 언제필요한가 – 어떤 flow에 대해 100Kbps로 대역폭 제한 – 하지만 4KB 정도는 burstiness 허용 OpenFlow 1.3 11
  • 12.
    Token bucket 12 • Bucket에B 이상은 담기지 않는다. • 실제로는 counter와 timer로 구현
  • 13.
  • 14.
    Traffic Measurement • Traffic측정은 중요 – Internet Backbones에서 Accounting/Billing – Traffic engineering – Capacity planning – Network diagnostics and forensics: Intrusion detection, denial- of-service attacks – Products: NetFlow (Cisco), cflowd (Juniper), NetStream (Huawei) • 그리고 “떠오르는 분야” 14
  • 15.
    Counting : 0 : 0 :0 : 0 : 0 : 1: 2 : 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24 : 1: 2: 3 : 1 : 1: 2: 3 • Counting: 대표적인 measurement – Interface에 대한 counting: 쉽다 – Filter-based, per-prefix: 어렵다 15
  • 16.
    Hybrid DRAM-SRAM architecture •DRAM(large) .vs. SRAM(fast) – DRAMs have access times of 50 - 60 ns – SRAMs have access times of 4.5 -7 ns, but around 50 - 60 Mb (Micro n Tech.) 16 “Expensive infeasible”
  • 17.
    Approximate counting • 더단순하게 만들고, 메모리를 줄이기 위해 정확성을 희생시키자. • Randomized counting – 확률적으로 counter 증가 • Large flows (elephants) 만 측정하고 small flows (mice)는 무시해도 될 것 같다. – 그런데 elephant 인지는 어떻게 알지? – Elephant인지 알기 위해서 모든 flow에 대해서 counting하면 똑같음 -> hashing – False positive 줄이기 위해 multi-hash 17
  • 18.
    Overall Architecture Elephant Traps Few,deep counters Mouse Traps Many, shallow counters Status bit Indicates overflow flows 18
  • 19.
    Sampling • Basic Netflow –DRAM에 부담, collection overhead – 보통 1/16, 1/1000 • Sampled charging • Trajectory sampling – 라우터들의 hash를 동일하게 19 Trajectory sampling
  • 20.
    Longest Prefix Matching 20 10.0.0.0/800001010 XXXXXXXX XXXXXXXX XXXXXXXX R1 128.0.0.0/9 10000000 0XXXXXXX XXXXXXXX XXXXXXXX R2 10.0.1.0/24 00001010 00000000 00000001 XXXXXXXX R3 0.0.0.0/0 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX R4 10.0.1.5 와 match되는 가장 긴 prefix는?
  • 21.
    Longest Prefix Matching •Non-algorithmic – Caching • Map: 32-bit address에서 next hop로 • “Cache hit ratios in backbone: poor” – TCAM Issues (Ternary Content-Addressable Memories) • Density Scaling • Power Scaling • Time Scaling • Extra Chips • Algorithmic – TRIE: 문자열에 특화된 tree 자료구조 – Binary search • Semiconductor manufacturers – “양쪽에 베팅” - algorithmic, CAM-based 21
  • 22.
    TCAM • Ternary ContentAddressable Memory – 0, 1, X에 대해서 match된다 (ternary) – data를 넣으면 주소가 나온다. • Great for partial match – Longest prefix – Access lists 00 1 2 3 4 5 6 7 1 0 1 1 1 1 0 1 1 X X 1 0 1 X X X 1 1 0 1 X X 0 0 1 0 X X X 0 0 X 0 0 X 0 0 X 1 0 X X X X X X 22 SRAM TCAM Power 6x Area 7x Latency 4x 00 1 2 3 4 5 6 7 1 0 1 1 1 1 0 1 1 X X 1 0 1 X X X 1 1 0 1 X X 0 0 1 0 X X X 0 0 X 0 0 X 0 0 X 1 0 X X X X X X 1 0 1 1 0 0 1
  • 23.
    One-bit TRIE 23 DRAM access:60ns 32 * 60ns = 1.92 us
  • 24.
    Multi-bit TRIE 여러가지 최적화:stride, compression 등 “10Gbps는 이걸로 충분” 24
  • 25.
    Binary search • Multi-bittrie보다 느림 • 두가지 필요성 – 특허 • “이것 때문에 하고 있는 vendor들이 있다” – IPv6 • 8bit stride multibit TRIE: 16 acceses • Binary search on prefix lengths: 7 accesses 25
  • 26.
    Binary search • Binarysearch on ranges • Binary search on prefix lengths 26 Length-1 Length-2 Length-3 … 101 100 111 110
  • 27.
    Exact match lookup:history of bridge • 1980년대 후반 – Ethernet의 한계 – Ethernet을 확장 필요. • Filter repeater with learning (Mark Kempf, DEC) – “훌륭한 아이디어” 27
  • 28.
    Wire Speed를 위해한 것들 • 10Mbps – 2 lookups per port in 51.2 usec • Architecture – 4-port cheap DRAM with cycle time of 100 nsec for packet bufers and lookup memory. Bus parallelism, memory bandwidth, page mode. • Data Copying – Ethernet chips used DMA, packets copied from one port to other by flipping pointers. • Control Overhead – Interrupt overhead minimized by processor polling, staying in a loop after a packet interrupt. • Lookups – Used caveats. Wrote software to verify lookup bottleneck 28
  • 29.
    Scaling lookups • 1990년대 –DEC의 결정: 100Mbps ethernet ring 연결 위한 FD야 bridge – 패킷 최소크기: 64b -> 40b – Lookup DB: 8K -> 64K • Two approaches – Perfect Hashing (pre-computation) – HW parallelism 29
  • 30.
    Network Algorithmics 30 Network algorithmicsis the use of an interdisciplinary systems approach, seasoned with algorithmic thinking, to design fast implementations of network processing tasks at servers, routers, and other networking devices
  • 31.
    Topics • Endnode bottlenecks –Data copy: DMA, programmed IO – Context switching • service model (process/thread/event-driven), select() – Timer: timing wheel – Demultiplexing – Protocol processing • UDP checksum, buffer 관리, Reassembly • Router bottlenecks – exact match: bridge – prefix match: router의 longest first match – switching – packet classification • service differentiation (router) – QoS: rate-limiting, RED • 그외 – Network Measurement: counter, trajectory sampling – Network Security: exact/approximate string matching 31
  • 32.
  • 33.
    Polya, “How tosolve it” 33
  • 34.
    ROUTE COMPUTATION USINGDIJKSTRA’S ALGORITHM (4.3) 0 8 9 8 10 1 3 12 7 11 8 ∞ ∞ ∞ ∞ ∞ ∞ ∞ 2 4 5 6 0 11 9 88 9 8 10 1 3 12 7 11 8 ∞ ∞ ∞ ∞ 2 4 5 6 0 11 9 18 8 8 10 1 3 12 7 8 ∞ ∞ ∞ 2 4 5 6 0 11 9 10 8 8 12 7 8 12 ∞ ∞ 2 4 5 0 19 11 9 19 10 8 8 12 7 8 12 4 5 6 0 11 9 10 8 8 1 3 12 7 8 ∞ ∞ ∞ 2 4 5 6 34
  • 35.
    ROUTE COMPUTATION USINGDIJKSTRA’S ALGORITHM (4.3) 35 2 5 10 12 15 16 Heap n log n -> n + diam*maxlinkcost
  • 36.
    Updating TCAM (3.1) 00 1 2 3 4 5 6 7 10 1 1 1 1 0 1 1 X X 1 0 1 X X X 1 1 0 1 X X 0 0 1 0 X X X 0 0 X 0 0 X 0 0 X 1 0 X X X X X X 1 0 1 1 0 0 1 36
  • 37.
    principles 나름의 요약 •자료구조, 하드웨어를 잘 사용하자 – TCAM, TRIE, Hash 등 • Common case를 최적화하라 – Cache • 제한조건을 완화시켜서 더 쉬운 알고리즘 적용. – 예: Real number 대신에 integer • 그것도 안되면, 정확도를 희생하거나 확률적인 방법도 고려하라 – 아주 정확하지 않아도 되는 값 (ranking) – Ethernet, RED 37
  • 38.
    Summary • Building fasterrouters – Scheduling packets • BW guarantee & DRR • Random early detection • Token bucket – Traffic measurement – Lookups • Prefix Match • Exact-Match: inventing bridge • “Network Algorithmics” 소개 38